Another day, another odd error. Another trip into the deep, dark depths of Windows. Another enlightening find that reminded me of the inter-dependency of Lync, Windows, and SQL Server.
Product: Microsoft Lync Server 2013, Front End Server - Update 'Lync Server 2013 (KB3120728)' could not be installed. Error code 1603. Additional information is available in the log file D:\Source\Microsoft\05-Lync Server 2013 - Jan 2016 CU\Server.msp-computername-[2016-05-06][15-19-10]_log.txt
So how did this error come about, you ask?
The Back Story
This error was part of a new Front End Pool installation. At this point in the process I had completed the following tasks:
- SQL Express instances had been pre-installed
- Lync Server 2013 Core Components were installed
- Lync Server 2013 deployment wizard steps 1 & 2 were run
- Local Configuration Store
- Local Components and Services
The error itself was appearing when I was attempting to run the LyncServerUpdateInstaller.exe patch for the January 2016 Cumulative Update. Typically this is a slam-dunk process and goes without issue, but the Front End Server patch failed and rolled back. Examining the log file in the error message was ultimately helpful, but given the amount of information in there, it was truly finding a needle in a haystack. But the needle was found:
Product: Microsoft Lync Server 2013, Front End Server -- Error 29024. Error 0x80004005 (Unspecified error) occurred while executing command 'D:\Program Files\Microsoft SQL Server\110\Tools\Binn\osql.exe'. For more details check log file 'C:\users\username\AppData\Local\Temp\LCSSetup_Commands.log'.
A log file within a log file…interesting… Alright, I’ll follow the bread crumbs:
Msg 5011, Level 13, State 9, Server computername\RTCLOCAL, Line 5 User does not have permission to alter database 'rtc', the database does not exist, or the database is not in a state that allow access checks. Msg 5069, Level 16, State 1, Server computername\RTCLOCAL, Line 5 ALTER DATBASE statement failed.
The KB installer is calling an executable, osql.exe, and using a T-SQL script to initiate changes. I had to look up each of the osql.exe command line switches, but the one that is most important to notice is the “-E”:
Uses a trusted connection instead of requesting a password
Effectively what that command means is “use Windows Integrated Authentication”, which thereby means that my user account should be used. My user account has all the rights in the world (including sysadmin in SQL), so why is this failing? I tried many, many things – even going so far as blowing away Lync (bootstrapper /scorch) and SQL databases – but none of them made any difference. The CU installer would always fail with the same error every time. Nothing seemed to make a difference.
The Plot Thickens
Given my failure and frustration, I fired up SQL Tracing:
I was very, very surprised to see the “NT AUTHORITY\SYSTEM” account being used for the LoginName. My user account is launching the application executable – why aren’t those credentials being used!? Looking at Management Studio for the RTCLOCAL insteance, the “NT AUTHORITY\SYSTEM” account does have a login, but it is not granted any elevated permissions or rights:
No sysadmin role means that it cannot alter databases within the instance. That’s sort of an explanation, but why is this the first time I’m seeing this problem!?
The ultimate epiphany came when I began to look at how the LyncServerUpdateInstaller.exe worked. The executable extracts .MSP files that contain each of the individual Lync Server application patches. The .MSP file contains all the logic and T-SQL scripts that are being executed for this particular Front End Server patch. The big difference is found in how the .EXE and .MSP differ:
- My user account launches the LyncServerUpdateInstaller.exe executable
- My user account is used to initially launch the .MSP files, but the Local System account actually runs the .MSP files.
Microsoft patch files get executed by the Local System account, so that explains why the -E switch to the osql.exe command was passing the “NT AUTHORITY\SYSTEM” credentials. The osql.exe executable was being called by the .MSP file and that .MSP file was run with the SYSTEM account. OK, fair enough, but why aren’t permissions correct on my SQL configuration, especially considering I’ve done this hundreds of times before without any previous issue?!
I looked at a few other server installs within this environment and within the TechNet virtual labs and there was one SQL Server login that was missing from this server:
This group was granted sysadmin rights, which meant that any local admin of the server had sysadmin rights within SQL. Nearly any SQL administrator will advocate for not having the local server Administrator group as a login and generally I would agree that is a best practice. Given all this information, however, it still didn’t explain why the patching process is failing so further research was required…
The ultimate “A-HA!” moment came when I ran across these articles whilst searching for the relationship between the Local System account and the built-in Administrators group:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684190(v=vs.85).aspx The LocalSystem account is a predefined local account used by the service control manager. This account is not recognized by the security subsystem, so you cannot specify its name in a call to the LookupAccountName function. It has extensive privileges on the local computer, and acts as the computer on the network. Its token includes the NT AUTHORITY\SYSTEM and BUILTIN\Administrators SIDs; these accounts have access to most system objects.
https://technet.microsoft.com/en-us/library/cc778824(v=ws.10).aspx System is a hidden member of Administrators. That is, any process running as System has the SID for the built-in Administrators group in its access token.
Go ahead and read those again. See if the “A-HA!” moment comes to you, too… OK…I’ll help you…
Effectively what the articles are saying is that the LOCAL SYSTEM SID is, by default, a bona-fide member of the BUILTIN\Administrators group because its token includes the Administrators group SID. Taking it one step further: What server role does that group have on the working servers SQL instances?…that’s right….sysadmin. Since the .MSP file is attempting to access the SQL instance using the built-in SID for the LOCAL SYSTEM account, it has no access to actually update the databases because the Administrators group was not in the instance.
Note: recall that the SYSTEM account did have a login within SQL, but the available server role rights were set to ‘public’ which means it basically had no rights to do much of anything.
I manually added in the “BUILTIN\Administrators” group to the offending Front End Servers local SQL instances and granted that group sysadmin rights. I re-ran the LyncServerUpdateInstaller.exe updater again and…SUCCESS!!!
Had I not found those two articles, I may have never known the true reasons for the behavior I was seeing. This was the fix though, making sure the “BUILTIN\Administrators” simply matched the configuration of the other servers, which also matched TechNet virtual labs configuration as well.
Coming full circle: What had actually occurred was that the SQL team had removed the BUILTIN\Administrators group for security reasons, after I had initially pre-installed SQL (which at the time of my installation was included) and that removal was unbeknownst to me. All of Microsoft’s standard Lync and Skype installers include that group for the SQL instances (and grant it sysadmin), so it truly is critical that the login exists for the purposes of patching. As I saw, installation of the product occurred just fine but patches would begin to fail outright because the patching process uses the SYSTEM account and not a specific user account.
Note: As an alternative workaround, you could grant the “NT AUTHORITY\SYSTEM” account in SQL sysadmin rights for the purposes of patching processes, but I doubt many people would want to undergo that additional management complexity.
Bottom line: if you choose to change the sysadmin rights on your Lync Front Ends and remove the Administrators group, be aware of this issue and plan for workarounds accordingly!
Note: This issue is another good case study that belongs in my other post, ‘The Dangers of SQL Server Security Hardening for Lync Server & Skype4B Server’, but I separated it into a distinct post for the sake of clarity and so that it would be more easily discoverable via search engines.