When you install Lync 2013 and configure the back end for mirroring, a lot of things happen “under the hood” to configure the mirror (and witness, if applicable). Microsoft really has done a fantastic job of hiding the overall complexity of SQL mirroring through Lync topology builder, but suffice it so say, there are many SQL pieces that get configured and more than a few moving parts. Following the completion of the Lync back end install many companies choose to go back and perform some cleanup duties on the SQL servers. Depending on the company and the DBA, there could be lots of customization, or as I’ve encountered with most deployments, the DBAs usually just want to change the owner of the databases and mirroring endpoint that got created. The assumption by many (myself included) was that this would be a completely seamless and a benign change, but I learned during a recent deployment that it can result in SQL issues that 1) cause the mirror to stop functioning and 2) put both SQL servers in a state where databases could not be accessed on either server. Initially I thought this was something specific to the customer’s SQL configuration, but I was able to confirm this exact behavior in my lab so I knew that the issue could potentially be more widespread than I first thought. It took more than a few hours to figure out the root cause, but thankfully the fix is quick and easy. Read on for the exciting conclusion!
Note: While I expect this issue to be a “your-mileage-may-vary” type of issue, it did give me pause that I could re-create this in my lab, which is very simple and has no complexity. Some may experience this issue and others may not. If you do, the info below should save you some heartache.
Understanding SQL Mirroring
For those less familiar with the inner workings of SQL mirroring, Microsoft has some great information on TechNet that will help you better understand what we’re about to discuss. I highly suggest you read the TechNet documentation first before continuing on.
SQL mirror functionality relies on mirroring endpoints, configured within each SQL instance, to allow inter-server communication. There are two easy ways to view the mirror endpoints:
Option 1 – In SQL Management Studio, go to Server Objects>Endpoints>Database Mirroring
Option 2 – In SQL Management Studio, run this query:
select * from sys.database_mirroring_endpoints
In order to use the mirroring endpoint, logins must exist on the SQL server instance for the account that the SQL services are running under on the server. Just as a domain user account must be added to the logins to connect to a SQL server, so too must the account(s) that the SQL server instance are running under. The principal server will have logins for the mirror and witness, the mirror server will have logins for the principal and witness, and the witness server will have logins for the principal and mirror. In my lab the SQL instances are using the NETWORK SERVICE account on the server itself, so the domain computer accounts are logins within each SQL instance. If your environment uses dedicated domain user accounts to run SQL, then you would see the domain user accounts as logins within each SQL instance. For example, my principal server logins are shown below:
Additionally, the logins must be granted permissions to connect to the mirroring endpoint. Just as a domain user account must be granted access to a database to successfully connect, so too must accounts be given explicit access to consume the database mirroring endpoint. The permission required is a GRANT CONNECT permission, and this permission will match up with the logins on each server. The principal server will have GRANT CONNECT for the mirror and witness, the mirror server will have GRANT CONNECT for the principal and witness, and the witness server will have GRANT CONNECT for the principal and mirror. Remember that the permissions will always be granted to the account that SQL is running under. For example, I can use the following T-SQL query to examine the current permissions on my principal SQL server:
SELECT EPS.name, SPS.STATE, CONVERT(nvarchar(38), SUSER_NAME(SPS.grantor_principal_id))AS [GRANTED BY], SPS.TYPE AS PERMISSION, CONVERT(nvarchar(46),SUSER_NAME(SPS.grantee_principal_id))AS [GRANTED TO] FROM sys.server_permissions SPS , sys.endpoints EPS WHERE SPS.major_id = EPS.endpoint_id ORDER BY Permission,[GRANTED BY], [GRANTED TO]
Lastly, the mirroring endpoint will have an owner who was the person that initially ran Lync topology builder and installed the databases. The owner will be that person’s domain user account and can be determined by the following T-SQL query:
SELECT [PrincipalName] = sp.name, [PrincipalId] = sp.principal_id, me.* FROM sys.database_mirroring_endpoints me with(nolock) inner join sys.server_principals sp with(nolock) on me.principal_id = sp.principal_id
The Issue
As stated at the beginning of the post, my customer went through and changed the owner of the mirroring endpoint to SA as a part of post-installation clean-up. The process to change the owner is a very simple T-SQL command that must be run on each SQL server:
ALTER AUTHORIZATION ON ENDPOINT::mirroring_endpoint TO sa
After running that command, we verified that ownership had changed using the following T-SQL query:
SELECT [PrincipalName] = sp.name, [PrincipalId] = sp.principal_id, me.* FROM sys.database_mirroring_endpoints me with(nolock) inner join sys.server_principals sp with(nolock) on me.principal_id = sp.principal_id
Initially we thought all was well, but when we began testing failover things broke down. We tested a hard failover by simply shutting down the SQL services on the principal and as expected the failover completed successfully. After turning on the SQL services on the principal following the failover, we ended up in a state where mirroring became broken and the Lync front end services couldn’t connect to the back end databases on either server. Looking at SQL Management Studio you could see that both servers thought they were the principal and mirroring was disconnected. We also tried restarting SQL services on the mirror and witness, but nothing helped. Nothing was working and it seemed that the servers, witness included, simply couldn’t talk to one another. When replicating these changes in my lab, I saw this (which was the same experience at my customer):
We were all pretty baffled by this (including the customer’s DBA) and after multiple attempts in my lab, I was able to track down what had occurred. For a reason unbeknownst to me, the ALTER AUTHORIZATION command removes the GRANT CONNECT permissions on the mirroring endpoint. Checking the permissions in my lab, I saw this:
SELECT EPS.name, SPS.STATE, CONVERT(nvarchar(38), SUSER_NAME(SPS.grantor_principal_id))AS [GRANTED BY], SPS.TYPE AS PERMISSION, CONVERT(nvarchar(46),SUSER_NAME(SPS.grantee_principal_id))AS [GRANTED TO] FROM sys.server_permissions SPS , sys.endpoints EPS WHERE SPS.major_id = EPS.endpoint_id ORDER BY Permission,[GRANTED BY], [GRANTED TO]
Notice how the computer accounts have been removed. This little nugget turned out to be the proverbial “needle in a haystack”, as it allowed me to determine that the servers couldn’t connect because the permissions had been removed.
The Fix
After all the fuss, the fix was to simply add the GRANT CONNECT permissions back so that the servers could talk to one another again. Remember that the permissions will always be granted to account that SQL is running under. For example, in my lab I used the following T-SQL commands to restore the permissions:
Principal
GRANT CONNECT ON ENDPOINT::mirroring_endpoint TO [WIDGETS\PIA-SQL-BE02$] GRANT CONNECT ON ENDPOINT::mirroring_endpoint TO [WIDGETS\PIA-SQL-BE03$]
Mirror
GRANT CONNECT ON ENDPOINT::mirroring_endpoint TO [WIDGETS\PIA-SQL-BE01$] GRANT CONNECT ON ENDPOINT::mirroring_endpoint TO [WIDGETS\PIA-SQL-BE03$]
Witness
GRANT CONNECT ON ENDPOINT::mirroring_endpoint TO [WIDGETS\PIA-SQL-BE01$] GRANT CONNECT ON ENDPOINT::mirroring_endpoint TO [WIDGETS\PIA-SQL-BE02$]
Note: If your environment uses domain user accounts for SQL services, make sure to apply the permission to the user accounts and not the computer accounts like I did above.
Once the permissions were in place and verified, mirroring automatically resumed and SQL Management Studio looked much happier:
At my customer we made the same changes (except we applied the permissions to the domain user accounts that SQL was running under) and ended up with the same result as my lab: SQL mirroring was again working. Additionally, the Lync services on the front ends could once again connect to the back end databases. All was once again well in my Lync world.
Conclusion
This is another example of hidden SQL “gotchas” that could cause major heartaches for Lync 2013 deployments using SQL mirroring functionality. It seems that SQL caches the original configuration (which is why mirroring initially worked after we changed it), but once SQL services are restarted it will use the new configuration (which is why it was broken afterwards). If you have SQL mirroring deployed I strongly urge you to double-check permissions if you change the owner of the DB endpoint and thoroughly verify failover following that change. If you don’t, you run the risk of an outage which is exactly what mirroring is supposed to avoid.
Lastly, if anyone out there, DBAs included, can explain why the ALTER AUTHORIZATION command would remove the permissions….I’m all mirrors….I mean ears.