Ah…CMS…the Centralized Management Store…  How important you are, yet so little the respect you are given.  I gaze upon your XML and bask in its glory…  The simplicity and elegance of your knowledge of all things Lync/Skype4B…  The heartache your unknown ways cause when things don’t always go as planned…

For those following along with my blog, you may have noticed that I had posted a CMS migration guide a while back.  The guide itself is solid and I’ve used it within many different migrations within Lync Server 2010, Lync Server 2013 and Skype4B Server 2015.  Despite the fact it exists, it doesn’t mean that it is correct 100% of the time.  I was reminded of this last week when I encountered a unique CMS migration scenario that forced me to dig way in and use some out-of-the-box thinking to work around a limitation of the Move-CsManagementStore cmdlet.  In the end it was all OK but my pain will hopefully be your gain.

The Environment

The environment consisted of two Enterprise Edition Front End Pools:  one Lync 2010 pool and one Lync 2013 pool.  Each pool consisted of five servers.  The CMS store was located on the 2010 pool and the goal was to migrate it to the 2013 pool, as shown in the picture below:

Lync-CMStopology-start

Following along with the process of my CMS migration post, I was ready to run Step 7 from a server in the receiving pool.  I tee’d up the cmdlet, hit ‘Enter’, watched with a smile as the cmdlet progressed as expected and then my heart sank:

This cmdlet moves Central Management Server to the pool that contains this computer.

 Current State:
 Central Management Server Pool: "pool01.domain.com"
 Central Management File Store: "\\FQDN\share"
 Central Management Store: "SQLFQDN1.domain.com\sqlinst1"
 Central Management Store SCP: "SQLFQDN1.domain.com\sqlinst1"

 Proposed State:
 Central Management Server Pool: "pool02.domain.com"
 Central Management File Store: "\\FQDN\share"
 Central Management Store: "SQLFQDN2.domain.com\sqlinst2"
 Central Management Store SCP: "SQLFQDN2.domain.com\sqlinst2"


Do you want to move the Central Management Server, Central Management Store, and File Store in the current topology and assign permissions for computers in Active Directory? (Note: Please read the help provided for this cmdlet using
the Get-Help cmdlet before you proceed.)
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help
(default is "Y"):A

...

WARNING: Move-CsManagementServer failed.
Move-CsManagementServer : Failed to execute the following PowerShell command -. "D:\Program Files\Microsoft Lync Server 2013\Deployment\Bootstrapper.exe" .
At line:1 char:1

The Error

Looking at the error, my expectation was that the cmdlet failed to install the CMS replication components on the local server. When I examined the HTML log file it became apparent that the error didn’t even have anything to do with the CMS replication components:

Lync-CMSMigration-Error
Installing MeetingRoomPortalInstaller.msi(Feature_Web_MeetingRoomPortal_Ext, Feature_Web_MeetingRoomPortal_Int)...Log file was: %TEMP%\Bootstrap-CsMachine-[2016_05_19][23_45_58].html
End executing command "9".
Error: Failed to execute the following PowerShell command - . "D:\Program Files\Microsoft Lync Server 2013\Deployment\Bootstrapper.exe" . 

Verify that the account used to run this cmdlet has sufficient permissions to run bootstrapper located at "D:\Program Files\Microsoft Lync Server 2013\Deployment\Bootstrapper.exe". 

Note: The move operation failed after modifying the topology. This means that there are no active Central Management services to replicate configuration changes. To complete the move please run this cmdlet again after the issues encountered during this run are resolved. If the issues cannot be resolved then run the cmdlet on the original pool with -force option to rollback the move operation.

After picking my heart up off the floor, I began to focus on the error.  MeetingRoomPortalInstaller.msi???  Why on earth would the CMS cmdlet be looking for that MSI?  For whatever reason the cmdlet was expecting it and it borked leaving CMS in a horrible state:

 Current State:
 Central Management Server Pool: "pool02.domain.com"
 Central Management File Store: "\\FQDN\share"
 Central Management Store: "SQLFQDN2.domain.com\sqlinst2"
 Central Management Store SCP: "SQLFQDN1.domain.com\sqlinst1"

The CMS topology document indicated that the new pool was the CMS master, but the Active Directory SCP object indicated that the old pool was the master.  Effectively both sides are referring one another to the other pool.  Great.

Abort!  Abort!

With the customer on the call with me, tensions are beginning to rise.  The order was given to rollback CMS.  So I try running the move commands from a server in the Lync 2010 pool…and that fails, too, with the 2010 shell complaining about entries in the XML file.  Ugh.  We’re officially in a pickle now, and we’ve got no path but forward.  I go grab some coffee and settle in for a long night.

Some Background Info

Going back to the error, I start asking questions..

Me:  “The MeetingRoomPortalInstaller.msi is utilized for Lync Room System management…do you utilize this in your environment?”

Customer:  “Uh…we don’t know.  We have many LRS systems, but we have no knowledge of this portal you’re talking about.”

Me:  “That software is not listed in Add/Remove Programs on this machine…is there a reason it is not installed on this machine?”

Customer:  “That server was recently added to our pool.  We didn’t install any additional software during the provisioning.”

*chatter…grumbling…cross talk*

Me:  “I’ve examined the web.config file for the LRS portal on one of the other servers and it doesn’t appear that it is truly configured for proper operation.  Instead it just looks like someone installed the MSI file and left it as is.”

Customer:  *silence*  “Uh…ok…  So now what?”

I go ahead and install the LRS portal components on the server and try running the move cmdlet again, but it complains that the new pool already owns CMS and won’t complete.  Ugh!

Digging Deeper

Apparently the LRS portal installer adds entries to the CMS topology when it gets installed on a server within a Front End pool and as a result, bootstrapper expects the software to be installed.  Since it wasn’t installed, bootstrapper fails which in turn resulted in the CMS move cmdlet failing as well.  Unfortunately for me it resulted in CMS being in a half-baked state where the topology doc says it moved but Active Directory says it didn’t.  Back to the logs…

Importing the LIS configuration data into the new Central Management Store.
Executing ImportLisConfigurationTask.
Importing Location Information Services (LIS) configuration.
Begin executing command "5": "Import-CsLisConfiguration -FileName "C:\Users\username\AppData\Local\Temp\2\Move-CsManagementServer-CsLisConfiguration-New-3-316cfe43-bca1-488b-a285-87a06c05ad0b.zip"".

Executing MoveCmsInTopologyTask.
Exporting Central Management Store configuration.
Importing Central Management Store configuration.
Begin executing command "6": "Import-CsConfiguration -FileName "C:\Users\username\AppData\Local\Temp\2\Move-CsManagementServer-CsConfiguration-New-2-722d35fe-c75a-4a87-b3c2-efec483a1bce.zip"".
End executing command "6".

Reading through the logs it showed that the CMS data itself did get moved to the new store.  Additionally, the CMS replication components did get installed on the new server even though bootstrapper failed on the LRS components:

Installing MgmtServer.msi(Feature_MGMTServer, Feature_FTA, Feature_Master)...success

Remember that at the moment, CMS is not functional.  I can’t start the replication services on the new pool because the AD connection point resolves to the legacy pool and the services stop when they detect a database schema mis-match:

Lync-2013Pool-CMSServiceStopError

I also can’t start the replication services on the legacy pool because the topology document says the new pool is the CMS owner and the services stop upon this detection:

Lync-2010Pool-CMSServiceStopError

The environment has no functional CMS replication until this can be resolved, so the clock is ticking and people are continuing to get ‘twitchy’.  I come back to what I know:

  1. CMS data did get moved to the new database
  2. Active Directory did not get updated to point to the new SQL server

I’ve got one last ditch effort to see if I can get this up and running before we have to call Microsoft support.  At this point, if I update AD to resolve to the new SQL server there is a chance that I can get CMS components to be functional again.

A Fix?

There are cmdlets available that allow you to forcibly change the CMS configuration point and Active Directory SCP object, so I tee up the following commands and hit ‘Enter’:

Set-CsConfigurationStoreLocation -SQLSERVERFQDN “SQLFQDN2.domain.com” -SQLINSTANCENAME “sqlinst2” -Verbose
Set-CsManagementConnection -STOREPROVIDER sql -CONNECTION “SQLFQDN2.domain.com\sqlinst2” -Verbose

Note:  Be sure to make note of the original configuration of the cmdlet values, in case you need to revert the changes back to the original pool!

I wait about 15 minutes for Active Directory replication to fully complete and then attempt to start the replication services on the new pool server.  I watch event log and filter for ‘LS File Transfer Agent Service, LS Master Replicator Agent Service, LS Replica Replicator Agent Service’.  As I’m watching entries come in, the expected entries from my CMS migration blog post come by indicating the server detected the new values in AD and has assumed active CMS master.

LS Master Replicator Agent
Lync-2013Pool-CMSMasterReplicatorSuccessLS File Transfer Agent
Lync-2013Pool-CMSFileTransferSuccess

I continue to run Step 10 of my post and examine CMS replication…within 5 minutes CMS replicas all report up to date.   Success!  I log in to Topology Builder and pull down the topology, and all entries are there as expected with the new pool showing as CMS master.  I check the LIS database and all 911 information is there.  With a hail-Mary pass, I somehow averted a disaster.

Bottom Line

This little find was wholly unexpected.  Who would’ve thought that an error like this could have resulted in CMS migrations going haywire?  Had I run the cmdlet from a server that already had the LRS portal components on it, I likely would not have had any issues and it would have gone without incident.  At the end of the day this was caused by dependent software (LRS meeting portal) not being installed on a new server that was recently added to a front end pool.  I guess this is something I should start checking for, ensuring all software is present and accounted for, before attempting CMS migrations.  This is also a wake-up call for administrators and architects to make sure that your change management processes alert you to inconsistencies like this!