If your UC deployment is in any way critical then you should be planning your Lync/Skype4B environment to be not only highly available within your data center, but also available in another data center for disaster recovery purposes. The pool pairing functionality within Lync Server and Skype4B Server provides a very functional solution for ensuring your users have access to UC services in the event you have an entire pool failure or potentially a larger failure, such as a data center failure. Despite the pool pairing feature being fairly well-executed within the products themselves, there are some aspects of pool pairing that aren’t very well documented or even documented at all. Microsoft goes to great lengths at describing how Active Directory should be utilized with Exchange Server deployments but doesn’t provide the same set of detailed guidance for Lync Server or Skype4B Server. Generally speaking, Lync Server and/or Skype4B Server doesn’t have as intricate of integration with Active Directory as compared to Exchange Server, but that doesn’t mean that AD isn’t important. The two UC server products absolutely still depend on Active Directory for operation and when it comes to pool pairing there are some additional best practices (and requirements) that will be vitally important in failover scenarios.
Infrastructure Placement
When it comes down to having multiple data centers, one of the first AD-related questions will be ‘Should I place Domain Controllers and Global Catalogs at the DR site?’. The short answer is YES. Microsoft would recommend having dedicated DCs (with GC roles) at the remote data center for Exchange Server and I would agree that also stands for Lync Server and Skype4B Server. Your placement of DCs and GCs would look like below:
Infrastructure Sizing
The next question people tend to ask is typically more cost related and results in the ‘Can I get away with less equipment and smaller servers?’ question. The short answer is IT DEPENDS. I would not advise deploying less than your existing production data center for the simple reason of performance. Your users (IT workers included) have come to expect a certain level of performance from your primary data center and will come to expect similar (or identical) performance in the event of DR. The safest approach is to mirror your existing data center if at all possible and if not possible, reduce servers marginally in order to prevent causing performance issues. If you plan on going from 10 domain controllers in your primary data center to 1 domain controller in your DR data center, I can guarantee you’ll cause more issues than you attempted to ‘solve’. Additionally, if your DR data center will be used more in an “Active-Active” configuration then it is more prudent to ensure that performance is the same regardless of data center location. A few samples of this are below:
BEST OPTION: ACCEPTABLE COMPROMISE: BAD IDEA:
Active Directory Site Design
You’ve decided to deploy an identical mirror at your DR data center. The question now becomes ‘How do I design the AD Site topology?’. The appropriate answer to this question is largely based off your requirements and a few other best practices that Microsoft offers. In practice, your design could go in two directions:
- Add the DR data center subnets to an existing Active Directory site so that they are with the production data center domain controllers, OR…
- Add the DR data center subnets into a dedicated Active Directory site specifically for the DR data center domain controllers
Each option has advantages and disadvantages to consider:
- Having a single site is by far the ‘simplest’ solution
- Having a single site speeds up the replication process by treating the two locations as a single logical location
- Having a single site could cause clients to be referred to DCs that are significantly further away and result in delays in group policy processing, logins, LDAP searches, etc
- Having a single site could result in much higher replication bandwidth between the two data centers
- Having multiple sites allows a replication lag in the event an unwanted change in AD occurs and you need to prevent those changes from replicating to DR
All things considered, I would advise having separate AD sites and I would expect that most AD engineers would agree. As a result, your site design would look like this:
Impact to Lync Server and Skype4B Server Pairing
Assuming the AD site design above, there is one considerable item you should think about with pool pairing when a multiple AD site design is deployed: CMS & the AD Service Connection Point. I actually touched on this topic in a previous post and while the scenario in that post is a bit different, the movement of CMS and potential delays still apply in a pool pairing scenario. When you invoke a pool failover the first process that must take place is the movement of CMS, if the pool being failed over hosts CMS. The ‘Invoke-CsManagementServerFailover‘ cmdlet updates a Service Connection Point within Active Directory that tells Lync/Skype4B Servers where to find the CMS master. The SCP object has to be replicated with the updated value (the new pool’s SQL back end) to all AD sites where Lync/Skype4B servers reside before CMS replication will ever be successful, which is a pre-requisite to perform the pool failover.
You can view the current configuration of this SCP object against any DC in your environment through the adsiedit.msc MMC and navigating to Configuration>Services>RTC Service>Topology Settings
Additionally, you can view the current configuration of this SCP object against any DC in your environment through this Management Shell cmdlet:
Get-CsManagementConnection
Get-CsConfigurationStoreLocation
Active Directory replication is dependent upon intra-site AND inter-site replication schedules and by default, inter-site replication intervals are set to 180 minutes, or 3 hours! What this means is that if you make no changes to default configurations in AD site link replication intervals and perform a CMS failover, it could take over 3 hours before the Master Replicator and File Transfer services detect the updated SCP object and allow you to move forward and perform the pool failover.
Note: When separate AD sites are not used in the design, this delay doesn’t apply as intra-site replication does not have the same delay as inter-site replication.
For most IT pros out there, 3 hours of waiting is not acceptable so you have a few options available to speed up the process and compress the timeline:
- Manually force AD Site Link Replication – If the DCs in the primary data center are still available, you can manually force AD site replication and get replication completed sooner. You would then have to perform this step any time you manually perform a CMS failover between the sites, which increases the adminstrative overhead a bit but still allows you to move forward at a faster pace than waiting for replication to automatically occur.
- Set AD Site Link Replication to 15 Minutes – The default value of 180 minutes can be changed on site link replication to the lowest configurable value of 15 minutes. Assuming 15 minutes is configured, your waiting time drops significantly and removes the need to manually invoke any AD site replication if you have time to wait.
- Configure Advanced Replication on the AD Site Link – If 15 minutes is too long and manual intervention is not desired, then certain site links can be configured to operate in non-default states that treat the replication schedule as intra-site instead of inter-site. By default the Options attribute on the Site Link Object should not be configured which tells the Active Directory KCC to replicate according to the site link interval. If you configure the Options attribute to a value of 1, the KCC configures that site link for change notifications and will propagate all changes as they occur. In this scenario your second AD site is still a separate logical site, but replication will now be treated as if it weren’t and result in nearly instantaneous replication intervals whilst still maintaining inter-site compression benefits.
Wrapping Up
Any of the three methods above are completely workable within any disaster recovery or pool failover plan. If automation is the name-of-your-game, look at utilizing either option 2 or option 3 above. If failover is to be a truly manual exercise, then option 1 is your game plan. Regardless of method, the update of this SCP object is crucial when CMS is failed over in multi-site scenarios so make sure you plan for this configuration to meet your failover needs. This scenario is often overlooked and can have significant impact in your failover plans and overall timelines!