20Jun/16

The Curious Case of Lync Server 2013 CLS “Running” but not Actually Working

This was one of those “ohhhh yeah…” moments – one of those issues that made complete sense once you were able to see root cause but one of those issues where you weren’t initially able to “see the forest through the trees” when troubleshooting began.  It eventually took a case with Premiere support, along with multiple rounds of different Premiere engineers, but we finally arrived at the root cause and a solution.  Without further delay, the issue:

Lync2013-CLSCmdlet-Blank
Success Code - 0, Successful on 1 agents

Some of you may look at the above screenshot and think, “uh…Trevor…so what?”, assume I’ve starting drinking too early in the morning and move on to someone else’s blog.  The true secret, however, was that CLS wasn’t actually working and the screenshot above shows that the cmdlet is actually missing information to confirm that CLS was functional.  What I should have seen was this:

Lync2013-CLSCmdlet-WorkingExample

Notice how the cmdlet actually returned Tracing Status to me and indicated the status, scenario and computer I’m interacting with.  That’s the piece that was missing in this customer’s environment.

So to recap, this is what I knew to be true:

  1. CLS simply didn’t function at all in any capacity, even though the Windows Service for the CLS process was running.  No amount of service restart or server reboot would help.  No cmdlet requests were being processed or executed to make CLS do what it is supposed to do.
  2. It took a significantly long time – on the order of 5+ minutes – for any of the CLS cmdlets to complete.  I have always known that CLS isn’t exactly the Bugatti Veyron Super Sport of the logging world, but it has never taken that long for a simple Show-CsClsLogging cmdlet to complete in any deployment I’ve been involved in.
  3. This was happening on multiple servers (4 at the time the ticket was opened with Microsoft) so it definitely seemed like something had occurred or changed in this customer’s environment that would be at play with this issue.

Given what I know about the environment and what I know about the functioning of Lync Server and CLS, I start to dig in on my own…

First Things First

Knowing the fact that Sophos Anti-Virus has caused multiple Lync Server related issues in this environment in the past, I immediately began focusing my attention there.  I fired up Process Monitor to begin looking at process traces whilst I executed a CLS cmdlet via PowerShell, and in doing so I saw this:

Lync2013-CLSagent-SophosDetour
Sophos_Detoured_x64.dll

The CLSAGENT.EXE executable is having its operation forced through a Sophos DLL, “Sophos_Detoured_x64.dll”.  I actually encountered this issue before and had previously penned a blog post involving security hardening to SQL Server, but I had not yet seen this cause an issue with CLS.  Knowing that this DLL detouring wasn’t supported my Microsoft nor a performance help in general, I went into regedit and $NULL’d out the following registry key entries per the instruction at Sophos:

HKLM\Software\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_Dlls

HKLM\Software\Wow6432Node\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_Dlls

Note:  If the AppInit_Dlls value contains any text – in my case it contained the NTFS file path to the Sophos_Detoured_x64.dll – then DLL detouring is being utilized by your Anti-Virus vendor.

I rebooted the server after making the registry change and tried running the CLS cmdlets again after the reboot, but I didn’t get any different results.  Things still appeared broken.

One Step Closer

I went back into Process Monitor and began looking at traces while I again executed a CLS cmdlet in PowerShell.  The process traces looked very different this time (with Sophos not in the picture) and I could see that the CLSAgent executable was actually trying to do something:

Lync2013-CLSagent-NoSophosDetour

The Process Monitor traces showed that the CLSAGENT.EXE process was stuck in a perpetual loop of “Thread Create” and then immediately after, a “Thread Exit”.  When comparing the log above to a server where CLS was functional, there is a significant difference:

Lync2013-CLSagent-WorkingExample

In the working example directly above, after one of the first “Thread Create” operations, you see the CLSAGENT.EXE process begin writing information to multiple .cache files in the “C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Temp\Tracing” NTFS location.  From that point on in the trace, CLSAGENT.EXE seems perfectly happy.  On the non-working server, the traces never indicated getting to the point where logs were being written.

Final Examinations & Troubleshooting

Thinking logically, it seemed as though the CLSAGENT.EXE process was still potentially being interfered with, so I went through the list of items that made sense to check:

  • Is Sophos configured to exclude all Lync Server related NTFS directories and application executables from real-time scanning?
    • Confirmed that yes, exclusions are in place.
  • Can Sophos be turned off to fully confirm that it is not in any way playing a part?
    • Confirmed that yes, even with Sophos Anti-Virus turned off the end result did not change.
  • Are any firewall ports being blocked that would prevent CLS from functioning?
    • Confirmed both through Wireshark and Process Monitor that TCP 50,001-50,003 were open and that network flows were present.
  • Given that CLS runs under the NetworkService account, are any NTFS restrictions in place that would prevent the account from writing to the desired NTFS locations?
    • Confirmed that while some GPO configuration was present, there were no GPO settings that would prevent the NetworkService account from accessing or writing to the desired NTFS directories.

It was at this point that we got Premiere Support from Microsoft involved.  It took a number of weeks and a number of engineers, but we finally had an answer presented to us this past week.

The Root Cause

Before just “showing my cards”, a little back ground info to help set the stage…

Dynamic Port Background

Starting in Windows Server 2008, Microsoft made a change to the way dynamic UDP/TCP ports are used within the operating system to bring it in-line with IANA standards.  Prior to Windows Server 2008 the dynamic port range was 1024-65535, but in Windows Server 2008 and newer the dynamic port range changed to 49152-65535:

https://support.microsoft.com/en-us/kb/929851

What this means is that any process that may need to request a TCP port for networking communications (think about applications that may use RPC) will use, by default, an open port in the 49152-65535 range for communications. Additionally, you can customize those port ranges to help allay your InfoSec team so that potentially a smaller port range may be used – say ports 50000-55000.

Specifying Specific Ports for NetLogon

In addition to the dynamic ports of the OS, system administrators can actually set a few registry keys that specify the Windows OS to use specific ports for certain communications:

https://support.microsoft.com/en-us/kb/224196

Registry key 1

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters

Registry value: TCP/IP Port
Value type: REG_DWORD
Value data: (available port)

You need to restart the computer for the new setting to become effective.

Registry key 2

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters

Registry value: DCTcpipPort
Value type: REG_DWORD
Value data: (available port)

You need to restart the Netlogon service for the new setting to become effective.

Despite the KB article above talking about Active Directory, the important piece to remember is that LSASS.EXE is utilized even by member servers in the domain.  Additionally, LSASS.EXE is the parent process that spawns the NetLogon process so it consumes any settings that are configured on the base OS build:

https://blogs.technet.microsoft.com/askpfeplat/2015/01/11/rpc-endpoint-mapper-returns-dynamic-port-incorrectly-when-active-directory-is-configured-to-use-static-port/

How it all fits together

In this particular environment, the servers had been customized in regards to the dynamic port range configuration and in regards to static ports for the NetLogon service.  After multiple rounds of logs and investigation, the final engineer eventually focused in on what each ProcessID’s active network ports were on the system by using the ‘netstat’ command in conjunction with the ‘findstr’ command:

netstat -ano | findstr 5000

Lync2013-CLS-NetstatMissingCLS

The engineer then took the ProcessID from the results and looked in Task Manager to find which service or executable was tied to that ProcessID.  What the engineer eventually discovered was that there was another Windows process bound to the CLS TCP ports – lsass.exe.

Lync2013-CLS-LSASSConflict

LSASS.EXE is the Windows Local Security Authority Subsystem which is responsible for all security processing on a server including authentication, security policy processing, audit policy processing, and token processing.  But why would LSASS.EXE be listening on the port CLS wants to use?  The answer to that question is two-fold:

  1. Since LSASS.EXE relies on the dynamic port range configuration of the Windows OS, it simply looks for a random port available upon boot up.
    • Note:  Given that the servers also had a static port configuration set for the NetLogon service that overlapped the CLS ports, it meant that even reboots would not have solved the issue because the same port would have been used after every single reboot.
  2. Since LSASS.EXE starts much earlier in the boot process than CLSAGENT.EXE, it has free reign to bind to the TCP ports that CLS needs because the CLS service isn’t running yet.

As a result of the port range configuration and the boot process order, CLS was effectively being starved out of a port needed to function.

The Fix?

In short, the fix was very simple:  change the dynamic port range and move the static NetLogon port configuration.

Dynamic Ports

This is a pretty easy change to implement using netsh commands:

Netsh int ipv4 set dynamicport tcp start=24419 num=16383
Netsh int ipv4 set dynamicport udp start=24419 num=16383

Netsh int ipv6 set dynamicport tcp start=24419 num=16383
Netsh int ipv6 set dynamicport udp start=24419 num=16383

NetLogon Static Ports

This is also a pretty easy change to implement through regedit:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters

Registry value: TCP/IP Port
Value type: REG_DWORD
Value data: 30000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters

Registry value: DCTcpipPort
Value type: REG_DWORD
Value data: 30001

The Results…

Following the change of the dynamic port range and the change of the NetLogon configuration, you need to reboot the servers in question.  Low-and-behold…after doing so we had functional CLS:

Lync2013-CLSCmdlet-ServerFixed

A quick examination of netstat also confirmed that CLS was bound to the TCP ports, as expected:

Lync2013-CLS-FixedPorts

Short Re-Cap

In the end it was part configuration error and part dumb luck.  This customer had overlapped their dynamic TCP port and static TCP port allocations with ports that CLS wanted to use.  It was simply the luck-of-the-draw that LSASS.EXE had grabbed one of the CLS ports and all because LSASS.EXE starts when Windows boots…long before CLSAGENT.EXE starts, which is set to “Delayed Start” by default.  When you combined everything together, you get the case of CLS “running” but not actually working.

Short Aside:

If you are using Microsoft’s recommendations for TCP/UDP ports for Lync/Skype4B QoS, any port above 40803 should be dedicated for the various Lync/Skype4B MCUs.  Don’t overlap your dynamic or static OS ports in the same range that your MCUs will operate in…instead move it below the 40803 range, as we did in this fix. You may have to ensure your firewall rules are updated to reflect the new SrcPort for communications (if you are using internal firewalling), but it’s a small price to pay to be able to actually use CLS!

13Jun/16

Multiple E-911 Number Support in Skype4B

Update 9/28/2016 – Including information for Lync Phone Edition Cumulative Update that adds support for multiple E-911 number functionality.  Fixes for client support table.

Update 7/15/2016 – Including information for official release via CU3 for Skype for Business Server, including information for official release via July 5, 2016 CU for Office 2016 Skype for Business client.

E-911 is a passion of mine, largely due to some past experiences in my life that have proven the importance of proper E-911 setup within unified communications environments. Despite having done the configuration multiple times within many Lync Server deployments, there has always been a critical limitation within Lync Server in regards to E-911. Most US-based deployments never really encountered this issue because we here in the US have a single emergency number to call for all emergency types, but when you step into a country like Switzerland, a breakdown occurs because Switzerland offers multiple emergency numbers for different types of emergencies. Lync Server 2010/2013 couldn’t support the multiple number configuration of these countries within the context of the platform’s E-911 logic and resulted in an incomplete solution (and potentially dangerous workarounds). This problem remained throughout the Lync Server 2013 lifecycle and even through RTM of Skype for Business Server 2015 until July of 2016. Microsoft officially added multiple emergency number support within Skype for Business, thus alleviating the issues that plagued the platform prior to now. This is a BIG step forward in making the platform more flexible but isn’t without limitations. It’s a more holistic turn-key solution, but is still incomplete and has some pieces left to fill.

Understanding the Problem

Within Lync Server 2010/2013, there simply did not exist a way to support the dialing of multiple emergency numbers for countries that support it.  For example, in Switzerland you could dial 117 for Police, 144 for Ambulance, and 118 for Fire, in addition to 112 for the EU-centric global emergency number.  In Lync Server 2010/2013 you absolutely could account for users dialing those numbers within a location policy, but the mapping was a 1:M assocation:

Skype4B-E911-SingleNumberSupport
An E9-1-1 dial mask is effectively a normalization rule, defining all the different digits that are recognized to indicate an emergency call.

An E9-1-1 dial number is the singular number that all dial masks map to and is the number that actually gets dialed by the Skype for Business client.

As you can see, there is a 1:M mapping which effectively means that while you can have multiple numbers defined in the mask, they all get translated into a single emergency number of 112. What this also means is that you cannot support calling the other emergency numbers specified within the dial mask due to the lack of multiple number support calling within the Location Policy logic.

Note: Some readers would correctly point out that you could actually build normalization rules into dial plans to allow users to dial each of those numbers, and you would be correct. That being said, defining those numbers within a dial plan allows external users and mobile users to make calls to those numbers. The whole purpose of E9-1-1 is to precisely map the caller’s location to a known civic address location via LIS.  LIS functionality isn’t natively available for mobility clients nor external users. I am a strong opponent of allowing normalization rules in dial plans for emergency calls and do not advise this approach.

So what’s New?

The new addition is that Microsoft has added the ability to have the location policy recognize unique and independent combinations of emergency dialing masks and emergency dial numbers, effectively giving an M:N option for emergency numbers..

https://technet.microsoft.com/en-us/library/mt723406.aspx

When planning for multiple emergency numbers, keep the following in mind:

•You can define up to five emergency numbers for a given location.

•For each emergency number, you can specify an emergency dial mask, which is unique to a given location policy.

A dial mask is a number that you want to translate into the value of the emergency dial number value when it is dialed. For example, assume you enter a value of 212 in this field and the emergency dial number field has a value of 911. When a user dials 212, the call will be made to 911. This allows for alternate emergency numbers to be dialed and still have the call reach emergency services (for example, if someone from a country or region with a different emergency number attempts to dial that country or region’s number rather than the number for the country or region they are currently in). You can define multiple emergency dial masks by separating the values with semicolons. For example, 212;414. The string limit for a dial mask is 100 characters. Each character must be a digit 0 through 9.

•Each location policy has a single public switched telephone network (PSTN) usage that is used to determine which voice route is used to route emergency calls from clients using this policy. The usage can have a unique route per emergency number.

•If a location policy has both the EmergencyNumbers and DialString parameters defined, and the client supports multiple emergency numbers, then the emergency number takes precedence. If the client does not support emergency numbers, then the emergency dial string is used.

•For information about which Skype for Business and Lync clients support receiving multiple emergency numbers, dial masks, and public switched telephone network (PSTN) usages, see Supported clients.

To configure this new feature, you must abandon the GUI and revert to PowerShell by utilizing the new New-CsEmergencyNumber cmdlet. This cmdlet allows you to create the individual mask->number mappings, within the limitations above, of course.

Step 1 – Research your Emergency Number Needs

The first task you should take is simply to define your number mappings.  Ask yourself these questions:

  • Given the locale of the office location, how many different number types do I need to support?
  • Given the locale of the office location, do I need to account for other regions’ emergency numbers being dialed by visiting personnel?

Step 2 – Plan your Emergency Number Mappings

Once you have identified the needs above, you can create a table that outlines the configuration you will put into place:

Location Policy Name Emergency Dial String Emergency Dial Mask PSTN Usage
CH-BE-Bern-Hochschulstrasse-FL1 117 117 CH-BE-Bern-Hochschulstrasse-Emergency
CH-BE-Bern-Hochschulstrasse-FL1 144 144 CH-BE-Bern-Hochschulstrasse-Emergency
CH-BE-Bern-Hochschulstrasse-FL1 118 118 CH-BE-Bern-Hochschulstrasse-Emergency
CH-BE-Bern-Hochschulstrasse-FL1 112 112;999;911 CH-BE-Bern-Hochschulstrasse-Emergency

You’ll notice in my table above, I’ve accounted for each of the individual emergency types first and mapped them directly to their unique dial string. This configuration ensures that each user can dial each emergency number and not have that number be changed in any way. The last line of the table is the “catch all”, allowing users to dial the EU-centric emergency number ‘112’, along with some other emergency numbers from other locales (such as the UK and the US) that will automatically map to ‘112’. This final configuration helps to ensure that emergency calls complete for users who may not know the specific emergency number for a given local (think visitors).

Step 3 – Configure the Number Mappings

PowerShell will be your friend here, as it is the only method by which you can configure this new functionality. Do not attempt to configure this within the Control Panel web portal, as you won’t find it anywhere!  Open up your Skype for Business Server Management Shell and add the configuration:

$a = New-CsEmergencyNumber -DialString 117 -DialMask 117
$b = New-CsEmergencyNumber -DialString 144 -DialMask 144
$c = New-CsEmergencyNumber -DialString 118 -DialMask 118
$d = New-CsEmergencyNumber -DialString 112 -DialMask 112;999;911
Set-CsLocationPolicy -Identity CH-BE-Bern-Hochschulstrasse-FL1 -EmergencyNumbers @{add=$a,$b,$c,$d}

Step 4 – Configure Legacy Client Number Mappings

This is done via the CSCP GUI, just as before. Configure this for the legacy clients that still need emergency services information via the legacy location policy logic. Remember that these clients will be limited to a single emergency number, so make sure to utilize the most global emergency number such as ‘112’:

Skype4B-E911-LegacySingleNumber

Step 5 – Configure Location Policy PSTN Usage

Take a look at your PSTN Usage assigned to your location policy. Remember that your PSTN Usage determines the available voice routes for the calls users make, so you need to ensure that the voice route assigned to the PSTN Usage allows all the different emergency dial numbers you have configured.

Skype4B-E911-LegacySingleNumber
Skype4B-E911-MultipleNumberSUpport-PSTNUsage

If needed, make changes to your PSTN Usage, otherwise simply make note of the voice route(s) you need to edit and move on to step 6.

Step 6 – Edit your Emergency Number Voice Route

This can be accomplished by the CSCP GUI. Go into your voice routing configuration and edit the appropriate emergency voice route (the ones tied to the PSTN Usage in Step 5) to now match all the emergency numbers you have configured. Simply use a logical “|” (or) in the matching rule:

Skype4B-E911-MultipleNumberSupport-VoicRoute

Step 7 – Commit Changes

Commit all your changes and test, test, TEST!

Limitations of this Approach

Nothing is 100% fool-proof and the new multiple emergency number support falls in line with that statement.

Clients

There are now three separate clients that support the multiple E-911 number functionality, a large increase from the original support of one when this feature was first released:

Mobility, Lync for Mac 2011, or legacy 2013/2015 clients seem to be excluded from support.  In addition to that list are unknowns about Polycom VVX phones (or other 3PIP phones from Audiocodes, for example) and the new Skype for Business 2016 client for Mac.  Note:  Given that the new Mac client works off the same functionality as the mobility clients (UCWA), I can almost guarantee support is not available at this time.  I’m sure Microsoft will continue to add to this list as time goes on, but be aware of this limitation and check back on TechNet to find out the latest client supportability.

Note: Sadly, this is simply another reason to make sure you are staying up-to-date with software releases, as the best stuff is only available in the latest versions.

Servers

With Microsoft now releasing information on this, you must have the CU3 update installed for Skype for Business Server 2015.  Microsoft may add this functionality back into Lync Server 2013 (it seems to be back-porting much functionality these days), but I wouldn’t hold my breath.

Note: Sadly, this is simply another reason to make sure you are staying up-to-date with software releases, as the best stuff is only available in the latest versions.

Wrapping Up

This addition is a BIG-WIN for non-US based Skype4B deployments and adds a sorely missing feature. While I’ve focused mainly on non-US for this post, there are distinct cases where additional emergency numbers could be utilized within the US, such as corporations, manufacturers or hospitals that would require the ability to have multiple emergency numbers.  Those organizations could allow unique emergency number combinations for any number of scenarios that may meet internal life-safety requirements.

23May/16

Dissecting a failed CMS Migration in Lync Server 2013 due to LRS Meeting Portal

Ah…CMS…the Centralized Management Store…  How important you are, yet so little the respect you are given.  I gaze upon your XML and bask in its glory…  The simplicity and elegance of your knowledge of all things Lync/Skype4B…  The heartache your unknown ways cause when things don’t always go as planned…

For those following along with my blog, you may have noticed that I had posted a CMS migration guide a while back.  The guide itself is solid and I’ve used it within many different migrations within Lync Server 2010, Lync Server 2013 and Skype4B Server 2015.  Despite the fact it exists, it doesn’t mean that it is correct 100% of the time.  I was reminded of this last week when I encountered a unique CMS migration scenario that forced me to dig way in and use some out-of-the-box thinking to work around a limitation of the Move-CsManagementStore cmdlet.  In the end it was all OK but my pain will hopefully be your gain.

The Environment

The environment consisted of two Enterprise Edition Front End Pools:  one Lync 2010 pool and one Lync 2013 pool.  Each pool consisted of five servers.  The CMS store was located on the 2010 pool and the goal was to migrate it to the 2013 pool, as shown in the picture below:

Lync-CMStopology-start

Following along with the process of my CMS migration post, I was ready to run Step 7 from a server in the receiving pool.  I tee’d up the cmdlet, hit ‘Enter’, watched with a smile as the cmdlet progressed as expected and then my heart sank:

This cmdlet moves Central Management Server to the pool that contains this computer.

 Current State:
 Central Management Server Pool: "pool01.domain.com"
 Central Management File Store: "\\FQDN\share"
 Central Management Store: "SQLFQDN1.domain.com\sqlinst1"
 Central Management Store SCP: "SQLFQDN1.domain.com\sqlinst1"

 Proposed State:
 Central Management Server Pool: "pool02.domain.com"
 Central Management File Store: "\\FQDN\share"
 Central Management Store: "SQLFQDN2.domain.com\sqlinst2"
 Central Management Store SCP: "SQLFQDN2.domain.com\sqlinst2"


Do you want to move the Central Management Server, Central Management Store, and File Store in the current topology and assign permissions for computers in Active Directory? (Note: Please read the help provided for this cmdlet using
the Get-Help cmdlet before you proceed.)
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help
(default is "Y"):A

...

WARNING: Move-CsManagementServer failed.
Move-CsManagementServer : Failed to execute the following PowerShell command -. "D:\Program Files\Microsoft Lync Server 2013\Deployment\Bootstrapper.exe" .
At line:1 char:1

The Error

Looking at the error, my expectation was that the cmdlet failed to install the CMS replication components on the local server. When I examined the HTML log file it became apparent that the error didn’t even have anything to do with the CMS replication components:

Lync-CMSMigration-Error
Installing MeetingRoomPortalInstaller.msi(Feature_Web_MeetingRoomPortal_Ext, Feature_Web_MeetingRoomPortal_Int)...Log file was: %TEMP%\Bootstrap-CsMachine-[2016_05_19][23_45_58].html
End executing command "9".
Error: Failed to execute the following PowerShell command - . "D:\Program Files\Microsoft Lync Server 2013\Deployment\Bootstrapper.exe" . 

Verify that the account used to run this cmdlet has sufficient permissions to run bootstrapper located at "D:\Program Files\Microsoft Lync Server 2013\Deployment\Bootstrapper.exe". 

Note: The move operation failed after modifying the topology. This means that there are no active Central Management services to replicate configuration changes. To complete the move please run this cmdlet again after the issues encountered during this run are resolved. If the issues cannot be resolved then run the cmdlet on the original pool with -force option to rollback the move operation.

After picking my heart up off the floor, I began to focus on the error.  MeetingRoomPortalInstaller.msi???  Why on earth would the CMS cmdlet be looking for that MSI?  For whatever reason the cmdlet was expecting it and it borked leaving CMS in a horrible state:

 Current State:
 Central Management Server Pool: "pool02.domain.com"
 Central Management File Store: "\\FQDN\share"
 Central Management Store: "SQLFQDN2.domain.com\sqlinst2"
 Central Management Store SCP: "SQLFQDN1.domain.com\sqlinst1"

The CMS topology document indicated that the new pool was the CMS master, but the Active Directory SCP object indicated that the old pool was the master.  Effectively both sides are referring one another to the other pool.  Great.

Abort!  Abort!

With the customer on the call with me, tensions are beginning to rise.  The order was given to rollback CMS.  So I try running the move commands from a server in the Lync 2010 pool…and that fails, too, with the 2010 shell complaining about entries in the XML file.  Ugh.  We’re officially in a pickle now, and we’ve got no path but forward.  I go grab some coffee and settle in for a long night.

Some Background Info

Going back to the error, I start asking questions..

Me:  “The MeetingRoomPortalInstaller.msi is utilized for Lync Room System management…do you utilize this in your environment?”

Customer:  “Uh…we don’t know.  We have many LRS systems, but we have no knowledge of this portal you’re talking about.”

Me:  “That software is not listed in Add/Remove Programs on this machine…is there a reason it is not installed on this machine?”

Customer:  “That server was recently added to our pool.  We didn’t install any additional software during the provisioning.”

*chatter…grumbling…cross talk*

Me:  “I’ve examined the web.config file for the LRS portal on one of the other servers and it doesn’t appear that it is truly configured for proper operation.  Instead it just looks like someone installed the MSI file and left it as is.”

Customer:  *silence*  “Uh…ok…  So now what?”

I go ahead and install the LRS portal components on the server and try running the move cmdlet again, but it complains that the new pool already owns CMS and won’t complete.  Ugh!

Digging Deeper

Apparently the LRS portal installer adds entries to the CMS topology when it gets installed on a server within a Front End pool and as a result, bootstrapper expects the software to be installed.  Since it wasn’t installed, bootstrapper fails which in turn resulted in the CMS move cmdlet failing as well.  Unfortunately for me it resulted in CMS being in a half-baked state where the topology doc says it moved but Active Directory says it didn’t.  Back to the logs…

Importing the LIS configuration data into the new Central Management Store.
Executing ImportLisConfigurationTask.
Importing Location Information Services (LIS) configuration.
Begin executing command "5": "Import-CsLisConfiguration -FileName "C:\Users\username\AppData\Local\Temp\2\Move-CsManagementServer-CsLisConfiguration-New-3-316cfe43-bca1-488b-a285-87a06c05ad0b.zip"".

Executing MoveCmsInTopologyTask.
Exporting Central Management Store configuration.
Importing Central Management Store configuration.
Begin executing command "6": "Import-CsConfiguration -FileName "C:\Users\username\AppData\Local\Temp\2\Move-CsManagementServer-CsConfiguration-New-2-722d35fe-c75a-4a87-b3c2-efec483a1bce.zip"".
End executing command "6".

Reading through the logs it showed that the CMS data itself did get moved to the new store.  Additionally, the CMS replication components did get installed on the new server even though bootstrapper failed on the LRS components:

Installing MgmtServer.msi(Feature_MGMTServer, Feature_FTA, Feature_Master)...success

Remember that at the moment, CMS is not functional.  I can’t start the replication services on the new pool because the AD connection point resolves to the legacy pool and the services stop when they detect a database schema mis-match:

Lync-2013Pool-CMSServiceStopError

I also can’t start the replication services on the legacy pool because the topology document says the new pool is the CMS owner and the services stop upon this detection:

Lync-2010Pool-CMSServiceStopError

The environment has no functional CMS replication until this can be resolved, so the clock is ticking and people are continuing to get ‘twitchy’.  I come back to what I know:

  1. CMS data did get moved to the new database
  2. Active Directory did not get updated to point to the new SQL server

I’ve got one last ditch effort to see if I can get this up and running before we have to call Microsoft support.  At this point, if I update AD to resolve to the new SQL server there is a chance that I can get CMS components to be functional again.

A Fix?

There are cmdlets available that allow you to forcibly change the CMS configuration point and Active Directory SCP object, so I tee up the following commands and hit ‘Enter’:

Set-CsConfigurationStoreLocation -SQLSERVERFQDN “SQLFQDN2.domain.com” -SQLINSTANCENAME “sqlinst2” -Verbose
Set-CsManagementConnection -STOREPROVIDER sql -CONNECTION “SQLFQDN2.domain.com\sqlinst2” -Verbose

Note:  Be sure to make note of the original configuration of the cmdlet values, in case you need to revert the changes back to the original pool!

I wait about 15 minutes for Active Directory replication to fully complete and then attempt to start the replication services on the new pool server.  I watch event log and filter for ‘LS File Transfer Agent Service, LS Master Replicator Agent Service, LS Replica Replicator Agent Service’.  As I’m watching entries come in, the expected entries from my CMS migration blog post come by indicating the server detected the new values in AD and has assumed active CMS master.

LS Master Replicator Agent
Lync-2013Pool-CMSMasterReplicatorSuccessLS File Transfer Agent
Lync-2013Pool-CMSFileTransferSuccess

I continue to run Step 10 of my post and examine CMS replication…within 5 minutes CMS replicas all report up to date.   Success!  I log in to Topology Builder and pull down the topology, and all entries are there as expected with the new pool showing as CMS master.  I check the LIS database and all 911 information is there.  With a hail-Mary pass, I somehow averted a disaster.

Bottom Line

This little find was wholly unexpected.  Who would’ve thought that an error like this could have resulted in CMS migrations going haywire?  Had I run the cmdlet from a server that already had the LRS portal components on it, I likely would not have had any issues and it would have gone without incident.  At the end of the day this was caused by dependent software (LRS meeting portal) not being installed on a new server that was recently added to a front end pool.  I guess this is something I should start checking for, ensuring all software is present and accounted for, before attempting CMS migrations.  This is also a wake-up call for administrators and architects to make sure that your change management processes alert you to inconsistencies like this!

16May/16

‘Preliminary Primary FileShareName Parameter is Unusable’ with Lync Server 2013

I’ve been working through a Lync Server 2010 to Lync Server 2013 migration as of recently, and the error below popped up after I upgraded an existing Lync Server 2010 SBA to Lync Server 2013:

Skype4B-EventID32080
Event 32080, LS Storage Service

A queue flush operation has encountered a file error.

Preliminary primary fileShareName parameter: is unusable.  Exception: System.ArgumentNullException: Value cannot be null.
Parameter name: path
     at System.IO.DirectoryInfo..ctor(String path)
     at Microsfot.Rtc.Internal.Storage.Sql.LysDal.ValidateFileShareName(StoreContext ctx, string fileShareName, String timestamp)
Cause:  There may be permission issues to the file share, local file location, temporary directory, or disk is full.

This is the first time I’ve seen this error – ever – so I was a bit perplexed as to what was causing this.  The error involved the Lync Server Storage Service (LYSS) which is not exactly the best published piece of Lync Server (or Skype for Business Server) so finding root cause might be a needle in a haystack.  LYSS, for the curious folks out there, looks like this from a conceptual point of view:

Skype4B-LyncServerStorageService-Conceptual

Note:  For more in-depth information about LYSS, see Mattias Kressmark’s blog post on the topic.

Given that the LYSS Database on each registrar is a temporary storage ground for many Lync Server related activities, it didn’t initially make much sense that LYSS was involving a file share. The only file share that could come to my mind was the Front End pool file share which is defined in topology.  When opening up topology and looking at the file share configuration, some alarm bells started to sound:

LyncServer-FileShareConfig-Invalid
File server FQDN:  NETBIOS
File share:  SHARE\FOLDER

I’ve changed the true names above to protect the innocent, but the configuration above gives enough of a picture.  To my eyes there were two significant issues:

  1. The File server FQDN was not defined as an FQDN.  It was defined as NETBIOS.
  2. The File share was not defined as a share.  It was defined as a share\folder.

Nearly all Microsoft documentation is pretty clear on defining the file share as FQDN\Share.  Could it be possible that this is truly the root cause?

A Short Aside

Almost any Lync/Skype4B architect would agree with avoiding #1 above – everything in Topology must be defined as an FQDN.  Or, well, it should be and Topology Builder should help enforce that.  Lync Server 2013 Topology Builder does not validate the File server FQDN format.  Go ahead and try it yourself…you’ll see no validation errors when you configure things with NETBIOS name within Lync Server 2013 Topology Builder.  Try and do that within Skype for Business Topology Builder, however, and you’ll see a validation error saying the file share is not in FQDN format:

Skype4B-ToplogyBuilderFileShare-Invalid

Some Lync/Skype4B architects may not completely agree with avoiding #2 above – that you should only define a share name and not share\folder.  Perform a quick internet search and you’ll find plenty of examples of file shares configured this way with no documented issues that result from it.  Additionally, Microsoft makes no specific statement about this issue in a KB article dedicated to documenting errors for unsupported File Share configuration.  Historically speaking, I have never configured a file share as share\folder, so I couldn’t be 100% certain it was unsupported or caused issues.

Aside Over

The SBA seemed to be perfectly functional, with exception to the error at the start of this post, and the front end pool (Lync Server 2013) was also functional (and did not exhibit the error) so I needed to try and remove variables from the equation.  It seemed there to be two potential root causes:

  1. File Share Permissions Issues
  2. SBA code issues with current file share configuration

Checking #1 proved to be very simple and we were able to confirm permissions were correct:

Group Permission Note
RTCHSUniversalServices Change Standard Requirement for Lync Server
RTCComponentUniversalServices Change Standard Requirement for Lync Server
RTCUniversalServerAdmins Change Standard Requirement for Lync Server
RTCUniversalConfigReplication Change Standard Requirement for Lync Server

So if it’s truly not the file share permissions it became an issue of trying to determine if the file store configuration simply wouldn’t work with the SBA code.

Digging In…

First step in obtaining more information is through CLS logs.  I already had the AlwaysOn scenario running on the entire environment so gathering the needed data would be easy.  Additionally, the Event 32080 error was only generated once per day and it was almost at the exact same time every day so I could pull CLS information for a very short period of time and more easily find the information. After pulling the data and looking at the Trace logs I was able to find the exact moment when the error was generated:

LyncSBA-LyssDalValidateFileShareName-BrokenLyssDal.ValidateFileShareName:lyssdal.cs
Preliminary primary fileShareName parameter: is unusable.  Exception: System.ArgumentNullException: Value cannot be null.

Looking at the second entry above showed that the code began to also access a local NTFS location of ‘C:\ProgramData\Microsoft\Lync Server’.  Examining that folder path showed that, indeed, there was a folder structure with that seemed to be created by the LYSS functionality:

LyncSBA-LyssDalValidateFileShareName-LocalNTFSFolders

Even better, I had Process Monitor loaded at the time the error was recorded and I could validate through Process Monitor logs that the SBA a) never attempted to contact any UNC-based resources and b) did successfully access the local NTFS folder location above:

LyncSBA-LyssDalValidateFileShareName-ProcMon

Interesting…very interesting…

Based on the information I’d found thus far, it certainly seems like the SBA code doesn’t like something with the file share.  How about the Front End Servers though?  What do they have to say about this, if anything?

After pulling the data and looking at the Trace logs I was able to find the exact moment when the Front End kicked off this process:

LyncFE-LyssDalValidateFileShareName-WorkingLyssDal.ValidateFileShareName:lyssdal.cs
Primary file share location is [\\NETBIOS\SHARE\FOLDER\1-WebServices-29\StorageService]

The Front End has no issues, whatsoever, and validates the share name and begins a process to connect to ‘1-WebServices-29\StorageService\DataExport\Date’.  Examining that folder path showed that, indeed, there was a folder structure with that included a folder for each front end in the pool, created by the LYSS functionality:

LyncFE-LyssDalValidateFileShareName-ShareNTFSFolders

Getting Closer…

Given that the Front End servers seemed to be ‘happy’, it certainly seems like the SBA code might have a specific issue with the file share configuration.  The file share is defined in topology and that gets replicated by CMS to each server in topology.  CMS replication was showing UpToDate for all nodes and I even went so far as examining the local XML data within the FE and SBA local databases to make sure they truly were working off the same topology data.  Indeed, each server had the correct data from CMS:

File Server Configuration (both servers)
Lync-LocalCMSComparison-FileShareConfigurationSBA File Share Configuration (SBA)
LyncSBA-LocalCMS-FileShareConfigurationFront End File Share Configuration (Front End)
LyncFE-LocalCMS-FileShareConfiguration

The servers do have correct CMS information and the SBA still doesn’t like what’s configured.  Running out of options in my arsenal, it lead me to a nearly final conclusion:  define a new file store configuration in topology (a supported file store configuration) and migrate content for the Lync Server 2013 infrastructure in topology.  This process has multiple steps and is well documented within these URLs:

  1. http://social.technet.microsoft.com/wiki/contents/articles/15374.change-the-file-store-location-for-lync-server-2013-pool.aspx
  2. http://ucken.blogspot.com/2014/01/presentation-issues-after-moving-lync.html

The Results Were?

So we went through the process and made the changes.  Updated Topology to use a correctly formatted file store and updated all related configuration items.  Restarted services.  Following all the changes, I was able to validate…failure.  The change didn’t seem to have any impact and the errors persist today.  A big thanks goes out to Amanda Debler for confirming that she too sees these errors in her SBA event logs even though the file store in that environment is specified in the recommended manner.

Bottom line:  I have no idea why the error appears nor any idea of negative ramifications the error indicates.  Sadly I don’t have a fix for this one…yet.  Even so, this customer would have had to change their file store configuration anyway since the Skype4B Topology Builder wouldn’t allow them to use the current one due to the NETBIOS name configuration.  Not a complete loss, but not the home-run I was hoping for.  🙁

Note:  The SBA was running January 2016 Cumulative Update for Lync Server 2013.

Note:  If you want to manually invoke this LyssDal.Cs process, run the Invoke-CsStorageServiceFlush cmdlet specifying a flushtype of “FullFlush”.  This cmdlet will generate the error on command.

09May/16

Lync Server 2013 Front End Patch Installer Fails with Error 1603

Another day, another odd error.  Another trip into the deep, dark depths of Windows.  Another enlightening find that reminded me of the inter-dependency of Lync, Windows, and SQL Server.

The error:

Product: Microsoft Lync Server 2013, Front End Server - Update 'Lync Server 2013 (KB3120728)' could not be installed.  Error code 1603.  Additional information is available in the log file D:\Source\Microsoft\05-Lync Server 2013 - Jan 2016 CU\Server.msp-computername-[2016-05-06][15-19-10]_log.txt

So how did this error come about, you ask?

The Back Story

This error was part of a new Front End Pool installation.  At this point in the process I had completed the following tasks:

  • SQL Express instances had been pre-installed
  • Lync Server 2013 Core Components were installed
  • Lync Server 2013 deployment wizard steps 1 & 2 were run
    • Local Configuration Store
    • Local Components and Services

The error itself was appearing when I was attempting to run the LyncServerUpdateInstaller.exe patch for the January 2016 Cumulative Update.  Typically this is a slam-dunk process and goes without issue, but the Front End Server patch failed and rolled back.  Examining the log file in the error message was ultimately helpful, but given the amount of information in there, it was truly finding a needle in a haystack.  But the needle was found:

Lync-CUInstaller-FirstLogError
Product: Microsoft Lync Server 2013, Front End Server -- Error 29024.  Error 0x80004005 (Unspecified error) occurred while executing command 'D:\Program Files\Microsoft SQL Server\110\Tools\Binn\osql.exe'. For more details check log file 'C:\users\username\AppData\Local\Temp\LCSSetup_Commands.log'.

A log file within a log file…interesting…  Alright, I’ll follow the bread crumbs:

Lync-CUInstaller-SecondLogError
Msg 5011, Level 13, State 9, Server computername\RTCLOCAL, Line 5
User does not have permission to alter database 'rtc', the database does not exist, or the database is not in a state that allow access checks.
Msg 5069, Level 16, State 1, Server computername\RTCLOCAL, Line 5
ALTER DATBASE statement failed.

The KB installer is calling an executable, osql.exe, and using a T-SQL script to initiate changes.  I had to look up each of the osql.exe command line switches, but the one that is most important to notice is the “-E”:

Uses a trusted connection instead of requesting a password

Effectively what that command means is “use Windows Integrated Authentication”, which thereby means that my user account should be used.  My user account has all the rights in the world (including sysadmin in SQL), so why is this failing?  I tried many, many things – even going so far as blowing away Lync (bootstrapper /scorch) and SQL databases – but none of them made any difference.  The CU installer would always fail with the same error every time.  Nothing seemed to make a difference.

The Plot Thickens

Given my failure and frustration, I fired up SQL Tracing:

Lync-SQLTracing-Failure

I was very, very surprised to see the “NT AUTHORITY\SYSTEM” account being used for the LoginName.  My user account is launching the application executable – why aren’t those credentials being used!?  Looking at Management Studio for the RTCLOCAL insteance, the “NT AUTHORITY\SYSTEM” account does have a login, but it is not granted any elevated permissions or rights:

Lync-SQLSYSTEM-ServerRoleRights

No sysadmin role means that it cannot alter databases within the instance.  That’s sort of an explanation, but why is this the first time I’m seeing this problem!?

Digging Further

The ultimate epiphany came when I began to look at how the LyncServerUpdateInstaller.exe worked.  The executable extracts .MSP files that contain each of the individual Lync Server application patches.  The .MSP file contains all the logic and T-SQL scripts that are being executed for this particular Front End Server patch.  The big difference is found in how the .EXE and .MSP differ:

  • My user account launches the LyncServerUpdateInstaller.exe executable
  • My user account is used to initially launch the .MSP files, but the Local System account actually runs the .MSP files.

Microsoft patch files get executed by the Local System account, so that explains why the -E switch to the osql.exe command was passing the “NT AUTHORITY\SYSTEM” credentials.  The osql.exe executable was being called by the .MSP file and that .MSP file was run with the SYSTEM account.  OK, fair enough, but why aren’t permissions correct on my SQL configuration, especially considering I’ve done this hundreds of times before without any previous issue?!

I looked at a few other server installs within this environment and within the TechNet virtual labs and there was one SQL Server login that was missing from this server:

Lync-SQLSYSTEM-WorkingServerRoleRights
BUILTIN\ADMINISTRATORS

This group was granted sysadmin rights, which meant that any local admin of the server had sysadmin rights within SQL.  Nearly any SQL administrator will advocate for not having the local server Administrator group as a login and generally I would agree that is a best practice.  Given all this information, however, it still didn’t explain why the patching process is failing so further research was required…

Continuing On

The ultimate “A-HA!” moment came when I ran across these articles whilst searching for the relationship between the Local System account and the built-in Administrators group:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms684190(v=vs.85).aspx

The LocalSystem account is a predefined local account used by the service control manager. This account is not recognized by the security subsystem, so you cannot specify its name in a call to the LookupAccountName function. It has extensive privileges on the local computer, and acts as the computer on the network. Its token includes the NT AUTHORITY\SYSTEM and BUILTIN\Administrators SIDs; these accounts have access to most system objects.
https://technet.microsoft.com/en-us/library/cc778824(v=ws.10).aspx

System is a hidden member of Administrators. That is, any process running as System has the SID for the built-in Administrators group in its access token.

Go ahead and read those again.  See if the “A-HA!” moment comes to you, too…  OK…I’ll help you…

Effectively what the articles are saying is that the LOCAL SYSTEM SID is, by default, a bona-fide member of the BUILTIN\Administrators group because its token includes the Administrators group SID.  Taking it one step further:  What server role does that group have on the working servers SQL instances?…that’s right….sysadmin.  Since the .MSP file is attempting to access the SQL instance using the built-in SID for the LOCAL SYSTEM account, it has no access to actually update the databases because the Administrators group was not in the instance.

Note:  recall that the SYSTEM account did have a login within SQL, but the available server role rights were set to ‘public’ which means it basically had no rights to do much of anything.

A Fix?

I manually added in the “BUILTIN\Administrators” group to the offending Front End Servers local SQL instances and granted that group sysadmin rights.  I re-ran the LyncServerUpdateInstaller.exe updater again and…SUCCESS!!!

Lync-CUInstaller-Success
Done: Installing KB3120728 for Server.msp 

Had I not found those two articles, I may have never known the true reasons for the behavior I was seeing.  This was the fix though, making sure the “BUILTIN\Administrators” simply matched the configuration of the other servers, which also matched TechNet virtual labs configuration as well.

Wrap Up

Coming full circle: What had actually occurred was that the SQL team had removed the BUILTIN\Administrators group for security reasons, after I had initially pre-installed SQL (which at the time of my installation was included) and that removal was unbeknownst to me.  All of Microsoft’s standard Lync and Skype installers include that group for the SQL instances (and grant it sysadmin), so it truly is critical that the login exists for the purposes of patching.  As I saw, installation of the product occurred just fine but patches would begin to fail outright because the patching process uses the SYSTEM account and not a specific user account.

Note:  As an alternative workaround, you could grant the “NT AUTHORITY\SYSTEM” account in SQL sysadmin rights for the purposes of patching processes, but I doubt many people would want to undergo that additional management complexity.

Bottom line:  if you choose to change the sysadmin rights on your Lync Front Ends and remove the Administrators group, be aware of this issue and plan for workarounds accordingly!

Note:  This issue is another good case study that belongs in my other post, ‘The Dangers of SQL Server Security Hardening for Lync Server & Skype4B Server’, but I separated it into a distinct post for the sake of clarity and so that it would be more easily discoverable via search engines.

02May/16

‘Gateway peer in inbound call is not found in topology document’ with Lync Server 2013

This issue was discovered during an SBA upgrade from Lync Server 2010 to Lync Server 2013.  While the issue does have a few random musings out on the Internet that seemed to be loosely-related (although none were a direct resolution to my issue), it ultimately turned out to be that I was the problem.  I’ve since learned my lesson and am much wiser now…and hopefully you all will be too after reading through this post.

Without further delay…the issue (and thus error):

Lync-QoEGatewayPeerError-NotFound10013; reason="Gateway peer in inbound call is not found in topology document or does not depend on this Mediation Server"

Well that’s interesting…  It seems a Mediation Server is receiving a call from a gateway that isn’t defined in Topology.  Which gateway and which mediation server, though?  Looking through QoE reports I was able to determine it was coming from a PSTN gateway in a branch that I had just completed an SBA upgrade on.  I had to pull the syslogs from the Audiocodes Mediant 1000 gateway and when I did, I discovered something entirely unexpected:

Lync-GWError-SIP400BadRequestSIP/2.0 400 Bad Request

Inbound PSTN calls were failing with a SIP/2.0 Bad Request error that was being generated by Lync.  Even more interesting was that the gateway was actually sending calls to the Front End Mediation Pool and not the SBA Mediation Pool.  This was surprising for a number of reasons:

  • The SBA is configured as the first entry in the ProxySet within the Audiocodes gateway – all incoming calls from the PSTN should go there first.
  • The Front End Mediation pool is configured as the second entry in the ProxySet within the Audiocodes gateway – incoming calls from the PSTN will go there only if the SBA is marked as down.
Lync-GWConfig-LyncProxySet
  • The Front End is configured for resiliency with the SBA within topology, so calls should be accepted no problem in the event the SBA is down.
Lync-TopologyConfig-SBAResiliency

Digging In

First off I had to determine why the calls weren’t going to the SBA Mediation Server.  The Audiocodes gateway monitors the ProxySet peers through SIP OPTIONS requests and sends calls to available servers.   Upon examination it turned out that the Mediation Server service on the SBA had simply failed to start up and wasn’t running.  As a result, calls were being sent to the Front End pool per the ProxySet configuration, which is by design.  Why, however, is the Front End pool failing to process the calls?

Using the AlwaysOn and IncomingandOutgoingCall scenarios in CLS, I was able to examine the debug traces on the Front End servers and find the following piece of information:

Lync-MediationServerError-NextHopPeerNotFoundThe host portion of the from header, gwfqdn.domain.com, arriving at MS listening port (5068) did not match any next hop peers' FQDN or IP Address.

Huh?  What?  This gateway had previously worked just fine through the SBA and I re-confirmed this by turning on the Mediation Server service on the SBA and calls immediately began working.  Turning off the Mediation Server service on the SBA resulted in the calls failing to the Front End pool again.

Down Into the Rabbit Hole…

Into Topology Builder I go and begin examination the Branch Site configuration.

  • PSTN Gateway object correctly defined?  CHECK.
  • Trunk object correctly defined to SBA?  CHECK.
  • SBA Resiliency correct defined to FE Pool?  CHECK.

At this point my brain is spinning in circles trying to figure out what’s wrong and then it dawns on me…the error in the debug logs is telling me exactly what I need to know but I wasn’t understanding how it was saying it.  The error again:

The host portion of the from header, gwfqdn.domain.com, arriving at MS listening port (5068) did not match any next hop peers' FQDN or IP Address.

What this error is really saying is:

An incoming call from the PSTN gateway, gwfqdn.domain.com, arrived at my Mediation Server PSTN gateway port, but I can't accept the call because that gateway is not defined as a next-hop for me.

Mediation Servers cannot accept calls from PSTN Gateways that it has no association to.  How do you define that association in Topology, you ask?  Simple:  you define a TRUNK.

Looking at the Central Site configuration, my error became instantly apparent:  I did not have a Trunk defined between the Front End Mediation Pool and the Gateway in the branch.  😳

I quickly configured a new trunk and published Topology with the new information:

Lync-TopologyConfig-FETrunkToGW

Once CMS replication had completed to the SBA, I monitored the syslog traffic from the Audiocodes gateway and SUCCESS!  Calls were now completing:

Lync-GWSuccess-CallCompleted

I can assure you that no other changes were made, all that was done is the Topology configuration.  In addition, I tested this exact scenario both against a Lync Server 2010 infrastructure and a Lync Server 2013 infrastructure, and in both cases calls would fail unless a trunk was correctly defined in Topology.

My Concern and Confusion

Confusion #1

The biggest confusion I had was incorrectly assuming the resiliency configuration in Topology would handle this particular failure scenario – namely the failure of the Mediation Server on the SBA for incoming call scenarios.  In reality, it doesn’t.  I would strongly recommend that people test this scenario because I am very confident you will have the exact same failures I had.  As a result you will need to define additional trunks in Topology to properly handle this failure scenario in Lync Server 2013 (and above) environments.

Concern #1

There isn’t a single piece of documentation (that I can find) from Audiocodes or Sonus or Microsoft that talks about the requirement for adding an additional trunk to the central site mediation pool when deploying SBAs.  Audiocodes clearly defines adding the FEPool as the second entry in the ProxySet but that configuration in-and-of-itself does you absolutely zero good without the additional trunk configuration in the Lync Server topology.  Configuring the ProxySet without the additional trunk in Topology doesn’t get you automatic inbound call failover, it gets you failed calls.  Not good, not good at all.

Concern #2

SBA’s have been around since Lync Server 2010, but Lync Server 2010 is restricted when compared to Lync Server 2013, especially in regards to Mediation Server flexibility.  In the Lync Server 2010 world a Mediation Server can be 1:N, meaning a single mediation server (or pool) can have only a single trunk to a particular gateway, but that same server (or pool) can connect to multiple different gateways.

Note:  Yes, there are ways of “faking” this and creating multiple trunks to a single gateway using alias DNS records, but many don’t do that and the documentation for SBA installs don’t speak of this anyway.

Where it becomes a problem with Lync Server 2010 is that since you are unable to define a second trunk to the Mediation Server it means that automatic inbound call failover from SBA to FEMediation won’t work.  As a result, in the Lync Server 2010 failure scenario you are forced to update the PSTN Gateway association in Topology first, but only after you’ve detected the failure and have choosen to continue to route through the FEPool.  Not exactly automated…and a bit of a “chicken and egg” scenario.

Concern #3

I’ve talked with many colleagues on this and many were surprised at my findings and/or suspect there’s a “code issue”.  I tested this exact scenario both against Lync Server 2010 Mediation Servers and Lync Server 2013 Mediation Servers and got the same behavior with both.

Note:  I don’t think I’m crazy here, but I will acknowledge a mistake if one has been made.  If someone can prove me wrong, please let me know the details and I’ll update the post to be correct.

That being said, the behavior makes complete sense in that a Mediation Server won’t accept a call from a gateway for which it has no trunk configured in Topology.  You can take this same scenario and stretch it to a non-SBA deployment:  take any gateway in your Topology and begin sending calls to a Mediation Server for which there is no association defined (or Trunk, in Lync-2013 parlance) and your calls will begin to fail.

Bottom Line

If you have Lync Server 2013 (or above), make sure you define alternate trunks to the Front End Mediation Pool so that automatic inbound call failover can occur with your branch SBA deployments.  You don’t necessarily have to use those trunks for outbound calls if you don’t want to, but given that the SBA Mediation Server could fail it does provide you an alternate path to the gateway through backup PSTNUsage routing.  If you have Lync Server 2010 – which is beginning to be long in the tooth, anyway – you will have to manually update the PSTN Gateway association to reflect the Front End Mediation Pool (in a scenario where the SBA Mediation Server is down) so that calls are accepted and processed.

Again, what this boiled down to was a horribly ignorant understanding of inbound call processing, my incorrect assumptions of SBA/FE resiliency functionality, and an ignorance of configured Trunks in Topology.  It’s my own fault, but I’m appreciative of the opportunity to correct an oversight and set concepts straight within my brain!  As an additional note, make sure that you define alternate trunks for non-SBA deployment scenarios as well – say a FE disaster recovery scenario – so that you don’t run into the same issue in that scenario!

Note:  This environment is running the January 2016 CU for Lync Server 2013, so if some of you are thinking it’s a code issue…I’m not sure I agree!

17Mar/16

Lync Phone Edition TLS Limitations

Updated 12/21/2018 – Added information about removal of 3DES cipher coming on February 29, 2019.

Updated 1/22/2018 – Added information about forced usage of TLS 1.2 coming in CY 2018.

Updated 3/21/2016 – Added information for SNI support

Updated 6/1/2016 – Added KB article for SNI support

Updated 8/17/2016 – Added information for PCI-DSS 3.1

For a long time in the Lync Server world, Lync Phone Edition phones were the most optimal solution for environments that still required physical handsets on desks or within conference rooms (IMHO).  The devices are very simplistic and user friendly, but the down-side is they offer less-than-ideal flexibility and administrative control when compared to Audiocodes or Polycom 3PIP phones.  Despite the limitations, many customers accepted the LPE devices as-is and continued deploying the phones if they were required.  Microsoft has publicly stated that there is a limited support lifecycle still left on those phones (along with the fact that no new development is being geared towards them) so over time it has made the 3PIP phones much more attractive.  As with many things in IT, you discover new information as a result of “breaking something”, and we can officially add one more significant limitation to the Lync Phone Edition list:  TLS support.

TLS Background

Almost everything we do is secured using TLS – banking, e-mail, Facebook, Instagram…you get the picture.  TLS is the successor to SSL and has gone through several iterations to date:

Most modern browsers and Operating Systems support at least TLS 1.0, which has largely become the baseline standard for encrypting communications on the public Internet today.  The newer TLS 1.1 and 1.2 standards include enhancements to include features like Elliptic Curve Cryptography and Perfect Forward Secrecy to make TLS communications significantly stronger, due to the hashes, ciphers, and cipher suites that are utilized.  For many companies out there (and indeed the industry as a whole), advancing security is part of business and operational excellence – PCI and FIPS 140-2 are evidence of this ongoing approach to security.  Many companies have a strong desire to enhance security and as a result begin to make changes to infrastructure to change TLS configurations to make communications “more secure”.  Enter Lync Phone Edition…

Lync Phone Edition TLS Support

Lync Phone Edition is a customized Windows application running on a Windows Embedded CE 6.0 operating system.  The Windows CE 6.0 OS was originally released back in November of 2006 – I emphasize that date because much has changed in 10 years since that release.  The most important thing to remember though is that any application running on Windows CE 6 is subject to the limitations of the OS platform when it comes to SChannel support…which means that the newer versions of TLS are not available to those devices.  What Windows CE 6.0 does support is:

  • SSL 2.0
  • SSL 3.0
  • SSL 3.1 (TLS 1.0)

The other significant limitation to consider are the different cipher suites that are supported within those specific SSL/TLS versions.  As of the December 2015 Cumulative Update, Lync Phone Edition offers these ciphers when trying to negotiate secured communication paths with external entities such as Exchange, Lync/Skype, or ADFS:

  • TLS_RSA_WITH_RC4_128_MD5
  • TLS_RSA_WITH_RC4_128_SHA
  • TLS_RSA_WITH_3DES_EDE_CBC_SHA
  • TLS_RSA_WITH_DES_CBC_SHA
  • TLS_RSA_EXPORT1024_WITH_RC4_56_SHA
  • TLS_RSA_EXPORT1024_WITH_DES_CBC_SHA
  • TLS_RSA_EXPORT_WITH_RC4_40_MD5
  • TLS_RSA_EXPORT_WITH_RC2_CBC_40_MD5
LPE-Dec2015CU-TLSCiphers

Note:  There are quite a few very insecure ciphers that are offered in that list – just goes to show how much has changed since November of 2006!

Heightened Security Ramifications

SChannel Cipher Support

Lync Phone Edition doesn’t support anything higher than TLS 1.0.  When companies begin a process to harden their infrastructure and remove TLS 1.0 to bolster security (or maybe in preparation for PCI DSS 3.1), you effectively remove the ability of the LPE devices to connect to..well…anything.  Consider the following potential scenarios:

  • If you alter SChannel on Lync/Skype4B Front Ends to not allow TLS 1.0, LPE devices can’t register.
  • If you alter SChannel on Exchange Servers to not allow TLS 1.0, LPE devices can’t access Exchange Web Services.
  • If you alter SChannel on Active Directory Federation Services to not allow TLS 1.0, LPE devices can’t authenticate to ADFS to allow access to Office365 services, such as Exchange Online or Skype4B Online.
  • If you alter SChannel on Reverse Proxies to not allow TLS 1.0, LPE devices cannot access pool web services externally.
  • If you alter SChannel on HTTP proxies to not allow TLS 1.0, LPE devices can’t negotiate a compatible TLS version to allow access to Office365 services, such as Exchange Online or Skype4B Online.

Due to the overwhelming outages that can result from removing TLS 1.0, you are far better off not removing TLS 1.0 support!  That is, while you still have the option not to.

SChannel Cipher Suite Support

Another potential path of bolstering security is changing the cipher suites that are utilized within TLS communications.

Using Windows OSs?  Nartac software has a great utility called IISCrypto that automatically sets the SChannel registry keys according to PCI and FIPS 140-2 compliance requirements.

Using non-Windows OSs?  The weakdh.org website has a great write up that gives you instructions on how to set ciphers and cipher suites according to best practice.

PCI Compliance (PCI DSS 3.0)

If you choose the legacy PCI compliance template within IISCrypto, out of the total cipher suites available, only 2 are supported by Lync Phone Edition (Windows CE 6.0):

  • TLS_RSA_WITH_RC4_128_SHA
  • TLS_RSA_WITH_3DES_EDE_CBC_SHA

PCI Compliance (PCI DSS 3.1)

If you choose the new 3.1 PCI compliance template within IISCrypto, you effectively remove all capabilities of Lync Phone Edition to connect with your systems.  PCI DSS 3.1 requires ‘early TLS’ to be disabled (‘early TLS’ = TLS 1.0).  Lync Phone Edition does not support TLS versions higher than 1.0, so this is truly game over regarding the PCI DSS 3.1 standard.

Short Aside:

Many could argue that PCI DSS 3.1 on Lync/Skype systems is a bit over-reaching, and I would definitely agree.  Per the PCI Compliance Website, there are four specific areas that the new PCI DSS aims to deprecate TLS 1.0 from being used:

  • 2.3 – Encryption for VPNs, NetBIOS, file sharing, Telnet, FTP and similar services that are considered insecure;
  • 3 – Encryption for web-based management and other remote (non-console) administrative access;
  • 1 – Encryption of cardholder data during transmission over open, public networks; and
  • 1.1 – Encryption for wireless networks that transmit cardholder data or connect to the cardholder data environment (CDE)

In most environments, cardholder data isn’t being handled by Skype for Business or Lync Server.  That being said, file sharing and transmission of cardholder data could occur in a Skype for Business Server (or Lync Server) implementation so it is still a potentially valid restraint.  Most Windows based clients (Vista+) will be capable of negotiating TLS 1.1+, but even though Lync Phone Edition can’t support file sharing or IM transmissions, removing TLS 1.0 support on the server will remove the capability of LPE to connect at all.  At that point, your Polycom CX600 is nothing more than a large paperweight.

FIPS 140-2 Compliance

If you choose FIPS 140-2 compliance within IISCrypto, out of the total cipher suites available, only 1 is supported by Lync Phone Edition (Windows CE 6.0):

  • TLS_RSA_WITH_3DES_EDE_CBC_SHA

Note:  The big difference between FIPS and PCI is the differences in hash support and cipher suite order that is configured within the registry for SChannel.

Server Name Indication Support

Simply put, Lync Phone Edition does not support Server Name Indication because the Windows CE OS doesn’t support it:

You can't sign in to Skype for Business or Lync clients on devices that don’t support Server Name Indication (SNI)
https://support.microsoft.com/en-us/kb/2973873

SNI is really tied to extensions within TLS 1.1 so the enhanced feature won’t be available for legacy clients that don’t support TLS 1.1 or higher.  As a result of this limitation, be careful when you configure the Web Application Proxy role for ADFS 3.0 due to its use of SNI and the default non-fallback setting.  It can be made to operate in a method that will support non-SNI clients, but be careful when making this change if your WAP is handling traffic for non-ADFS entities!

Office365 TLS Support

With much of the computing landscape shifting to the cloud, you must also look at Office365 to determine which TLS services are supported.  For Microsoft, they must offer FIPS 140-2 support for their services within Office365 so you can look at SSLLabs to determine which versions and cipher suites are supported:

https://www.ssllabs.com/ssltest/analyze.html?d=sipdir.online.lync.com&s=134.170.54.26

O365-Mar2016-TLSCiphers

For Lync Phone Edition devices, the only available cipher suite is TLS_RSA_WITH_3DES_EDE_CBC_SHA.  This largely matches up what what would be available with on-premises SChannel configurations when deploying FIPS 140-2 security requirements.

Microsoft’s TLS Stance for Office365 – Updated

In December 2017, Microsoft made official announcements that TLS 1.2 will now be a unsupported for connections to Office365 as of October 31, 2018.  They have not gone so far as to block TLS 1.0/1.1 but when they do, given that LPE simply doesn’t support TLS 1.2, any LPE device attempting to connect to Office365 will not succeed due to a lack of common protocol.  As a result, those devices would be useless as you will be unable to sign in to Skype4B Online, or connect to EWS Online, or integrate with federated partners that use Office365.  Your only recourse in that instance is to look at third-party phones, such as the Polycom VVX series, to work around this limitation.  Or migrate to the desktop client.

Microsoft’s TLS Stance for Office365 – Updated Again

In December 2018, Microsoft made official announcements that the 3DES cipher suite will be removed from Office365 as of February 29, 2019.  Given that LPE uses 3DES as the only common denominator of support between itself and Office365, this means Phone Edition devices will not be able to establish a connection due to the lack of common cipher suite.  As a result, those devices would be useless as you will be unable to sign in to Skype4B Online, or connect to EWS Online, or integrate with federated partners that use Office365.  This is the death knell foretold of these devices so your only recourse is to look at third-party phones, such as the Polycom VVX series, to work around this limitation.  Or migrate to the desktop client.

PCI Compliance (PCI DSS 3.1) – Updated

Microsoft Azure must support PCI DSS standards and does so through PCS DSS 3.1 support.  Office365, on the other hand, isn’t explicitly outlined as meeting PCI DSS 3.1 requirements yet it does seem to pass PCI DSS 3.0 requirements (at least from a cursory glance).

The Warning

Bottom line:  be very, VERY careful with how you harden your infrastructure when it comes to TLS versions and cipher support.  If you go too far, you’ll shoot yourself in the foot and prevent things from working.  Additionally make sure you examine ALL potential integration points that the LPE devices will contact, especially ADFS and HTTP Proxy scenarios when integrating with Office365!

Note:  TLS 1.1 is available within Windows CE 7, but there has been no public information from Microsoft about updating Lync Phone Edition phones to CE 7.  At the current time, the only recourse is to begin migrating from the LPE devices to 3PIP phones, such as a Polycom VVX to get the improved TLS versions, OR begin migrating to the UC client on desktops.

14Mar/16

Enhanced 911 in Skype4B Server and Lync Server – Part 4

This is a continuation of the series on E9-1-1 within Skype4B Server and Lync Server.  Today, we begin to look at configuration required for our PSTN Gateway and additional coordination required with our telco.

ELIN Configuration

In our last post we had already ensured that LIS data was available and that calls would get routed to our PSTN gateway.  At this point we need to ensure that the PSTN gateway correctly handles the PIDF-LO data and performs ELIN functionality so that it sends a specific PS-ANI for the outbound 9-1-1 call.  This can be a complex topic, sadly, but I’ll try to offer some insight for two of the most common PSTN gateway vendors:  Audiococdes and Sonus.

Audiocodes

Note:  I will not give detailed how-to’s about creating a routing configuration within your gateway.  This will be specific to ELIN configuration only!

Enabling ELIN functionality on Audiocodes gateways is very, very simple.  The specific location of this menu item may vary a little bit given the different hardware platforms Audiocodes has but the settings you need to enable are here:

Advanced SIP Parameters

Skype4B-Audiocodes-ELINGatewayEnable

E911 Gateway

Under the E911 Gateway setting, make sure that ‘NG911 Callback Gateway’ is set.  That setting turns on the ELIN gateway functionality within the Audiocodes device itself and can also be found within the .INI file configuration here:

[SIP Params]

PLAYRBTONE2IP = 0
MEDIACHANNELS = 48
ISPROXYUSED = 1
PLAYRBTONE2TEL = 3
SECURECALLSFROMIP = 1
GWDEBUGLEVEL = 5
;ISPRACKREQUIRED is hidden but has non-default value
ENABLEEARLYMEDIA = 1
SIPGATEWAYNAME = 'noam-us-tn-sbc01.widgets.com'
PROXYREDUNDANCYMODE = 1
SIPTRANSPORTTYPE = 1
TCPLOCALSIPPORT = 5061
TLSLOCALSIPPORT = 5061
LOCALISDNRBSOURCE = 1
PLAYRBTONE2TRUNK = 1
MEDIASECURITYBEHAVIOUR = 3
REDUNDANTROUTINGMODE = 2
SOURCENUMBERPREFERENCE = 'From'
FORKINGHANDLINGMODE = 1
MSLDAPPRIMARYKEY = 'telephoneNumber'
ENABLEEARLY183 = 1
FAKERETRYAFTER = 60
ENABLESYMMETRICMKI = 1
E911GATEWAY = 1
ENABLECALLTRANSFERUSINGREINVITES = 1
RESETSRTPSTATEUPONREKEY = 1
ENERGYDETECTORCMD = 587202560
ANSWERDETECTORCMD = 10485760
SYSLOGCPUPROTECTION = 0

Once the NG911 gateway feature is enabled within the configuration, there is very little left to be done with exception of making sure that a 9-1-1 call has an available route to an available ISDN circuit.

Note:  NG911, or ELIN gateway functionality, is a licensed feature in the Audiocodes world, so make sure you have appropriate licensing on your gateways/SBCs to be able to use it!

E911 Callback Timeout

This setting defines how long the gateway leaves an outbound emergency call mapping in the callback table.  The callback table matches the ELIN used on the call to the internal number/user that dialed 9-1-1 and is sometimes required by legislation so that a PSAP can reach the original caller in case the call gets unexpectedly dropped.  By default it is set to 30, which means 30 minutes.  I would expect few to edit this parameter, but you can change it to a different value if you so choose.

IP to Tel Routing

Skype4B-Audiocodes-911CallRoute

At a minimum you need a single route to match for the destination phone prefix (or the called number), which is ‘911’:

Route Eight

Dest. Host Prefix = *
Source Host Prefix = *
Dest. Phone Prefix = 911
Source Phone Prefix = *
Source IP Address = *
Trunk Group ID = 1
IP Profile ID = 1

Here you are simply matching for calls destined to “911” and telling the Audiocodes to send the call to a specific trunk group associated with your ISDN PRI trunk group.  In my configuration shown above, I only have a single external trunk group ID so all calls get sent to the trunk group ID of ‘1’.

Final Audiocodes Note

Once the route and ELIN functionality is enabled, the Audiocodes gateway will read the PIDF-LO data sent by Skype4B/Lync and use the ‘CompanyName’ field for configuration data input.  Remember that the ‘CompanyName’ field contains the ELINs you want to use for a given ERL, so the gateway simply accepts the values and uses those to generate the PS-ALI on the outbound ISDN call.

Note:  You may need to tweak your Audiocodes manipulation rules for ANI/PS-ANI for your ISDN trunks to make sure that calls get sent with the appropriate calling party number information.  Whatever you do, make sure that the correct (and correctly formatted) PS-ANI gets used – e.g: 10 digit number or 11 digit number.

Sonus

The UX platform is a bit more involved in the configuration of ELIN functionality, but all things considered, it actually offers a bit more flexibility than what you can do with Audiocodes.  The URLs below give Sonus’s opinion on setup so I’ll extol on what they have and clarify a few things:

https://support.sonus.net/display/UXDOC50/Configuring+the+Sonus+SBC+1000-2000+and+Lync+to+Support+Enhanced+Emergency+%28911%29+Services

https://support.sonus.net/display/UXDOC50/Configuring+the+Sonus+SBC+1000-2000+for+Lync+E911+with+PSAP+Callback+Number

SIP Profile

Sonus-SIPProfile-PIDFLO

ELIN Identifier

You define which PIDF-LO attribute the gateway uses to determine location information within the MIME Payloads section.  Using the ‘LOC’ field (Location field in LIS/PIDF-LO) is best practice here so long as you don’t have duplicate names of any ‘Location’ fields within LIS.

Bottom line:  if each location in LIS unique, use the ‘LOC’ field.  Otherwise use one of the alternative options (assuming those options in LIS are unique, as well).

PIDF-LO Passthrough

You can turn this on but it actually won’t be utilized for ISDN calls.  If you were using a SIP-based E9-1-1 service, you absolutely need this setting set to TRUE so that PIDF-LO data gets sent to the Selective Router in the E9-1-1 service.

Bottom line:  turn this on for both ISDN and SIP based implementation.

Unknown Subtype Passthrough

You can turn this on but it actually won’t be utilized for ISDN calls.  If you were using a SIP-based E9-1-1 service, you may need this setting set to TRUE, depending on the E9-1-1 service you are utilizing.  As an aside, this setting may need to be tweaked to make sure that sufficient SIP elements are sent to your ITSP SIP trunks to prevent call setup failures.

Bottom line:  you shouldn’t need this on, but you can turn on if desired or if testing dictates it must be on.

Callback Number Pool

Sonus-CallbackTable-ELINs

A callback number pool is defined to configure ELINs for your ERLs.  In our previous post we know that three ELINs per ERL are required, so in the Sonus gateway we would define two callback number pools each with three unique and non-overlapping numbers.  The picture above shows an example of a single ERL with three ELINs.

Description

I would strongly advise making the description the same name as your ERL.  This simplifies administration and helps you maintain uniformity between Skype4B and Sonus.

Callback Numbers List

Enter the ELINs here.  They can be in any order but just know that the first 9-1-1 call received for this particular ERL will use the first number in the table.  The second call uses the second and so on.

PSAP Number

Put simply, this is your emergency services number that you dial.  For US customers it will be ‘911’.  For other locales it may be ‘112’ or ‘999’.  Match this number with the Emergency Dial String from your Skype4B location policy.

Translation Table

Sonus-TransTable-withELIN

At a minimum you need two transformation rules to match:

  1. Called Address/Number= \911
  2. Input Field Type = ELIN Identifier

Transformation Rule One – Called Address/Number

Input Field Type = Called Address/Number
Input Field Value = \911
Output Field Type = Called Address/Number
Output Field Value = 911
Match Type = Mandatory

Note:  Here you are simply matching for calls destined to “911” and telling Sonus to send the call to “911”.  Do not use the configuration as shown in the picture above – make sure you create a rule that matches your outbound voice route in Skype4B to match the DNIS format (911 vs. +911).

Transformation Rule Two – ELIN Identifier

Input Field Type = ELIN Identifier
Input Field ValueERL location name
E.g.:  "333-FL10-NW Wing"
Output Field Type = Callback Pool Identifier
Output Field ValueCallback Pool Name
E.g.:  "333-FL10-NW Wing"
Match Type = Mandatory

Note:  Here we are matching for the ERL name within the PIDF-LO data.  Once we find the name, we match that name to the callback  number pool we previously configured.  Do not use the configuration as shown in the picture – make sure you use the specific configuration outlined in the section above.

Call Routing Table

Sonus-CallRouteTable-withELIN

A call route table entry is required to get calls to search through a transformation table.  The entry needs to be at the very top of the call routing table to ensure that 9-1-1 calls get the first preference.  The call route entry simply needs to reference the transformation table where you created the ELIN matching rules.

Final Sonus Note

You need to have as many of the second transformation rule above as required for each of the ERLs you have defined!  If you have 10 ERLs, then you will have 10 ELIN identifier matches to 10 indivudal callback number pools.  5 ERLs = 5 ELIN identifier matches.  Etc…

Note:  In previous UX versions there was a limitation on the total number of Callback Number Pools that could be created.  For topologies that have centralized call routing for E9-1-1 or topologies with a large number of ERLs, this may be an issue for you.  The last time I checked the limitation was 20, or 25…I can’t really remember as it has been a few years.  This may alter your gateway planning/deployment strategy so be sure to check with Sonus for latest numbers!

Telco Configuration

This is the last step and unfortunately the most difficult piece because there is no good way for me to provide information for all the various LECs, C-LECs, and other telco providers out there.  Additionally, many of the telcos have very poorly trained sales staff (and sales engineers) that don’t understand how E9-1-1 needs to work, so you often face an uphill battle in getting the correct information.

Note:  I cannot tell you how many times I’ve spoken with telco engineers who know nothing about PS-ALI/PS-ANI or claim it isn’t available or claim it must be purchased separately.  It is available and must be available since E9-1-1 is a requirement by law.  Do not let them provide false info or “drop the ball” on E9-1-1!

Bottom line, make sure your sales engineers know that you are subject to E9-1-1 compliance and need to ensure the ability to utilize PS-ALI/PS-ANI services through their ISDN PRI circuits.  As a result of that request, there may be some additional charges they pass along your way (sorry) but you need to make sure the telco has the following info in order to correctly set up E9-1-1 within their systems:

  • What PS-ANIs are in use for each physical office location
  • The specific PS-ALI data for each PS-ANI (ELIN)
ELIN PS-ALI/LIS Location
+16155551000

+16155551001

+16155551002

LIS Field PIDF-LO Field Setting
Location LOC 333-FL-NW Wing
CompanyName NAM 6155551000;6155551001;6155551002
HouseNumber HNO 333
HouseNumberSuffix HNS
PreDirectional PRD
StreetName RD Commerce
StreetSuffix STS St
PostDirectional POD
City A3 Nashville
State A1 TN
PostalCode PC 37021
Country Country US

Make sure the telco has the address data correctly populated within the PS-ALI database for each of the PS-ANI entries!  Testing is worthless until you have confirmation that it has been completed!

Testing

Once you have confirmation that everything is in place, you MUST TEST!  Contact your local non-emergency number and make them aware that you are testing a new telephony system and want to ensure that E9-1-1 is functioning correctly.  Some locales may have you specify a date/time to begin testing the functionality, but the most important piece is to get the local government & PSAP involved to make sure you have all players alerted and all paperwork completed to ensure compliance.

Once all is in place, begin making calls and testing the ERL recognition across the campus, building, floors, etc.  Test all your ERLs – perform due diligence to make sure that all necessary network components are being recognized before signing off and saying that “yes, things work”.

  • Are your BSSIDs correct?
  • Are your LLDP-MED items correct?
  • Are subnets correct?
  • Are PS-ANI being used correctly by the gateway?
  • Does the PSAP operator receive the PS-ALI information?

Do not take anything for granted or assume anything works.  Make sure you have concrete validation that E9-1-1 is working as you expect it!

Wrapping Up

We’ve made it!  It was a long road and the work isn’t done.  You will now need to maintain this configuration and update as appropriate as the network topology changes or as offices come and go.  It is an ongoing work that is never complete so don’t become complacent with the responsibility!

This concludes the ISDN-related portion of this E9-1-1 series.  What follows next is actually utilizing NG9-1-1 within Skype4B Server. Stay tuned!

07Mar/16

Enhanced 911 in Skype4B Server and Lync Server – Part 3

This is a continuation of the series on E9-1-1 within Skype4B Server and Lync Server.  Today, we begin to take our emergency requirements and turn them into a logical configuration within Skype4B.

Skype4B Server LIS Network Requirements

Just when you thought that all the requirements were in place, there comes one more.  Sorry…don’t shoot the messenger!  We know from the previous post that we need to have two ERLs per floor but we also know that the client must send an identifying network attribute to LIS so that a single, definite location can be ascertained.  Recall that there are multiple options for location recognition by the client (in order of most preferred to least preferred):

  1. BSSID MAC of Wi-Fi access point
  2. Switch port ID from LLDP-MED
  3. Switch Chassis ID from LLDP-MED
  4. Subnet

Using IP subnets to identify locations is by far the most simple solution but there are two other factors you must be extra aware of:

  1. You can’t use subnets if a single subnet spans two ERLs.
  2. You can’t use subnets if a single subnet exists in two separate physical locations.

Regarding issue #1

Take a look at the diagram below:

Skype4B-E911-ERLLayout

If you intend to use IP subnets for location recognition, you must ensure that an IP subnet does not leak outside the physical bounds of the defined ERL it is serving.  Thus, an IP subnet used within ERL 2 must not be used for any client that may exist outside the physical bounds of ERL 2.

Regarding issue #2

This type of scenario tends to exist in medium to large organizations that have grown by acquisition.  In that case, a single subnet (10.0.100.0/24) might be used at a site in the US and also within a site in the UK.  To differentiate on the WAN and keep Layer 3 routing alive, network engineers use Network Address Translation (NAT) to ensure that traffic can route between the two locations.  The major issue with this approach is that LIS is global so you cannot define two separate locations within LIS for a single subnet.

Note:  It’s not that you can’t define two separate locations for a single subnet because you very much can.  The issue is that you can only have one that is active.  As soon as you create the second subnet & location, the first location becomes inactive and thus completely negates E9-1-1 location information you had previously entered.

Additionally, the client sends the distinguishing subnet ID to LIS WS within the web service communication, so the fact that a NAT is in place is completely irrelevant – NAT will change the SrcIP on the Layer 3 traffic between client and server but it doesn’t change the information within the web services request.

Workarounds

With the two caveats above in mind, you have two options to get around these issues:

  1. Utilize unique IP subnets within each ERL and ensure IP subnets are unique across all sites, OR…
  2. Utilize the switch port ID or switch MAC address to identify the ERL and don’t use subnets

Note:  You could technically use Option 2 and forgo Option 1 and identify ERLs by switch information, but the shared-subnet-uniqueness will still be a problem for you if not remediated.  Fixing that problem is painful and hard work to accomplish, but the rewards are plentiful because it removes other configuration blockers that arise for Skype4B/Lync when subnet-uniqueness is not available.  As an aside – IPv6 doesn’t allow that kind of shared-subnet-uniqueness anyway, so network engineers should not be fighting the inevitable (IMHO).  Remember, resistance is futile.

It may not seem like a big deal but the whole goal of E9-1-1 is provide as accurate of location information as possible, so if you can’t ensure subnet uniqueness and subnet geofencing then you’ve broken E9-1-1.  If you have unique IP subnets but can’t keep the subnet within the physical bounds of an ERL then enable LLDP and utilize option 2 above.

Note:  You must utilize LLDP-MED on your switches for Option 2 above, as the Skype clients obtain that information from the LLDP-MED tags on the network.  In addition, you must be using an IP Phone that supports it (LPE and qualified phones do) or a Windows OS that supports it (Windows 8.1 or above) if you are using soft phones only.  An example of information available via LLDP is below.

noam-us-tn-accsw02# show lldp neighbors detail

Chassis id: 0019.2fa7.b28d
Port id: Gi0/13
Port Description: GigabitEthernet0/13
System Name: noam-us-tn-accsw01.widgets.com

System Description: 
Cisco IOS Software, C3560 Software (C3560-ADVIPSERVICESK9-M), Version 12.2(44)SE, RELEASE SOFTWARE (fc1)
Copyright (c) 1986-2008 by Cisco Systems, Inc.
Compiled Sat 05-Jan-08 00:15 by weiliu

Time remaining: 114 seconds
System Capabilities: B,R
Enabled Capabilities: B
Management Addresses - not advertised
Auto Negotiation - supported, enabled
Physical media capabilities:
 10base-T(HD)
 10base-T(FD)
 100base-TX(HD)
 100base-TX(FD)
 1000baseT(HD)
 1000baseT(FD)
Media Attachment Unit type: 16
---------------------------------------------

Total entries displayed: 1

For the purposes of our sample scenario, our network engineer can ensure that subnets are unique and isolated to the physical location of each ERL.  Given that all requirements are in place, we can begin building the pieces of the solution.

Creating the LIS Components

First up is creating the subnet and access point information within LIS:

Set-CsLisSubnet -Subnet 10.7.100.0 -Description "" -Location "333-FL10-NW Wing" -CompanyName "6155551000;6155551001;6155551002" -HouseNumber "333" -HouseNumberSuffix "" -PreDirectional "" -StreetName "Commerce" -StreetSuffix "St" -PostDirectional "" -City "Nashville" -State "TN" -PostalCode "37021" -Country "US"
Set-CsLisSubnet -Subnet 10.7.101.0 -Description "" -Location "333-FL10-SE Wing" -CompanyName "6155551003;6155551004;6155551005" -HouseNumber "333" -HouseNumberSuffix "" -PreDirectional "" -StreetName "Commerce" -StreetSuffix "St" -PostDirectional "" -City "Nashville" -State "TN" -PostalCode "37021" -Country "US"
Set-CsLisWirelessAccessPoint -Bssid 18-8b-9d-74-77-3f -Description "" -Location "333-FL10-NW Wing" -CompanyName "6155551000;6155551001;6155551002" -HouseNumber "333" -HouseNumberSuffix "" -PreDirectional "" -StreetName "Commerce" -StreetSuffix "St" -PostDirectional "" -City "Nashville" -State "TN" -PostalCode "37021" -Country "US"
Set-CsLisWirelessAccessPoint -Bssid 18-8b-9d-74-77-3b -Description "" -Location "333-FL10-SE Wing" -CompanyName "6155551003;6155551004;6155551005" -HouseNumber "333" -HouseNumberSuffix "" -PreDirectional "" -StreetName "Commerce" -StreetSuffix "St" -PostDirectional "" -City "Nashville" -State "TN" -PostalCode "37021" -Country "US"

A few important notes here:

‘Location’ Parameter

The ‘Location” field contains the specifics around your ERL.  You are limited to 20 characters in this field, so arrive on a standard of how you will name this.  Since this field is actually shown in the client, it is best not to make it so convoluted as to not potentially be helpful to end-users who may want to know where a certain person is by looking at their Skype4B Contact Card.  I usually utilize “HouseNumber-Floor-Location” or “Building-Floor-Location” for mine.  Examples:

  • “333-FL10-STE 1001”
  • “333-FL10-NW Wing”
  • “333-FL2-RM 210”
  • “BldgA-FL1-NW Wing”
  • “Bldg10-FL1-RM 210”

Note:  For pure SIP-trunk E9-1-1 scenarios, this information gets passed all the way to the PSAP operator.  For that additional reason it is extremely important to use a clear naming standard.  I recommend using USPS abbreviations to denote floors, wings, suites, etc, so that a PSAP operator has a clear understanding of where you are.

‘CompanyName’ Parameter

Are you using an Audiocodes gateway?  If so, the CompanyName parameter should include the numbers that will be used as the ELINs.  The gateways detect that information in the PIDF-LO data and use the numbers for outbound PS-ANI on outbound 9-1-1 calls.

Are you using a Sonus gateway?  If so, the CompanyName parameter can be anything you want but will likely be your company name.  Duh, right?  Sonus gateways have ELINs defined within the gateways themselves, specifically in the Callback Table configuration, so there is no need to define it within Skype4B.

‘Subnet’ Parameter

The subnet ID must be the subnet as your DHCP server hands out to the clients.  No supernetting allowed here.  Be specific and no subnet mask required.

‘BSSID’ Parameter

The BSSID MAC address must be defined for all of your access points.  You will typically see multiple MAC addresses for each AP, one for each frequency, BSSID name, etc.  Make sure you obtain all Wi-FI MAC addresses from your network engineers and import them appropriately.

Other Parameters

You should use address information that matches the USPS delivery address, if at all possible.  The USPS address lookup is closely linked to proper MSAG-validated addresses, so it helps to perform that due diligence.

Note:  You’ll notice that no MSAG validation is performed as a part of these steps.  For ISDN-based scenarios, it is not required.  When we talk about SIP-based scenarios later on, it is required.  Regardless, get the address information correct the first time you do this so you don’t have to go back and change things after MSAG validation.

Note:  You’ll notice none of the address information changed between the pairs of subnet/BSSID parameters.  Each BSSID and and subnet is mapped to a single physical location – the only difference being the ‘Location’ field between the two sets.  This is critical to ensure that only a single location (ERL) is created within LIS and that each of the network elements maps to a single location.

Note:  When you use the Set-CsLis cmdlets, if an existing location in the database doesn’t exist with the address information you’ve entered, a new location will be created automatically.  This is by far the easiest way to enter LIS data.

Check your work!

Before moving any further, you need to check and make sure that LIS locations are correct!

Get-CsLisLocationet-CsLisSubnet
Skype4B-E911-LISDBSubnets
Get-CsLisWirelessAccessPoint
Skype4B-E911-LISDBBSSIDs

If the locations are correct within LIS, then you need to take the config and make it active:

Publish-CsLisConfiguration -Verbose

Creating the Voice Routing

In order for the 9-1-1 call to route outside of Skype4B Server, you need a voice route and a PSTN Usage.  You could, in practice, go two major routes (pun intended):

  1. Create a single voice route and single PSTN usage for all your ERLs within the location, OR…
  2. Create a dedicated route and dedicated PSTN usage for each ERL within the location

The option you use generally depends on the routing requirements and/or gateway availability within your environment.  I cannot give you a “do this to set up E9-1-1” answer that accurately covers all scenarios.  In almost all cases there are various potential answers so it is up to you to decide what is appropriate given the requirements and gateway availability.  In my fictitious scenario we will send all routes for multiple ERLs out to a single gateway:

#Create new PSTN Usage
Set-CsPSTNUsage -Identity global -Usage @{Add="US-TN-Nashville-333Commerce-Emergency"} -WarningAction:SilentlyContinue

#Create new voice route match for 911
New-CSVoiceRoute -Name "US-TN-Nashville-333Commerce-Emergency" -PSTNUsages "US-TN-Nashville-333Commerce-Emergency" -NumberPattern '^\+(911)

I won’t belabor the configuration of voice routing here, but just know that we should have a specific voice route AND specific PSTN usage that will be dedicated 100% to emergency calls. &nbsp;This usage and route should not be shared with&nbsp;<strong>any&nbsp;</strong>other type of outbound call from Skype4B. &nbsp;Additionally, we are enabling PIDF-LO support on the trunk out to our voice gateway so that Skype4B can pass the location data within SIP messaging to the gateway – this is required so that the gateway can perform ELIN functionality for us. &nbsp;After all is said and done, it ends up looking like this: <pre><a href=”https://ucvnext.org/wp-content/uploads/2016/02/Skype4B-E911-VoiceRoutingConfig.jpg” rel=”attachment wp-att-571″><img class=”aligncenter wp-image-571 size-full” src=”https://ucvnext.org/wp-content/uploads/2016/02/Skype4B-E911-VoiceRoutingConfig.jpg” alt=”Skype4B-E911-VoiceRoutingConfig” width=”990″ height=”448″></a></pre> <h3>Creating the Network Components</h3> Once you have routing configuration and LIS data populated there is another piece of topology that is required, and that is configuring the Skype4B Network Configuration. &nbsp;Many people get confused at this part because it seems foolish that you must separately tie LIS data into a network configuration, but it truly&nbsp;<span style=”text-decoration: underline;”>is required</span> to ensure that 9-1-1 call routing is defined to match only the physical network topology. <h4><strong>Create Network Subnets</strong></h4> Each subnet must be created within the network topology: <pre>

New-CsNetworkSubnet -Identity 10.7.100.0 -MaskBits 24
New-CsNetworkSubnet -Identity 10.7.101.0 -MaskBits 24

A few important notes here:

‘Identity’ & ‘MaskBits’ Parameter

The identity and mask bits must be the subnet and subnet mask as your DHCP server hands out to the clients.  No supernetting allowed here.

Create Network Sites

A network site is analagous to an ERL within LIS but it is actually a completely separate entity within Skype4B Server.  In my scenario I could do two things:

  1. Create a network site for each ERL, OR…
  2. Create a network site for each floor

The major restriction will be this:  you apply a location policy (which defines E9-1-1 calling parameters) at the network site so if you need specific voice routing or specific IM notifications for each ERL, then you must ensure that network subnets don’t exist outside of the ERL boundary – this is required so that those subnets are contained within each network site you create within Option 1 above.

Note:  This is another scenario where having non-unique subnets used in multiple sites causes issues.  Network subnets are a global configuration in Skype4B so you can’t have a single subnet exist in two separate locations.  While I’ve discussed mainly E9-1-1 impacts thus far, this problem also causes issues for things like CAC policies and P2P traffic flows.  If this non uniqueness includes your current network topology, then a plan needs to be in place to begin migrating sites to unique subnets across the entire enterprise.

For my fictitious scenario, we have subnets contained within the physical bounds of the ERL, so we can utilize Option 1:

New-CsNetworkSite -Identity "333_FL10_NW Wing" -NetworkRegionID NOAM

New-CsNetworkSite -Identity "333_FL10_SE Wing" -NetworkRegionID NOAM

A few important notes here:

‘Identity’ Parameter

For easiest configuration, I set the identity parameter to be the same as the ERL name.  Technically, it does not need to be the same, but I can tell you from experience that the administration becomes easier if you do so.  Additionally, these network site names show up in the Monitoring Reports so it becomes a huge advantage to have detailed naming, especially for troubleshooting and performance reports.

Note:  This is the only place where you cannot use a ‘-‘ in the identity.  I do not know why, but it’s a huge pain in my naming schemes.  I just switch to ‘_’ for network site names.

‘NetworkRegionID’ Parameter

You must already have a network region ID created in order to create a network site.  I use ‘NOAM’ simply as an example for a previously created network region.

Create Location Policies

Location policies contain the specific configuration for E9-1-1 call behavior and routing within Skype4B.  While there are many ways you could configure this, in practice I see it boiling down to two main methods:

  1. Create a location policy for each ERL and assign it to the corresponding network site, OR…
  2. Create a single location policy and assign to all applicable network sites, for example: per floor or per building

If you have requirements to have specific alerting or routing behaviors per ERL then use Option 1.  In my fictitious scenario we don’t need unique settings for each of our ERLs within the floor, I can utilize Option 2 and configure a single location policy that covers the entire floor:

New-CsLocationPolicy -Identity "333-FL10" -EnhancedEmergencyServicesEnabled $TRUE -PSTNUsage "US-TN-Nashville-333Commerce-Emergency" -EmergencyDialMask "9911;112;999;000" -EmergencyDialString "911" -LocationRequired Disclaimer -NotificationUri "sip:[email protected]"

A few important notes here:

‘Identity’ Parameter

For easiest configuration, I typically set the identity parameter to be the same as the ERL name or as another identifier such as floor name or wing or building.

‘EnhancedEmergencyServicesEnabled’ Parameter

Many people get confused here and turn this off because they aren’t subscribing to an E9-1-1 service.  In reality, you must have this turned on, even for simple purposes as using location information internally.

‘PSTNUsage’ Parameter

This is the usage we created earlier.

‘EmergencyDialMask’ Parameter

Many people also confuse the overall purpose of this parameter and include the logic within their dial plan normalization rules – that’s a huge ‘No-No’ in my opinion.  The dial mask should include all the different ways someone may attempt to dial 9-1-1.  For US residents it may be 9-1-1, or 9-9-1-1, but for others in the world it may be 1-1-2 or 9-9-9 or 0-0-0.  Configure the most common values here.

Note:  I never, EVER configure a dial plan normalization rule to match dialing emergency numbers, because it could allow normalization and outbound emergency call completion when location information may not be available, such as a call through Skype4B Mobile clients or external desktop clients.  In the event one of those calls can be completed (mobile/external/etc) you can’t accurately control routing and location information recognition so you will have calls going to PSAPs with no information about where the caller physically is.  Emergency calling configuration should be isolated to emergency calling components within Skype4B to ensure that physical location information is the only parameter that can be matched and utilized for all 9-1-1 calls.

‘EmergencyDialString’ Parameter

This is what the dial mask gets normalized into.  In the US, it will simply be ‘911’ but in other countries it may be “112” or “999”, etc.

‘LocationRequired’ Parameter

I generally put ‘disclaimer’ here but it is up to your discretion.  If the location information is configured correctly you never have to worry about this setting appearing to end users, but if you want users to see a warning within the Skype4B client when location information is not available, then configure this setting appropriately.

‘NotificationURI’ Parameter

This can be a user or e-mail distribution group of people who should receive an IM when an emergency call is made.  Populate the group as required and put the name in the configuration field.

‘ConferenceURI’ and ‘ConferenceMode’ Parameters

The ‘ConferenceURI’ and ‘ConferenceMode’ settings do not apply when using ELIN gateways to integrate with ISDN circuits.  You must use a SIP-based E9-1-1 service, such as 911Enable or RedSky, to take advantage of those settings.  The reason being is that the SIP-based E9-1-1 service extracts the values from the PIDF-LO data and initiates the call on your behalf.  ISDN has no capability to pass this information on the outbound call so it does not work there.  Additionally, Skype4B server does not perform the call merging itself, so please don’t configure these two settings and wonder why it doesn’t work!

Connect the dots

With all components ready, you simply need to connect the dots by assigning subnets to your network sites and applying the location policy to the network site:

Set-CsNetworkSubnet -Identity 10.7.100.0 -MaskBits 24 -NetworkSite "333_FL10_NW Wing"
Set-CsNetworkSubnet -Identity 10.7.101.0 -MaskBits 24 -NetworkSite "333_FL10_SE Wing"

Set-CsNetworkSite -Identity "333_FL10_NW Wing" -RegionID NOAM -LocationPolicy "333-FL10"
Set-CsNetworkSite -Identity "333_FL10_SE Wing" -RegionID NOAM -LocationPolicy "333-FL10"
Network Subnets
Skype4B-E911-NetworkSubnetConfig 
Network Sites
Skype4B-E911-NetworkSiteConfig
Location Policy
Skype4B-E911-LocationPolicyConfig

With the LIS configuration in place, any time a client communicates with the LIS web services they will now receive proper location information.  With the network & voice routing configuration in place, any 9-1-1 call will be routed correctly and send specific PIDF-LO information to our PSTN gateway for the purposes of 9-1-1 routing.  So long as that gateway is a certified Skype4B device then we can use the native ELIN functionality to take the PIDF-LO data that will be passed from Skype4B and ensure that the gateway sends the dedicated PS-ANI on the outbound call across the ISDN-PRIs.

Wash, Rinse, Repeat

I’ve given a base configuration above, but you would need to implement the same type of configuration for every single ERL within the locations where you plan to deploy voice.  Every subnet, every LIS location, every Wi-Fi AP, etc…  Quite frankly, it’s a lot of work, but every piece needs to be accounted for to ensure that E9-1-1 is truly in place.  Don’t miss something!

Wrapping Up

We’re not done, not quite yet, but we are awfully close.  We have ensured that regardless where you go on that floor, your client will receive correct location data and that Skype4B Server will transmit that unique location data across the SIP trunk to the PSTN gateway.  Stay tuned for the next post as we discuss what need to be communicated with your telco to ensure that all the work you’ve done is handled on their end as well!

-Description “Emergency routing for Nashville, TN – 333 Commerce” #Add PSTN Usage to voice route Set-CSVoiceRoute -Identity “US-TN-Nashville-333Commerce-Emergency” -PSTNGatewayList @{Add=”PstnGateway:noam-us-tn-nashSBC1.widgets.com”} #Alter trunk to enable PIDF-LO New-CsTrunkConfiguration -Identity “PstnGateway:noam-us-tn-nashSBC1.widgets.com” -EnableBypass $TRUE -EnablePIDFLOSupport $TRUE -ForwardCallHistory $TRUE -ForwardPAI $TRUE -SRTPMode Optional[/code]

I won’t belabor the configuration of voice routing here, but just know that we should have a specific voice route AND specific PSTN usage that will be dedicated 100% to emergency calls.  This usage and route should not be shared with any other type of outbound call from Skype4B.  Additionally, we are enabling PIDF-LO support on the trunk out to our voice gateway so that Skype4B can pass the location data within SIP messaging to the gateway – this is required so that the gateway can perform ELIN functionality for us.  After all is said and done, it ends up looking like this:


Creating the Network Components

Once you have routing configuration and LIS data populated there is another piece of topology that is required, and that is configuring the Skype4B Network Configuration.  Many people get confused at this part because it seems foolish that you must separately tie LIS data into a network configuration, but it truly is required to ensure that 9-1-1 call routing is defined to match only the physical network topology.

Create Network Subnets

Each subnet must be created within the network topology:


A few important notes here:

‘Identity’ & ‘MaskBits’ Parameter

The identity and mask bits must be the subnet and subnet mask as your DHCP server hands out to the clients.  No supernetting allowed here.

Create Network Sites

A network site is analagous to an ERL within LIS but it is actually a completely separate entity within Skype4B Server.  In my scenario I could do two things:

  1. Create a network site for each ERL, OR…
  2. Create a network site for each floor

The major restriction will be this:  you apply a location policy (which defines E9-1-1 calling parameters) at the network site so if you need specific voice routing or specific IM notifications for each ERL, then you must ensure that network subnets don’t exist outside of the ERL boundary – this is required so that those subnets are contained within each network site you create within Option 1 above.

Note:  This is another scenario where having non-unique subnets used in multiple sites causes issues.  Network subnets are a global configuration in Skype4B so you can’t have a single subnet exist in two separate locations.  While I’ve discussed mainly E9-1-1 impacts thus far, this problem also causes issues for things like CAC policies and P2P traffic flows.  If this non uniqueness includes your current network topology, then a plan needs to be in place to begin migrating sites to unique subnets across the entire enterprise.

For my fictitious scenario, we have subnets contained within the physical bounds of the ERL, so we can utilize Option 1:


A few important notes here:

‘Identity’ Parameter

For easiest configuration, I set the identity parameter to be the same as the ERL name.  Technically, it does not need to be the same, but I can tell you from experience that the administration becomes easier if you do so.  Additionally, these network site names show up in the Monitoring Reports so it becomes a huge advantage to have detailed naming, especially for troubleshooting and performance reports.

Note:  This is the only place where you cannot use a ‘-‘ in the identity.  I do not know why, but it’s a huge pain in my naming schemes.  I just switch to ‘_’ for network site names.

‘NetworkRegionID’ Parameter

You must already have a network region ID created in order to create a network site.  I use ‘NOAM’ simply as an example for a previously created network region.

Create Location Policies

Location policies contain the specific configuration for E9-1-1 call behavior and routing within Skype4B.  While there are many ways you could configure this, in practice I see it boiling down to two main methods:

  1. Create a location policy for each ERL and assign it to the corresponding network site, OR…
  2. Create a single location policy and assign to all applicable network sites, for example: per floor or per building

If you have requirements to have specific alerting or routing behaviors per ERL then use Option 1.  In my fictitious scenario we don’t need unique settings for each of our ERLs within the floor, I can utilize Option 2 and configure a single location policy that covers the entire floor:


A few important notes here:

‘Identity’ Parameter

For easiest configuration, I typically set the identity parameter to be the same as the ERL name or as another identifier such as floor name or wing or building.

‘EnhancedEmergencyServicesEnabled’ Parameter

Many people get confused here and turn this off because they aren’t subscribing to an E9-1-1 service.  In reality, you must have this turned on, even for simple purposes as using location information internally.

‘PSTNUsage’ Parameter

This is the usage we created earlier.

‘EmergencyDialMask’ Parameter

Many people also confuse the overall purpose of this parameter and include the logic within their dial plan normalization rules – that’s a huge ‘No-No’ in my opinion.  The dial mask should include all the different ways someone may attempt to dial 9-1-1.  For US residents it may be 9-1-1, or 9-9-1-1, but for others in the world it may be 1-1-2 or 9-9-9 or 0-0-0.  Configure the most common values here.

Note:  I never, EVER configure a dial plan normalization rule to match dialing emergency numbers, because it could allow normalization and outbound emergency call completion when location information may not be available, such as a call through Skype4B Mobile clients or external desktop clients.  In the event one of those calls can be completed (mobile/external/etc) you can’t accurately control routing and location information recognition so you will have calls going to PSAPs with no information about where the caller physically is.  Emergency calling configuration should be isolated to emergency calling components within Skype4B to ensure that physical location information is the only parameter that can be matched and utilized for all 9-1-1 calls.

‘EmergencyDialString’ Parameter

This is what the dial mask gets normalized into.  In the US, it will simply be ‘911’ but in other countries it may be “112” or “999”, etc.

‘LocationRequired’ Parameter

I generally put ‘disclaimer’ here but it is up to your discretion.  If the location information is configured correctly you never have to worry about this setting appearing to end users, but if you want users to see a warning within the Skype4B client when location information is not available, then configure this setting appropriately.

‘NotificationURI’ Parameter

This can be a user or e-mail distribution group of people who should receive an IM when an emergency call is made.  Populate the group as required and put the name in the configuration field.

‘ConferenceURI’ and ‘ConferenceMode’ Parameters

The ‘ConferenceURI’ and ‘ConferenceMode’ settings do not apply when using ELIN gateways to integrate with ISDN circuits.  You must use a SIP-based E9-1-1 service, such as 911Enable or RedSky, to take advantage of those settings.  The reason being is that the SIP-based E9-1-1 service extracts the values from the PIDF-LO data and initiates the call on your behalf.  ISDN has no capability to pass this information on the outbound call so it does not work there.  Additionally, Skype4B server does not perform the call merging itself, so please don’t configure these two settings and wonder why it doesn’t work!

Connect the dots

With all components ready, you simply need to connect the dots by assigning subnets to your network sites and applying the location policy to the network site:



With the LIS configuration in place, any time a client communicates with the LIS web services they will now receive proper location information.  With the network & voice routing configuration in place, any 9-1-1 call will be routed correctly and send specific PIDF-LO information to our PSTN gateway for the purposes of 9-1-1 routing.  So long as that gateway is a certified Skype4B device then we can use the native ELIN functionality to take the PIDF-LO data that will be passed from Skype4B and ensure that the gateway sends the dedicated PS-ANI on the outbound call across the ISDN-PRIs.

Wash, Rinse, Repeat

I’ve given a base configuration above, but you would need to implement the same type of configuration for every single ERL within the locations where you plan to deploy voice.  Every subnet, every LIS location, every Wi-Fi AP, etc…  Quite frankly, it’s a lot of work, but every piece needs to be accounted for to ensure that E9-1-1 is truly in place.  Don’t miss something!

Wrapping Up

We’re not done, not quite yet, but we are awfully close.  We have ensured that regardless where you go on that floor, your client will receive correct location data and that Skype4B Server will transmit that unique location data across the SIP trunk to the PSTN gateway.  Stay tuned for the next post as we discuss what need to be communicated with your telco to ensure that all the work you’ve done is handled on their end as well!

29Feb/16

Enhanced 911 in Skype4B Server and Lync Server – Part 2

This is a continuation of the series on E9-1-1 within Skype4B Server and Lync Server.  Today, we begin to take some concepts and structure them around the capabilities provided within Skype4B.

Skype4B Server Location Information Service

All location-recognition functionality is provided through the Location Information Service (LIS) within Skype4B Server 2015.  At a high level, LIS is comprised of two major components:

  1. LIS Database
  2. LIS Web Services
Skype4B-E911-LISConceptual
LIS Database

The LIS database is used to house all the information about your network, physical office information and ERL information.  It is used to store a ‘network map’ that maps physical network elements, such as an IP subnet, or wireless access point MAC address, or switch MAC address or switch port ID, to a physical location.

Skype4B-E911-LISDBLocationComponents

Each configured location within the LIS database is comprised of two major pieces of information:

  1. ERL Name
  2. Civic Address
Skype4B-E911-LISDBLocations

Note:  While I show them separately above, the two objects are actually not separate at all.  To define a location within LIS, you must define both, and all within a single PowerShell cmdlet.

The “ERL name” above corresponds to the “Location” field within the LIS location parameters.  The civic address information is just like any other mailing address info we are used to seeing.  When you combine the two together, you get something very specific, such as:

LIS Field LIS Location
Location 333-FL10-NW Wing
CompanyName Widgets, Inc
HouseNumber 333
HouseNumberSuffix
PreDirectional
StreetName Commerce
StreetSuffix St
PostDirectional
City Nashville
State TN
PostalCode 37201
Country US

So in essence, the entire table entry above is a single location, or ERL, within LIS.  Any number of network elements could tie you into the location above.  If multiple ERLs are required within a single site (such as a single building with multiple floors), you will have multiple locations within the LIS database (or multiple tables such as above) and the only distinguishing difference between them will be the value configured in the ‘Location’ field.

LIS Web Services

The LIS web services (LIS WS) serve as a conduit for clients and phones to query the LIS database to determine where they are.

Note:  Mobility clients are not included.  In fact, they don’t support getting location information from LIS at all.  This is primarily a server architecture limitation because LIS WS only works on the internal IIS website and not the external IIS website, which is where mobility clients (even when physically internal) work from.

After a client has successfully registered to a front end pool (steps 1 & 2), it asks LIS web services “where am I?” (steps 3 & 4):

Skype4B-E911-LISWSConceptual

As part of the LIS web service query, the client provides LIS with one of the identifying network attributes it knows about its current network whereabouts, such as a currently connected Wi-Fi access point MAC address, or subnet, or switch MAC address, or switchport ID.  An important note is that there is a preference or precedence of network objects that the clients use when communicating with LIS WS.  In order of most preferred to least preferred:

  1. BSSID MAC of Wi-Fi access point
  2. Switch port ID from LLDP-MED
    • Note:  Not available unless you use LLDP on your switches!
  3. Switch Chassis ID from LLDP-MED
    • Note:  Not available unless you use LLDP on your switches!
  4. Subnet

Thus, if all values are present, the client will send the BSSID MAC within the LIS web services request.  If BSSID MAC is not present but the remainder of the values are present, the client will send the switch port ID within the LIS web services request… and so on and so forth.  Bottom line:  the client sends the distinguishing value to LIS web services so that LIS can look up that value in the database and then return the location information to the client.  When the location information is returned, only the ‘Location’ parameter information (or ERL) is displayed within the Skype4B client:

Skype4B-E911-LISClientDisplay

All other location information is cached within the Skype4B/Lync application itself and only transmitted to the registrar server via SIP/PIDF-LO when an emergency call is actually made.

Now that you understand the framework of location services within Skype4B and Lync, you need to understand what dictates the design of that framework…

Letter of the Law

It seems that we can’t escape regulations within our lives these days and E9-1-1 is no different.  While you can wax poetic about the technical capabilities of a communications solution, the reality you must understand is that the E9-1-1 configuration within Skype4B (and any other communications platform) is completely dependent upon the exact legal/regulatory requirements that may apply to the physical office location.  The most common requirements boil down to:

  • How do you define your ERLs?
    • Per civic address (assuming a single building)?
    • Per building (assuming multiple buildings at a single civic address)?
    • Per floor within each building?
    • A certain number of square feet per floor within each building?
    • Per suite number?
    • Per office number?
    • Other?
  • What are callback requirements?

Hands down, the best public resource for determining E9-1-1 requirements for each state within the US are these two resources:

RedSky’s website is a bit more friendly and contains excerpts of each state’s requirements and is often the best place to start in understanding what state legislation may exist, whereas 911Enable includes a small snippet of each state’s requirements, as they exist on the individual state’s website, and isn’t quite as easy to read.  Regardless of the site you use, you need to do more homework though, and find out what local legislation may exist in addition to any national/state requirements.  Some cities may have their own E9-1-1 requirements that aren’t captured within the website’s analysis, so it is important to perform due diligence beyond what RedySky and 911Enable have.  Many times it is impossible for IT engineers to know the available resources to find out these national/state/local regulations, so plan on getting HR and Legal involved early on.  Once they are involved…hold on for the wild ride.

Internal Politics

The HR and legal meetings tend to be the most difficult aspect of any internal or external E9-1-1 discussions, as it often involves folks that aren’t technical and don’t understand the core concepts and difficulties of E9-1-1.  Regardless of the difficulty, pain and frustration endured, don’t give up on this.  Continue pressing for answers and firm, official decisions.  Get all parties to understand the problem, discover all requirements, discuss potential solutions and accept a design that meets the internal and external needs .  Bottom line, you need to get universal agreement on the following items:

  • Are the state requirements for emergency location identification (ERL) sufficient?
    • Do they match the expectation of safety/security that HR and legal are willing to sign off on for employees?
  • Are more specific emergency location identification (ERLs) required, above and beyond what the state requires?
    • If the state/city requires none, does the company want to provide something better?
  • What is the life safety plan and how does E9-1-1 fit in?
    • How will this new E9-1-1 system fit in to existing policies and procedures around handling emergencies?
  • How does the life safety plan handle multiple emergencies within a single emergency location?
  • How does the life safety plan handle off-hours emergencies?
  • Are notifications needed to alert employees and/or non-employees of an emergency?
  • How do analog phones, such as elevator phones, fit in?
  • Are any additional services or solutions required to meet any of the policy needs?

Until you have almost all of those questions answered, it becomes nearly impossible to architect a technical solution to fit the policy requirements.  Once you do have all the questions answered, you can easily implement the requirements within Skype4B.

The Scenario

Going back to the scenario from my first post, assume we’re back at the same office building.  Each floor is laid out in this general fashion:

Skype4B-E911-FloorLayout

After discussions with HR and legal, the determination has been made that the following E9-1-1 support must be provided for:

  • Unique emergency location per 10,000 sq ft per floor
  • Each emergency location must be able to support a maximum of 3 emergency calls
  • Each emergency call must provide the correct emergency location so that emergency responders can quickly search an area
  • Each emergency call must provide an instant message alert to a group of users that contains the location of the emergency and the caller
  • To ensure E9-1-1 capabilities with the telephone company, each emergency call must use the right ELIN/PS-ALI based on the caller’s physical location

Given that we have the requirements above, we can state the following as technical requirements:

  • We require two ERLs (and thus two LIS locations) per floor
    • NW Wing and SE Wing
  • We require 3 ELINs per ERL
  • We require that location information is populated within LIS and proper configuration within Skype4B to ensure that all calls contain the proper ERL of the location of the caller
  • We require a unique IM notification configuration per floor
  • We require a PSTN gateway that supports ELIN functionality to send the appropriate PS-ALI

Wrapping Up

Now that we have policy decisions in place, the next steps are to take these configuration values and implement within Skype4B.  The configuration of E9-1-1 will include items such as configuring the LIS database, configuring network sites, and enabling location policies.  Stay tuned for the next post on those topics!