On the week of January 9, 2017, Microsoft added some considerable new offerings within the Skype Operations Framework. While SOF is helpful in many aspects, its breadth and scope make it difficult to understand what to use and where to use it and how to use it. In more than one way, trying to understanding SOF is like the the old saying ‘trying to drink from a fire hose‘ – the content is all good but the volume of stuff seems to get in the way. Even so, Microsoft provided a home-run in the new content by giving customers a template to utilize for the Call Quality Dashboard within Skype for Business Online. If you are using Skype for Business Online today, you should go download this template and begin looking at your data, because the findings will be eye-opening and worthwhile.
What’s CQD, anyway?
For some out there, you may have no idea what CQD is. Maybe you don’t use Skype4B. Maybe you do but you haven’t delved into the inner-workings. Either way, CQD can simply be described as an advanced way to analyze representations of media streams, media quality, and usage metrics. Before diving in to CQD though, you need a small history lesson…
Within on-premises deployments, you have two databases that comprise what’s known as the ‘Monitoring Databases’:
- CDR.mdf – The CDR database contains call detail records – session information that contains who did something, what they did, and when they did it. Examples include: SIP URI’s, modality type, timestamps, etc.
- QoE.mdf – The QoE database contains quality metrics – specific network and performance information that contains where someone did something and how it performed. Examples include: IP addresses, modality type, packet loss, jitter, MOS, etc.
The big problem back in the Lync Server 2010/2013 era was that while the CDR/QoE information was great to have, the Monitoring Reports that MSFT provided to query the data weren’t overly robust. The pre-built reports offered value but they were not customizable (meaning they were static in the data they queried) and creating new reports required you have an intimate knowledge of SSRS, T-SQL, and an understanding of the CDR/QoE database schema. Most folks – myself included – don’t have that level of understanding so we simply used things as-is.
When Skype for Business Server 2015 landed, Microsoft offered a new solution called the ‘Call Quality Dashboard‘. There are several good things about the solution but my top three would be:
- Reporting and analysis using the power and speed of Microsoft SQL Server Analysis Services – CQD utilizes Microsoft SQL Analysis Services to provide fast summary, filter, and pivoting capabilities to power the dashboard via an Analysis Cube. Reporting execution speed and the ability to drill down into the data can reduce analysis times dramatically.
- New data schema optimized for call quality reporting – The Cube has a schema designed for voice quality reporting and investigations. Portal users can focus on the reporting tasks instead of figuring out how the QoE Metrics database schema maps to the views they need. The combination of the QoE Archive and the Cube provides an abstraction that reduces the complexity of reporting and analysis via CQD. The QoE Archive database schema also contains tables that can be populated with deployment-specific data to enhance the overall value of the data.
- Built-in report designer and in-place report editing – The Portal component comes with several built-in reports modeled after the Call Quality Methodology. Portal users can modify the reports and create new reports via the Portal’s editing functionality.
It’s fast. It’s easier to use. It’s customizable. Win-win-win. Not so fast…
A significant remaining limitation was the lack of in-depth templates (and thus, guidance) for what you should be querying, but bigger than that was the complete lack of visibility to user accounts that may be hosted within Skype for Business Online. Customers were left completely in the dark and unable to examine quality issues for user accounts that were homed within Skype for Business Online. Microsoft heard the complaints though and eventually released the Call Quality Dashboard for Skype for Business Online, thus allowing customers the same data analysis that is available to CQD on-premises. Even though CQD (in both scenarios, on-premises and online) contain some pre-built reports, customers were still left scratching their heads about what other pieces of information they should be examining. What other metrics could shed light on issues they’ve been having? Enter the CQD SOF template (v1)…
What’s Included?
The SOF CQD sample template is a multi-layered set of reports, with a primary mission at examining audio quality. While audio is the primary reporting factor, it does not mean you cannot duplicate reports to search for data for video or application sharing. If a customer wants to report on that data then they absolutely can, but start with audio analysis, resolve your issues, and you should see the remaining modalities start to fall in line.
The top-most report is a usage/trend report that aims at showing you the total number of streams and the percentage of those streams that classify the call as having been poor (or outright failures):
If you click ‘Edit’ to examine the query, you see the data that is being pulled:
A few things to know and understand when looking at queries in the query editor:
- Dimensions – These are items that get put on the X-axis of the chart but those items are used as groupings to summarize the queried measurements into a better visualization of the data. Month/Year is very common but you could report on others as well, such as Network Subnet, or Building Name, etc..
- Measurements – These are the pieces of data that make up the Y-axis of the chart. Your available query options here contain all the pieces of stream information imported from the QoE database. Jitter, packet loss, round-trip-time, percentages, etc are all available at your disposal.
- Filters – Filters can be used to isolate and return only specific sets of data from the larger CQD data sets. Filters impact what is returned for the ‘measurements’ and could be configured to be many things. Month/Year is common (to look at data from only a certain set of months) or you could configure a filter to look at only internal network segments, etc.
Each of the types above effectively correlate back to T-SQL, so if you can grasp your mind around T-SQL then you can work with the query editor in an easier manner:
- Dimensions are like T-SQL GROUP BY statements
- Measurements are like T-SQL SELECT statements
- Filters are like T-SQL WHERE statements
To dig deeper into these reports, you simply ‘follow the rabbit hole’ by clicking on the hyperlink of the report name.
Note: If a report name is clickable, that means there are sub-reports available, otherwise the report name will not be clickable.
One-Level Deeper
The first sub-report contains a bevy of data and includes reports that offer even more sub-reports. The second level top reports include:
Audio
This report (and its sub-reports) is where you will like spend most of your time. We’ll dive a bit further into this report as we keep peeling back the layers of the CQD onion.
Media Reliability – Call Setup Failures
This report (and its sub-reports) is another useful report where you will likely spend some time. If you want to determine why clients aren’t able to connect to media or for supplementary information about why calls are poor, then you’ll find that additional data here. We’ll dive a bit further into this report as we keep peeling back the onion layers.
User Reported Experiences – Rate My Call
This report (and its sub-reports) is, IMHO, useless. Most people I know don’t fill out those prompts asking them to rate a call. Maybe your users do but I don’t find this report all that useful. YMMV.
Client Versions
This report is a useful way to track client versions. Guidance from MSFT is to remain no more than 4 months behind the current version of the client software and this report will help you identify folks using out-of-date versions.
Devices – Microphone
This report is a useful way to track microphones used by clients. Want to find out people who are using internal microphones instead of certified devices? This report (and sub-reports) will tell you.
Devices – Speaker
This report is a useful way to track speakers used by clients. Want to find out people who are using internal speakers instead of qualified devices? This report (and sub-reports) will tell you.
Audio Sub-Reports
You will spend most of your time here, as these reports identify specific stream paths and metrics issues that help you identify the biggest problems in your network environment. The best reports here and most useful (IMHO) are as follows:
Client-to-Server Poor Audio Streams
Use this report to easily examine the number of streams considered poor that involve things like conferences or CloudPBX calling. Dig further info the sub-reports to begin identifying what buildings and/or subnets are the most prone to issue…
Client<->Server Poor Audio Streams by Building
This report will give you the exact location (assuming you have filled out and imported your subnet locations – which you absolutely should!) of the building and network name involving your poor streams. Dig further info the sub-reports to begin identifying what made the calls poor, including metrics and/or connection type…
Client<->Server Poor Audio Streams by Building, Subnet and Network Connection
This report will give you the reason of ‘why’ a call was classified as poor and where those calls are from. Instead of identifying the call as ‘poor’ the table shows you the calls that are poor by classification – packet loss, degradation,round trip, concealed ratio. Another report at this level includes additional information to allow you to potentially help identify last-hop routing issues…
Client<->Server Poor Audio Streams by Reflexive IP
This report will give you the reason of ‘why’ a call was classified as poor and where those calls are from, but also adds the reflexive IP address used in the stream. The Reflexive IP is the IP address as seen by Office365 (the NAT IP or STUN address) of the stream. Use this to help you determine if media streams are egressing from an unexpected network location or to identify if a particular network egress point is potentially saturated.
TCP Usage
This report (and sub-reports) will identify audio streams that use TCP for transport instead of UDP. These reports will effectively help you quantify and isolate firewall configurations that don’t allow the right protocol or right ports. Dive in further to determine network subnets that are the culprits…
TCP Breakdown by Building and Subnet
This report gives you subnets involved in calls using TCP as transport. If TCP is used for transport, there is a possibility that either ports or IP’s may be mis-configured in your network firewalls. Another possibility is that client streams may be egressing via an HTTP proxy…
HTTP Proxy Usage
If you have streams egressing via a proxy, be ready for some significant issues. Avoid proxies if at all possible and you can do so by ensuring traffic to Skype4B Online IP’s are bypassed. Unfortunately this report doesn’t show what network sites these calls are coming from, but one could easily build a sub-report to do so.
Peer-to-Peer Sub-Reports
Without re-posting a bunch of pictures, this report (and sub-reports) contain the same information as the Client-to-Server reports but it is filtered to provide you information for calls between endpoints (P2P) within your network. These calls should never go out to the Internet (or ExpressRoute) so you can help isolate and identify network segments that are problematic within your internal network.
Media Reliability Sub-Reports
Call Setup Failures by Building, Subnet and Reflexive IP
This report helps you identify subnets and/or external IPs that have firewall rules blocking traffic to the Skype4B Online IP ranges. It potentially also helps you identify firewall rules that may be configured for SSL/TLS inspection or DPI/IPS traffic manipulation. Use the ‘Call Setup Failure Reason’ column to help with that identification. Despite this being great, it doesn’t identify which IP addresses are in the communication failure…
Custom Report – Failures to Office365 Media Relays
You can create this custom report to identify exactly which Office365 IP addresses are in the failed communication path. Your firewall team claims that they have things right? Well…if so, this report will show nearly zero failures. If failures exist, then somewhere there is a firewall or router blocking communication to Office365 and you can show them this data to prove it.
Limitation to Note
While the data is great, you should note a few ‘gotchas’:
Limitation One
Since the client submits the QoE reports at the end of the call, the default report data may include information for elements outside of your corporate network. CDR/QoE reports include all parties for a call and/or conference, so it could include federated partners or anonymous guests. As a result, your reports and tables will include IP addresses that confuse and confound both yourself and your network team. You will almost certainly need to filter the queries to isolate your internal network using one of a few methods:
- Use the Second Tenant ID filter
- Use the Second Inside Corp filter
- Use the Inside Corp Pair filter
There seems to be multiple ways to try and filter the data and unfortunately I receive varying results when using each of the queries above. You’ll likely need to play around with the queries and export the data to CSV for some manual analysis, but at the end of the day you can begin to identify network segments using one (or many) of the methods above.
Limitation Two
CQD is all historical data. You cannot use CQD to pre-emptively identify quality issues nor is CQD useful if you haven’t imported your building data. Take the time before deployments to fill out this data.
What’s Next?
The template is great. The insights are valuable. It’s not perfect, however. I’ve already built some tables and reports with content that I’d like to see, especially around actual metrics reports for streams and not just what CQD uses as classification for ‘poor calls’. Microsoft will undoubtedly continue building this template and I definitely look forward to what’s next. Kudos to MSFT for a solid foundation on this!