Staying Ahead Of Microsoft Office 365 Outages

by December 29, 2017

When an enterprise moves its mission-critical office application to Microsoft Office 365, there is a leap of faith that Microsoft’s data centers hosting the servers and the Internet connection will remain available and perform to the mark.

Most frustrating issue for SaaS application customer is that he is at the mercy of the service providers to know of any outages.  The service provider may not know of an outage affecting your tenancy, may not post notifications on their service portals timely or frequently enough – and as an IT admin, you are helpless fielding calls from frustrated users.

Outages happen in and Microsoft Office 365 is no exception, and what IT admin want is let him know when it does. This gives him a chance to get in front of an outage instead of letting it run over himself when he starts fielding tickets from angry users.

Given the vivid nature of stakeholders interested in the O365 status:

  1. Ops Engineer in IT who’s responsible for day to day operations
  2. Reseller who wants to know the health of the tenancies they sold
  3. Helpdesk professional who wants to keep a tab on general service issues
  4. Support engineer who is working with a given Office 365 user regarding their reliability issues
  5. Consulting professional working on an Office 365 solution
  6. Global Administrator at company who is typically first to help his colleagues on their tech problems
  7. All-around admin at the small & medium enterprise who takes care of everything from sales to IT

Like any other SAAS application, maintenance and management is the domain of the service provider. Most frustrating issue for SAAS application customer is that they are at the mercy of the service providers to know of any outages.  The service provider may not know of an outage affecting your tenancy, may not post notifications on their service portals timely or frequently enough – and as an IT admin, you are helpless fielding calls from frustrated users.

For the subscribed Office 365 services by the tenant like Exchange online, SharePoint online, Yammer among others, stakeholders would like to know not only the service status for the tenant but also details around the incidents like total, resolved and pending. Using these metrics, IT admin can resolve the tickets confidently and quickly.

Figure 1 – Current Service status

 

In addition, for SaaS application like Office 365, key parameter to monitor is the service outage duration for each service. Using the accumulated value for this metric over time like a day or a calendar month, an enterprise can get a claim from it’s service provider or reseller for the credit based on the SLA contract.

Figure 2 – Trend Of Service Incident Count

 

For an enterprise since user license are obtained for expected number of end users, having information around license utilization trend can help IT administrator optimize and plan for usage cost. For planning purposes, not only utilization metric but number of assigned and expired licenses along with valid license count for each SKU type procured helps in managing the procurement and usage.

Figure 3 – License Utilization Trend For Each subscribed SKU

Microsoft broadcasts the notifications around active alerts and planned incidents to every tenant. Unified IT infrastructure monitoring tools like CA UIM can capture those notifications and create them as IT alarms internally for the configured tenant only there by filtering the notification bombing by Microsoft.

Figure 4 – Notification to CA UIM Alarm

CA UIM alarm gets updated based on changing notification state by Microsoft. For automated IT service management, in case and enterprise is using any of the ITSM tools like CA Service Desk, Service Now or any other for ticketing, so CA UIM can integrate itself with existing ITSM tool so that alarms can be mapped to a ticket.

If goal is to stay in front of cloud outage, tools like CA UIM can help SaaS, PaaS or even IaaS monitoring. In CA UIM monitoring is done using the light weight agents called probes. For Office 365, it has a probe among 140+ other monitoring probes which gets its non-synthetic monitoring data from published Microsoft management API over REST for each tenant registered in the Microsoft Azure Active Directory.