Tracking Service-Oriented and Web-Oriented Architecture

SOA & WOA Magazine

Subscribe to SOA & WOA Magazine: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get SOA & WOA Magazine: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


SOA & WOA Authors: TJ Randall, Lori MacVittie, Andreas Grabner, Dynatrace Blog, Cynthia Dunlop

Related Topics: SOA & WOA Magazine, IT Strategy, Web Analytics

Article

Predictive Analytics Product Strategy Echoed by Monitoring Expert

Schlossnagle examines interesting concepts on high metric volume performance monitoring, visualization, and anomaly detection

As Netuitive's director of product management, I was fortunate to attend the O'Reilly Velocity Conference this year in San Francisco where one presentation in particular  stood out.

The presentation, "Monitoring & Observability - Getting Off the Starting Blocks" was by technology visionary Theo Schlossnagle.  It examined several interesting concepts on high metric volume performance monitoring, visualization, anomaly detection, and alerting.  In his summary (see slides 69 to 73) he mentions several interesting monitoring tenets (my editorial comments in parentheses):

  • While monitoring all things seems like a good approach, alerting on things that do not have specific remediation requirements is horribly damaging (i.e. don't cry "wolf").
  • Use the same datasets for alerting and visualization (i.e., your visiualization has to enhance your understanding of your alarms, not contradict it.)
  • Never alert against a set of metrics without documentation that...

o Explains the failure in plain English

o Provides concise and repeatable remediation advice and escalation path

o Presents the business impact of the failure condition

o ( i.e. Level 1-3 operations staff want actionable alerts and executives want to know how the problem is affecting business)

This is not only good advice, it is a survival strategy for dealing with the monitoring data deluge faced by infrastructure and application support staff. Netuitive Product Management took a deep introspective look over the past 18 months about "what's working / not working" with regards to anomaly detecting and alarming for our customers. The result of the analysis mirrors much of what Mr. Schlossnagle says, and some of these concepts have been "baked in" to Netuitive's product strategy since the release of Netuitive 6.0 in June of 2012. Specifically:

  • Netuitive has made it much easier to leverage a Knowledge Base to help operations staff understand impact and recommended remediation steps when an "anomaly" is detected.
  • By default, Netuitive only presents Level 1 operations staff with an alert if a set of anomalies can be matched to a Knowledge Base entry with remediation advice.
  • Netuitive's Integration Studio tools make it easy to incorporate virtually any business KPI into the statistical performance models for an application - so that users and executives can quickly assess the business impact of any potential failure condition.

All this came from a deep-rooted desire to make it easier for Netuitive's Global 2000 customers to operationalize the use of our analytics software.  The last thing we want our customers' executives to feel is that they've bought more expensive shelfware.

By the way, we weren't the only people that noticed the nuggets of wisdom in this presentation.  Gartner's Jonah Kowall, also blogged about it.  Great minds...

More Stories By Marcus Jackson

Marcus is Director of Product Management at Netuitive. He is responsible for the direction of Netuitive's flagship product, including analytics and data visualization. He has over 20 years of experience in software engineering and performance management. Previously, he headed development for Netuitive and the IEEE Computer Society. Marcus holds a bachelor's degree in Computer Science from Harvard University.