Datadog Incident Management streamlines on-call response workflows for DevOps teams

Datadog announced the launch of Incident Management. This new product streamlines on-call response workflows for DevOps teams by unifying alerting data, documentation, and collaboration in a centralized pane of glass. Accessing this disparate information from a single location saves DevOps and security teams significant time while they troubleshoot issues and outages.

A performance or security incident can lead to degraded user experience, lost revenue, and damage to a company’s reputation. Every minute that an incident is occurring further increases the number of affected users and transactions.

Currently, on-call engineers and security teams who respond to incidents rely on multiple pieces of data and documentation living in disconnected tools which must all be evaluated simultaneously.

Datadog Incident Management brings data, documentation, and collaboration together in a single location which all engineers and security team members can jointly work from. This vastly reduces the time needed to repeatedly query multiple systems for data, as well as the time to “onboard” a new team member coming in to help.

“Managing incidents across disparate tools for alerting, communicating, tracking, and investigating makes it difficult to both solve a problem while it’s happening and conduct a post-mortem to prevent it from happening again,” said Marc Weisman, Vice President, Product at Datadog.

“With these new Incident Management features now available alongside our powerful tools for alerting, monitoring, and collaboration, Datadog customers can manage and resolve incidents in a single, unified platform, saving time when it matters.”

“At Olo it’s critical to always have our finger on the pulse of our systems in order to keep restaurants up and running,” said Greg Shackles, Vice President, Technology at Olo.

“While Datadog is already an important part of that, the release of integrated Incident Management can further improve our team’s ability to respond quickly and effectively, in those times it matters most.”

“effx provides users the context and contributing factors of an incident in real-time,” said Joey Parsons, Founder & CEO at effx. “We’re excited to partner with Datadog for Incident Management to provide further insight into managing incidents that can potentially impact just one or thousands of microservices.”

Datadog Incident Management will help teams manage and respond to incidents by allowing users to collect issue signals from across Datadog data sources and declare incidents directly from anomalous graphs or alerts.

Incident responders will then be able to document a timeline of each incident and record incident follow-up tasks. The collaboration and data for each incident will be saved in an incident history for post-mortem review and analysis.

Additionally, Datadog is releasing the following capabilities to support the Incident Management workflow:

  • Mobile App: An Android and iOS application for interacting with Datadog monitors, dashboards, etc, on the go, now generally available.
  • ChatBot: A chat integration with Slack for managing incidents and accessing Datadog data in chat-workflows.
  • Collaborative Notebooks: Improvements to Datadog Notebooks for real-time collaboration, including presence detection and live updates without refresh.

According to 451 Research, “As the world moves essentially entirely away from physical commerce to digital due to the coronavirus outbreak, the pressure on digital properties to perform well escalates.

“Businesses require monitoring tools to alert them when problems are occurring and assist in quickly identifying the root cause of issues. They will rely on incident management and ticketing tools, as well as adjacent collaboration tools, that ensure they can resolve performance problems before they cause a loss of customers – and revenue.”

More about

Don't miss