Header background

Build automated self-healing systems with xMatters and Dynatrace (Part 1 of 3)

Dynatrace and xMatters have teamed up to help organizations meet the challenges of their increasingly complex cloud environments. Our out-of-the-box xMatters integration automates and closes the feedback loop between Dev and Ops, allowing for automatic push notifications from Dynatrace to xMatters environments. This enables the timely routing of critical information to the responsible team members.

Flow Designer for more consistency in the delivery cycle

At this year’s Google Cloud Next conference, xMatters introduced Flow Designer, a visual designer that enables users to resolve issues without writing a single line of code. How is this done? The toolkit comes with built-in steps for the most requested applications, which flattens the learning curve for integrated toolchains. You can create toolchains that automate remediation by simply dragging and dropping the applications you need. Flow Designer then connects the tools for you. Whether you’re rolling back a release or applying a hotfix, Flow Designer increases speed and creates consistency in the delivery cycle.

xMatters Flow Designer allows you to create toolchains by simply dragging and dropping the applications you need
xMatters Flow Designer allows you to create toolchains by simply dragging and dropping the applications you need

In this three-part blog series, we’ll share the following three common problem scenarios that you can easily solve by building an automated self-healing system with Dynatrace and xMatters Flow Designer:

  • Process crash
  • Full disk
  • Slow microservices

As a first use case, let’s explore how your DevOps teams can prevent a process crash from taking down services across an organization—in five easy steps.

Use case #1: Prevent a process crash from taking down services across an organization

  • Step 1 — Dynatrace identifies the root cause of the problem

Dynatrace is built to understand how dependencies across an environment impact one another. Should an issue arise, Dynatrace AI-driven full-stack monitoring automatically analyzes all dependencies to pinpoint the root cause and provide context around other impacted services. Once it identifies the root cause, it’s time to fix the problem.

  • Step 2 — xMatters passes Dynatrace data into alerts with actionable responses

xMatters passes Dynatrace data to alerts, giving you full incident context to inform your remediation path. Depending on the type of Dynatrace issue, xMatters prompts on-call resources with response option buttons that launch workflows across your systems to start the automated self-healing process—and to keep stakeholders and customers updated.

  • Step 3 — xMatters creates and updates Jira issues with incident information from Dynatrace

Teams rely heavily on tickets during postmortems to identify repeatable processes that can be used to prevent similar incidents in the future. To keep these tickets consistent and up-to-date, bi-directional integration with ticketing systems is paramount. With xMatters, JIRA, and Zendesk these steps become part of an automated toolchain process, that allows xMatters to create tickets (that include complete Dynatrace incident data), assign and update the tickets, and append them with incident resolution information from your other systems (for example, Slack or Ansible).

  • Step 4 — xMatters creates a dedicated Slack channel: users can leverage Slackbot to find and invite the right teams to join the channel.

During an incident, DevOps on-call resources typically rely on a chat platform, like Slack. xMatters Flow Designer can automatically spin up a dedicated Slack channel populated with your critical Dynatrace incident data. Slackbot then references your on-call schedule and groups to invite your teammates to join so you can discuss and execute remediation actions in other tools (for example, Ansible) without ever leaving Slack. Once an incident is resolved, your Slack channel transcript is automatically attached to the related Jira issue, ready for your postmortem review.

  • Step 5 — One-click rollbacks with the xMatters mobile app

When it’s time to push your fix, configuration management tools such as Ansible allow you to quickly control and execute jobs in your impacted systems. Dynatrace ensures that you know which services are affected so you can launch the proper workflow automation from xMatters to Ansible (while simultaneously updating JIRA, Dynatrace, and your other integrated systems).

Depending on the type of the issue, xMatters launches workflows across your systems to start the automated self-healing process.
Depending on the type of the issue, xMatters launches workflows across your systems to start the automated self-healing process.

Wrap up

A great customer experience requires more than development of a useful new feature—it’s imperative that features be available 24/7 without interruption. By integrating your systems and putting an automated self-healing toolchain in place for restoring services, you can drastically reduce incident time and limit the impact of issues on customers.

For more details on these five steps, see Automated self-healing: Crash remediation with xMatters and Dynatrace on the xMatters blog. For more information on Dynatrace and xMatters, please visit our technology partner page.