Header background

Path to NoOps part 2: How infrastructure as code makes cloud automation attainable—and repeatable—at scale

NoOps may seem like a pipe dream. But our own experience at Dynatrace illustrates how NoOps can be a lot more attainable than you think by embracing an infrastructure-as-code mindset.

Infrastructure as code is a way to automate infrastructure provisioning and management. And it’s a crucial step toward achieving cloud automation on the path to NoOps.

In my previous blog post, Path to NoOps part 1: How modern AIOps brings NoOps within reach, I explored the aspirations of NoOps and how modern AIOps makes it possible. But how does it work in practice? Is it practical? And can other organizations realistically expect the same results? In this blog, I explore how Dynatrace has made cloud automation attainable—and repeatable—at scale by embracing the principles of infrastructure as code.

NoOps through modern AIOps: The Dynatrace story

As a torchbearer of modern AIOps, the Dynatrace’ AI engine, Davis®, provides a purpose-built AI platform for today’s web-scale modern cloud. Davis analyzes hundreds of billions of dependencies and auto-discovers billions of dynamic topology changes per second.
Mean time to repair (MTTR) includes the time teams need to detect, identify, fix, and verify issues. Without automation, these operations can span a wide range.

mean time to repair can span a long time without cloud automation

With its AI platform approach, however, Dynatrace eliminates mean time to detect (MTTD) and mean time to identify (MTTI) an issue’s root cause. Using AI, Dynatrace instantly and automatically identifies problems. Dynatrace can then automatically raise a service ticket and directly update the root cause in the service ticket. This automatic response eliminates time-consuming triage and reduces mean time to repair (MTTR) to just a few minutes, including fixing and verifying the issue.

cloud automation enables partial automation of MTTR processes

With this kind of reliable and automatic intelligence, teams can even automate fixing and verifying, thus attaining NoOps.

cloud automation makes it possible to fully automate MTTR for NoOps

Dynatrace IT itself has implemented a NoOps model for our own IT operations. With a skeleton staff of 7 on a on 24×7 schedule, we increased the number releases from 2 to 26 per year and reduced production bugs by 93% since 2014.

Dynatrace stats on its own NoOps capability using infrastructure as code and cloud automation
Dynatrace achieved its own IT transformation using infrastructure-as-code principles and Cloud Automation.

Hear the story of how Dynatrace achieved “NoOps” directly from our Chief Technology Officer, Bernd Greifeneder in “From 0 to NoOps in 80 days.”

But to fully integrate and automate our cloud-native continuous delivery and operations, we needed a control plane to automatically orchestrate most of the operations tasks. So we built one: The Dynatrace Cloud Automation control plane.

What is Dynatrace Cloud Automation?

Dynatrace Cloud Automation is an enterprise-grade control plane that extends intelligent observability, automation, and orchestration capabilities of the Dynatrace platform to DevOps pipelines. The goal of Cloud Automation is for development teams to build better software faster and operations to automate mundane repetitive tasks and focus on innovation.

This AI-driven control plane further abstracts away the complexity of underlying webhook-based integrations, enabling IT to assemble end-to-end processes that manage related interdependencies regardless of the underlying technology. As a result, IT teams can automate and manage processes across on-premises and cloud-based systems, or between multiple cloud services to prevent vendor lock-in.

Cloud Automation use cases

Let’s look at some scenarios where teams can use Cloud Automation in their NoOps journey.

Closed-loop remediation

Remediating an issue consists of the following stages:

  1. Identify the fix
  2. Apply the fix
  3. Validate the fix
  4. Update the service ticket
  5. Notify the relevant people

Individually, the tasks may take only a few minutes, but they add considerable overhead for the support staff. Many organizations execute scripts or runbooks manually to remediate trivial issues. These teams often hesitate to automate these runbooks because the triggering condition could result in a false positive. In other words, the condition could match even when the issue has disappeared or was not even there in the first place. As a result, teams must verify their data first because the triggering condition could be based on discovered data that is stale.

However, by advancing AIOps, Dynatrace considers dynamic CI relationships and dependencies instantly and automatically. Hence there are far fewer chances for false positives. Using Davis, Cloud Automation can trigger the right fix for an issue, validate the fix by running a synthetic test, update the service ticket, and notify stakeholders using communication channels—all in an automated way.

Davis AI automatically detects problems in incident response using cloud automation and infrastructure as code

The benefits of this closed-loop remediation include the following:

  • Freeing up your support staff from routine tasks
  • Lower MTTR
  • More controlled, consistent, and sustainable problem resolution
  • Transparency and scalability

Infrastructure-as-code

Infrastructure as code uses a declarative language to achieve the desired state as opposed to scripts that define a set of steps to execute. The purpose of infrastructure as code is to enable developers or operations teams to automatically manage, monitor, and provision resources, rather than manually configure discrete hardware devices and operating systems. Infrastructure as code is sometimes referred to as programmable or software-defined infrastructure. With the proliferation of infrastructure-as-code tools, operations teams can:

  • Deploy, configure, or tear down workloads into an instance in real-time
  • Ramp up or down resources in real-time based on workload requirements
  • Block or provision access to resources in real time based on threats or requests
  • Proactively manage web and mobile applications based on user experience or traffic

Embracing the concept of infrastructure as code has also helped to ease our own DevSecOps journey. Like our customers, we use Dynatrace to monitor thousands of environments and supporting applications, and we needed a way to streamline the configuration process. In response, Dynatrace introduced Monaco (Monitoring-as-code). Using Monaco, organizations can offer true application monitoring as a self-service to its users and applications. Teams can now onboarding hosts to the Dynatrace platform can be done in a few minutes instead of hours, reducing human dependencies.

Through its webhook-based integrations, Cloud Automation can further push configurations and deployments to a wide variety of tools using their infrastructure-as-code capabilities, through its webhook-based integrations.

SecOps

Dynatrace provides real-time, continuous surveillance of day 1 and day 2 vulnerabilities with full runtime context, no additional agents to install, no static scans, and no manual analysis for handling day 1 and day 2 vulnerabilities in real-time. The Dynatrace AI engine, Davis, provides intelligence and context to such detected events and helps to decide the remediation workflow automatically. These actions could be resetting a password, disabling a VM, blocking an IP address, or even patching a vulnerable application.

Using Dynatrace Cloud Automation, teams can easily mitigate data breaches and security violations.

Infrastructure as code and cloud automation pave the way to NoOps

While NoOps could probably be a never-ending journey, it is not just a passing cloud. It’s achievable by continuously innovating autonomic applications that require no manual intervention when an issue arises. With constant innovations based on our own path to NoOps, Dynatrace helps organizations on their own NoOps journeys. While it may not eliminate the requirement for Ops staff altogether, organizations can still benefit by greatly reducing their investment in operations and focusing on customer satisfaction, and improving application performance.

To learn more about how Dynatrace modern AIOps makes NoOps possible, join me on December 14, 2022, for our live event, the DevOps and SRE Virtual Workshop: Realize NoOps using Dynatrace Cloud Automation.