article thumbnail

Why applying chaos engineering to data-intensive applications matters

Dynatrace

Stream processing enables software engineers to model their applications’ business logic as high-level representations in a directed acyclic graph without explicitly defining a physical execution plan. We designed experimental scenarios inspired by chaos engineering. This significantly increases event latency.

article thumbnail

Site reliability engineering: 5 things you need to know

Dynatrace

What is site reliability engineering? Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Dynatrace news. SRE focuses on automation.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Site reliability engineering: 5 things to you need to know

Dynatrace

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Organizations can then integrate these skilled engineers at key points in the DevOps life cycle.

article thumbnail

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

Site reliability engineering (SRE) has recently become a critical discipline in recent years as the world has shifted in favor of web-based interactions. This shift is leading more organizations to hire site reliability engineers to guarantee the reliability and resiliency of their services. Mobile retail e-commerce spending in the U.

article thumbnail

The Speed of Time

Brendan Gregg

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. Measuring the speed of time Is there already a microbenchmark for os::javaTimeMillis()? I happened to be speaking at a technical confering while still debugging this, and mentioned what I was working on to a processor engineer.

Speed 126
article thumbnail

Speed Up Presto at Uber with Alluxio Local Cache

Uber Engineering

Uber’s interactive analytics team shares how they integrated Alluxio’s data caching into Presto, the SQL query engine powering thousands of daily active users on petabyte scale at Uber, to dramatically reduce data scan latencies through leveraging Presto on local disks.

Cache 83
article thumbnail

Automated Change Impact Analysis with Site Reliability Guardian

Dynatrace

This is where Site Reliability Engineering (SRE) practices are applied. SREs use Service-Level Indicators (SLI) to see the complete picture of service availability, latency, performance, and capacity across various systems, especially revenue-critical systems.

DevOps 219