Remove 2019 Remove Latency Remove Metrics Remove Network
article thumbnail

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

As a result, site reliability has emerged as a critical success metric for many organizations. Uptime Institute’s 2022 Outage Analysis report found that over 60% of system outages resulted in at least $100,000 in total losses, up from 39% in 2019. More than one in seven outages cost more than $1 million. availability.

article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

In this session, we discuss the technologies used to run a global streaming company, growing at scale, billions of metrics, benefits of chaos in production, and how culture affects your velocity and uptime. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges.

AWS 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

In this session, we discuss the technologies used to run a global streaming company, growing at scale, billions of metrics, benefits of chaos in production, and how culture affects your velocity and uptime. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges.

AWS 100
article thumbnail

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

Reconstructing a streaming session was a tedious and time consuming process that involved tracing all interactions (requests) between the Netflix app, our Content Delivery Network (CDN), and backend microservices. Using simple lookup indices in Cassandra gives us the ability to maintain acceptable read latencies while doing heavy writes.

article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

In this session, we discuss the technologies used to run a global streaming company, growing at scale, billions of metrics, benefits of chaos in production, and how culture affects your velocity and uptime. In 2019, Netflix moved thousands of container hosts to bare metal.

AWS 37
article thumbnail

Four tips to maximise your time at DevOps Enterprise Summit 2019, London

Tasktop

In one week’s time, thousands of IT and business professionals will descend on London for the latest iteration of DevOps Enterprise Summit London 2019 (June 25-27 – InterContinental O2, London, UK). Here are four tips to get the most out of DOES London 2019: Tip #1 – Develop a plan of attack. The countdown is on.

DevOps 74
article thumbnail

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

Failure can occur due to a myriad of reasons: misbehaving clients that trigger a retry storm, an under-scaled service in the backend, a bad deployment, a network blip, or issues with the cloud provider. Those two metrics are approximate indicators of failures and latency. Let’s dig into how we accomplished this.

Traffic 252