article thumbnail

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. CRITICAL : This traffic affects the ability to play.

Traffic 252
article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. In this talk, we share how Netflix deploys systems to meet its demands, Ceph’s design for high availability, and results from our benchmarking.

AWS 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. In this talk, we share how Netflix deploys systems to meet its demands, Ceph’s design for high availability, and results from our benchmarking.

AWS 100
article thumbnail

So many bad takes?—?What is there to learn from the Prime Video microservices to monolith story

Adrian Cockcroft

Then they tried to scale it to cope with high traffic and discovered that some of the state transitions in their step functions were too frequent, and they had some overly chatty calls between AWS lambda functions and S3. They state in the blog that this was quick to build, which is the point.

article thumbnail

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

Uptime Institute’s 2022 Outage Analysis report found that over 60% of system outages resulted in at least $100,000 in total losses, up from 39% in 2019. At the lowest level, SLIs provide a view of service availability, latency, performance, and capacity across systems. Make SLOs realistic.

article thumbnail

Stuff The Internet Says On Scalability For March 8th, 2019

High Scalability

There was already a payment system — it was called the credit card. They'll learn a lot and love you even more. All of the heavy-lifting infrastructure was already in place for it. There was already a telecommunication network, which became the backbone of the internet.

Internet 146
article thumbnail

Achieving observability in async workflows

The Netflix TechBlog

However, they are scattered across multiple systems, and there isn’t an easy way to tie related messages together. You’re joining tables, resolving status types, cross-referencing data manually with other systems, and by the end of it all you ask yourself why? Things got hairy. We wanted a scalable service that was near real-time, 2.

Traffic 160