article thumbnail

So many bad takes?—?What is there to learn from the Prime Video microservices to monolith story

Adrian Cockcroft

Then they tried to scale it to cope with high traffic and discovered that some of the state transitions in their step functions were too frequent, and they had some overly chatty calls between AWS lambda functions and S3. They state in the blog that this was quick to build, which is the point.

article thumbnail

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

Uptime Institute’s 2022 Outage Analysis report found that over 60% of system outages resulted in at least $100,000 in total losses, up from 39% in 2019. At the lowest level, SLIs provide a view of service availability, latency, performance, and capacity across systems. More than one in seven outages cost more than $1 million.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges. Wednesday?—?December

AWS 100
article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges. Wednesday?—?December

AWS 100
article thumbnail

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Logs and background requests are examples of this type of traffic.

Traffic 252
article thumbnail

Achieving observability in async workflows

The Netflix TechBlog

Prodicle Distribution Our service is required to be elastic and handle bursty traffic. We are expected to process 1,000 watermarks for a single distribution in a minute, with non-linear latency growth as the number of watermarks increases. Things got hairy. We wanted a scalable service that was near real-time, 2.

Traffic 160
article thumbnail

O’Reilly serverless survey 2019: Concerns, what works, and what to expect

O'Reilly

Rather than buying racks and racks of servers that need to handle the maximum potential traffic and be idle most of the time, it seems that serverless’ method of paying by compute is proving to be beneficial to the bottom lines of organizations. latency, startup, mocking, etc.) 1] The serverless adoption survey ran in June 2019.