article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic 339
article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic 279
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

The three strategies we will discuss today are AB Testing , Replay Testing, and Sticky Canaries. To launch Phase 1 safely, we used AB Testing. To launch Phase 2 safely, we used Replay Testing and Sticky Canaries. We knew we could test the same query with the same inputs and consistently expect the same results.

Traffic 353
article thumbnail

Efficient SLO event integration powers successful AIOps

Dynatrace

For instance, consider how fine-tuned failure rate detection can provide insights for comprehensive understanding. Please refer to How to fine-tune failure detection (dynatrace.com) for further information. SLOs must be evaluated at 100%, even when there is currently no traffic. What characterizes a weak SLO?

article thumbnail

How Dynatrace boosts production resilience with Site Reliability Guardian

Dynatrace

Validation tasks are then extended left to cover performance testing and release validation in a pre-production environment. While the first guardian validates the traffic, the second guardian checks the business transactions generated during the observation period. The functionality is implemented via an automated workflow.

DevOps 195
article thumbnail

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Logs and background requests are examples of this type of traffic.

Traffic 252
article thumbnail

9 key DevOps metrics for success

Dynatrace

Two important ways to improve this metric are to implement quality assurance testing throughout multiple development environments and to automate testing and DevOps processes. A change failure rate above 40% can indicate poor testing procedures, which means teams will need to make more changes than necessary, eroding efficiency.

DevOps 209