article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

Replay Traffic Testing Replay traffic refers to production traffic that is cloned and forked over to a different path in the service call graph, allowing us to exercise new/updated systems in a manner that simulates actual production conditions. This approach has a handful of benefits.

Traffic 339
article thumbnail

Service level objectives: 5 SLOs to get started

Dynatrace

More than half of CIOs confirmed that they often make tradeoffs among code quality, security, and reliability to meet the need for rapid software delivery. Fitness app : The fitness app should offer a response time of less than 500 milliseconds for exercise tracking and data recording.

Latency 182
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Service level objective examples: 5 SLO examples for faster, more reliable apps

Dynatrace

More than half of CIOs confirmed that they often make tradeoffs among code quality, security, and reliability to meet the need for rapid software delivery. Fitness app : The fitness app should offer a response time of less than 500 milliseconds for exercise tracking and data recording.

Traffic 173
article thumbnail

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

On the Android team, while most of our time is spent working on the app, we are also responsible for maintaining this backend that our app communicates with, and its orchestration code. Image taken from a previously published blog post As you can see, our code was just a part (#2 in the diagram) of this monolithic service.

Latency 233
article thumbnail

MezzFS?—?Mounting object storage in Netflix’s media processing platform

The Netflix TechBlog

We parallelize rerun jobs with Titus , Netflix’s container management platform, which allows us to exercise many hundreds of replay files in minutes. Here’s an attempt to represent this logic with some pseudo code: [link] There is a limit on the throughput you can get out of a single HTTP connection to S3.

Media 214
article thumbnail

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages (Part II)

The Morning Paper

1:18pm a key observation was made that an API call to populate the homepage sidebar saw a huge jump in latency. The process tracing exercise included: Examning IRC transcripts from multiple channels. Gathering timestapms of changes made to application code during the outage. Semi-structured interviews using cued-recall.

article thumbnail

Fixing a slow site iteratively

CSS - Tricks

With all of this in mind, I thought improving the speed of my own version of a slow site would be a fun exercise. The code for the site is available on GitHub for reference. I’m going to update my referenced URL to the new site to help decrease latency that adds drag to the initial page load.

Cache 92