Remove Latency Remove Metrics Remove Traffic Remove Video
article thumbnail

Implementing service-level objectives to improve software quality

Dynatrace

By implementing service-level objectives, teams can avoid collecting and checking a huge amount of metrics for each service. First, it helps to understand that applications and all the services and infrastructure that support them generate telemetry data based on traffic from real users. So how can teams start implementing SLOs?

Software 273
article thumbnail

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

As an example, to render the screen shown here, the app sends a query that looks like this: paths: ["videos", 80154610, "detail"] A path starts from a root object , and is followed by a sequence of keys that we want to retrieve the data for. Instead, it is part of a different path : [videos, <id>, similars].

Latency 233
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Data Reprocessing Pipeline in Asset Management Platform @Netflix

The Netflix TechBlog

Existing data got updated to be backward compatible without impacting the existing running production traffic. Data Sharding strategy in elasticsearch is updated to provide low search latency (as described in blog post) Design of new Cassandra reverse indices to support different sets of queries.

Media 237
article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic 279
article thumbnail

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

Investigating a video streaming failure consists of inspecting all aspects of a member account. If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls.

article thumbnail

Percentiles don’t work: Analyzing the distribution of response times for web services

Adrian Cockcroft

There is no way to model how much more traffic you can send to that system before it exceeds it’s SLA. I presented this analysis of response time distributions talk in 2016 — at Microxchg in Berlin ( video ). Mu is the mean of each component, the latency. I’ve been thinking about this for a long time.

Lambda 98
article thumbnail

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

We could also swap out the implementation of a field from GraphQL Shim to Video API with federation directives. So, we relied on higher-level metrics-based testing: AB Testing and Sticky Canaries. To determine customer impact, we could compare various metrics such as error rates, latencies, and time to render.

Traffic 353