Remove Design Remove Exercise Remove Latency Remove Network
article thumbnail

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

For each route we migrated, we wanted to make sure we were not introducing any regressions: either in the form of missing (or worse, wrong) data, or by increasing the latency of each endpoint. Being able to canary a new route let us verify latency and error rates were within acceptable limits. This meant that data that was static (e.g.

Latency 233
article thumbnail

Automating chaos experiments in production

The Morning Paper

This is a fascinating paper from members of Netflix’s Resilience Engineering team describing their chaos engineering initiatives: automated controlled experiments designed to verify hypotheses about how the system should behave under gray failure conditions, and to probe for and flush out any weaknesses. Safeguards.

Latency 77
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Fixing a slow site iteratively

CSS - Tricks

With all of this in mind, I thought improving the speed of my own version of a slow site would be a fun exercise. Redirects are often pretty light in terms of the latency that they add to a website, but they are an easy first thing to check, and they can generally be removed with little effort. Improvement #2: The Critical Render Path.

Cache 92
article thumbnail

Failure Modes and Continuous Resilience

Adrian Cockcroft

There are many possible failure modes, and each exercises a different aspect of resilience. Another problem is that a design control, intended to mitigate a failure mode, may not work as intended. STPA is based on a functional control diagram of the system, and the safety constraints and requirements for each component in the design.

Latency 52
article thumbnail

Why I hate MPI (from a performance analysis perspective)

John McCalpin

This is an intellectually challenging and labor-intensive exercise, requiring detailed review of the published details of each of the components of the system, and usually requiring significant “detective work” (using customized microbenchmarks, hardware performance counter analysis, and creative thinking) to fill in the gaps.

article thumbnail

Failure Modes and Continuous Resilience

Adrian Cockcroft

There are many possible failure modes, and each exercises a different aspect of resilience. Another problem is that a design control, intended to mitigate a failure mode, may not work as intended. STPA is based on a functional control diagram of the system, and the safety constraints and requirements for each component in the design.

Latency 53
article thumbnail

A persistent problem: managing pointers in NVM

The Morning Paper

(Byte-addressable non-volatile memory,) NVM will fundamentally change the way hardware interacts, the way operating systems are designed, and the way applications operate on data. Therefore any programming abstraction must be low latency and the kernel needs to be kept off the path of persistent data access as much as possible.