Code, Exercise and Latency - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Replay Traffic Testing Replay traffic refers to production traffic that is cloned and forked over to a different path in the service call graph, allowing us to exercise new/updated systems in a manner that simulates actual production conditions. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Service level objectives: 5 SLOs to get started

Dynatrace

JUNE 1, 2023

More than half of CIOs confirmed that they often make tradeoffs among code quality, security, and reliability to meet the need for rapid software delivery. Fitness app : The fitness app should offer a response time of less than 500 milliseconds for exercise tracking and data recording.

Latency

Latency Website Traffic Virtualization

Service level objective examples: 5 SLO examples for faster, more reliable apps

Dynatrace

JUNE 1, 2023

More than half of CIOs confirmed that they often make tradeoffs among code quality, security, and reliability to meet the need for rapid software delivery. Fitness app : The fitness app should offer a response time of less than 500 milliseconds for exercise tracking and data recording.

Traffic

Traffic Latency Website Virtualization

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

On the Android team, while most of our time is spent working on the app, we are also responsible for maintaining this backend that our app communicates with, and its orchestration code. Image taken from a previously published blog post As you can see, our code was just a part (#2 in the diagram) of this monolithic service.

Latency

Latency Cache Java Traffic

MezzFS?—?Mounting object storage in Netflix’s media processing platform

The Netflix TechBlog

MARCH 6, 2019

We parallelize rerun jobs with Titus , Netflix’s container management platform, which allows us to exercise many hundreds of replay files in minutes. Here’s an attempt to represent this logic with some pseudo code: [link] There is a limit on the throughput you can get out of a single HTTP connection to S3.

Media

Media Storage Processing Cache

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages (Part II)

The Morning Paper

JANUARY 23, 2020

1:18pm a key observation was made that an API call to populate the homepage sidebar saw a huge jump in latency. The process tracing exercise included: Examning IRC transcripts from multiple channels. Gathering timestapms of changes made to application code during the outage. Semi-structured interviews using cued-recall.

Internet

Internet Internet Cache Engineering

Fixing a slow site iteratively

CSS - Tricks

APRIL 1, 2021

With all of this in mind, I thought improving the speed of my own version of a slow site would be a fun exercise. The code for the site is available on GitHub for reference. I’m going to update my referenced URL to the new site to help decrease latency that adds drag to the initial page load.

Cache

Cache Social Media Media Network

Why I hate MPI (from a performance analysis perspective)

John McCalpin

AUGUST 1, 2018

Bandwidth, performance analysis has two recurring themes: How fast should this code (or “simple” variations on this code) run on this hardware? Interacting components in the execution of an MPI job — a brief outline (from memory): The user source code, which contains an ordered set of calls to MPI routines.

Hardware

Hardware Transportation Performance Latency

Failure Modes and Continuous Resilience

Adrian Cockcroft

NOVEMBER 11, 2019

There are many possible failure modes, and each exercises a different aspect of resilience. Collecting some critical metrics at one second intervals, with a total observability latency of ten seconds or less matches the human attention span much better. This is followed by some common code related failure modes.

Latency

Latency Engineering Systems Hardware

Failure Modes and Continuous Resilience

Adrian Cockcroft

NOVEMBER 11, 2019

There are many possible failure modes, and each exercises a different aspect of resilience. Collecting some critical metrics at one second intervals, with a total observability latency of ten seconds or less matches the human attention span much better. This is followed by some common code related failure modes.

Latency

Latency Engineering Systems Hardware

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

John McCalpin

JANUARY 22, 2018

Introduction: In December 2017, my colleague Damon McDougall (now at AMD) asked for help in porting the fused multiply-add example code from a Colfax report ( [link] ) to the Xeon Phi x200 (Knights Landing) processors here at TACC. Instead, we found puzzle after puzzle. Instead, we found puzzle after puzzle.

Latency

Latency Hardware Code Testing

A persistent problem: managing pointers in NVM

The Morning Paper

DECEMBER 8, 2019

Therefore any programming abstraction must be low latency and the kernel needs to be kept off the path of persistent data access as much as possible. The Twizzler KVS (key-value store) is just 250 lines of C code, and uses one persistent object for the index structure, and a second one for the data. What about security?

Hardware

Hardware Programming Media Storage

The Agile PMO: Consistent Project Gatekeepers

The Agile Manager

DECEMBER 17, 2008

Traditional IT projects are mass economy-of-scale exercises: once development begins, armies of developers are unleashed. Clearly, a small team delivering documentation is nowhere near as significant an event as a large team delivering executable code. An Agile team is not an exercise in scale.

Latency

Latency Code Metrics Testing

Transforming enterprise integration with reactive streams

O'Reilly Software

MARCH 7, 2018

This is mixing concerns and leads to code that becomes strongly coupled, monolithic, hard to write, hard to read, hard to evolve, hard to test, and hard to reuse. Let’s take a look at a code snippet of a simple streaming pipeline to better understand how these pieces fit together. of ( Invoice. alsoTo ( Sink. of ( Order.

Transportation

Transportation Java Programming Architecture

Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Service level objectives: 5 SLOs to get started

Trending Sources

Service level objective examples: 5 SLO examples for faster, more reliable apps

Seamlessly Swapping the API backend of the Netflix Android app

MezzFS?—?Mounting object storage in Netflix’s media processing platform

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages (Part II)

Fixing a slow site iteratively

Why I hate MPI (from a performance analysis perspective)

Failure Modes and Continuous Resilience

Failure Modes and Continuous Resilience

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

A persistent problem: managing pointers in NVM

The Agile PMO: Consistent Project Gatekeepers

Transforming enterprise integration with reactive streams

Stay Connected