Availability, Exercise, Latency and Strategy - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

This blog series will examine the tools, techniques, and strategies we have utilized to achieve this goal. In this testing strategy, we execute a copy (replay) of production traffic against a system’s existing and new versions to perform relevant validations. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

Over the course of this post, we will talk about our approach to this migration, the strategies that we employed, and the tools we built to support this. Functional Testing Functional testing was the most straightforward of them all: a set of tests alongside each path exercised it against the old and new endpoints.

Latency

Latency Cache Java Traffic

Real user monitoring vs. synthetic monitoring: Understanding best practices

Dynatrace

JUNE 27, 2022

This includes development, user acceptance testing, beta testing, and general availability. connectivity, access, user count, latency) of geographic regions. The result is a more comprehensive and robust monitoring strategy that will have a longer-lasting impact on user performance and experience. The bottom line?

Best Practices

Best Practices Monitoring Wireless Traffic

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Scalegrid

APRIL 16, 2020

In this post, we compare ScaleGrid’s Bring Your Own Cloud (BYOC) plan vs. the standard Dedicated Hosting model to help you determine the best strategy for your MySQL, PostgreSQL, Redis™ and MongoDB® database deployment. Both AWS EC2 instances and Azure VM instances are available as Reserved Instances, and can be used through the BYOC plan.

Cloud

Cloud Azure AWS Database

Taiji: managing global user traffic for large-scale Internet services at the edge

The Morning Paper

NOVEMBER 14, 2019

Taiji’s routing table is a materialized representation of how user traffic at various edge nodes ought to be distributed over available data centers to balance data center utilization and minimize latency. For example, balance utilisation across all data centers, or optimise for network latency. Sharing is caring caching.

Traffic

Traffic Internet Internet Latency

Failure Modes and Continuous Resilience

Adrian Cockcroft

NOVEMBER 11, 2019

There are many possible failure modes, and each exercises a different aspect of resilience. Collecting some critical metrics at one second intervals, with a total observability latency of ten seconds or less matches the human attention span much better. This is why most AWS regions have three availability zones.

Latency

Latency Engineering Systems Hardware

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages (Part II)

The Morning Paper

JANUARY 23, 2020

1:18pm a key observation was made that an API call to populate the homepage sidebar saw a huge jump in latency. The shop had been closed so no data was available. The process tracing exercise included: Examning IRC transcripts from multiple channels. Gathering timestapms of changes made to application code during the outage.

Internet

Internet Internet Cache Engineering

Fixing a slow site iteratively

CSS - Tricks

APRIL 1, 2021

With all of this in mind, I thought improving the speed of my own version of a slow site would be a fun exercise. The code for the site is available on GitHub for reference. I’m going to update my referenced URL to the new site to help decrease latency that adds drag to the initial page load. Again, every millisecond counts.

Cache

Cache Social Media Media Website

Failure Modes and Continuous Resilience

Adrian Cockcroft

NOVEMBER 11, 2019

There are many possible failure modes, and each exercises a different aspect of resilience. Collecting some critical metrics at one second intervals, with a total observability latency of ten seconds or less matches the human attention span much better. This is why most AWS regions have three availability zones.

Latency

Latency Engineering Systems Hardware

Transforming enterprise integration with reactive streams

O'Reilly Software

MARCH 7, 2018

Today, data needs to be available at all times, serving its users—both humans and computer systems—across all time zones, continuously, in close to real time. Its strategies for flow control are either stop-and-wait (i.e., Welcome to a new world of data-driven systems. Illustrates the flow of data and backpressure in a stream topology.

Transportation

Transportation Java Programming Architecture

A Decade of Dynamo: Powering the next wave of high-performance, internet-scale applications

All Things Distributed

OCTOBER 2, 2017

We were pushing the limits of what was a leading commercial database at the time and were unable to sustain the availability, scalability and performance needs that our growing Amazon business demanded. Durable and Highly-Available – DynamoDB maintains data durability and 99.99

Internet

Internet Internet AWS Performance

Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Seamlessly Swapping the API backend of the Netflix Android app

Trending Sources

Real user monitoring vs. synthetic monitoring: Understanding best practices

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Taiji: managing global user traffic for large-scale Internet services at the edge

Failure Modes and Continuous Resilience

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages (Part II)

Fixing a slow site iteratively

Failure Modes and Continuous Resilience

Transforming enterprise integration with reactive streams

A Decade of Dynamo: Powering the next wave of high-performance, internet-scale applications

Stay Connected