Latency, Open Source and Traffic - Technology Performance Pulse

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems.

Systems

Systems Media Cache Open Source

Cluster Diagnostics: Troubleshoot Cluster Issues Using Only SQL Queries

DZone

JULY 6, 2020

TiDB is an open-source, distributed SQL database that supports Hybrid Transactional/Analytical Processing (HTAP) workloads. For external reasons, application traffic may surge and increase the pressure on the cluster. Ideally, a TiDB cluster should always be efficient and problem-free. However, reality is often unsatisfactory.

Open Source

Open Source Latency Traffic Analytics

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. Using simple lookup indices in Cassandra gives us the ability to maintain acceptable read latencies while doing heavy writes.

Infrastructure

Infrastructure Transportation Storage Open Source

MySQL Key Performance Indicators (KPI) With PMM

Percona

JUNE 22, 2023

A monitoring tool like Percona Monitoring and Management (PMM) is a popular choice among open source options for effectively monitoring MySQL performance. That said, it should also be monitored for usage, which will exhibit the traffic pressuring them. This is not an exhaustive list but an example of what we can watch for.

Performance

Performance Monitoring Traffic Database

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

Azure Traffic Manager. It enables you to use popular open-source frameworks such as Hadoop, Spark, and Kafka in Azure cloud environments. Azure Front Door enables you to define, manage, and monitor the global routing for your web traffic by optimizing for best performance and quick global failover for high availability.

Azure

Azure Cloud Big Data Virtualization

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Performance Testing - Tools, Steps, and Best Practices

KeyCDN

AUGUST 15, 2019

Just because everything works perfectly during production testing doesn’t mean that will be the case when your website is flooded with traffic. Bottlenecks can occur, for example, if you have a sudden surge in traffic that your servers are not equipped to handle. What Are the Benefits of Performance Testing?

Testing Tools

Testing Tools Best Practices Performance Testing Testing

Keeping up with Header Bidding’s performance requirements

VoltDB

JUNE 29, 2017

Most existing adtech infrastructure simply can not achieve the required latency. VoltDB provides the necessary technology to achieve the latency required by header bidding. DSPs need to find out the best route to an impression, and will steer traffic towards the best pricing available.

Performance

Performance Hardware Latency Infrastructure

Keeping up with Header Bidding’s performance requirements

VoltDB

JUNE 29, 2017

Most existing adtech infrastructure simply can not achieve the required latency. VoltDB provides the necessary technology to achieve the latency required by header bidding. DSPs need to find out the best route to an impression, and will steer traffic towards the best pricing available.

Performance

Performance Hardware Latency Infrastructure

How Google PageSpeed Works: Improve Your Score and Search Engine Ranking

CSS - Tricks

JULY 25, 2019

Lighthouse is an open source project run by a dedicated team from Google Chrome. Estimated Input Latency. Estimated Input Latency. Speed has become a crucial factor for SEO rankings, especially now that nearly 50% of web traffic comes from mobile devices. What is Google Lighthouse? Speed Index. First CPU Idle.

Google

Google Engineering Speed Mobile

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Redis Revealed: An Overview Redis, a renowned open-source, in-memory remote dictionary server, stands out for its diverse data structures and advanced features. Memcached Explained: The Simplicity of Caching Alternatively, Memcached, a high-performance, open-source caching solution, prides itself on its simplicity.

Cache

Cache Storage Scalability Architecture

Revisiting “Serverless Architectures”

The Symphonia

MAY 22, 2018

I also rewrote the section on Startup Latency since Cold Starts are one of the big “FUD” areas of Serverless. Also there’s been a lot of open source updates, including from Amazon and Microsoft. I was glad to be able to talk about Amazon’s automated traffic shifting / canary releases.

Serverless

Serverless Architecture Lambda Azure

Maximizing Performance of AWS RDS for MySQL with Dedicated Log Volumes

Percona

DECEMBER 11, 2023

DLVs are particularly advantageous for databases with large allocated storage, high I/O per second (IOPS) requirements, or latency-sensitive workloads. For write-only traffic, the QPS counters match the performance of standard RDS instances for lower thread counts, though, for higher counters, there is a drastic improvement.

AWS

AWS Benchmarking Performance Traffic

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Dynatrace

MAY 17, 2023

OpenTelemetry , the open source observability tool, has become the go-to standard for instrumenting custom applications to collect observability telemetry data. Monitoring DNS query time is important for understanding network latency, ensuring that services are available, troubleshooting issues, and optimizing application performance.

Metrics

Metrics Database Monitoring Network

Compression Methods in MongoDB: Snappy vs. Zstd

Percona

MARCH 29, 2023

Snappy Data size: 14.95GB Data size after compression: 10.75GB Avg latency: 12.22ms Avg cpu usage: 34% Avg insert ops rate: 16K/s Time taken to import 120000000 document: 7292 seconds Zstd (with default compression level 6) Data size: 14.95GB Data size after compression: 7.69GB Avg latency: 12.52ms Avg cpu usage: 31.72% Avg insert ops rate: 14.8K/s

Storage

Storage Network Open Source Latency

The Best Way to Host MongoDB on DigitalOcean

Scalegrid

DECEMBER 16, 2019

MongoDB is the #3 open source database and the #1 NoSQL database in the world. Azure and found that DigitalOcean performance was in line with, if not better, on both high throughput and low latency in the deployment. DigitalOcean Droplets.

Azure

Azure AWS Latency Database

Understanding What Kubernetes Is Used For: The Key to Cloud-Native Efficiency

Percona

NOVEMBER 9, 2023

At its core, Kubernetes (often abbreviated as K8s) is an open source tool that automates the deployment, scaling, and management of containerized applications. Applications can be horizontally scaled with Kubernetes by adding or deleting containers based on resource allocation and incoming traffic demands.

Efficiency

Efficiency Cloud Healthcare Open Source

How We Optimized Performance To Serve A Global Audience

Smashing Magazine

AUGUST 3, 2023

It increases our visibility and enables us to draw a steady stream of organic (or “free”) traffic to our site. While paid marketing strategies like Google Ads play a part in our approach as well, enhancing our organic traffic remains a major priority. The higher our organic traffic, the more profitable we become as a company.

Performance

Performance Cache Traffic Metrics

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

There are several open source CDC projects, often using the same underlying libraries, database APIs, and protocols. No locks on tables are ever acquired, which prevent impacting write traffic on the source database. Hence, downstream consumers have confidence to receive change events as they occur on a source.

Database

Database Traffic Transportation Open Source

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

There are several open source CDC projects, often using the same underlying libraries, database APIs, and protocols. No locks on tables are ever acquired, which prevent impacting write traffic on the source database. Hence, downstream consumers receive change events as they occur on a source.

Database

Database Traffic Transportation Open Source

Telltale: Netflix Application Monitoring Simplified

The Netflix TechBlog

AUGUST 13, 2020

We also highlight interesting broader events such as regional traffic evacuations and nearby deployments , information that is vital to understanding health holistically. Regional traffic evacuations. For example, a latency increase is less critical than error rate increase and some error codes are less critical than others.

Monitoring

Monitoring Tuning Traffic Metrics

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

Operational Reporting is a reporting paradigm specialized in covering high-resolution, low-latency data sets, serving detailed day-to-day activities¹ and processes of a business domain. Operational Reporting Pipeline Example Iceberg Sink Apache Iceberg is an open source table format for huge analytics datasets.

Big Data

Big Data Government Analytics Processing

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

It has become a de facto standard for perceptual quality measurements within Netflix and, thanks to its open-source nature , throughout the video industry. This enables us to use our scale to increase throughput and reduce latencies. Here, based on the video length, the throughput and latency requirements, available scale etc.,

Media

Media Innovation Metrics Latency

Scaling Amazon ElastiCache for Redis with Online Cluster Resizing

All Things Distributed

NOVEMBER 21, 2017

Redis's microsecond latency has made it a de facto choice for caching. Four years ago, as part of our AWS fast data journey, we introduced Amazon ElastiCache for Redis , a fully managed, in-memory data store that operates at microsecond latency. TB of in-memory capacity in a single cluster.

Games

Games Retail Latency Education

Proof of Concept: Horizontal Write Scaling for MySQL With Kubernetes Operator

Percona

MAY 15, 2023

In the MySQL open source ecosystem, we have only two consolidated ways to perform sharding — Vitess and ProxySQL. As illustrated above, ProxySQL allows us to set up a common entry point for the application and then redirect the traffic on the base of identified sharding keys.

Traffic

Traffic Scalability Database Servers

MongoDB Database Backup: Best Practices & Expert Tips

Percona

MAY 2, 2023

Also, we will take a look at our open-source backup utility custom-built to help avoid costs and proprietary software – Percona Backup for MongoDB or PBM. Especially if going into or out of storage types that may throttle bandwidth/network traffic. And they are generally very surprised at how long it takes to restore them!

Best Practices

Best Practices Database Storage Servers

Proposal for a Realtime Carbon Footprint Standard

Adrian Cockcroft

APRIL 5, 2023

This proposal seeks to define a standard for real-time carbon and energy data as time-series data that would be accessed alongside and synchronized with the existing throughput, utilization and latency metrics that are provided for the components and applications in computing environments.

Energy

Energy Metrics Cloud Operating System

Content Management Systems of the Future: Headless, JAMstack, ADN and Functions at the Edge

Abhishek Tiwari

NOVEMBER 3, 2018

You should expect one-time implementation cost (depending CMS and business requirements it can cost 200,000 USD to 3M USD) and yearly hosting infrastructure cost (proportional to load and traffic but typically 30,000 USD - 300,000 USD per year). In addition, open source CMS solutions also struggle with blotted plugin ecosystem.

Systems

Systems Cache Website Network

MongoDB Best Practices: Security, Data Modeling, & Schema Design

Percona

APRIL 17, 2023

The CFQ works well for many general use cases but lacks latency guarantees. The deadline excels at latency-sensitive use cases ( like databases ), and noop is closer to no schedule at all. This allows MongoDB to scale horizontally, handling large datasets and high traffic loads. Two other schedulers are deadline and noop.

Best Practices

Best Practices Design Tuning Database

HTTP/3: Performance Improvements (Part 2)

Smashing Magazine

AUGUST 22, 2021

Because we are dealing with network protocols here, we will mainly look at network aspects, of which two are most important: latency and bandwidth. Latency can be roughly defined as the time it takes to send a packet from point A (say, the client) to point B (the server). Two-way latency is often called round-trip time (RTT).

Performance

Performance Network Latency Servers

Hobson's Browser

Alex Russell

JULY 14, 2021

Meanwhile, on Android, the #2 and #3 sources of web traffic do not respect browser choice. On Android today and early iOS versions, WebViews allow embedders to observe and modify all network traffic (regardless of encryption). Users can have any browser with any engine they like, but it's unlikely to be used. How can that be?

Google

Google Mobile Engineering Internet

How To Make Performance Visible With GitLab CI And Hoodoo Of GitLab Artifacts

Smashing Magazine

MAY 20, 2020

This metric is important, but quite vague because it can include anything — starting from server rendering time and ending up with latency problems. This saves clients traffic — sometimes traffic which the client is paying for. This metric shows how much time it takes for the server to respond with something.

Performance

Performance Metrics Best Practices Code

Failure Modes and Continuous Resilience

Adrian Cockcroft

NOVEMBER 11, 2019

Collecting some critical metrics at one second intervals, with a total observability latency of ten seconds or less matches the human attention span much better. For systems that are latency sensitive, creating two independent ways to succeed is an important technique for greatly reducing the 99th percentile latency.

Latency

Latency Engineering Systems Hardware

Failure Modes and Continuous Resilience

Adrian Cockcroft

NOVEMBER 11, 2019

Collecting some critical metrics at one second intervals, with a total observability latency of ten seconds or less matches the human attention span much better. For systems that are latency sensitive, creating two independent ways to succeed is an important technique for greatly reducing the 99th percentile latency.

Latency

Latency Engineering Systems Hardware

Hidden in Plain Sight - Public Key Crypto

Nick Desaulniers

FEBRUARY 22, 2015

As you might imagine, all of these back and forth trips made during the TLS handshake add latency overhead when compared to unencrypted HTTP requests. Peer review and open source, battle tested. I hope this post helped you understand how we can use cryptography to exchange secret information through public channels.

C++

C++ Servers Education Government

Aurora vs RDS: How to Choose the Right AWS Database Solution

Percona

JULY 1, 2023

It efficiently manages read and write operations, optimizes data access, and minimizes contention, resulting in high throughput and low latency to ensure that applications perform at their best. Percona XtraBackup is a free, online, open source, and complete database backup solution.

AWS

AWS Database Serverless Storage

HTTP/3: Practical Deployment Options (Part 3)

Smashing Magazine

SEPTEMBER 6, 2021

Finally, not inlining resources has an added latency cost because the file needs to be requested. Luckily, multiple companies have been working on open-source QUIC and HTTP/3 implementations for over five years now, so we have several mature and stable options to choose from. Support is unclear at this time.

Network

Network Servers Cache Traffic

Front-End Performance Checklist 2021

Smashing Magazine

JANUARY 11, 2021

CrUX generates an overview of performance distributions over time, with traffic collected from Google Chrome users. dashboard (open source), SpeedCurve and Calibre are just a few of them, and you can find more tools on perf.rocks. You can create your own on Chrome UX Dashboard. Large preview ). Large preview ).

Performance

Performance Cache Media Metrics

Front-End Performance Checklist 2020 [PDF, Apple Pages, MS Word]

Smashing Magazine

JANUARY 6, 2020

dashboard (open source), SpeedCurve and Calibre are just a few of them, and you can find more tools on perf.rocks. For Mac OS, we can use Network Link Conditioner , for Windows Windows Traffic Shaper , for Linux netem , and for FreeBSD dummynet. Large preview ). There are many tools allowing you to achieve that: SiteSpeed.io

Performance

Performance Cache Network Metrics

Front-End Performance Checklist 2019 [PDF, Apple Pages, MS Word]

Smashing Magazine

JANUARY 7, 2019

dashboard (open source), SpeedCurve and Calibre are just a few of them, and you can find more tools on perf.rocks. For Mac OS, we can use Network Link Conditioner , for Windows Windows Traffic Shaper , for Linux netem , and for FreeBSD dummynet. There are many tools allowing you to achieve that: SiteSpeed.io Large preview ).

Performance

Performance Cache Metrics Network

Supporting Diverse ML Systems at Netflix

Cluster Diagnostics: Troubleshoot Cluster Issues Using Only SQL Queries

Trending Sources

Building Netflix’s Distributed Tracing Infrastructure

MySQL Key Performance Indicators (KPI) With PMM

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Predictive CPU isolation of containers at Netflix

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Netflix at AWS re:Invent 2019

Performance Testing - Tools, Steps, and Best Practices

Keeping up with Header Bidding’s performance requirements

Keeping up with Header Bidding’s performance requirements

How Google PageSpeed Works: Improve Your Score and Search Engine Ranking

Redis vs Memcached in 2024

Revisiting “Serverless Architectures”

Maximizing Performance of AWS RDS for MySQL with Dedicated Log Volumes

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Compression Methods in MongoDB: Snappy vs. Zstd

The Best Way to Host MongoDB on DigitalOcean

Understanding What Kubernetes Is Used For: The Key to Cloud-Native Efficiency

How We Optimized Performance To Serve A Global Audience

DBLog: A Generic Change-Data-Capture Framework

DBLog: A Generic Change-Data-Capture Framework

Telltale: Netflix Application Monitoring Simplified

Data Movement in Netflix Studio via Data Mesh

Netflix Video Quality at Scale with Cosmos Microservices

Scaling Amazon ElastiCache for Redis with Online Cluster Resizing

Proof of Concept: Horizontal Write Scaling for MySQL With Kubernetes Operator

MongoDB Database Backup: Best Practices & Expert Tips

Proposal for a Realtime Carbon Footprint Standard

Content Management Systems of the Future: Headless, JAMstack, ADN and Functions at the Edge

MongoDB Best Practices: Security, Data Modeling, & Schema Design

HTTP/3: Performance Improvements (Part 2)

Hobson's Browser

How To Make Performance Visible With GitLab CI And Hoodoo Of GitLab Artifacts

Failure Modes and Continuous Resilience

Failure Modes and Continuous Resilience

Hidden in Plain Sight - Public Key Crypto

Aurora vs RDS: How to Choose the Right AWS Database Solution

HTTP/3: Practical Deployment Options (Part 3)

Front-End Performance Checklist 2021

Front-End Performance Checklist 2020 [PDF, Apple Pages, MS Word]

Front-End Performance Checklist 2019 [PDF, Apple Pages, MS Word]

Stay Connected