Architecture, Latency and Tuning - Technology Performance Pulse

Allegro Reduces Kafka Producer Latency Outliers by 82% After Switching to XFS

InfoQ

APRIL 26, 2024

Allegro experimented with different performance optimization options to improve Apache Kafka producer tail latency and eventually switched all its clusters to the XFS filesystem. The company used Kafka protocol sniffing, JVM profiling, and eBPF, which proved instrumental in identifying and eliminating performance bottlenecks.

Latency

Latency Performance Tuning Design

Bending pause times to your will with Generational ZGC

The Netflix TechBlog

MARCH 5, 2024

Reduced tail latencies In both our GRPC and DGS Framework services, GC pauses are a significant source of tail latencies. In fact, we’ve found for our services and architecture that there is no such trade off. No explicit tuning has been required to achieve these results. There is no best garbage collector.

Latency

Latency Java Tuning Efficiency

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. Logging is selective to cases where the old and new responses do not match.

Traffic

Traffic Latency Tuning Systems

LinkedIn Migrates Espresso to HTTP2 and Reduces Connections by 88% and Latency by 75%

InfoQ

DECEMBER 4, 2023

to HTTP2, resulting in a reduction in the number of connections, latency, and garbage collection times. LinkedIn was able to dramatically improve the scalability and performance of its Espresso database by migrating it from HTTP1.1 To achieve these gains, the team had to optimize the Netty’s default HTTP2 stack to make it fit their needs.

Latency

Latency Scalability Database Performance

Optimizing your Kubernetes clusters without breaking the bank

Dynatrace

JANUARY 14, 2022

Tuning thousands of parameters has become an impossible task to achieve via a manual and time-consuming approach. The following figure shows the high-level architecture where any load testing solution (e.g. SREcon21 – Automating Performance Tuning with Machine Learning. The Akamas approach. lower than 2%.).

Latency

Latency Tuning Efficiency AWS

Enhancing Kubernetes cluster management key to platform engineering success

Dynatrace

MARCH 29, 2024

You can ask for the best configuration to reduce latency or improve the user experience.” And with automatic application tuning, teams spend less time on manually testing and reviewing configurations, resulting in up to five times the productivity of performance engineers, DevOps, and SREs when it comes to application optimization.

Engineering

Engineering DevOps Operating System Open Source

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Scalegrid

JUNE 4, 2020

Compare Latency. lower latency compared to DigitalOcean for PostgreSQL. Now, let’s take a look at the throughput and latency performance of our comparison. Next, we are going to test and compare the latency performance between ScaleGrid and DigitalOcean for PostgreSQL. PostgreSQL DigitalOcean Latency Averages (ms).

Database

Database Latency Benchmarking Performance

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. We have also noted a great potential for further improvement by model tuning (see the section of Rollout in Production).

Tuning

Tuning Efficiency Big Data Engineering

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Scalegrid

OCTOBER 17, 2019

Moving to a multithreaded architecture will require extensive rewrites. But that causes a problem with PostgreSQL’s architecture – forking a process becomes expensive when transactions are very short, as the common wisdom dictates they should be. The PostgreSQL Architecture | Source. The Connection Pool Architecture.

Architecture

Architecture Database Latency Servers

What is serverless computing? Driving efficiency without sacrificing observability

Dynatrace

JANUARY 26, 2021

Within this paradigm, it is possible to run entire architectures without touching a traditional virtual server, either locally or in the cloud. In a serverless architecture, applications are distributed to meet demand and scale requirements efficiently. When an application is triggered, it can cause latency as the application starts.

Serverless

Serverless Efficiency Lambda Azure

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

This is especially crucial in microservice architectures, where the number of components can be overwhelming. Stay tuned for more examples and easy-to-adopt automations provided in our public Github project. Furthermore, increasing the frequency of releases requires additional product lifecycle automation and setup.

Best Practices

Best Practices Code Infrastructure Latency

How To Scale a Single-Host PostgreSQL Database With Citus

Percona

NOVEMBER 3, 2023

Rather than listing the concepts, function calls, etc, available in Citus, which frankly is a bit boring, I’m going to explore scaling out a database system starting with a single host. And now, execute the benchmark: -- execute the following on the coordinator node pgbench -c 20 -j 3 -T 60 -P 3 pgbench The results are not pretty.

Database

Database Benchmarking Latency C++

The Most Important MySQL Setting

Percona

APRIL 7, 2023

If we were to select the most important MySQL setting, if we were given a freshly installed MySQL or Percona Server for MySQL and could only tune a single MySQL variable, which one would it be? To be fair, that is also true with PostgreSQL; it hasn’t been tuned either, and it, too, can also perform much better.

Tuning

Tuning Cache Servers Benchmarking

Discord Scales to 1 Million+ Online MidJourney Users in a Single Server

InfoQ

JANUARY 26, 2024

The company evolved the guild component, which is responsible for fanning out billions of message notifications, in a series of performance and scalability improvements supported by system observability and performance tuning. By Rafal Gancarz

Servers

Servers Tuning Scalability Performance

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

System Setup Architecture The following diagram summarizes the architecture description: Figure 1: Event-sourcing architecture of the Device Management Platform. By the following morning, alerts were received regarding high memory consumption and GC latencies, to the point where the service was unresponsive to HTTP requests.

Latency

Latency Traffic Transportation Hardware

Plan Your Multi Cloud Strategy

Scalegrid

MARCH 22, 2024

They can also bolster uptime and limit latency issues or potential downtimes. Adopting Infrastructure as Code (IaaC) makes transitioning to a multi-cloud architecture more efficient, allowing streamlined setup processes.

Strategy

Strategy Cloud Government Innovation

What are SLOs? How service-level objectives work with SLIs to deliver on SLAs

Dynatrace

DECEMBER 2, 2021

As organizations adopt microservices-based architecture , service-level objectives (SLOs) have become a vital way for teams to set specific, measurable targets that ensure users are receiving agreed-upon service levels. You can set SLOs based on individual indicators, such as batch throughput, request latency, and failures-per-second.

Metrics

Metrics Best Practices DevOps Infrastructure

The evolution of single-core bandwidth in multicore processors

John McCalpin

APRIL 25, 2023

To understand what is happening here, we need to understand the way memory bandwidth interacts with memory latency and the concurrency (parallelism) of memory accesses. Stay tuned! I don’t expect all of that, but the core can clearly make use of more than 20 GB/s. Why is the single-core bandwidth increasing so slowly?

Benchmarking

Benchmarking Cache Latency Tuning

Snap: a microkernel approach to host networking

The Morning Paper

NOVEMBER 10, 2019

Here are the bombshell paragraphs: Our datacenter applications seek ever more CPU-efficient and lower-latency communication, which Pony Express delivers. Rather than reimplement TCP/IP or refactor an existing transport, we started Pony Express from scratch to innovate on more efficient interfaces, architecture, and protocol.

Network

Network Transportation Latency Entertainment

What Adrian Did Next: 2022 Conference Appearances

Adrian Cockcroft

AUGUST 1, 2022

photo by Adrian I gave a talk at Monitorama in Portland Oregon in June, which set out the idea that carbon is just another metric to monitor, and that in a few years most of the monitoring and performance tuning tools are going to be reporting and optimizing for carbon alongside latency, throughput, availability and cost.

AWS

AWS Virtualization DevOps Latency

Monitoring Serverless Applications

Dotcom-Montior

NOVEMBER 11, 2020

In serverless architecture, when applications are developed, they are typically composed of many different services. Other benefits to serverless architecture include the following: Cost. There is plenty to like about moving to a serverless architecture, but there can be some disadvantages compared to the traditional, monolithic model.

Serverless

Serverless Monitoring Lambda Latency

Testing MySQL 8.0.16 on Skylake with innodb_spin_wait_pause_multiplier

HammerDB

MAY 5, 2019

Note that the main developer of HammerDB is Intel employee (#IAMINTEL) however HammerDB is a personal open source project and HammerDB has no optimization whatsoever for a database running on any particular architecture. In the recent MySQL 8.0.16 So to test I took a system with Skylake CPUs and all storage on a P4800X SSD. linux-glibc2.12-x86_64/data

Testing

Testing Tuning Latency Storage

MongoDB Best Practices: Security, Data Modeling, & Schema Design

Percona

APRIL 17, 2023

The main objective of this post is to share my experience over the past years tuning MongoDB and centralize the diverse sources that I crossed in this journey in a unique place. The swap issue is explained in the excellent article by Jeremy Cole at the Swap Insanity and NUMA Architecture. Two other schedulers are deadline and noop.

Best Practices

Best Practices Design Tuning Database

CDN Web Application Firewall (WAF): Your Shield Against Online Threats

IO River

NOVEMBER 15, 2023

Two of them are particularly gnarly: fine-tuning rules to perfection and managing a WAF over a multi-CDN architecture. Configuring and Maintaining WAF on a Multi-CDNâ€Multi-CDN architectures, the double-edged swords. Let's dive deep into these challenges.â€1. But instead of porridge, we're talking about WAF rules.Â

Traffic

Traffic Network Logistics Architecture

A case for managed and model-less inference serving

The Morning Paper

JUNE 13, 2019

Making queries to an inference engine has many of the same throughput, latency, and cost considerations as making queries to a datastore, and more and more applications are coming to depend on such queries. The following figure highlights how just one of these variables, batch size, impacts throughput and latency on ResNet50.

Hardware

Hardware Latency Serverless Energy

Updated Azure SQL Database Tier Options

SQL Performance

APRIL 27, 2020

I highly recommend that you take a look at the diagram that breaks down the architecture and how it all works in this article. This works well for many SQL Server workloads, however, there have been use cases for a lower CPU latency and higher clock speed for CPU-heavy workloads and a need for higher memory per vCore.

Azure

Azure Database Serverless Hardware

CDN Web Application Firewall (WAF): Your Shield Against Online Threats

IO River

NOVEMBER 2, 2023

Two of them are particularly gnarly: fine-tuning rules to perfection and managing a WAF over a multi-CDN architecture. Configuring and Maintaining WAF on a Multi-CDN‍Multi-CDN architectures, the double-edged swords. Let's dive deep into these challenges.‍1. That's where Bot Detection comes in.

Traffic

Traffic Network Logistics Architecture

5 tips for architecting fast data applications

O'Reilly Software

APRIL 4, 2018

Considerations for setting the architectural foundations for a fast data platform. Google was among the pioneers that created “web scale” architectures to analyze the massive data sets that resulted from “crawling” the web that gave birth to Apache Hadoop, MapReduce, and NoSQL databases. Back in the days of Web 1.0, At least once?

Architecture

Architecture Scalability Google Operating System

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

Here are 8 fallacies of data pipeline The pipeline is reliable Topology is stateless Pipeline is infinitely scalable Processing latency is minimum Everything is observable There is no domino effect Pipeline is cost-effective Data is homogeneous The pipeline is reliable The inconvenient truth is that pipeline is not reliable.

Latency

Latency Analytics Scalability Engineering

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

This architecture shift greatly reduced the processing latency and increased system resiliency. We expanded pipeline support to serve our studio/content-development use cases, which had different latency and resiliency requirements as compared to the traditional streaming use case. divide the input video into small chunks 2.

Processing

Processing Media Latency Innovation

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

John McCalpin

JANUARY 22, 2018

Each of the two vector units can issue one FMA instruction per cycle, assuming that there are enough independent accumulators to tolerate the 6-cycle dependent-operation latency. of the “adjusted peak performance”, there is no longer a significant upside to performance tuning. vfmadd213pd %zmm16, %zmm17, %zmm26.

Latency

Latency Hardware Code Testing

Working at Netflix 2017

Brendan Gregg

MAY 16, 2017

You might imagine that at some point we had a major scaling crises, where it looked like we'd fail due to an architectural bottleneck, and engineers worked long nights and weekends to save Netflix from certain disaster. A latency outlier issue that happened every 15 minutes. That'd make a great story, but it didn't happen.

Java

Java Entertainment Engineering Scalability

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

InfoQ

APRIL 15, 2024

At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. He specifically delved into Venice DB, the NoSQL data store used for feature persistence. The presenter shared the lessons learned from evolving and operating the platform, including cluster management and library versioning.

Artificial Intelligence

Artificial Intelligence Big Data Data Engineering Latency

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

Table 1: Movie and File Size Examples Initial Architecture A simplified view of our initial cloud video processing pipeline is illustrated in the following diagram. Figure 1: A Simplified Video Processing Pipeline With this architecture, chunk encoding is very efficient and processed in distributed cloud computing instances.

Cloud

Cloud Media Storage Cache

The Netflix Cosmos Platform

The Netflix TechBlog

MARCH 1, 2021

It supports both high throughput services that consume hundreds of thousands of CPUs at a time, and latency-sensitive workloads where humans are waiting for the results of a computation. The subsystems all communicate with each other asynchronously via Timestone, a high-scale, low-latency priority queuing system.

Serverless

Serverless Media Latency Social Media

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

By collecting and analyzing key performance metrics of the service over time, we can assess the impact of the new changes and determine if they meet the availability, latency, and performance requirements. They enable us to further fine-tune and configure the system, ensuring the new changes are integrated smoothly and seamlessly.

Traffic

Traffic Metrics Systems Strategy

Why LinkedIn chose gRPC+Protobuf over REST+JSON: Q&A with Karthik Ramgopal and Min Chen

InfoQ

DECEMBER 27, 2023

LinkedIn announced that it would be moving to gRPC with Protocol Buffers for the inter-service communication in its microservices platform, where previously an open-source Rest.li framework was used with JSON as a primary serialization format.

Open Source

Open Source Latency Tuning Scalability

Achieving observability in async workflows

The Netflix TechBlog

MAY 14, 2021

Managing and operating asynchronous workflows can be difficult without the proper tools and architecture that puts observability, debugging, and tracing at the forefront. We are expected to process 1,000 watermarks for a single distribution in a minute, with non-linear latency growth as the number of watermarks increases.

Traffic

Traffic Latency Java Google

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

At Netflix, we also heavily embrace a microservice architecture that emphasizes separation of concerns. The data warehouse is not designed to serve point requests from microservices with low latency. Therefore, we must efficiently move data from the data warehouse to a global, low-latency and highly-reliable key-value store.

Latency

Latency Storage Big Data Tuning

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture. These principles reduce resource usage by being more efficient and effective while lowering the end-to-end latency in data processing. More processing resources.

Storage

Storage Latency Efficiency Data Engineering

Zero Configuration Service Mesh with On-Demand Cluster Discovery

The Netflix TechBlog

AUGUST 29, 2023

In this architecture, service to service communication no longer goes through the single point of failure of a load balancer. The above architecture has served us well over the last decade, though changing business needs and evolving industry standards have added more complexity to our IPC ecosystem in a number of ways.

Traffic

Traffic Latency Cloud C++

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

Motivation With the rapid growth in Netflix member base and the increasing complexity of our systems, our architecture has evolved into an asynchronous one that enables both online and offline computation. Architecture As shown in the diagram above, the RENO service can be broken down into the following components.

Systems

Systems Traffic Architecture Mobile

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

The Reloaded system is a well-matured and scalable system, but its monolithic architecture can slow down rapid innovation. This enables us to use our scale to increase throughput and reduce latencies. Here, based on the video length, the throughput and latency requirements, available scale etc., via bug fixes).

Media

Media Innovation Metrics Latency

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

Netflix is known for its loosely coupled microservice architecture and with a global studio footprint, surfacing and connecting the data from microservices into a studio data catalog in real time has become more important than ever. Most of the business views created on top of the Iceberg tables can tolerate a few minutes of latency.

Big Data

Big Data Government Analytics Processing

Allegro Reduces Kafka Producer Latency Outliers by 82% After Switching to XFS

Bending pause times to your will with Generational ZGC

Trending Sources

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

LinkedIn Migrates Espresso to HTTP2 and Reduces Connections by 88% and Latency by 75%

Optimizing your Kubernetes clusters without breaking the bank

Enhancing Kubernetes cluster management key to platform engineering success

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

What is serverless computing? Driving efficiency without sacrificing observability

Automated observability, security, and reliability at scale

How To Scale a Single-Host PostgreSQL Database With Citus

The Most Important MySQL Setting

Discord Scales to 1 Million+ Online MidJourney Users in a Single Server

Towards a Reliable Device Management Platform

Plan Your Multi Cloud Strategy

What are SLOs? How service-level objectives work with SLIs to deliver on SLAs

The evolution of single-core bandwidth in multicore processors

Snap: a microkernel approach to host networking

What Adrian Did Next: 2022 Conference Appearances

Monitoring Serverless Applications

Testing MySQL 8.0.16 on Skylake with innodb_spin_wait_pause_multiplier

MongoDB Best Practices: Security, Data Modeling, & Schema Design

CDN Web Application Firewall (WAF): Your Shield Against Online Threats

A case for managed and model-less inference serving

Updated Azure SQL Database Tier Options

CDN Web Application Firewall (WAF): Your Shield Against Online Threats

5 tips for architecting fast data applications

Friends don't let friends build data pipelines

Rebuilding Netflix Video Processing Pipeline with Microservices

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

Working at Netflix 2017

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

Netflix Cloud Packaging in the Terabyte Era

The Netflix Cosmos Platform

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Why LinkedIn chose gRPC+Protobuf over REST+JSON: Q&A with Karthik Ramgopal and Min Chen

Achieving observability in async workflows

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Optimizing data warehouse storage

Zero Configuration Service Mesh with On-Demand Cluster Discovery

Rapid Event Notification System at Netflix

Netflix Video Quality at Scale with Cosmos Microservices

Data Movement in Netflix Studio via Data Mesh

Stay Connected