Architecture, Cache, Hardware and Latency - Technology Performance Pulse

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

ACM Sigarch

MAY 31, 2023

Introduction Memory systems are evolving into heterogeneous and composable architectures. There are three common mechanisms to access remote memory: modifying applications, modifying virtual memory, and hardware-level cache coherence support. Figure 2: Latency characteristics of memory technologies (source: Maruf et al.,

Latency

Latency Hardware Cache Architecture

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

Performance

Performance Latency Cache Virtualization

Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption

The Morning Paper

OCTOBER 4, 2020

We are standing on the eve of the 5G era… 5G, as a monumental shift in cellular communication technology, holds tremendous potential for spurring innovations across many vertical industries, with its promised multi-Gbps speed, sub-10 ms low latency, and massive connectivity. Throughput and latency. energy consumption).

Energy

Energy Latency Performance Network

Memory Latency on the Intel Xeon Phi x200 “Knights Landing” processor

John McCalpin

DECEMBER 6, 2016

The Xeon Phi x200 (Knights Landing) has a lot of modes of operation (selected at boot time), and the latency and bandwidth characteristics are slightly different for each mode. In “Cache” mode, MCDRAM memory is used as an L3 cache for the main DDR4 memory. numactl).

Latency

Latency Cache Testing Systems

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

Performance

Performance Latency Cache Virtualization

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

MAY 14, 2019

Last time around we looked at the DeathStarBench suite of microservices-based benchmark applications and learned that microservices systems can be especially latency sensitive, and that hotspots can propagate through a microservices architecture in interesting ways. on end-to-end latency) and less than 0.15% on throughput.

Big Data

Big Data Cloud Performance Hardware

How Google PageSpeed Works: Improve Your Score and Search Engine Ranking

CSS - Tricks

JULY 25, 2019

Cache-Headers missing? If you’re interested in a high-level overview of Lighthouse architecture, read this guide from the official repository. Estimated Input Latency. Estimated Input Latency. Service workers that will cache the bytecode result of a parsed and compiled script. What changed in PageSpeed 5.0?

Google

Google Engineering Speed Mobile

Redis® Monitoring Strategies for 2024

Scalegrid

DECEMBER 21, 2023

With its widespread use in modern application architectures, understanding the ins and outs of Redis® monitoring is essential for any tech professional. Identifying key Redis® metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. Redis®, a powerful in-memory data store, is no exception.

Strategy

Strategy Monitoring Latency DevOps

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Key Takeaways Distributed storage systems benefit organizations by enhancing data availability, fault tolerance, and system scalability, leading to cost savings from reduced hardware needs, energy consumption, and personnel. By implementing data replication strategies, distributed storage systems achieve greater.

Storage

Storage Systems Big Data Azure

This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

Sutter's Mill

FEBRUARY 13, 2017

Tue-Thu Apr 25-27: High-Performance and Low-Latency C++ (Stockholm). On April 25-27, I’ll be in Stockholm (Kista) giving a three-day seminar on “High-Performance and Low-Latency C++.”

Latency

Latency C++ Hardware Performance

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

The Morning Paper

MAY 12, 2019

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems Gan et al., A typical architecture diagram for one of these services looks like this: Suitably armed with a set of benchmark microservices applications, the investigation can begin! Hardware implications.

Open Source

Open Source Hardware Benchmarking Systems

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

This system has been designed to supplement and succeed the existing Hadoop-based system that had too high latency of data processing and too high maintenance costs. In many cases join is performed on a finite time window or other type of buffer e.g. LFU cache that contains most frequent tuples in the stream. Jacobsen and R.

Big Data

Big Data Processing Lambda Database

Invited Talk at SuperComputing 2016!

John McCalpin

OCTOBER 16, 2016

The talk will conclude with a discussion of near-term trends in HPC system balances and some ideas on the fundamental architectural changes that will be required if we ever want to obtain large reductions in cost and power consumption. The official announcement: SC16 Invited Talk Spotlight: Dr. John D.

Architecture

Architecture Systems Technology Technology

Time protection: the missing OS abstraction

The Morning Paper

APRIL 14, 2019

The paper sets out what we can do in software given today’s hardware, and along the way also highlights areas where cooperation from hardware will be needed in the future. cache) can be partitioned across domains; for those that are instead time-multiplexed, we have to flush them during domain switches. Threat scenarios.

Hardware

Hardware Cache Latency Speed

Updated Azure SQL Database Tier Options

SQL Performance

APRIL 27, 2020

Gen 5 is the primary hardware option now for most regions since Gen 4 is aging out. Hyperscale achieves high performance from each compute node having SSD-based caches which helps minimize the network round trips to fetch data. New Hardware Configuration for Provisioned Compute Tier. GB per vCore.

Azure

Azure Database Serverless Hardware

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

All Things Distributed

JULY 14, 2015

In this blog post, I will explain how these three new capabilities empower you to build applications with distributed systems architecture and create responsive, reliable, and high-performance applications using DynamoDB that work at any scale. DynamoDB Cross-region Replication.

Database

Database Lambda AWS IoT

Why I hate MPI (from a performance analysis perspective)

John McCalpin

AUGUST 1, 2018

According to Dr. Bandwidth, performance analysis has two recurring themes: How fast should this code (or “simple” variations on this code) run on this hardware? The user environment defines the mapping of MPI ranks to hardware resources (cores, sockets, nodes). The MPI runtime library. in ways that are seldom transparent.

Hardware

Hardware Transportation Performance Latency

SQL Server I/O Basics Chapter #1

SQL Server According to Bob

JANUARY 11, 2020

Stable media is commonly physical disk storage, but other devices and certain caching facilities qualify as well. Many high-end disk subsystems provide high-speed cache facilities to reduce the latency of read and write operations. This cache is often supported by a battery-powered backup facility.

Servers

Servers Cache Media Hardware

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture. Apache Arrow's in-memory columnar layout is specifically optimized for data locality for better performance on modern hardware like CPUs and GPUs.

Big Data

Big Data Artificial Intelligence Storage Hardware

MongoDB Best Practices: Security, Data Modeling, & Schema Design

Percona

APRIL 17, 2023

The swap issue is explained in the excellent article by Jeremy Cole at the Swap Insanity and NUMA Architecture. The CFQ works well for many general use cases but lacks latency guarantees. The deadline excels at latency-sensitive use cases ( like databases ), and noop is closer to no schedule at all.

Best Practices

Best Practices Design Tuning Database

The Performance Inequality Gap, 2021

Alex Russell

MARCH 6, 2021

A then-representative $200USD device had 4-8 slow (in-order, low-cache) cores, ~2GiB of RAM, and relatively slow MLC NAND flash storage. Hardware Past As Performance Prologue. Regardless, the overall story for hardware progress remains grim, particularly when we recall how long device replacement cycles are: Tap for a larger version.

Performance

Performance Network Cache Metrics

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

Amazon DynamoDB offers low, predictable latencies at any scale. This architectural pattern was a response to the scaling challenges that had challenged Amazon.com through its first 5 years, when direct database access was one of the major bottlenecks in scaling and operating the business. This impacts the predictability of a Domainâ??s

Scalability

Scalability Database Ecommerce Latency

Intel discloses “vector+SIMD” instructions for future processors

John McCalpin

NOVEMBER 5, 2016

The art and science of microprocessor architecture is a never-ending struggling to balance complexity, verifiability, usability, expressiveness, compactness, ease of encoding/decoding, energy consumption, backwards compatibility, forwards compatibility, and other factors. This includes Haswell and newer cores.

Cache

Cache C++ Latency Hardware

Can You Afford It?: Real-world Web Performance Budgets

Alex Russell

OCTOBER 22, 2017

One distinct trend is a belief that a JavaScript framework and Single-Page Architecture (SPA) is a must for PWA development. It simulates a link with a 400ms RTT and 400-600Kbps of throughput (plus latency variability and simulated packet loss). Our baseline, then, should probably trade lower throughput/higher-latency for packet loss.

Performance

Performance Benchmarking Network Mobile

Declarative recursive computation on an RDBMS

The Morning Paper

SEPTEMBER 12, 2019

SQL provides a declarative programming interface, below which the system itself can figure out the most effective execution plans based on data size and statistics, layout, compute hardware etc. Declarative recursive computation on an RDBMS… or, why you should use a database for distributed machine learing Jankov et al., VLDB’19.

Network

Network Database Programming Hardware

Front-End Performance Checklist 2021

Smashing Magazine

JANUARY 11, 2021

Defining The Environment Choosing a framework, baseline performance cost, Webpack, dependencies, CDN, front-end architecture, CSR, SSR, CSR + SSR, static rendering, prerendering, PRPL pattern. Estimated Input Latency tells us if we are hitting that threshold, and ideally, it should be below 50ms.

Performance

Performance Cache Media Metrics

The evolution of single-core bandwidth in multicore processors

John McCalpin

APRIL 25, 2023

For most high-end processors these values have remained in the range of 75% to 85% of the peak DRAM bandwidth of the system over the past 15-20 years — an amazing accomplishment given the increase in core count (with its associated cache coherence issues), number of DRAM channels, and ever-increasing pipelining of the DRAMs themselves.

Benchmarking

Benchmarking Cache Latency Tuning

Front-End Performance Checklist 2020 [PDF, Apple Pages, MS Word]

Smashing Magazine

JANUARY 6, 2020

Estimated Input Latency tells us if we are hitting that threshold, and ideally, it should be below 50ms. Designed for the modern web, it responds to actual congestion, rather than packet loss like TCP does, it is significantly faster , with higher throughput and lower latency — and the algorithm works differently.

Performance

Performance Cache Network Metrics

Front-End Performance Checklist 2019 [PDF, Apple Pages, MS Word]

Smashing Magazine

JANUARY 7, 2019

Estimated Input Latency tells us if we are hitting that threshold, and ideally, it should be below 50ms. On the other hand, we have hardware constraints on memory and CPU due to JavaScript parsing times (we’ll talk about them in detail later). Bonus: there is also a Webpack config configurator that generates a basic configuration file.

Performance

Performance Cache Metrics Network

Technology Performance Pulse

Predictive CPU isolation of containers at Netflix

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

Trending Sources

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption

Memory Latency on the Intel Xeon Phi x200 “Knights Landing” processor

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

How Google PageSpeed Works: Improve Your Score and Search Engine Ranking

Redis® Monitoring Strategies for 2024

What is a Distributed Storage System

This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

In-Stream Big Data Processing

Invited Talk at SuperComputing 2016!

Time protection: the missing OS abstraction

Updated Azure SQL Database Tier Options

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

Why I hate MPI (from a performance analysis perspective)

SQL Server I/O Basics Chapter #1

5 data integration trends that will define the future of ETL in 2018

MongoDB Best Practices: Security, Data Modeling, & Schema Design

The Performance Inequality Gap, 2021

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Intel discloses “vector+SIMD” instructions for future processors

Can You Afford It?: Real-world Web Performance Budgets

Declarative recursive computation on an RDBMS

Front-End Performance Checklist 2021

The evolution of single-core bandwidth in multicore processors

Front-End Performance Checklist 2020 [PDF, Apple Pages, MS Word]

Front-End Performance Checklist 2019 [PDF, Apple Pages, MS Word]

Stay Connected