2016, Cache, Hardware and Latency - Technology Performance Pulse

2016

Cache

Hardware

Latency

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

Performance

Performance Latency Cache Virtualization

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

Performance

Performance Latency Cache Virtualization

Join 5,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

Dynatrace

Invited Talk at SuperComputing 2016!

John McCalpin

OCTOBER 16, 2016

“Memory Bandwidth and System Balance in HPC Systems” If you are planning to attend the SuperComputing 2016 conference in Salt Lake City next month, be sure to reserve a spot on your calendar for my talk on Wednesday afternoon (4:15pm-5:00pm).

Architecture

Architecture Systems Technology Technology

The Performance Inequality Gap, 2021

Alex Russell

MARCH 6, 2021

Back in 2016, I gave a talk outlining the causes and effects of the terrible performance of web apps built using popular tools on the fastest-growing device segment: low-end to mid-range Android phones. A then-representative $200USD device had 4-8 slow (in-order, low-cache) cores, ~2GiB of RAM, and relatively slow MLC NAND flash storage.

Performance

Performance Network Cache Metrics

Progress Delayed Is Progress Denied

Alex Russell

APRIL 29, 2021

At the time of the last Confluence run, the gap had stretched to nearly 1000 APIs, doubling since 2016. For heavily latency-sensitive use-cases like WebXR, this is a critical component in delivering a good experience. An extension to Service Workers that enables browsers to present users with cached content when offline.

Media

Media Games Education Engineering

Intel discloses “vector+SIMD” instructions for future processors

John McCalpin

NOVEMBER 5, 2016

In the latest (October 2016) revision of Intel’s Instruction Extensions Programming Reference , Intel has disclosed a fairly dramatic departure from these “traditional” approaches. With 2 FMA units that have 5-cycle latency, the code must implement at least 2*5=10 independent accumulators in order to avoid stalls.

Cache

Cache C++ Latency Hardware

Can You Afford It?: Real-world Web Performance Budgets

Alex Russell

OCTOBER 22, 2017

It simulates a link with a 400ms RTT and 400-600Kbps of throughput (plus latency variability and simulated packet loss). Simulated packet loss and variable latency, however, can make benchmarking extremely difficult and slow. Our baseline, then, should probably trade lower throughput/higher-latency for packet loss.

Performance

Performance Benchmarking Network Mobile

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Trending Sources

Invited Talk at SuperComputing 2016!

The Performance Inequality Gap, 2021

Progress Delayed Is Progress Denied

Intel discloses “vector+SIMD” instructions for future processors

Can You Afford It?: Real-world Web Performance Budgets

Stay Connected