Remove 2016 Remove Cache Remove Hardware Remove Latency
article thumbnail

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

article thumbnail

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Invited Talk at SuperComputing 2016!

John McCalpin

“Memory Bandwidth and System Balance in HPC Systems” If you are planning to attend the SuperComputing 2016 conference in Salt Lake City next month, be sure to reserve a spot on your calendar for my talk on Wednesday afternoon (4:15pm-5:00pm).

article thumbnail

The Performance Inequality Gap, 2021

Alex Russell

Back in 2016, I gave a talk outlining the causes and effects of the terrible performance of web apps built using popular tools on the fastest-growing device segment: low-end to mid-range Android phones. A then-representative $200USD device had 4-8 slow (in-order, low-cache) cores, ~2GiB of RAM, and relatively slow MLC NAND flash storage.

article thumbnail

Progress Delayed Is Progress Denied

Alex Russell

At the time of the last Confluence run, the gap had stretched to nearly 1000 APIs, doubling since 2016. For heavily latency-sensitive use-cases like WebXR, this is a critical component in delivering a good experience. An extension to Service Workers that enables browsers to present users with cached content when offline.

Media 145
article thumbnail

Intel discloses “vector+SIMD” instructions for future processors

John McCalpin

In the latest (October 2016) revision of Intel’s Instruction Extensions Programming Reference , Intel has disclosed a fairly dramatic departure from these “traditional” approaches. With 2 FMA units that have 5-cycle latency, the code must implement at least 2*5=10 independent accumulators in order to avoid stalls.

Cache 40
article thumbnail

Can You Afford It?: Real-world Web Performance Budgets

Alex Russell

It simulates a link with a 400ms RTT and 400-600Kbps of throughput (plus latency variability and simulated packet loss). Simulated packet loss and variable latency, however, can make benchmarking extremely difficult and slow. Our baseline, then, should probably trade lower throughput/higher-latency for packet loss.