Remove Design Remove Exercise Remove Hardware Remove Latency
article thumbnail

Why I hate MPI (from a performance analysis perspective)

John McCalpin

According to Dr. Bandwidth, performance analysis has two recurring themes: How fast should this code (or “simple” variations on this code) run on this hardware? The user environment defines the mapping of MPI ranks to hardware resources (cores, sockets, nodes). The MPI runtime library. in ways that are seldom transparent.

article thumbnail

Automating chaos experiments in production

The Morning Paper

This is a fascinating paper from members of Netflix’s Resilience Engineering team describing their chaos engineering initiatives: automated controlled experiments designed to verify hypotheses about how the system should behave under gray failure conditions, and to probe for and flush out any weaknesses. Safeguards.

Latency 77
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

A persistent problem: managing pointers in NVM

The Morning Paper

Byte-addressable non-volatile memory,) NVM will fundamentally change the way hardware interacts, the way operating systems are designed, and the way applications operate on data. Therefore any programming abstraction must be low latency and the kernel needs to be kept off the path of persistent data access as much as possible.

article thumbnail

COVID-19 Hazard Analysis using STPA

Adrian Cockcroft

There are many possible failure modes, and each exercises a different aspect of resilience. Another problem is that a design control, intended to mitigate a failure mode, may not work as intended. STPA is based on a functional control diagram of the system, and the safety constraints and requirements for each component in the design.

article thumbnail

Failure Modes and Continuous Resilience

Adrian Cockcroft

There are many possible failure modes, and each exercises a different aspect of resilience. Another problem is that a design control, intended to mitigate a failure mode, may not work as intended. This discussion focuses on hardware, software and operational failure modes. Book: Engineering a Safer World by Nancy G.

Latency 52
article thumbnail

Failure Modes and Continuous Resilience

Adrian Cockcroft

There are many possible failure modes, and each exercises a different aspect of resilience. Another problem is that a design control, intended to mitigate a failure mode, may not work as intended. This discussion focuses on hardware, software and operational failure modes. Book: Engineering a Safer World by Nancy G.

Latency 53
article thumbnail

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

John McCalpin

There was no deep goal — just a desire to see the maximum GFLOPS in action. The exercise seemed simple enough — just fix one item in the Colfax code and we should be finished. This is an uninspiring fraction of peak performance that would normally suggest significant inefficiencies in either the hardware or software.

Latency 40