article thumbnail

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. Using simple lookup indices in Cassandra gives us the ability to maintain acceptable read latencies while doing heavy writes.

article thumbnail

Percentiles don’t work: Analyzing the distribution of response times for web services

Adrian Cockcroft

There is no way to model how much more traffic you can send to that system before it exceeds it’s SLA. Bill Kaiser of NewRelic published this blog in 2017 which goes some way towards what I’m talking about, but since then I have figured out a new way to interpret the data. Mu is the mean of each component, the latency.

Lambda 98
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Ciao Milano! – An AWS Region is coming to Italy!

All Things Distributed

Since then, AWS has added another PoP in Palermo in 2017. The website went online in less than one month and was able to support a 250 percent increase in traffic around the launch of the Aventador J. To meet such large traffic numbers, they need a technology infrastructure that is secure, reliable, and flexible.

AWS 167
article thumbnail

Revisiting “Serverless Architectures”

The Symphonia

Gojko Adzic has done some great speaking and writing on his experience here, and I included the link to his talk from late 2017. I also rewrote the section on Startup Latency since Cold Starts are one of the big “FUD” areas of Serverless. I was glad to be able to talk about Amazon’s automated traffic shifting / canary releases.

article thumbnail

Understanding the Importance of 5 Nines Availability

IO River

Delta Air Lines experienced a severe system outage in 2017, resulting in flight cancellations and delays across their network. The stakes are even higher during high-traffic periods such as Black Friday or Cyber Monday. High availability is a business imperative in this sector.

article thumbnail

Understanding the Importance of 5 Nines Availability

IO River

Delta Air Lines experienced a severe system outage in 2017, resulting in flight cancellations and delays across their network. The stakes are even higher during high-traffic periods such as Black Friday or Cyber Monday. High availability is a business imperative in this sector.

article thumbnail

The Speed of Time

Brendan Gregg

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. Since instances of both CentOS and Ubuntu were running in parallel, I could collect flame graphs at the same time (same time-of-day traffic mix) and compare them side by side.

Speed 126