Remove 2019 Remove Latency Remove Traffic Remove Tuning
article thumbnail

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. Our engineering teams tuned their services for performance after factoring in increased resource utilization due to tracing.

article thumbnail

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Logs and background requests are examples of this type of traffic.

Traffic 252
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Achieving observability in async workflows

The Netflix TechBlog

Prodicle Distribution Our service is required to be elastic and handle bursty traffic. We are expected to process 1,000 watermarks for a single distribution in a minute, with non-linear latency growth as the number of watermarks increases. Things got hairy. We wanted a scalable service that was near real-time, 2.

Traffic 160
article thumbnail

The Speed of Time

Brendan Gregg

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. Since instances of both CentOS and Ubuntu were running in parallel, I could collect flame graphs at the same time (same time-of-day traffic mix) and compare them side by side.

Speed 126
article thumbnail

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

Nonetheless, we found a number of limitations that could not satisfy our requirements e.g. stalling the processing of log events until a dump is complete, missing ability to trigger dumps on demand, or implementations that block write traffic by using table locks. Blocking write traffic by locking tables. Writing events to any output.

Database 197
article thumbnail

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

Nonetheless, we found a number of limitations that could not satisfy our requirements e.g. stalling the processing of log events until a dump is complete, missing ability to trigger dumps on demand, or implementations that block write traffic by using table locks. Blocking write traffic by locking tables. Writing events to any output.

Database 212
article thumbnail

The Speed of Time

Brendan Gregg

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. Since instances of both CentOS and Ubuntu were running in parallel, I could collect flame graphs at the same time (same time-of-day traffic mix) and compare them side by side.

Speed 40