article thumbnail

Achieving observability in async workflows

The Netflix TechBlog

We are expected to process 1,000 watermarks for a single distribution in a minute, with non-linear latency growth as the number of watermarks increases. Initial offering of Prodicle Distribution backend When we decided to migrate the asynchronous workflow to Java, we landed on these additional requirements: 1.

Traffic 160
article thumbnail

The Speed of Time

Brendan Gregg

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. There's no Java stack—there should be a tower of green Java methods—instead there's only a single green frame or two. This is how Java flame graphs looked at the time. 30.14% in the middle of the flame graph.

Speed 126
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. We chose Open-Zipkin because it had better integrations with our Spring Boot based Java runtime environment.

article thumbnail

The Speed of Time

Brendan Gregg

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. There's no Java stack—there should be a tower of green Java methods—instead there's only a single green frame or two. This is how Java flame graphs looked at the time. This will slow this test a little.)

Speed 52
article thumbnail

The Speed of Time

Brendan Gregg

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. There's no Java stack—there should be a tower of green Java methods—instead there's only a single green frame or two. This is how Java flame graphs looked at the time. 30.14% in the middle of the flame graph.

Speed 40
article thumbnail

Analyzing a High Rate of Paging

Brendan Gregg

biolatency From [bcc], this eBPF tool shows a latency histogram of disk I/O. C Process Name = java Kbytes : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 31 | | 16 -> 31 : 15 | | 32 -> 63 : 15 | | 64 -> 127 : 15 | | 128 -> 255 : 1682 | *|. They are largeish I/O, about 128 Kbytes (divide rkB/s by r/s). ## 3.

Cache 105
article thumbnail

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

One which: interleaves log with dump events so that both can make progress allows to trigger dumps at any time does not use table locks uses standardized database features DBLog Framework DBLog is a Java-based framework, able to capture changes in real-time and to take dumps. Beresford, and Boerge Svingen. Online event processing.

Database 197