article thumbnail

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. Our trace data collection agent transports traces to Mantis job cluster via the Mantis Publish library. What’s next?

article thumbnail

Towards a Reliable Device Management Platform

The Netflix TechBlog

By Benson Ma , Alok Ahuja Introduction At Netflix, hundreds of different device types, from streaming sticks to smart TVs, are tested every day through automation to ensure that new software releases continue to deliver the quality of the Netflix experience that our customers enjoy. In this blog post, we will focus on the latter feature set.

Latency 213
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Expanding the Cloud – The Second AWS GovCloud (US) Region, AWS GovCloud (US-East)

All Things Distributed

The AWS GovCloud (US-East) Region is located in the eastern part of the United States, providing customers with a second isolated Region in which to run mission-critical workloads with lower latency and high availability. By using AWS, they have been able to reduce the time to build, test, and scale software from weeks to hours.

AWS 117
article thumbnail

Plan Your Multi Cloud Strategy

Scalegrid

They can also bolster uptime and limit latency issues or potential downtimes. It’s important to ensure the bells and whistles of any software-as-a-service (SaaS) they offer can support where you aim to take your business, keeping your strategy tight and on track.

Strategy 130
article thumbnail

Unlocking Enterprise systems using voice

All Things Distributed

The availability of large scale voice training data, the advances made in software with processing engines such as Caffe, MXNet and Tensorflow, and the rise of massively parallel compute engines with low-latency memory access, such as the Amazon EC2 P3 instances have made voice processing at scale a reality.

Systems 110
article thumbnail

Edgar: Solving Mysteries Faster with Observability

The Netflix TechBlog

This difference has substantial technological implications, from the classification of what’s interesting to transport to cost-effective storage (keep an eye out for later Netflix Tech Blog posts addressing these topics). Distributed tracing is the process of generating, transporting, storing, and retrieving traces in a distributed system.

Latency 296
article thumbnail

Snap: a microkernel approach to host networking

The Morning Paper

It’s been clear for a while that software designed explicitly for the data center environment will increasingly want/need to make different design trade-offs to e.g. general-purpose systems software that you might install on your own machines. The desire for CPU efficiency and lower latencies is easy to understand. Enter Google!

Network 92