article thumbnail

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. the retry success probability) and compute cost efficiency (i.e., Multi-objective optimizations.

Tuning 210
article thumbnail

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

Under the hood, Titus is powered by Kubernetes , but it provides a thick layer of enhancements over off-the-shelf Kubernetes, to make it more observable , secure , scalable , and cost-efficient. In other cases, it is more convenient to share the results via a low-latency API.

Systems 226
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

1pm-2pm NFX 207 Benchmarking stateful services in the cloud Vinay Chella , Data Platform Engineering Manager Abstract : AWS cloud services make it possible to achieve millions of operations per second in a scalable fashion across multiple regions. We explore all the systems necessary to make and stream content from Netflix.

AWS 100
article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

1pm-2pm NFX 207 Benchmarking stateful services in the cloud Vinay Chella , Data Platform Engineering Manager Abstract : AWS cloud services make it possible to achieve millions of operations per second in a scalable fashion across multiple regions. We explore all the systems necessary to make and stream content from Netflix.

AWS 100
article thumbnail

Friends don't let friends build data pipelines

Abhishek Tiwari

Unfortunately, building data pipelines remains a daunting, time-consuming, and costly activity. Not everyone is operating at Netflix or Spotify scale data engineering function. Often companies underestimate the necessary effort and cost involved to build and maintain data pipelines.

Latency 63
article thumbnail

Optimizing data warehouse storage

The Netflix TechBlog

We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits. This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture.

Storage 203
article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

This talk explores the journey, learnings, and improvements to performance analysis, efficiency, reliability, and security. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges. In 2019, Netflix moved thousands of container hosts to bare metal.

AWS 37