The Netflix TechBlog

Seeing through hardware counters: a journey to threefold performance increase

The Netflix TechBlog

By Vadim Filanovsky and Harshad Sane In one of our previous blogposts, A Microscope on Microservices we outlined three broad domains of observability (or “levels of magnification,” as we referred to them)?—?Fleet-wide, Fleet-wide, Microservice and Instance.

For your eyes only: improving Netflix video quality with neural networks

The Netflix TechBlog

by Christos G. Bampis , Li-Heng Chen and Zhi Li When you are binge-watching the latest season of Stranger Things or Ozark, we strive to deliver the best possible video quality to your eyes.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Helping VFX studios pave a path to the cloud

The Netflix TechBlog

By: Peter Cioni (Netflix), Alex Schworer (Netflix), Mac Moore (Conductor Tech.), Rachel Kelley (AWS), Ranjit Raju (AWS) Rendering is core to the the VFX process VFX studios around the world create amazing imagery for Netflix productions.

Cloud 233

Machine Learning for Fraud Detection in Streaming Services

The Netflix TechBlog

By Soheil Esmaeilzadeh , Negin Salajegheh , Amir Ziai , Jeff Boote Introduction Streaming services serve content to millions of users all over the world. These services allow users to stream or download content across a broad category of devices including mobile phones, laptops, and televisions.

New Series: Creating Media with Machine Learning

The Netflix TechBlog

By Vi Iyengar , Keila Fong , Hossein Taghavi , Andy Yao , Kelli Griggs , Boris Chen , Cristina Segalin , Apurva Kansara , Grace Tang , Billur Engin , Amir Ziai , James Ray , Jonathan Solorzano-Hamilton Welcome to the first post in our multi-part series on how Netflix is developing and using machine learning (ML) to help creators make better media?—?from

Media 196

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

by Tomasz Bak and Fabio Kung Introduction Titus is the Netflix cloud container runtime that runs and manages containers at scale.

Cache 196

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Virtual Production?—?A Validation Framework For Unreal Engine

The Netflix TechBlog

Virtual Production?—?A A Validation Framework For Unreal Engine By Adam Davis, Jimmy Fusil, Bhanu Srikanth and Girish Balakrishnan Game Engines in Virtual Production The use of Virtual Production and real time technologies has markedly accelerated in the past few years.

Data Mesh?—?A Data Movement and Processing Platform @ Netflix

The Netflix TechBlog

Data Mesh?—?A A Data Movement and Processing Platform @ Netflix By Bo Lei , Guilherme Pires , James Shao , Kasturi Chatterjee , Sujay Jain , Vlad Sydorenko Background Realtime processing technologies (A.K.A

How Product Teams Can Build Empathy Through Experimentation

The Netflix TechBlog

A conversation between Travis Brooks, Netflix Product Manager for Experimentation Platform, and George Khachatryan, OfferFit CEO Note: I’ve known George for a little while now, and as we’ve talked a lot about the philosophy of experimentation, he kindly invited me to their office (virtually) for their virtual speaker series.

Reinforcement Learning for Budget Constrained Recommendations

The Netflix TechBlog

by Ehtsham Elahi with James McInerney , Nathan Kallus , Dario Garcia Garcia and Justin Basilico Introduction This writeup is about using reinforcement learning to construct an optimal list of recommendations when the user has a finite time budget to make a decision from the list of recommendations.

Rapid Event Notification System at Netflix

The Netflix TechBlog

By: Ankush Gulati , David Gevorkyan Additional credits: Michael Clark , Gokhan Ozer Intro Netflix has more than 220 million active members who perform a variety of actions throughout each session, ranging from renaming a profile to watching a title.

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

By Alok Tiagi , Hariharan Ananthakrishnan , Ivan Porto Carrero and Keerti Lakshminarayan Netflix has developed a network observability sidecar called Flow Exporter that uses eBPF tracepoints to capture TCP flows at near real time.

Life of a Netflix Partner Engineer?—?The case of extra 40 ms

The Netflix TechBlog

Life of a Netflix Partner Engineer?—?The The case of the extra 40 ms By: John Blair , Netflix Partner Engineering The Netflix application runs on hundreds of smart TVs, streaming sticks and pay TV set top boxes.

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

by Aryan Mehra with Farnaz Karimdady Sharifabad , Prasanna Vijayanathan , Chaïna Wade , Vishal Sharma and Mike Schassberger Aim and Purpose?—?Problem Problem Statement The purpose of this article is to give insights into analyzing and predicting “out of memory” or OOM kills on the Netflix App.

Netflix: A Culture of Learning

The Netflix TechBlog

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , Colin McFarland , Mihir Tendulkar , and Travis Brooks This is the last post in an overview series on experimentation at Netflix. Need to catch up?

How Netflix Content Engineering makes a federated graph searchable

The Netflix TechBlog

By Alex Hutter , Falguni Jhaveri and Senthil Sayeebaba Over the past few years Content Engineering at Netflix has been transitioning many of its services to use a federated GraphQL platform.

Experimentation is a major focus of Data Science across Netflix

The Netflix TechBlog

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

By Vikram Srivastava and Marcelo Mayworm Netflix has one of the most complex data platforms in the cloud on which our data scientists and engineers run batch and streaming workloads.

How Netflix Content Engineering makes a federated graph searchable (Part 2)

The Netflix TechBlog

By Alex Hutter , Falguni Jhaveri , and Senthil Sayeebaba In a previous post , we described the indexing architecture of Studio Search and how we scaled the architecture by building a config-driven self-service platform that allowed teams in Content Engineering to spin up search indices easily.

Remote Workstations for the Discerning Artists

The Netflix TechBlog

By Michelle Brenner Netflix is poised to become the world’s most prolific producer of visual effects and original animated content. To meet that demand, we need to attract the world’s best artistic talent.

A Survey of Causal Inference Applications at Netflix

The Netflix TechBlog

At Netflix, we want to entertain the world through creating engaging content and helping members discover the titles they will love. Key to that is understanding causal effects that connect changes we make in the product to indicators of member joy.

Bringing AV1 Streaming to Netflix Members’ TVs

The Netflix TechBlog

by Liwei Guo , Ashwin Kumar Gopi Valliammal , Raymond Tam , Chris Pham , Agata Opalach , Weibo Ni AV1 is the first high-efficiency video codec format with a royalty-free license from Alliance of Open Media (AOMedia), made possible by wide-ranging industry commitment of expertise and resources.

Media 207

What is an A/B Test?

The Netflix TechBlog

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , and Colin McFarland This is the second post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. See here for Part 1: Decision Making at Netflix.

Optimized shot-based encodes for 4K: Now streaming!

The Netflix TechBlog

by Aditya Mavlankar , Liwei Guo , Anush Moorthy and Anne Aaron Netflix has an ever-expanding collection of titles which customers can enjoy in 4K resolution with a suitable device and subscription plan.

How We Build Micro Frontends With Lattice

The Netflix TechBlog

Written by Michael Possumato , Nick Tomlin , Jordan Andree , Andrew Shim , and Rahul Pilani. As we continue to grow here at Netflix, the needs of Revenue and Growth Engineering are rapidly evolving; and our tools must also evolve just as rapidly.

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

The Netflix TechBlog

By Alex Borysov , Ricky Gardiner Background At Netflix, we heavily use gRPC for the purpose of backend to backend communication. When we process a request it is often beneficial to know which fields the caller is interested in and which ones they ignore.

Design 207

Scaling Appsec at Netflix (Part 2)

The Netflix TechBlog

By Astha Singhal , Lakshmi Sudheer , Julia Knecht The Application Security teams at Netflix are responsible for securing the software footprint that we create to run the Netflix product, the Netflix studio, and the business.

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

By Xiaomei Liu , Rosanna Lee , Cyril Concolato Introduction Behind the scenes of the beloved Netflix streaming service and content, there are many technology innovations in media processing. Packaging has always been an important step in media processing.

Cloud 202

Demystifying Interviewing for Backend Engineers @ Netflix

The Netflix TechBlog

By Karen Casella, Director of Engineering, Access & Identity Management Have you ever experienced one of the following scenarios while looking for your next role? You study and practice coding interview problems for hours/days/weeks/months, only to be asked to merge two sorted lists.

Open-Sourcing a Monitoring GUI for Metaflow

The Netflix TechBlog

Open-Sourcing a Monitoring GUI for Metaflow, Netflix’s ML Platform tl;dr Today, we are open-sourcing a long-awaited GUI for Metaflow. The Metaflow GUI allows data scientists to monitor their workflows in real-time, track experiments, and see detailed logs and results for every executed task.

Evolving Container Security With Linux User Namespaces

The Netflix TechBlog

By Fabio Kung , Sargun Dhillon , Andrew Spyker , Kyle , Rob Gulewich, Nabil Schear , Andrew Leung , Daniel Muino, and Manas Alekar As previously discussed on the Netflix Tech Blog, Titus is the Netflix container orchestration system.

Media 236

Fixing Performance Regressions Before they Happen

The Netflix TechBlog

Angus Croll Netflix is used by 222 million members and runs on over 1700 device types ranging from state-of-the-art smart TVs to low-cost mobile devices. At Netflix we’re proud of our reliability and we want to keep it that way.

Evolution of ML Fact Store

The Netflix TechBlog

by Vivek Kaushal At Netflix, we aim to provide recommendations that match our members’ interests. To achieve this, we rely on Machine Learning (ML) algorithms. ML algorithms can be only as good as the data that we provide to it.

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

By Andrew Nguonly , Armando Magalhães , Obi-Ike Nwoke , Shervin Afshar , Sreyashi Das , Tongliang Liu , Wei Liu , Yucheng Zeng Background Over the next few years, most content on Netflix will come from Netflix’s own Studio.

Edgar: Solving Mysteries Faster with Observability

The Netflix TechBlog

Edgar helps Netflix teams troubleshoot distributed systems efficiently with the help of a summarized presentation of request tracing, logs, analysis, and metadata. by Elizabeth Carretto Everyone loves Unsolved Mysteries. There’s always someone who seems like the surefire culprit.

Data pipeline asset management with Dataflow

The Netflix TechBlog

by Sam Setegne, Jai Balani, Olek Gorajek Glossary asset ?—?any any business logic code in a raw (e.g. SQL) or compiled (e.g. JAR) form to be executed as part of the user defined data pipeline. data pipeline ?—?a a set of tasks (or jobs) to be executed in a predefined order (a.k.a.

The Show Must Go On: Securing Netflix Studios At Scale

The Netflix TechBlog

Written by Jose Fernandez , Arthur Gonigberg , Julia Knecht , and Patrick Thomas In 2017, Netflix Studios was hitting an inflection point from a period of merely rapid growth to the sort of explosive growth that throws “how do we scale?” into every conversation.

ConsoleMe: A Central Control Plane for AWS Permissions and Access

The Netflix TechBlog

ConsoleMe: A Central Control Plane for AWS Permissions and Access By Curtis Castrapel , Patrick Sanders , and Hee Won Kim At AWS re:Invent 2020, we open sourced two new tools for managing multi-account AWS permissions and access.

AWS 214

Practical API Design at Netflix, Part 2: Protobuf FieldMask for Mutation Operations

The Netflix TechBlog

By Ricky Gardiner , Alex Borysov Background In our previous post , we discussed how we utilize FieldMask as a solution when designing our APIs so that consumers can request the data they need when fetched via gRPC.

Design 188

Interpreting A/B test results: false positives and statistical significance

The Netflix TechBlog

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , and Colin McFarland This is the third post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. Need to catch up?

Snaring the Bad Folks

The Netflix TechBlog

Project by Netflix’s Cloud Infrastructure Security team ( Alex Bainbridge , Mike Grima , Nick Siow) Cloud security is a hard problem, but an even harder one is cloud security at scale.

Optimizing the Aural Experience on Android Devices with xHE-AAC

The Netflix TechBlog

By Phill Williams and Vijay Gondi Introduction At Netflix, we are passionate about delivering great audio to our members. We began streaming 5.1 channel surround sound in 2010, Dolby Atmos in 2017 , and adaptive bitrate audio in 2019.