article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience.

Traffic 339
article thumbnail

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems. ETL workflows), as well as downstream (e.g.

Systems 226
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

To do this, we devised a novel way to simulate the projected traffic weeks ahead of launch by building upon the traffic migration framework described here. New content or national events may drive brief spikes, but, by and large, traffic is usually smoothly increasing or decreasing.

Traffic 342
article thumbnail

FIFO vs. LIFO: Which Queueing Strategy Is Better for Availability and Latency?

DZone

As an engineer, you probably know that server performance under heavy load is crucial for maintaining the availability and responsiveness of your services. But what happens when traffic bursts overwhelm your system? Queueing requests is a common solution, but what's the best approach: FIFO or LIFO?

Strategy 141
article thumbnail

What is log management? How to tame distributed cloud system complexities

Dynatrace

Log management is an organization’s rules and policies for managing and enabling the creation, transmission, analysis, storage, and other tasks related to IT systems’ and applications’ log data. Distributed cloud systems are complex, dynamic, and difficult to manage without the proper tools. What is log management?

Systems 186
article thumbnail

Managing High Availability in PostgreSQL – Part III: Patroni

Scalegrid

In the final post of this series, we will review the last solution, Patroni by Zalando, and compare all three at the end so you can determine which high availability framework is best for your PostgreSQL hosting deployment. Managing High Availability in PostgreSQL – Part I: PostgreSQL Automatic Failover. Patroni for PostgreSQL.

article thumbnail

MySQL High Availability Framework Explained – Part III: Failover Scenarios

High Scalability

In this three-part blog series, we introduced a High Availability (HA) Framework for MySQL hosting in Part I, and discussed the details of MySQL semisynchronous replication in Part II. Now in Part III, we review how the framework handles some of the important MySQL failure scenarios and recovers to ensure high availability.