article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience.

Traffic 339
article thumbnail

Congestion Control in Cloud Scale Distributed Systems

DZone

Distributed systems are composed of multiple systems that are wired together to provide a specific functionality. Systems that operate at a cloud scale can get expected or unexpected surges of traffic from one or multiple callers and are expected to perform in a predictable manner.

Systems 290
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Black Friday traffic exposes gaps in observability strategies

Dynatrace

What’s the problem with Black Friday traffic? But that’s difficult when Black Friday traffic brings overwhelming and unpredictable peak loads to retailer websites and exposes the weakest points in a company’s infrastructure, threatening application performance and user experience. These kinds of problems are unacceptable.

Traffic 191
article thumbnail

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems.

Systems 226
article thumbnail

What is a Distributed Storage System

Scalegrid

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. This guide delves into how these systems work, the challenges they solve, and their essential role in businesses and technology.

Storage 130
article thumbnail

Apollo Router Performance Monitoring with OpenTelemetry and Splunk APM

DZone

Built using Rust, it offers a high degree of flexibility, loose coupling, and exceptional performance. This self-hosted graph routing solution is highly configurable, making it an ideal choice for developers who require a high-performance routing system.

article thumbnail

Jaeger and ScyllaDB Integration: High Performance at Scale

DZone

With the rise of microservices and cloud-native applications, Jaeger has become a crucial tool for developers and system administrators to gain insights into the performance and behavior of their applications. Use the best-performing Jaeger storage backend that you can find.