article thumbnail

Chaos Data Engineering Manifesto: 5 Laws for Successful Failures

DZone

A powerful surge of traffic is inevitable. It's midnight in the dim and cluttered office of The New York Times, currently serving as the "situation room." During every major election, the wave would crest and crash against our overwhelmed systems before receding, allowing us to assess the damage.

article thumbnail

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

InfoQ

HubSpot adopted routing messages over multiple Kafka topics (called swimlanes) for the same producer to avoid the build-up in the consumer group lag and prioritize the processing of real-time traffic. By Rafal Gancarz

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

We started seeing signs of scale issues, like: Slowness during peak traffic moments like 12 AM UTC, leading to increased operational burden. At Netflix, the peak traffic load can be a few orders of magnitude higher than the average load. Hence, the system has to withstand bursts in traffic while still maintaining the SLO requirements.

Java 202
article thumbnail

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. By Rafal Gancarz

Cache 83
article thumbnail

Experimentation is a major focus of Data Science across Netflix

The Netflix TechBlog

To learn about Analytics and Viz Engineering, have a look at Analytics at Netflix: Who We Are and What We Do by Molly Jackman & Meghana Reddy and How Our Paths Brought Us to Data and Netflix by Julie Beckley & Chris Pham. Curious to learn about what it’s like to be a Data Engineer at Netflix?

article thumbnail

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

While our engineering teams have and continue to build solutions to lighten this cognitive load (better guardrails, improved tooling, …), data and its derived products are critical elements to understanding, optimizing and abstracting our infrastructure. What will be the cost of rolling out the winning cell of an AB test to all users?

article thumbnail

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

VPC Flow Logs VPC Flow Logs is an AWS feature that captures information about the IP traffic going to and from network interfaces in a VPC. At Netflix we publish the Flow Log data to Amazon S3. We also adjusted the frequency in which Spark app instances are spun up such that any backlog would burn off during a trough in traffic.

Network 150