Remove AWS Remove Big Data Remove Scalability Remove Tuning
article thumbnail

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

Challenges The cloud network infrastructure that Netflix utilizes today consists of AWS services such as VPC, DirectConnect, VPC Peering, Transit Gateways, NAT Gateways, etc and Netflix owned devices. After several iterations of the architecture and some tuning, the solution has proven to be able to scale.

Network 325
article thumbnail

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

Usually Data scientists and engineers write Extract-Transform-Load (ETL) jobs and pipelines using big data compute technologies, like Spark or Presto , to process this data and periodically compute key information for a member or a video. The processed data is typically stored as data warehouse tables in AWS S3.

Latency 243
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

Central engineering teams provide paved paths (secure, vetted and supported options) and guard rails to help reduce variance in choices available for tools and technologies to support the development of scalable technical architectures.

article thumbnail

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

As Big data and ML became more prevalent and impactful, the scalability, reliability, and usability of the orchestrating ecosystem have increasingly become more important for our data scientists and the company. Another dimension of scalability to consider is the size of the workflow.

Java 202
article thumbnail

Optimizing data warehouse storage

The Netflix TechBlog

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. Orient: Gather tuning parameters for a particular table that changed. Also, respond to ad-hoc requests created manually by end-users.

Storage 203
article thumbnail

Music to my Ears - All Things Distributed

All Things Distributed

Werner Vogels weblog on building scalable and robust distributed systems. We see that with our Amazon customers; when they hear a great tune on a radio they may identify it using the Shazam or Soundhound apps on their mobile phone and buy that song instantly from the Amazon MP3 store. Driving Storage Costs Down for AWS Customers.

AWS 138