Remove AWS Remove Big Data Remove Definition Remove Tuning
article thumbnail

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java 202
article thumbnail

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

In addition, we derive lineage information from scheduled ETL jobs by extracting workflow definitions and runtime metadata using Meson scheduler APIs. Netflix’s diverse data landscape made it challenging to capture all the right data and conforming it to a common data model.

article thumbnail

Optimizing data warehouse storage

The Netflix TechBlog

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. Orient: Gather tuning parameters for a particular table that changed. Also, respond to ad-hoc requests created manually by end-users.

Storage 203