Data Engineering and Efficiency - Technology Performance Pulse

Our First Netflix Data Engineering Summit

The Netflix TechBlog

DECEMBER 14, 2023

Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines. The result was a series of talks which we are now sharing with the rest of the Data Engineering community! In this video, Sr. In this video, Sr.

Data Engineering

Data Engineering Engineering Software Engineering Best Practices

Automated Testing in Data Engineering: An Imperative for Quality and Efficiency

DZone

JANUARY 9, 2024

In the data-driven landscape of today, automation has become indispensable across industries, not just to maximize efficiency but, more importantly, to ensure quality. This holds true for the critical field of data engineering as well.

Data Engineering

Data Engineering Efficiency Engineering Testing

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. Let’s dive in!

Data Engineering

Data Engineering Engineering Processing Games

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

Welcome to the first post in our exciting series on mastering offline data pipeline's best practices, focusing on the potent combination of Apache Airflow and data processing engines like Hive and Spark. Working together, they form the backbone of many modern data engineering solutions.

Best Practices

Best Practices Data Engineering Big Data Games

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. the retry success probability) and compute cost efficiency (i.e., Multi-objective optimizations.

Tuning

Tuning Efficiency Big Data Engineering

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

AUGUST 14, 2019

Maintaining Uber’s large-scale data warehouse comes with an operational cost in terms of ETL functions and storage. In our experience, optimizing for operational efficiency requires answering one key question: for which tables does the maintenance cost supersede utility?

Efficiency

Efficiency Engineering Design Storage

2. Diving Deeper into Psyberg: Stateless vs Stateful Data Processing

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty In the inaugural blog post of this series, we introduced you to the state of our pipelines before Psyberg and the challenges with incremental processing that led us to create the Psyberg framework within Netflix’s Membership and Finance data engineering team.

Processing

Processing Data Engineering Efficiency Analytics

3. Psyberg: Automated end to end catch up

The Netflix TechBlog

NOVEMBER 14, 2023

By focusing solely on updates and avoiding reprocessing of data based on a fixed lookback window, both Stateless and Stateful Data Processing maintain a minimal change footprint. This approach ensures data processing is both efficient and accurate.

Tuning

Tuning Processing C++ Efficiency

How TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day

High Scalability

JUNE 15, 2020

This is a guest post by Eunice Do , Data Engineer at TripleLift , a technology company leading the next generation of programmatic advertising. The system is the data pipeline at TripleLift. TripleLift is an adtech company, and like most companies in this industry, we deal with high volumes of data on a daily basis.

Processing

Processing Data Engineering Efficiency Engineering

What is IT automation?

Dynatrace

JULY 6, 2022

Ultimately, IT automation can deliver consistency, efficiency, and better business outcomes for modern enterprises. Automating IT practices offers enterprises faster data centers and cloud operations, as well as increased flexibility and accuracy. IT automation tools can achieve enterprise-wide efficiency. Read eBook now!

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

As a micro-service owner, a Netflix engineer is responsible for its innovation as well as its operation, which includes making sure the service is reliable, secure, efficient and performant. In the Efficiency space, our data teams focus on transparency and optimization.

Infrastructure

Infrastructure Cloud Scalability AWS

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Under the hood, Titus is powered by Kubernetes , but it provides a thick layer of enhancements over off-the-shelf Kubernetes, to make it more observable , secure , scalable , and cost-efficient. Internally, we use a production workflow orchestrator called Maestro.

Systems

Systems Media Cache Open Source

How Our Paths Brought Us to Data and Netflix

The Netflix TechBlog

SEPTEMBER 18, 2020

[Julie] Chris and I have the same primary stakeholders (or engineering team that we support): Encoding Technologies. They are continuously innovating compression algorithms to efficiently send high quality audio and video files to our customers over the internet. Is the benefit uniform, or do certain cohorts of members?—?such

Analytics

Analytics Education Innovation Engineering

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

At Netflix, our data scientists span many areas of technical specialization, including experimentation, causal inference, machine learning, NLP, modeling, and optimization. Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group.

Analytics

Analytics Innovation C++ Engineering

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 17, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Java Software Engineering

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 3, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Games Engineering Java

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 18, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Games Engineering Java

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Software Engineering Scalability Engineering

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Software Engineering Engineering Big Data

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 9, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Games Engineering Java

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 30, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Software Engineering Engineering Big Data

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

3:15pm-4:15pm OPN 209 Netflix’s application deployment at scale Andy Glover , Director Delivery Engineering & Paul Roberts, AWS Abstract : Spinnaker is an open-source continuous-delivery platform created by Netflix to improve its developers’ efficiency and reduce the time it takes to get an application into production.

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

3:15pm-4:15pm OPN 209 Netflix’s application deployment at scale Andy Glover , Director Delivery Engineering & Paul Roberts, AWS Abstract : Spinnaker is an open-source continuous-delivery platform created by Netflix to improve its developers’ efficiency and reduce the time it takes to get an application into production.

AWS

AWS Entertainment Open Source Benchmarking

Google Announces the General Availability of A2 Virtual Machines

InfoQ

APRIL 7, 2021

According to the company, the A2 VMs will allow customers to run their NVIDIA CUDA-enabled machine learning (ML) and high-performance computing (HPC) scale-out and scale-up workloads efficiently at a lower cost. By Steef-Jan Wiggers.

Virtualization

Virtualization Google Availability Engineering

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Adrian Cockcroft

FEBRUARY 13, 2023

STP213 Scaling global carbon footprint management — Blake Blackwell Persefoni Manager Data Engineering and Michael Floyd AWS Head of Sustainability Solutions. This is the AWS Professional Services built tooling that customers can use to track the carbon footprint of their operations and processes, along with a customer example.

AWS

AWS Energy Architecture Programming

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits. This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture.

Storage

Storage Latency Efficiency Data Engineering

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns. We will show how we are building a clean and efficient incremental processing solution (IPS) by using Netflix Maestro and Apache Iceberg. Users configure the workflow to read the data in a window (e.g.

Processing

Processing Big Data Efficiency Engineering

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

Unfortunately, building data pipelines remains a daunting, time-consuming, and costly activity. Not everyone is operating at Netflix or Spotify scale data engineering function. Often companies underestimate the necessary effort and cost involved to build and maintain data pipelines.

Latency

Latency Analytics Scalability Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This talk explores the journey, learnings, and improvements to performance analysis, efficiency, reliability, and security. Our data scientists are expected to develop and operate large machine learning workflows autonomously without the need to be deeply experienced with systems or data engineering.

AWS

AWS Entertainment Open Source Benchmarking

ETL Workflow Modeling

Abhishek Tiwari

APRIL 14, 2018

First and foremost, modeling ETL process helps in designing an efficient, robust and evolvable ETL. Second, modeling ETL workflow plays an important role to optimize the data warehousing. Modeling of ETL workflow is important for several reasons.

Best Practices

Best Practices Database Efficiency Design

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

Data Pipelines: The Hammer for Every Nail

Abhishek Tiwari

JULY 7, 2023

Airflow provides rich scheduling and execution semantics enabling data engineers to easily define complex pipelines, running at regular intervals. Workflow platforms have emerged as a crucial component for these companies, enabling them to orchestrate complex tasks, automate processes, and ensure efficient collaboration.

Logistics

Logistics Transportation Scalability Data Engineering

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

It is a general-purpose workflow orchestrator that provides a fully managed workflow-as-a-service (WAAS) to the data platform at Netflix. It serves thousands of users, including data scientists, data engineers, machine learning engineers, software engineers, content producers, and business analysts, for various use cases.

Java

Java Scalability Traffic Architecture

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A unified data management (UDM) system combines the best of data warehouses, data lakes, and streaming without expensive and error-prone ETL. It offers reliability and performance of a data warehouse, real-time and low-latency characteristics of a streaming system, and scale and cost-efficiency of a data lake.

Big Data

Big Data Artificial Intelligence Storage Hardware

Reimagining Experimentation Analysis at Netflix

The Netflix TechBlog

SEPTEMBER 10, 2019

Our data scientists faced numerous challenges in our previous infrastructure. Complex business logic was embedded directly into the ETL pipelines by data engineers. In order to replicate results, scientists had to delve deep into the data, code, and documentation.

Metrics

Metrics Architecture Infrastructure Innovation

Experimentation is a major focus of Data Science across Netflix

The Netflix TechBlog

JANUARY 11, 2022

To learn about Analytics and Viz Engineering, have a look at Analytics at Netflix: Who We Are and What We Do by Molly Jackman & Meghana Reddy and How Our Paths Brought Us to Data and Netflix by Julie Beckley & Chris Pham. Curious to learn about what it’s like to be a Data Engineer at Netflix?

Innovation

Innovation Metrics Engineering Testing

Our First Netflix Data Engineering Summit

Automated Testing in Data Engineering: An Imperative for Quality and Efficiency

Trending Sources

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Data Engineers of Netflix?—?Interview with Samuel Setegne

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

2. Diving Deeper into Psyberg: Stateless vs Stateful Data Processing

3. Psyberg: Automated end to end catch up

How TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day

What is IT automation?

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Supporting Diverse ML Systems at Netflix

How Our Paths Brought Us to Data and Netflix

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Google Announces the General Availability of A2 Virtual Machines

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Optimizing data warehouse storage

Incremental Processing using Netflix Maestro and Apache Iceberg

Friends don't let friends build data pipelines

Netflix at AWS re:Invent 2019

ETL Workflow Modeling

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Data Pipelines: The Hammer for Every Nail

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

5 data integration trends that will define the future of ETL in 2018

Reimagining Experimentation Analysis at Netflix

Experimentation is a major focus of Data Science across Netflix

Stay Connected