article thumbnail

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. What is late-arriving data? Let’s dive in!

article thumbnail

Secrets Detection: Optimizing Filter Processes

DZone

In a previous article , we explained how we built benchmarks to keep track of those three metrics: precision, recall, and the most important here, speed. These benchmarks taught us a lot about the true internals of our engine at runtime and led to our first improvements.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

3. Psyberg: Automated end to end catch up

The Netflix TechBlog

In the previous installments of this series, we introduced Psyberg and delved into its core operational modes: Stateless and Stateful Data Processing. Pipelines After Psyberg Let’s explore how different modes of Psyberg could help with a multistep data pipeline. In this case, the minimum hour to process the data is hour 2.

Tuning 244
article thumbnail

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

Data: Fast Data Our main data lake is hosted on S3, organized as Apache Iceberg tables. For ETL and other heavy lifting of data, we mainly rely on Apache Spark. In addition to Spark, we want to support last-mile data processing in Python, addressing use cases such as feature transformations, batch inference, and training.

Systems 226
article thumbnail

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

by Jun He , Yingyi Zhang , and Pawan Dixit Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset.

article thumbnail

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

At Netflix, our data scientists span many areas of technical specialization, including experimentation, causal inference, machine learning, NLP, modeling, and optimization. Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group.

Analytics 207
article thumbnail

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Adrian Cockcroft

I asked around and heard that they are still working on it, but the AWS hiring freeze means that they don’t have the headcount they expected and are making slow progress on an API, more detailed metrics, and scope 3, which everyone is waiting for. Portfolio is currently reducing Amazons carbon footprint by 19 Million Metric Tons of CO2e.

AWS 64