article thumbnail

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin, what drew you to data engineering?

article thumbnail

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

For example, a job would reprocess aggregates for the past 3 days because it assumes that there would be late arriving data, but data prior to 3 days isn’t worth the cost of reprocessing. Backfill: Backfilling datasets is a common operation in big data processing. data arrives too late to be useful).

article thumbnail

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

These characteristics allow for an on-call response time that is relaxed and more in line with traditional big data analytical pipelines. Spark could look up and retrieve the data in the s3 files that the Mouthful represented. And excellent logging is needed for debugging purposes and supportability.

Network 150
article thumbnail

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

Today, I am excited to share with you a brand new service called Amazon QuickSight that aims to simplify the process of deriving insights from a wide variety of data sources in a fast and affordable manner. Big data challenges. We believe this is one of the critical parts of our big data offerings.

Cloud 137
article thumbnail

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

In 2018, we will see new data integration patterns those rely either on a shared high-performance distributed storage interface ( Alluxio ) or a common data format ( Apache Arrow ) sitting between compute and storage. For instance, Alluxio, originally known as Tachyon, can potentially use Arrow as its in-memory data structure.

article thumbnail

Reimagining Experimentation Analysis at Netflix

The Netflix TechBlog

This enables us to optimize their experience at speed. Instead of relying on engineers to productionize scientific contributions, we’ve made a strategic bet to build an architecture that enables data scientists to easily contribute. Our data scientists faced numerous challenges in our previous infrastructure.

Metrics 215