Big Data, Data Engineering and Strategy - Technology Performance Pulse

Big Data

Data Engineering

Strategy

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

Welcome to the first post in our exciting series on mastering offline data pipeline's best practices, focusing on the potent combination of Apache Airflow and data processing engines like Hive and Spark. Working together, they form the backbone of many modern data engineering solutions.

Best Practices

Best Practices Data Engineering Big Data Games

What is IT automation?

Dynatrace

JULY 6, 2022

And what are the best strategies to reduce manual labor so your team can focus on more mission-critical issues? This requires significant data engineering efforts, as well as work to build machine-learning models. Big data automation tools. Creating a sound IT automation strategy. So, what is IT automation?

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Join 5,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

Dynatrace

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

At Netflix, our data scientists span many areas of technical specialization, including experimentation, causal inference, machine learning, NLP, modeling, and optimization. Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group.

Analytics

Analytics C++ Innovation Engineering

A case for ELT

Abhishek Tiwari

DECEMBER 22, 2017

Cheap storage and on-demand compute in the cloud coupled with the emergence of new big data frameworks and tools are forcing us to rethink the whole ETL and data warehousing architecture. There is a strong argument for ELT i.e. extract, load, and transform model. Classic ETL. Late transformation. Challenges.

Big Data

Big Data Retail Storage Google

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

We at Netflix, as a streaming service running on millions of devices, have a tremendous amount of data about device capabilities/characteristics and runtime data in our big data platform. With large data, comes the opportunity to leverage the data for predictive and classification based analysis.

Big Data

Big Data Cache Engineering Data Engineering

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

For example, a job would reprocess aggregates for the past 3 days because it assumes that there would be late arriving data, but data prior to 3 days isn’t worth the cost of reprocessing. Backfill: Backfilling datasets is a common operation in big data processing. data arrives too late to be useful).

Processing

Processing Big Data Efficiency Engineering

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

Some of the optimizations are prerequisites for a high-performance data warehouse. Sometimes Data Engineers write downstream ETLs on ingested data to optimize the data/metadata layouts to make other ETL processes cheaper and faster.

Storage

Storage Latency Efficiency Data Engineering

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

Today, I am excited to share with you a brand new service called Amazon QuickSight that aims to simplify the process of deriving insights from a wide variety of data sources in a fast and affordable manner. Big data challenges. We believe this is one of the critical parts of our big data offerings.

Cloud

Cloud Big Data AWS Analytics

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

What is IT automation?

Trending Sources

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

A case for ELT

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

Incremental Processing using Netflix Maestro and Apache Iceberg

Optimizing data warehouse storage

Expanding the Cloud: Introducing Amazon QuickSight

Stay Connected