AWS, Big Data and Data Engineering - Technology Performance Pulse

AWS

Big Data

Data Engineering

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

This post focuses on elevating our data engineering game, streamlining your data workflows, and significantly cutting computing costs. The need to optimize offline data pipeline optimization has become a necessity with the growing complexity and scale of modern data pipelines.

Best Practices

Best Practices Data Engineering Big Data Games

Trending Sources

Dynatrace

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

All Things Distributed

NOVEMBER 15, 2016

As I mentioned, we live in a world where massive volumes of data are being generated, every day, from connected devices, websites, mobile apps, and customer applications running on top of AWS infrastructure. Auto-discovery : One of the challenges with BI is discovering and accessing the data. Enter Amazon QuickSight.

Analytics

Analytics Availability Media Social Media

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

However, the data infrastructure to collect, store and process data is geared toward developers (e.g., In AWS’ quest to enable the best data storage options for engineers, we have built several innovative database solutions like Amazon RDS, Amazon RDS for Aurora, Amazon DynamoDB, and Amazon Redshift.

Cloud

Cloud Big Data AWS Analytics

Data Pipelines: The Hammer for Every Nail

Abhishek Tiwari

JULY 7, 2023

In the era of big data and complex data processing, data pipelines have emerged as a popular solution for managing and manipulating data. They provide a systematic approach to extract, transform, and load (ETL) data from various sources, enabling organizations to derive valuable insights.

Logistics

Logistics Transportation Scalability Data Engineering

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

Service Segmentation: The ease of the cloud deployments has led to the organic growth of multiple AWS accounts, deployment practices, interconnection practices, etc. VPC Flow Logs VPC Flow Logs is an AWS feature that captures information about the IP traffic going to and from network interfaces in a VPC.

Network

Network Tuning AWS Big Data

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” Netflix’s diverse data landscape made it challenging to capture all the right data and conforming it to a common data model.

Infrastructure

Infrastructure Big Data Transportation Architecture

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

Technology Performance Pulse

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Trending Sources

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

Expanding the Cloud: Introducing Amazon QuickSight

Data Pipelines: The Hammer for Every Nail

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Optimizing data warehouse storage

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Stay Connected