Data, Data Engineering, Scalability and Storage

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

DZone

JULY 3, 2023

Data engineering projects often require the setup and management of complex infrastructures that support data processing, storage, and analysis. In this article, we will explore the benefits of leveraging IaC for data engineering projects and provide detailed implementation steps to get started.

Data Engineering

Data Engineering Infrastructure Code Engineering

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs

InfoQ

DECEMBER 29, 2023

Zendesk reduced its data storage costs by over 80% by migrating from DynamoDB to a tiered storage solution using MySQL and S3. By Rafal Gancarz

Storage

Storage Scalability Database Technology

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

Building data pipelines can offer strategic advantages to the business. Often companies underestimate the necessary effort and cost involved to build and maintain data pipelines. Data pipeline initiatives are generally unfinished projects. In this post, we will discuss why you should avoid building data pipelines in first place.

Latency

Latency Analytics Scalability Engineering

Article: Design Pattern Proposal for Autoscaling Stateful Systems

InfoQ

JANUARY 25, 2023

In this article, Rogerio Robetti discusses the challenges in auto-scaling stateful storage systems and proposes an opinionated design solution to automatically scale up (vertical) and scale out (horizontal) from a single node up to several nodes in a cluster with minimum configuration and interference of the operator. By Rogerio Robetti

Design

Design Systems Storage Data Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint.

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint.

AWS

AWS Entertainment Open Source Benchmarking

Back-to-Basics Weekend Reading - The 5 Minute Rule - All Things.

All Things Distributed

AUGUST 24, 2012

Werner Vogels weblog on building scalable and robust distributed systems. The AWS team launched this week Amazon Glacier , a cold storage archive service at the very low price point of $0.01 Which makes this week a good moment to read up on some of the historical work around the costs of data engineering. Comments ().

Storage

Storage Hardware AWS Data Engineering

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

We live in a world where massive volumes of data are generated from websites, connected devices and mobile apps. In such a data intensive environment, making key business decisions such as running marketing and sales campaigns, logistic planning, financial analysis and ad targeting require deriving insights from these data.

Cloud

Cloud Big Data AWS Analytics

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint.

AWS

AWS Entertainment Open Source Benchmarking

Technology Performance Pulse

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

Optimizing data warehouse storage

Trending Sources

Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs

Kubernetes for Big Data Workloads

Friends don't let friends build data pipelines

Article: Design Pattern Proposal for Autoscaling Stateful Systems

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Back-to-Basics Weekend Reading - The 5 Minute Rule - All Things.

Expanding the Cloud: Introducing Amazon QuickSight

Netflix at AWS re:Invent 2019

Stay Connected