Big Data, Engineering, Storage and Systems - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages. When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results.

Big Data

Big Data Database Artificial Intelligence Open Source

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. This system has been designed to supplement and succeed the existing Hadoop-based system that had too high latency of data processing and too high maintenance costs.

Big Data

Big Data Processing Lambda Database

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Performance.

Big Data

Big Data Storage Benchmarking Hardware

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Uber Engineering

APRIL 5, 2018

Three years ago, Uber Engineering adopted Hadoop as the storage ( HDFS ) and compute ( YARN ) infrastructure for our organization’s big data analysis.

Systems

Systems Big Data Storage Infrastructure

What is container orchestration?

Dynatrace

MARCH 24, 2023

Containers enable developers to package microservices or applications with the libraries, configuration files, and dependencies needed to run on any infrastructure, regardless of the target system environment. This means organizations are increasingly using Kubernetes not just for running applications, but also as an operating system.

Infrastructure

Infrastructure Open Source Operating System Cloud

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Teams have introduced workarounds to reduce storage costs. Additionally, efforts such as lowered data retention times, two-tiered storage systems, shaky index management, sampled data, and data pipelines reduce the overall amount of stored data. Dynatrace discovers logs automatically at scale.

Analytics

Analytics Artificial Intelligence Storage Serverless

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

As Kubernetes adoption increases and it continues to advance technologically, Kubernetes has emerged as the “operating system” of the cloud. Kubernetes is emerging as the “operating system” of the cloud. Kubernetes is emerging as the “operating system” of the cloud. Kubernetes moved to the cloud in 2022.

Open Source

Open Source Java Operating System Programming

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

With these goals in mind, two in-memory data stores, Redis and Memcached, have emerged as the top contenders. This article will explore how they handle data storage and scalability, perform in different scenarios, and, most importantly, how these factors influence your choice. Data transfer technology. 3d render.

Cache

Cache Storage Scalability Architecture

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Understanding Hybrid Cloud Strategy A hybrid cloud merges the capabilities of public and private clouds into a singular, coherent system. This combination allows for the fluid movement of data and applications across different environments, facilitating shared workloads seamlessly. The tool must be compatible with your current systems.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

In fact, Gartner estimates that 80% of enterprises will shut down their on-premises data centers by 2025. This transition to public, private, and hybrid cloud is driving organizations to automate and virtualize IT operations to lower costs and optimize cloud processes and systems. So, what is ITOps? Why is IT operations important?

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud.

Big Data

Big Data Analytics AWS Cloud

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

All Things Distributed

AUGUST 20, 2012

Werner Vogels weblog on building scalable and robust distributed systems. Managing Cold Storage with Amazon Glacier. With the introduction of Amazon Glacier , IT organizations now have a solution that removes the headaches of digital archiving and provides extremely low cost storage. All Things Distributed. Comments ().

Storage

Storage Cloud AWS Media

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

As a production system within Microsoft capturing around a quadrillion events and indexing 16 trillion search keys per day it would be interesting in its own right, but there’s a lot more to it than that. These two narratives of reference architecture and ingestion/indexing system are interwoven throughout the paper.

Cloud

Cloud Big Data Latency Architecture

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

How are we managing the torrent of telemetry that flows into analytics systems from these devices? Incoming data is saved into data storage (historian database or log store) for query by operational managers who must attempt to find the highest priority issues that require their attention. The list goes on.

IoT

IoT Analytics Big Data Architecture

Structural Evolutions in Data

O'Reilly

SEPTEMBER 19, 2023

It progressed from “raw compute and storage” to “reimplementing key services in push-button fashion” to “becoming the backbone of AI work”—all under the umbrella of “renting time and storage on someone else’s computers.” ” (It will be easier to fit in the overhead storage.)

Hardware

Hardware Storage Big Data Blockchain

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A unified data management (UDM) system combines the best of data warehouses, data lakes, and streaming without expensive and error-prone ETL. It offers reliability and performance of a data warehouse, real-time and low-latency characteristics of a streaming system, and scale and cost-efficiency of a data lake.

Big Data

Big Data Artificial Intelligence Storage Hardware

Expanding the Cloud: Introducing the AWS Asia Pacific (Mumbai) Region

All Things Distributed

JUNE 26, 2016

A region in India has been highly sought after by companies around the world who want to participate in one of the most significant economic opportunities in the world – India, a rising economy that holds tremendous promise for growth, a thriving technology hub with a rich eco-system of technology talent, and more.

AWS

AWS Cloud Healthcare Blockchain

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. And while many of our systems are based on the latest in computer science research, this often hasnt been sufficient: our architects and engineers have had to advance research in directions that no academic had yet taken. All Things Distributed.

Technology

Technology Technology AWS Storage

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

AliGraph covers Alibaba’s distributed graph engine supporting the development of new GNN applications. Their dataset has about 7B edges… Meanwhile, AnalyticDB is Alibaba’s real-time OLAP RDBMS handling 10PB of data (in excess of 100 trillion rows!). Autoscaling tiered cloud storage in Anna. Research papers. (In Yes please!

Blockchain

Blockchain Hardware Google Analytics

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. As some of you may remember I was pretty excited when Amazon Simple Storage Service (S3) released its website feature such that I could serve this weblog completely from S3. Driving Storage Costs Down for AWS Customers. All Things Distributed.

Servers

Servers Social Media AWS Website

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

This aspect of NoSQL is well-studied both in practice and theory because specific non-functional properties are often the main justification for NoSQL usage and fundamental results on distributed systems like the CAP theorem apply well to NoSQL systems. Full Text Search Engines: Apache Lucene, Apache Solr. 2) Aggregates.

Database

Database Ecommerce Efficiency Engineering

Expanding the AWS Cloud – Introducing the AWS Europe (Stockholm) Region

All Things Distributed

DECEMBER 12, 2018

million vehicles in more than 75 countries with services like car locator, engine remote start, driving journal, heater start, and stolen vehicle tracking. We help Supercell to quickly develop, deploy, and scale their games to cope with varying numbers of gamers accessing the system throughout the course of the day.

AWS

AWS Cloud Games Serverless

Software Testing Trends 2021 – What can we expect?

Testsigma

FEBRUARY 12, 2021

The Internet of Things is generally referred to as IoT which encompasses computers, cars, houses or some other technological system related. According to Gartner, the greatest technological developments in 2021 will influence the future from technology affecting how people operate, to AI engineering and hyperautomation.

Artificial Intelligence

Artificial Intelligence Software Software IoT

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 30, 2018

USENIX’s LISA conference is the premier event for topics in production system engineering. We both have had long careers supporting system administration, and LISA has always felt like a homecoming, reuniting with old friends while welcoming newcomers. Join us for 3 days in Nashville at LISA'18.

DevOps

DevOps Network Best Practices Programming

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 29, 2018

USENIX’s LISA conference is the premier event for topics in production system engineering. We both have had long careers supporting system administration, and LISA has always felt like a homecoming, reuniting with old friends while welcoming newcomers. Join us for 3 days in Nashville at LISA'18.

DevOps

DevOps Network Best Practices Programming

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Our smart phones and tablets are obvious examples, but many other devices are quickly gaining these capabilities; TV Sets and Hifi systems are internet enabled, and soon our treadmills and automobiles will be equally plugged into the digital world. Comments ().

AWS

AWS Cloud Storage Internet

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

All Things Distributed

JANUARY 19, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Flexibility is one of the key principles of Amazon Web Services - developers can select any programming language and software package, any operating system, any middleware and any database to build systems and applications that meet their requirements.

AWS

AWS Cloud Java Scalability

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

Werner Vogels weblog on building scalable and robust distributed systems. I am very excited that today we have launched Amazon Route 53, a high-performance and highly-available Domain Name System (DNS) service. Naming is one of the fundamental concepts in Distributed Systems. By Werner Vogels on 05 December 2010 02:00 PM.

Cloud

Cloud Internet Internet AWS

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Not just for HPC but for mission critical enterprise systems such as OLTP. Cluster Computer Instances are similar to other Amazon EC2 instances but have been specifically engineered to provide high performance compute and networking. Comments ().

Cloud

Cloud AWS Automotive Latency

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

All Things Distributed

NOVEMBER 19, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Big news this week was of course the launch of Cluster GPU instances for Amazon EC2. Science & Engineering. an engineering adventure to break the 1,000 mph barrier in a car. Driving Storage Costs Down for AWS Customers. Comments ().

AWS

AWS Cloud Benchmarking Storage

Around the World in 28 Days - All Things Distributed

All Things Distributed

SEPTEMBER 30, 2010

Werner Vogels weblog on building scalable and robust distributed systems. There is huge variety in exiting architectures and I am often impressed about the ingenuity of the engineers in how to best transform the application if "Lift & Shift" is not an option. Driving Storage Costs Down for AWS Customers. All Things Distributed.

AWS

AWS Storage Cloud Best Practices

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Now that our ability to generate higher and higher clock rates has stalled and CPU architectural improvements have shifted focus towards multiple cores, we see that it is becoming harder to efficiently use these computer systems. All Things Distributed.

AWS

AWS Latency Programming Architecture

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

All Things Distributed

APRIL 28, 2010

Werner Vogels weblog on building scalable and robust distributed systems. There are four main reasons to do so: Performance - For many applications and services, data access latency to end users is important. You need to be able to place your systems in locations where you can minimize the distance to your most important customers.

AWS

AWS Cloud Latency Storage

Should You Use ClickHouse as a Main Operational Database?

Percona

JANUARY 14, 2019

Actually, we can simulate an additional index set by creating a materialized view in ClickHouse : create materialized view rc_id_v ENGINE MergeTree() PARTITION BY toYYYYMM(toDate(created_utc)) ORDER BY (id) POPULATE AS SELECT id, created_utc from rc; Here I’m creating a materialized view and populating it initially from the main (rc) table.

Database

Database Analytics Blockchain Healthcare

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

This blog post gives a glimpse of the computer systems research papers presented at the USENIX Annual Technical Conference (ATC) 2019, with an emphasis on systems that use new hardware architectures. USENIX ATC is a top-tier venue with a broad range of systems research papers from both industry and academia. Heterogeneous ISA.

Architecture

Architecture Hardware Cache Storage

What is Greenplum Database? Intro to the Big Data Database

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Trending Sources

Optimizing data warehouse storage

In-Stream Big Data Processing

Kubernetes for Big Data Workloads

Scaling Uber’s Apache Hadoop Distributed File System for Growth

What is container orchestration?

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Kubernetes in the wild report 2023

Redis vs Memcached in 2024

Mastering Hybrid Cloud Strategy

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Driving down the cost of Big-Data analytics - All Things Distributed

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

Helios: hyperscale indexing for the cloud & edge – part 1

The Need for Real-Time Device Tracking

Structural Evolutions in Data

5 data integration trends that will define the future of ETL in 2018

Expanding the Cloud: Introducing the AWS Asia Pacific (Mumbai) Region

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

No Server Required - Jekyll & Amazon S3 - All Things Distributed

NoSQL Data Modeling Techniques

Expanding the AWS Cloud – Introducing the AWS Europe (Stockholm) Region

Software Testing Trends 2021 – What can we expect?

USENIX LISA 2018: CFP Now Open

USENIX LISA 2018: CFP Now Open

Music to my Ears - All Things Distributed

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

Around the World in 28 Days - All Things Distributed

Amazon EC2 Cluster GPU Instances - All Things Distributed

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

Should You Use ClickHouse as a Main Operational Database?

The Winds of Architecture Changes at the USENIX ATC 2019

Stay Connected