Big Data, Engineering, Scalability and Storage - Technology Performance Pulse

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Performance.

Big Data

Big Data Storage Benchmarking Hardware

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The design of the in-stream processing engine itself was driven by the following requirements: SQL-like functionality. Strict fault-tolerance is a principal requirement for the engine.

Big Data

Big Data Processing Lambda Database

What is container orchestration?

Dynatrace

MARCH 24, 2023

Problems include provisioning and deployment; load balancing; securing interactions between containers; configuration and allocation of resources such as networking and storage; and deprovisioning containers that are no longer needed. How does container orchestration work?

Infrastructure

Infrastructure Open Source Operating System Cloud

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud.

Big Data

Big Data Analytics AWS Cloud

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Most Kubernetes clusters in the cloud (73%) are built on top of managed distributions from the hyperscalers like AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). Through effortless provisioning, a larger number of small hosts provide a cost-effective and scalable platform.

Open Source

Open Source Java Operating System Programming

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

All Things Distributed

AUGUST 20, 2012

Werner Vogels weblog on building scalable and robust distributed systems. Managing Cold Storage with Amazon Glacier. With the introduction of Amazon Glacier , IT organizations now have a solution that removes the headaches of digital archiving and provides extremely low cost storage. All Things Distributed. Comments ().

Storage

Storage Cloud AWS Media

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

Incoming data is saved into data storage (historian database or log store) for query by operational managers who must attempt to find the highest priority issues that require their attention. The best they can usually do in real-time using general purpose tools is to filter and look for patterns of interest.

IoT

IoT Analytics Big Data Architecture

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment. High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages. Greenplum features a cost-based query optimizer for large-scale, big data workloads.

Big Data

Big Data Database Artificial Intelligence Open Source

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

Given this, enterprises, public sector bodies, startups, and small businesses are looking to adopt agile, scalable, and secure public cloud solutions. Access to secure, scalable, low-cost AWS infrastructure in Canada allows customers to innovate and provide tools to meet privacy, sovereignty, and compliance requirements. Scalability.

AWS

AWS Cloud Lambda Innovation

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. As some of you may remember I was pretty excited when Amazon Simple Storage Service (S3) released its website feature such that I could serve this weblog completely from S3. Driving Storage Costs Down for AWS Customers. All Things Distributed.

Servers

Servers Social Media AWS Website

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios.

Cache

Cache Storage Scalability Architecture

Dutch Enterprises and The Cloud

All Things Distributed

SEPTEMBER 6, 2013

Shell leverages AWS for big data analytics to help achieve these goals. Shell''s scientists, especially the geophysicists and drilling engineers, frequently use cloud computing to run models. Essent – supplies customers in the Benelux region with gas, electricity, heat and energy services.

Cloud

Cloud Energy AWS Healthcare

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 30, 2018

USENIX’s LISA conference is the premier event for topics in production system engineering. LISA originally stood for "Large Installation System Administration," where "large" meant systems with more than a gigabyte of storage, or with more than 100 users. Join us for 3 days in Nashville at LISA'18.

DevOps

DevOps Network Best Practices Programming

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

This approach allows companies to combine the security and control of private clouds with public clouds’ scalability and innovation potential. Mastering Hybrid Cloud Strategy Are you looking to leverage the best private and public cloud worlds to propel your business forward? A hybrid cloud strategy could be your answer.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Causal AI—which brings AI-enabled actionable insights to IT operations—and a data lakehouse, such as Dynatrace Grail , can help break down silos among ITOps, DevSecOps, site reliability engineering, and business analytics teams. “The weakness of a data lake is they fail when you need to access them fast,” Pawlowski said.

Analytics

Analytics Infrastructure Storage Efficiency

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 29, 2018

USENIX’s LISA conference is the premier event for topics in production system engineering. LISA originally stood for "Large Installation System Administration," where "large" meant systems with more than a gigabyte of storage, or with more than 100 users. Join us for 3 days in Nashville at LISA'18.

DevOps

DevOps Network Best Practices Programming

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

All Things Distributed

JANUARY 19, 2011

Werner Vogels weblog on building scalable and robust distributed systems. and Engine Yard , Springsource users have CloudFoundry. Elastic Beanstalk makes it easy for developers to deploy and manage scalable and fault-tolerant applications on the AWS cloud. Driving Storage Costs Down for AWS Customers. Comments ().

AWS

AWS Cloud Java Operating System

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

All Things Distributed

NOVEMBER 19, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Big news this week was of course the launch of Cluster GPU instances for Amazon EC2. Science & Engineering. an engineering adventure to break the 1,000 mph barrier in a car. Driving Storage Costs Down for AWS Customers. Comments ().

AWS

AWS Cloud Benchmarking Storage

Around the World in 28 Days - All Things Distributed

All Things Distributed

SEPTEMBER 30, 2010

Werner Vogels weblog on building scalable and robust distributed systems. There is huge variety in exiting architectures and I am often impressed about the ingenuity of the engineers in how to best transform the application if "Lift & Shift" is not an option. Driving Storage Costs Down for AWS Customers. All Things Distributed.

AWS

AWS Storage Cloud Best Practices

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Cluster Computer Instances are similar to other Amazon EC2 instances but have been specifically engineered to provide high performance compute and networking. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

Cloud

Cloud AWS Automotive Latency

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

Werner Vogels weblog on building scalable and robust distributed systems. There is more than one Werner Vogels in this world and although I never get emails, snail mail or phones calls for any of my peers, I am sure they are somewhat frustrated if they type in our name in a search engine :-). All Things Distributed. Comments ().

Cloud

Cloud Internet Internet AWS

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

All Things Distributed

APRIL 28, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Currently, each AWS Region contains multiple Availability Zones, which are distinct locations that are engineered to be insulated from failures in other Availability Zones. Driving Storage Costs Down for AWS Customers. All Things Distributed. Comments ().

AWS

AWS Cloud Latency Storage

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Teams have introduced workarounds to reduce storage costs. Additionally, efforts such as lowered data retention times, two-tiered storage systems, shaky index management, sampled data, and data pipelines reduce the overall amount of stored data. Dynatrace discovers logs automatically at scale.

Analytics

Analytics Artificial Intelligence Storage Serverless

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

However, the data infrastructure to collect, store and process data is geared toward developers (e.g., In AWS’ quest to enable the best data storage options for engineers, we have built several innovative database solutions like Amazon RDS, Amazon RDS for Aurora, Amazon DynamoDB, and Amazon Redshift.

Cloud

Cloud Big Data AWS Analytics

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. And while many of our systems are based on the latest in computer science research, this often hasnt been sufficient: our architects and engineers have had to advance research in directions that no academic had yet taken. All Things Distributed. Comments ().

Technology

Technology Technology AWS Storage

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

Werner Vogels weblog on building scalable and robust distributed systems. The scalability, reliability and durability requirements for Cloud Drive are very high which is why they decided to make use of the Amazon Simple Storage Service (S3) as the core component of their service. Driving Storage Costs Down for AWS Customers.

AWS

AWS Cloud Storage Internet

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

AliGraph covers Alibaba’s distributed graph engine supporting the development of new GNN applications. Their dataset has about 7B edges… Meanwhile, AnalyticDB is Alibaba’s real-time OLAP RDBMS handling 10PB of data (in excess of 100 trillion rows!). Autoscaling tiered cloud storage in Anna. Research papers. (In

Blockchain

Blockchain Hardware Google Analytics

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. This user-oriented nature had vast implications: The end user is often interested in aggregated reporting information, not in separate data items, and SQL pays a lot of attention to this aspect. 1) Denormalization.

Database

Database Ecommerce Efficiency Engineering

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

Werner Vogels weblog on building scalable and robust distributed systems. A third generation of APIs, however, left the graphics specifics interfaces behind and instead focused on exposing the pipeline as a generic highly parallel engine supporting task and data parallelism. Driving Storage Costs Down for AWS Customers.

AWS

AWS Latency Programming Architecture

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

As is the case for many high-quality computer systems conferences, the papers presented here involve a significant amount of engineering and experimentation on real hardware to convincingly evaluate innovative concepts end-to-end in a realistic setting. ATC ’19 was refreshingly different. Heterogeneous ISA. Programmable I/O Devices.

Architecture

Architecture Hardware Cache Storage

Should You Use ClickHouse as a Main Operational Database?

Percona

JANUARY 14, 2019

However, ClickHouse is super efficient for timeseries and provides “sharding” out of the box (scalability beyond one node). In a partitioned massively parallel database system, the storage format and sorting algorithm may not be optimized for that operation as we are reading multiple partitions in parallel. Processed 8.19

Database

Database Analytics Blockchain Healthcare

Technology Performance Pulse

Kubernetes for Big Data Workloads

In-Stream Big Data Processing

Trending Sources

What is container orchestration?

Driving down the cost of Big-Data analytics - All Things Distributed

Kubernetes in the wild report 2023

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

The Need for Real-Time Device Tracking

What is Greenplum Database? Intro to the Big Data Database

Optimizing data warehouse storage

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

No Server Required - Jekyll & Amazon S3 - All Things Distributed

Redis vs Memcached in 2024

Dutch Enterprises and The Cloud

USENIX LISA 2018: CFP Now Open

Mastering Hybrid Cloud Strategy

Conducting log analysis with an observability platform and full data context

USENIX LISA 2018: CFP Now Open

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

Around the World in 28 Days - All Things Distributed

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Expanding the Cloud: Introducing Amazon QuickSight

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Music to my Ears - All Things Distributed

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

NoSQL Data Modeling Techniques

Amazon EC2 Cluster GPU Instances - All Things Distributed

The Winds of Architecture Changes at the USENIX ATC 2019

Should You Use ClickHouse as a Main Operational Database?

Stay Connected