Availability, Big Data, Scalability and Storage

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment. High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages. Greenplum features a cost-based query optimizer for large-scale, big data workloads.

Big Data

Big Data Database Artificial Intelligence Open Source

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios. Data transfer technology.

Cache

Cache Storage Scalability Architecture

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Performance. Native frameworks.

Big Data

Big Data Storage Benchmarking Hardware

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Several pain points have made it difficult for organizations to manage their data efficiently and create actual value. Limited data availability constrains value creation. Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes.

Analytics

Analytics Artificial Intelligence Storage Serverless

Advancing Application Performance With NVMe Storage, Part 2

DZone

JUNE 3, 2019

Using local SSDs inside of the GPU node delivers fast access to data during training, but introduces challenges that impact the overall solution in terms of scalability, data access, and data protection.

Storage

Storage Performance Network Scalability

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The pipelines can be stateful and the engine’s middleware should provide a persistent storage to enable state checkpointing. Interoperability with Hadoop.

Big Data

Big Data Processing Lambda Database

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

With more organizations taking the multicloud plunge, monitoring cloud infrastructure is critical to ensure all components of the cloud computing stack are available, high-performing, and secure. Website monitoring examines a cloud-hosted website’s processes, traffic, availability, and resource use. Cloud storage monitoring.

Cloud

Cloud Monitoring Best Practices Infrastructure

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

This approach allows companies to combine the security and control of private clouds with public clouds’ scalability and innovation potential. Mastering Hybrid Cloud Strategy Are you looking to leverage the best private and public cloud worlds to propel your business forward? A hybrid cloud strategy could be your answer.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

What is container orchestration?

Dynatrace

MARCH 24, 2023

This orchestration includes provisioning, scheduling, networking, ensuring availability, and monitoring container lifecycles. The configuration file directs the container orchestration tool on how to retrieve container images, how to create a network between containers, and where to store log data or mount storage volumes.

Infrastructure

Infrastructure Open Source Operating System Cloud

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3. Moving data with Bulldozer at Netflix.

Latency

Latency Storage Big Data Tuning

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Through effortless provisioning, a larger number of small hosts provide a cost-effective and scalable platform. On-premises data centers invest in higher capacity servers since they provide more flexibility in the long run, while the procurement price of hardware is only one of many cost factors.

Open Source

Open Source Java Operating System Programming

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

The first phase involves validating functional correctness, scalability, and performance concerns and ensuring the new systems’ resilience before the migration. It provides a good read on the availability and latency ranges under different production conditions.

Traffic

Traffic Latency Tuning Systems

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

All Things Distributed

AUGUST 20, 2012

Werner Vogels weblog on building scalable and robust distributed systems. Managing Cold Storage with Amazon Glacier. With the introduction of Amazon Glacier , IT organizations now have a solution that removes the headaches of digital archiving and provides extremely low cost storage. All Things Distributed. Comments ().

Storage

Storage Cloud AWS Media

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

Today, I'm happy to share that the Canada (Central) Region is available for use by customers worldwide. The AWS Cloud now operates in 40 Availability Zones within 15 geographic regions around the world, with seven more Availability Zones and three more regions coming online in China, France, and the U.K. Scalability.

AWS

AWS Cloud Lambda Innovation

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

ScaleOut Software

APRIL 3, 2020

An innovative new software approach called “real-time digital twins” running on a cloud-hosted, highly scalable, in-memory computing platform can help address this challenge. With up-to-date information for all ventilators immediately at hand, analysts can ask questions like: “Where are all the available ventilators at this moment?”.

Logistics

Logistics Analytics Scalability Cloud

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

ScaleOut Software

APRIL 3, 2020

An innovative new software approach called “real-time digital twins” running on a cloud-hosted, highly scalable, in-memory computing platform can help address this challenge. With up-to-date information for all ventilators immediately at hand, analysts can ask questions like: “Where are all the available ventilators at this moment?”.

Logistics

Logistics Analytics Scalability Cloud

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

All Things Distributed

AUGUST 22, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Caching has become a standard component in many applications to achieve a fast and predictable performance, but maintaining a collection of cache servers in a reliable and scalable manner is not a simple task. Driving Storage Costs Down for AWS Customers.

Cloud

Cloud Cache AWS Storage

The AWS GovCloud (US) Region - All Things Distributed

All Things Distributed

AUGUST 16, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Government and Big Data. One particular early use case for AWS GovCloud (US) will be massive data processing and analytics. The scalability, flexibility and the elasticity of AWS makes it an ideal environment for the agencies to run their analytics.

AWS

AWS Government Big Data Cloud

Expanding the Cloud - Introducing the AWS Asia Pacific (Tokyo.

All Things Distributed

MARCH 2, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Japanese companies and consumers have become used to low latency and high-speed networking available between their businesses, residences, and mobile devices. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

AWS

AWS Cloud Games Latency

New Route 53 and ELB features: IPv6, Zone Apex, WRR and more.

All Things Distributed

MAY 24, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Driving down the cost of Big-Data analytics. Comments ().

Internet

Internet Internet AWS Scalability

Spot Instances - Increased Control - All Things Distributed

All Things Distributed

JULY 11, 2011

Werner Vogels weblog on building scalable and robust distributed systems. With this change, we will improve the granularity of pricing information you receive by introducing a Spot Instance price per Availability Zone rather than a Spot Instance price per Region. Driving Storage Costs Down for AWS Customers. Comments ().

AWS

AWS Storage Cloud Big Data

Hacking with AWS at The Next Web Hackaton - All Things Distributed

All Things Distributed

MARCH 24, 2011

Werner Vogels weblog on building scalable and robust distributed systems. It is likely that the Amazon Web Services will be used by many of the participants for their compute, storage, database and other cloud resource needs. To make it easy we have free AWS usage credits available onsite for who ever needs them. Comments ().

AWS

AWS Internet Internet Storage

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

All Things Distributed

JANUARY 19, 2011

Werner Vogels weblog on building scalable and robust distributed systems. A whole range of innovative new services, ranging from media conversion to geo-location-context services have been developed by our customers using this flexibility and are available in the AWS ecosystem. All Things Distributed. Comments ().

AWS

AWS Cloud Java Operating System

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

Autoscaling tiered cloud storage in Anna. Hyper Dimension Shuffle describes how Microsoft improved the cost of data shuffling, one of the most costly operations, in their petabyte-scale internal big data analytics platform, SCOPE. Research papers. (In In random order!). What if the network was no longer the bottleneck?

Blockchain

Blockchain Hardware Google Analytics

APAC Summer Tour - All Things Distributed

All Things Distributed

JULY 3, 2011

Werner Vogels weblog on building scalable and robust distributed systems. There is a tremendous interest in using the AWS multi-Availability Zone features for disaster prevention and recovery in the wake of the March 11 earthquake. I will update this posting as more details about public events become available. APAC Summer Tour.

AWS

AWS Storage Big Data Cloud

Expanding the Cloud - New AWS Region: US-West (Northern.

All Things Distributed

DECEMBER 3, 2009

Werner Vogels weblog on building scalable and robust distributed systems. We have expanded the AWS footprint in the US and starting today a new AWS Region is available for use: US-West (Northern California). AWS is committed to making its services available at low cost. Driving Storage Costs Down for AWS Customers.

AWS

AWS Cloud Latency Storage

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

Werner Vogels weblog on building scalable and robust distributed systems. I am very excited that today we have launched Amazon Route 53, a high-performance and highly-available Domain Name System (DNS) service. Route 53 provides Authoritative DNS functionality implemented using a world-wide network of highly-available DNS servers.

Cloud

Cloud Internet Internet AWS

Powerful New Amazon EC2 Boot Features - All Things Distributed

All Things Distributed

DECEMBER 3, 2009

Werner Vogels weblog on building scalable and robust distributed systems. Today a powerful new feature is available for our Amazon EC2 customers: the ability to boot their instances from Amazon EBS (Elastic Block Store). A wide variety of operating systems and software configurations is available for use. Comments ().

AWS

AWS Storage Operating System Cloud

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Today, I am very proud to be a part of the Amazon Web Services team as we truly make HPC available as an on-demand commodity for every developer to use. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

Cloud

Cloud AWS Automotive Latency

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

All Things Distributed

APRIL 28, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Availability - The cloud makes it possible to build resilient applications to make sure they can survive different failure scenarios. The Asia Pacific (Singapore) region launches with two Availability Zones. Driving Storage Costs Down for AWS Customers.

AWS

AWS Cloud Latency Storage

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. Besides this, elimination of these features had an extremely important influence on the performance and scalability of the stores. Many techniques that are described below are perfectly applicable to this model.

Database

Database Ecommerce Efficiency Engineering

Expanding the Cloud - Amazon EC2 Spot Instances - All Things.

All Things Distributed

DECEMBER 13, 2009

Werner Vogels weblog on building scalable and robust distributed systems. Consistently we have lowered compute, storage and bandwidth prices based on such cost savings. You are assured that your Reserved Instance will always be available in the Availability Zone in which you purchased it. All Things Distributed.

Cloud

Cloud AWS Storage Innovation

Should You Use ClickHouse as a Main Operational Database?

Percona

JANUARY 14, 2019

However, ClickHouse is super efficient for timeseries and provides “sharding” out of the box (scalability beyond one node). With the latest ClickHouse version, all of these features are available, but some of them may not perform fast enough. So can we use it as our main datastore? Deleting messages. Text search.

Database

Database Analytics Blockchain Healthcare

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

However, the data infrastructure to collect, store and process data is geared toward developers (e.g., In AWS’ quest to enable the best data storage options for engineers, we have built several innovative database solutions like Amazon RDS, Amazon RDS for Aurora, Amazon DynamoDB, and Amazon Redshift. Big data challenges.

Cloud

Cloud Big Data AWS Analytics

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

All Things Distributed

DECEMBER 13, 2016

Today, I'm happy to announce that the AWS Europe (London) Region, our 16th technology infrastructure region globally, is now generally available for use by customers worldwide. Fraud.net use AWS to support highly scalable, big data applications that run machine learning processes for real-time analytics.

AWS

AWS Cloud Artificial Intelligence IoT

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage.

All Things Distributed

MAY 18, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Expanding the Cloud - Amazon S3 Reduced Redundancy Storage. Today a new storage option for Amazon S3 has been launched: Amazon S3 Reduced Redundancy Storage (RRS). By Werner Vogels on 18 May 2010 04:00 PM. Comments (). Durability in Amazon S3.

Storage

Storage Cloud AWS Scalability

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. The storage systems weve pioneered demonstrate extreme scalability while maintaining tight control over performance, availability, and cost. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

Technology

Technology Technology AWS Storage

New AWS feature: Run your website from Amazon S3 - All Things.

All Things Distributed

FEBRUARY 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Since a few days ago this weblog serves 100% of its content directly out of the Amazon Simple Storage Service (S3) without the need for a web server to be involved. Driving Storage Costs Down for AWS Customers. Driving down the cost of Big-Data analytics.

AWS

AWS Website Storage Servers

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

Werner Vogels weblog on building scalable and robust distributed systems. As a big music fan with well over 100Gb in digital music I am particularly excited that I now have access to all my digital music anywhere I go. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

AWS

AWS Cloud Storage Internet

Simplifying IT - Create Your Application with AWS CloudFormation.

All Things Distributed

FEBRUARY 25, 2011

Werner Vogels weblog on building scalable and robust distributed systems. They had taken the approach that they would not only be offering their software as a scalable multi-tenant product but also as a single tenant environment for customers that want to have their own isolated environment. Driving down the cost of Big-Data analytics.

AWS

AWS Cloud Scalability Storage

5 Terabyte Object Support in Amazon S3 - All Things Distributed

All Things Distributed

DECEMBER 9, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Big Just Got Bigger - 5 Terabyte Object Support in Amazon S3. Amazon S3 has always been a scalable, durable and available data repository for almost any customer workload. Driving Storage Costs Down for AWS Customers. Comments ().

AWS

AWS Big Data Scalability Storage

Choosing Consistency - All Things Distributed

All Things Distributed

FEBRUARY 24, 2010

Werner Vogels weblog on building scalable and robust distributed systems. There are many factors that come into play when you need to meet stringent availability and performance requirements under ultra-scalable conditions. Data Consistency Models in the Amazon Services. All Things Distributed. Choosing Consistency.

AWS

AWS Latency Database Scalability

What is Greenplum Database? Intro to the Big Data Database

What is a Distributed Storage System

Trending Sources

Optimizing data warehouse storage

Redis vs Memcached in 2024

Kubernetes for Big Data Workloads

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Advancing Application Performance With NVMe Storage, Part 2

In-Stream Big Data Processing

What is cloud monitoring? How to improve your full-stack visibility

Mastering Hybrid Cloud Strategy

What is container orchestration?

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Kubernetes in the wild report 2023

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

The AWS GovCloud (US) Region - All Things Distributed

Expanding the Cloud - Introducing the AWS Asia Pacific (Tokyo.

New Route 53 and ELB features: IPv6, Zone Apex, WRR and more.

Spot Instances - Increased Control - All Things Distributed

Hacking with AWS at The Next Web Hackaton - All Things Distributed

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

APAC Summer Tour - All Things Distributed

Expanding the Cloud - New AWS Region: US-West (Northern.

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Powerful New Amazon EC2 Boot Features - All Things Distributed

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

NoSQL Data Modeling Techniques

Expanding the Cloud - Amazon EC2 Spot Instances - All Things.

Should You Use ClickHouse as a Main Operational Database?

Expanding the Cloud: Introducing Amazon QuickSight

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage.

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

New AWS feature: Run your website from Amazon S3 - All Things.

Music to my Ears - All Things Distributed

Simplifying IT - Create Your Application with AWS CloudFormation.

5 Terabyte Object Support in Amazon S3 - All Things Distributed

Choosing Consistency - All Things Distributed

Stay Connected