Big Data, Development, Storage and Systems - Technology Performance Pulse

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. This guide delves into how these systems work, the challenges they solve, and their essential role in businesses and technology.

Storage

Storage Systems Big Data Azure

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is an open-source , hardware-agnostic MPP database for analytics, based on PostgreSQL and developed by Pivotal who was later acquired by VMware. Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment. What Exactly is Greenplum? At a glance – TLDR.

Big Data

Big Data Database Artificial Intelligence Open Source

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. This system has been designed to supplement and succeed the existing Hadoop-based system that had too high latency of data processing and too high maintenance costs.

Big Data

Big Data Processing Lambda Database

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

During earlier years of my career, I primarily worked as a backend software engineer, designing and building the backend systems that enable big data analytics. I developed many batch and real-time data pipelines using open source technologies for AOL Advertising and eBay. What is your favorite project?

Data Engineering

Data Engineering Engineering Big Data Software Engineering

What is container orchestration?

Dynatrace

MARCH 24, 2023

By embracing public cloud and hybrid cloud computing environments, IT teams can further accelerate development and automate software deployment and management. Container technology enables organizations to efficiently develop cloud-native applications or to modernize legacy applications to take advantage of cloud services.

Infrastructure

Infrastructure Open Source Operating System Cloud

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

However, its limited feature set compared to Redis might be a disadvantage for applications that require more advanced data structures and persistence. Introduction Caching serves a dual purpose in web development – speeding up client requests and reducing server load. Data transfer technology. 3d render.

Cache

Cache Storage Scalability Architecture

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

This transition to public, private, and hybrid cloud is driving organizations to automate and virtualize IT operations to lower costs and optimize cloud processes and systems. We’ll discuss how the responsibilities of ITOps teams changed with the rise of cloud technologies and agile development methodologies. So, what is ITOps?

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

As Kubernetes adoption increases and it continues to advance technologically, Kubernetes has emerged as the “operating system” of the cloud. Kubernetes is emerging as the “operating system” of the cloud. Kubernetes is emerging as the “operating system” of the cloud. Kubernetes moved to the cloud in 2022.

Open Source

Open Source Java Operating System Programming

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Understanding Hybrid Cloud Strategy A hybrid cloud merges the capabilities of public and private clouds into a singular, coherent system. This combination allows for the fluid movement of data and applications across different environments, facilitating shared workloads seamlessly. We will examine each of these elements in more detail.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud.

Big Data

Big Data Analytics AWS Cloud

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3.

Latency

Latency Storage Big Data Tuning

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

All Things Distributed

AUGUST 20, 2012

Werner Vogels weblog on building scalable and robust distributed systems. Managing Cold Storage with Amazon Glacier. With the introduction of Amazon Glacier , IT organizations now have a solution that removes the headaches of digital archiving and provides extremely low cost storage. All Things Distributed. Comments ().

Storage

Storage Cloud AWS Media

Structural Evolutions in Data

O'Reilly

SEPTEMBER 19, 2023

It progressed from “raw compute and storage” to “reimplementing key services in push-button fashion” to “becoming the backbone of AI work”—all under the umbrella of “renting time and storage on someone else’s computers.” ” (It will be easier to fit in the overhead storage.)

Hardware

Hardware Storage Big Data Blockchain

Delta: A Data Synchronization and Enrichment Platform

The Netflix TechBlog

OCTOBER 15, 2019

Beyond data synchronization, some applications also need to enrich their data by calling external services. To address these challenges, we developed Delta. Delta is an eventual consistent, event driven, data synchronization and enrichment platform. That system quickly grew very complex and became difficult to maintain.

Transportation

Transportation Architecture Processing Storage

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage.

All Things Distributed

MAY 18, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Expanding the Cloud - Amazon S3 Reduced Redundancy Storage. Today a new storage option for Amazon S3 has been launched: Amazon S3 Reduced Redundancy Storage (RRS). Under the covers Amazon S3 is a marvel of distributed systems technologies.

Storage

Storage Cloud AWS Scalability

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A unified data management (UDM) system combines the best of data warehouses, data lakes, and streaming without expensive and error-prone ETL. It offers reliability and performance of a data warehouse, real-time and low-latency characteristics of a streaming system, and scale and cost-efficiency of a data lake.

Big Data

Big Data Artificial Intelligence Storage Hardware

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

All Things Distributed

DECEMBER 13, 2016

With the launch of the AWS Europe (London) Region, AWS can enable many more UK enterprise, public sector and startup customers to reduce IT costs, address data locality needs, and embark on rapid transformations in critical new areas, such as big data analysis and Internet of Things. Fraud.net is a good example of this.

AWS

AWS Cloud Artificial Intelligence IoT

Why test data management is more important than you think

Testsigma

MAY 7, 2020

IBM Big Data and Analytics Hub website cited a case study, where a US insurance company was estimating 15% of their testing efforts to be just test data collection for the backend system and the frontend system. The test data management for the company had become a big problem and had to be solved.

Testing

Testing Storage Database Processing

Expanding the Cloud - AWS Import/Export Support for Amazon EBS.

All Things Distributed

JULY 7, 2011

Werner Vogels weblog on building scalable and robust distributed systems. AWS Import/Export transfers data off of storage devices using Amazons high-speed internal network and bypassing the Internet. Amazon Import/Export is an important tool for customers to accelerate moving large amounts of data into the AWS storage systems.

AWS

AWS Cloud Storage Internet

Expanding the AWS Cloud – Introducing the AWS Europe (Stockholm) Region

All Things Distributed

DECEMBER 12, 2018

Starting today, developers, startups, and enterprises—as well as government, education, and non-profit organizations—can use the new AWS Europe (Stockholm) Region. In addition, Hemnet's developers can now spin up temporary clones of Hemnet's entire application stack in just a few minutes. WirelessCar.

AWS

AWS Cloud Games Serverless

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. In the 2010 Shareholder Letter Jeff Bezos writes about the unique technologies developed at Amazon.com over the years. This approach reduces side effects and allows services to evolve at their own pace without impacting the other components of the overall system.

Technology

Technology Technology AWS Storage

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. As some of you may remember I was pretty excited when Amazon Simple Storage Service (S3) released its website feature such that I could serve this weblog completely from S3. It has been developed by Tom Preston-Werner of GitHub fame. Comments ().

Servers

Servers Social Media AWS Website

Software Testing Trends 2021 – What can we expect?

Testsigma

FEBRUARY 12, 2021

With the latest developments in IT sector services, the sector of QA testing has seen significant improvement and growth. The implementation of emerging technologies has helped improve the process of software development, testing, design and deployment. Multi-experience has been one of the top developments trends in technology in 2020.

Artificial Intelligence

Artificial Intelligence Software Software IoT

New AWS feature: Run your website from Amazon S3 - All Things.

All Things Distributed

FEBRUARY 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Since a few days ago this weblog serves 100% of its content directly out of the Amazon Simple Storage Service (S3) without the need for a web server to be involved. Driving Storage Costs Down for AWS Customers. Driving down the cost of Big-Data analytics.

AWS

AWS Website Storage Servers

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

All Things Distributed

JANUARY 19, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Flexibility is one of the key principles of Amazon Web Services - developers can select any programming language and software package, any operating system, any middleware and any database to build systems and applications that meet their requirements.

AWS

AWS Cloud Java Scalability

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

All Things Distributed

AUGUST 22, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Systems that make extensive use of caching almost all report a significant reduction in the cost of their database tier. For more hands-on information and to get started right away, see Jeff Barrs posting on the AWS Developer Blog. All Things Distributed.

Cloud

Cloud Cache AWS Storage

Introducing the AWS South America - All Things Distributed

All Things Distributed

DECEMBER 14, 2011

Werner Vogels weblog on building scalable and robust distributed systems. These companies can now benefit from the fact that the new Sao Paulo Region is similar to all other AWS Regions, which enables software developed for other Regions to be quickly deployed in South America as well. Driving Storage Costs Down for AWS Customers.

AWS

AWS Latency Storage Big Data

Job Openings in AWS - Senior Leader in Database Services - All.

All Things Distributed

AUGUST 19, 2011

Werner Vogels weblog on building scalable and robust distributed systems. AWS Database Services is responsible for setting the database strategy and delivering distributed structured storage services to our AWS customers. The ideal candidate will be someone who has built and ran large scale distributed systems and/or databases.

AWS

AWS Database Storage Scalability

The AWS GovCloud (US) Region - All Things Distributed

All Things Distributed

AUGUST 16, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Government and Big Data. One particular early use case for AWS GovCloud (US) will be massive data processing and analytics. AWS GovCloud (US) will be used by several of these agencies to help them with their Bigger-than-Big-Data needs.

AWS

AWS Government Big Data Cloud

Fast key-value stores: an idea whose time has come and gone

The Morning Paper

JUNE 23, 2019

Coupled with stateless application servers to execute business logic and a database-like system to provide persistent storage, they form a core component of popular data center service archictectures. Why are developers using RInK systems as part of their design? Fetching too much data in a single query (i.e.,

Cache

Cache Latency Google Lambda

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

AliGraph covers Alibaba’s distributed graph engine supporting the development of new GNN applications. Their dataset has about 7B edges… Meanwhile, AnalyticDB is Alibaba’s real-time OLAP RDBMS handling 10PB of data (in excess of 100 trillion rows!). Autoscaling tiered cloud storage in Anna. Research papers. (In Yes please!

Blockchain

Blockchain Hardware Google Analytics

Hacking with AWS at The Next Web Hackaton - All Things Distributed

All Things Distributed

MARCH 24, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Up to 200 developers and designers will get together to hack up interesting applications using the Internets APIs and SDKs. If you want to be that team of 5-7 developer/designers you can sign up using this form. Driving down the cost of Big-Data analytics.

AWS

AWS Internet Internet Storage

Driving Bandwidth Cost Down for AWS Customers. - All Things.

All Things Distributed

JUNE 29, 2011

Werner Vogels weblog on building scalable and robust distributed systems. For more details see the announcement , the details pages of the services at [link] , and the posting on the AWS developer blog. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. All Things Distributed.

AWS

AWS Retail Innovation Strategy

Simplifying IT - Create Your Application with AWS CloudFormation.

All Things Distributed

FEBRUARY 25, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Next to that they are often doing specialized development for these customers, meaning that for each production environment there may also be development and testing environments running. Driving Storage Costs Down for AWS Customers. Comments ().

AWS

AWS Cloud Scalability Storage

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

Werner Vogels weblog on building scalable and robust distributed systems. I am very excited that today we have launched Amazon Route 53, a high-performance and highly-available Domain Name System (DNS) service. Naming is one of the fundamental concepts in Distributed Systems. By Werner Vogels on 05 December 2010 02:00 PM.

Cloud

Cloud Internet Internet AWS

5 Terabyte Object Support in Amazon S3 - All Things Distributed

All Things Distributed

DECEMBER 9, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Big Just Got Bigger - 5 Terabyte Object Support in Amazon S3. By supporting such large object sizes, Amazon S3 better enables a variety of interesting big data use cases. Driving Storage Costs Down for AWS Customers. All Things Distributed.

AWS

AWS Big Data Scalability Storage

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Not just for HPC but for mission critical enterprise systems such as OLTP. Today, I am very proud to be a part of the Amazon Web Services team as we truly make HPC available as an on-demand commodity for every developer to use. All Things Distributed.

Cloud

Cloud AWS Automotive Latency

Spot Instances - Increased Control - All Things Distributed

All Things Distributed

JULY 11, 2011

Werner Vogels weblog on building scalable and robust distributed systems. To get started using Spot or for more details visit the Amazon EC2 Spot Instance web page, the AWS developer blog , and the EC2 Release Notes. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Comments ().

AWS

AWS Storage Cloud Big Data

New Route 53 and ELB features: IPv6, Zone Apex, WRR and more.

All Things Distributed

MAY 24, 2011

Werner Vogels weblog on building scalable and robust distributed systems. More information can found on the Elastic Load Balancing and the Route 53 detail pages and on the two blog posts on the AWS developer blog. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Comments ().

Internet

Internet Internet AWS Scalability

I am looking for new application and platform services - All Things.

All Things Distributed

APRIL 23, 2010

Werner Vogels weblog on building scalable and robust distributed systems. The ecosystem of new application and platform services in the cloud is the future of application development. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Driving down the cost of Big-Data analytics.

AWS

AWS Storage Cloud Big Data

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Now that our ability to generate higher and higher clock rates has stalled and CPU architectural improvements have shifted focus towards multiple cores, we see that it is becoming harder to efficiently use these computer systems. All Things Distributed.

AWS

AWS Latency Programming Architecture

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

All Things Distributed

APRIL 28, 2010

Werner Vogels weblog on building scalable and robust distributed systems. There are four main reasons to do so: Performance - For many applications and services, data access latency to end users is important. You need to be able to place your systems in locations where you can minimize the distance to your most important customers.

AWS

AWS Cloud Latency Storage

What is a Distributed Storage System

What is Greenplum Database? Intro to the Big Data Database

Trending Sources

Optimizing data warehouse storage

In-Stream Big Data Processing

Kubernetes for Big Data Workloads

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

What is container orchestration?

Redis vs Memcached in 2024

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Kubernetes in the wild report 2023

Mastering Hybrid Cloud Strategy

Driving down the cost of Big-Data analytics - All Things Distributed

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

Structural Evolutions in Data

Delta: A Data Synchronization and Enrichment Platform

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage.

5 data integration trends that will define the future of ETL in 2018

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

Why test data management is more important than you think

Expanding the Cloud - AWS Import/Export Support for Amazon EBS.

Expanding the AWS Cloud – Introducing the AWS Europe (Stockholm) Region

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

No Server Required - Jekyll & Amazon S3 - All Things Distributed

Software Testing Trends 2021 – What can we expect?

New AWS feature: Run your website from Amazon S3 - All Things.

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

Introducing the AWS South America - All Things Distributed

Job Openings in AWS - Senior Leader in Database Services - All.

The AWS GovCloud (US) Region - All Things Distributed

Fast key-value stores: an idea whose time has come and gone

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

Hacking with AWS at The Next Web Hackaton - All Things Distributed

Driving Bandwidth Cost Down for AWS Customers. - All Things.

Simplifying IT - Create Your Application with AWS CloudFormation.

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

5 Terabyte Object Support in Amazon S3 - All Things Distributed

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Spot Instances - Increased Control - All Things Distributed

New Route 53 and ELB features: IPv6, Zone Apex, WRR and more.

I am looking for new application and platform services - All Things.

Amazon EC2 Cluster GPU Instances - All Things Distributed

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

Stay Connected