Big Data, Network, Scalability and Storage - Technology Performance Pulse

Advancing Application Performance With NVMe Storage, Part 2

DZone

JUNE 3, 2019

Using local SSDs inside of the GPU node delivers fast access to data during training, but introduces challenges that impact the overall solution in terms of scalability, data access, and data protection.

Storage

Storage Performance Network Scalability

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Native frameworks.

Big Data

Big Data Storage Benchmarking Hardware

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The pipelines can be stateful and the engine’s middleware should provide a persistent storage to enable state checkpointing. Interoperability with Hadoop. Pipelining.

Big Data

Big Data Processing Lambda Database

What is container orchestration?

Dynatrace

MARCH 24, 2023

But managing the deployment, modification, networking, and scaling of multiple containers can quickly outstrip the capabilities of development and operations teams. This orchestration includes provisioning, scheduling, networking, ensuring availability, and monitoring container lifecycles. How does container orchestration work?

Infrastructure

Infrastructure Open Source Operating System Cloud

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

Incoming data is saved into data storage (historian database or log store) for query by operational managers who must attempt to find the highest priority issues that require their attention. The list goes on. The Limitations of Today’s Streaming Analytics. The heavy lifting is deferred to the back office.

IoT

IoT Analytics Big Data Architecture

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

All Things Distributed

AUGUST 20, 2012

Werner Vogels weblog on building scalable and robust distributed systems. Managing Cold Storage with Amazon Glacier. With the introduction of Amazon Glacier , IT organizations now have a solution that removes the headaches of digital archiving and provides extremely low cost storage. All Things Distributed. Comments ().

Storage

Storage Cloud AWS Media

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment. High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages. Greenplum features a cost-based query optimizer for large-scale, big data workloads.

Big Data

Big Data Database Artificial Intelligence Open Source

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

Given this, enterprises, public sector bodies, startups, and small businesses are looking to adopt agile, scalable, and secure public cloud solutions. Access to secure, scalable, low-cost AWS infrastructure in Canada allows customers to innovate and provide tools to meet privacy, sovereignty, and compliance requirements. Scalability.

AWS

AWS Cloud Lambda Innovation

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Expanding the Cloud - AWS Import/Export Support for Amazon EBS.

All Things Distributed

JULY 7, 2011

Werner Vogels weblog on building scalable and robust distributed systems. AWS Import/Export transfers data off of storage devices using Amazons high-speed internal network and bypassing the Internet. With this new functionality AWS Import/Export now supports importing data directly into Amazon EBS snapshots.

AWS

AWS Cloud Storage Internet

Register for AWS re: Invent - All Things Distributed

All Things Distributed

JULY 16, 2012

Werner Vogels weblog on building scalable and robust distributed systems. There are sessions in many different categories: Architecture, Big Data, HPC, Computer & Networking, Storage, Databases, Security, Tools & Languages, Media Sharing & Content Delivery, Managing AWS Resources, Enterprise IT, Mobile, Start-up, and more.

AWS

AWS Big Data Media Storage

Expanding the Cloud - Introducing the AWS Asia Pacific (Tokyo.

All Things Distributed

MARCH 2, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Japanese companies and consumers have become used to low latency and high-speed networking available between their businesses, residences, and mobile devices. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

AWS

AWS Cloud Games Latency

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios. Data transfer technology.

Cache

Cache Storage Scalability Architecture

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. As some of you may remember I was pretty excited when Amazon Simple Storage Service (S3) released its website feature such that I could serve this weblog completely from S3. Driving Storage Costs Down for AWS Customers. All Things Distributed.

Servers

Servers Social Media AWS Website

Dutch Enterprises and The Cloud

All Things Distributed

SEPTEMBER 6, 2013

Shell leverages AWS for big data analytics to help achieve these goals. Due to the exponential growth of the biology and informatics fields, Unilever needs to maintain this new program within a highly-scalable environment that supports parallel computation and heavy data storage demands.

Cloud

Cloud Energy AWS Healthcare

The AWS GovCloud (US) Region - All Things Distributed

All Things Distributed

AUGUST 16, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Government and Big Data. One particular early use case for AWS GovCloud (US) will be massive data processing and analytics. The scalability, flexibility and the elasticity of AWS makes it an ideal environment for the agencies to run their analytics.

AWS

AWS Government Big Data Cloud

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

This approach allows companies to combine the security and control of private clouds with public clouds’ scalability and innovation potential. Mastering Hybrid Cloud Strategy Are you looking to leverage the best private and public cloud worlds to propel your business forward? A hybrid cloud strategy could be your answer.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 30, 2018

Apart from networking, attending conferences like LISA in person is an effective way to upgrade your skills: you can block out work interruptions and absorb new knowledge that's been neatly summarized into sessions. We first met each other at LISA, in addition to making many other important industry connections over the years.

DevOps

DevOps Network Best Practices Programming

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Werner Vogels weblog on building scalable and robust distributed systems. During my academic career, I spent many years working on HPC technologies such as user-level networking interfaces, large scale high-speed interconnects, HPC software stacks, etc. Driving Storage Costs Down for AWS Customers. All Things Distributed.

Cloud

Cloud AWS Automotive Latency

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 29, 2018

Apart from networking, attending conferences like LISA in person is an effective way to upgrade your skills: you can block out work interruptions and absorb new knowledge that's been neatly summarized into sessions. We first met each other at LISA, in addition to making many other important industry connections over the years.

DevOps

DevOps Network Best Practices Programming

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

As cloud and big data complexity scales beyond the ability of traditional monitoring tools to handle, next-generation cloud monitoring and observability are becoming necessities for IT teams. With agent monitoring, third-party software collects data and reports from the component that’s attached to the agent.

Cloud

Cloud Monitoring Best Practices Infrastructure

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

All Things Distributed

JANUARY 19, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Elastic Beanstalk makes it easy for developers to deploy and manage scalable and fault-tolerant applications on the AWS cloud. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. All Things Distributed.

AWS

AWS Cloud Java Scalability

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Often these namespaces are hierarchical in nature such that it becomes easier to manage them and to decentralize control, which makes the system more scalable. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

Cloud

Cloud Internet Internet AWS

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes. The number and variety of applications, network devices, serverless functions, and ephemeral containers grows continuously. Teams have introduced workarounds to reduce storage costs.

Analytics

Analytics Artificial Intelligence Storage Serverless

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

All Things Distributed

DECEMBER 13, 2016

With the launch of the AWS Europe (London) Region, AWS can enable many more UK enterprise, public sector and startup customers to reduce IT costs, address data locality needs, and embark on rapid transformations in critical new areas, such as big data analysis and Internet of Things. Fraud.net is a good example of this.

AWS

AWS Cloud Artificial Intelligence IoT

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage.

All Things Distributed

MAY 18, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Expanding the Cloud - Amazon S3 Reduced Redundancy Storage. Today a new storage option for Amazon S3 has been launched: Amazon S3 Reduced Redundancy Storage (RRS). By Werner Vogels on 18 May 2010 04:00 PM. Comments (). Durability in Amazon S3.

Storage

Storage Cloud AWS Scalability

New AWS feature: Run your website from Amazon S3 - All Things.

All Things Distributed

FEBRUARY 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Since a few days ago this weblog serves 100% of its content directly out of the Amazon Simple Storage Service (S3) without the need for a web server to be involved. Driving Storage Costs Down for AWS Customers. Driving down the cost of Big-Data analytics.

AWS

AWS Website Storage Servers

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. The storage systems weve pioneered demonstrate extreme scalability while maintaining tight control over performance, availability, and cost. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

Technology

Technology Technology AWS Storage

Fast key-value stores: an idea whose time has come and gone

The Morning Paper

JUNE 23, 2019

Coupled with stateless application servers to execute business logic and a database-like system to provide persistent storage, they form a core component of popular data center service archictectures. We’ve seen similar high marshalling overheads in big data systems too.) Fetching too much data in a single query (i.e.,

Cache

Cache Latency Google Lambda

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

Autoscaling tiered cloud storage in Anna. I don’t think so in this case, but this paper will take you down into the nitty-gritty of getting the best out of modern processors and networks, with up to two orders of magnitude single node throughput gains to be had. What if the network was no longer the bottleneck?

Blockchain

Blockchain Hardware Google Analytics

5 Terabyte Object Support in Amazon S3 - All Things Distributed

All Things Distributed

DECEMBER 9, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Big Just Got Bigger - 5 Terabyte Object Support in Amazon S3. Amazon S3 has always been a scalable, durable and available data repository for almost any customer workload. Driving Storage Costs Down for AWS Customers. All Things Distributed.

AWS

AWS Big Data Scalability Storage

Amazon Cloudfront is Streaming Media 2010 Editor's pick - All.

All Things Distributed

APRIL 19, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Amazon Cloudfront is the Content Delivery Network (CDN) that is dead simple to use both from a technology and a business point of view. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Comments ().

Media

Media AWS Storage Big Data

Choosing Consistency - All Things Distributed

All Things Distributed

FEBRUARY 24, 2010

Werner Vogels weblog on building scalable and robust distributed systems. There are many factors that come into play when you need to meet stringent availability and performance requirements under ultra-scalable conditions. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

AWS

AWS Latency Database Scalability

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Starting today Amazon EMR can take advantage of the Cluster Compute and Cluster GPU instances, giving customers ever more powerful components to base the large scale data processing and analysis on. Driving Storage Costs Down for AWS Customers.

AWS

AWS Latency Programming Architecture

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

Alongside more traditional sessions such as Real-World Deployed Systems and Big Data Programming Frameworks, there were many papers focusing on emerging hardware architectures, including embedded multi-accelerator SoCs, in-network and in-storage computing, FPGAs, GPUs, and low-power devices. Heterogeneous ISA.

Architecture

Architecture Hardware Cache Storage

Should You Use ClickHouse as a Main Operational Database?

Percona

JANUARY 14, 2019

However, ClickHouse is super efficient for timeseries and provides “sharding” out of the box (scalability beyond one node). In a partitioned massively parallel database system, the storage format and sorting algorithm may not be optimized for that operation as we are reading multiple partitions in parallel.

Database

Database Analytics Blockchain Healthcare

Technology Performance Pulse

Advancing Application Performance With NVMe Storage, Part 2

Kubernetes for Big Data Workloads

Trending Sources

In-Stream Big Data Processing

What is container orchestration?

What is a Distributed Storage System

The Need for Real-Time Device Tracking

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

What is Greenplum Database? Intro to the Big Data Database

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

Optimizing data warehouse storage

Expanding the Cloud - AWS Import/Export Support for Amazon EBS.

Register for AWS re: Invent - All Things Distributed

Expanding the Cloud - Introducing the AWS Asia Pacific (Tokyo.

Redis vs Memcached in 2024

No Server Required - Jekyll & Amazon S3 - All Things Distributed

Dutch Enterprises and The Cloud

The AWS GovCloud (US) Region - All Things Distributed

Mastering Hybrid Cloud Strategy

USENIX LISA 2018: CFP Now Open

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

USENIX LISA 2018: CFP Now Open

What is cloud monitoring? How to improve your full-stack visibility

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage.

New AWS feature: Run your website from Amazon S3 - All Things.

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Fast key-value stores: an idea whose time has come and gone

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

5 Terabyte Object Support in Amazon S3 - All Things Distributed

Amazon Cloudfront is Streaming Media 2010 Editor's pick - All.

Choosing Consistency - All Things Distributed

Amazon EC2 Cluster GPU Instances - All Things Distributed

The Winds of Architecture Changes at the USENIX ATC 2019

Should You Use ClickHouse as a Main Operational Database?

Stay Connected