Availability, Big Data, Data and Storage - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data. This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes.

Big Data

Big Data Database Artificial Intelligence Open Source

Introduction to Azure Data Lake Storage Gen2

DZone

FEBRUARY 1, 2023

Built on Azure Blob Storage, Azure Data Lake Storage Gen2 is a suite of features for big data analytics. Azure Data Lake Storage Gen1 and Azure Blob Storage's capabilities are combined in Data Lake Storage Gen2.

Azure

Azure Storage Big Data Analytics

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer on the Product Data Science and Engineering team.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. This system has been designed to supplement and succeed the existing Hadoop-based system that had too high latency of data processing and too high maintenance costs.

Big Data

Big Data Processing Lambda Database

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Several pain points have made it difficult for organizations to manage their data efficiently and create actual value. Limited data availability constrains value creation. Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes.

Analytics

Analytics Artificial Intelligence Storage Serverless

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Dynatrace

FEBRUARY 16, 2023

How do you get more value from petabytes of exponentially exploding, increasingly heterogeneous data? The short answer: The three pillars of observability—logs, metrics, and traces—converging on a data lakehouse. To solve this problem, Dynatrace launched Grail, its causational data lakehouse , in 2022.

Analytics

Analytics Innovation Metrics Database

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios.

Cache

Cache Storage Scalability Architecture

Advancing Application Performance with NVMe Storage, Part 3

DZone

JUNE 4, 2019

NVMe Storage Use Cases. NVMe storage's strong performance, combined with the capacity and data availability benefits of shared NVMe storage over local SSD, makes it a strong solution for AI/ML infrastructures of any size. There are several AI/ML focused use cases to highlight.

Storage

Storage FinTech Artificial Intelligence Performance

Advancing Application Performance With NVMe Storage, Part 2

DZone

JUNE 3, 2019

Using local SSDs inside of the GPU node delivers fast access to data during training, but introduces challenges that impact the overall solution in terms of scalability, data access, and data protection.

Storage

Storage Performance Network Scalability

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

With more organizations taking the multicloud plunge, monitoring cloud infrastructure is critical to ensure all components of the cloud computing stack are available, high-performing, and secure. With agent monitoring, third-party software collects data and reports from the component that’s attached to the agent.

Cloud

Cloud Monitoring Best Practices Infrastructure

Delta: A Data Synchronization and Enrichment Platform

The Netflix TechBlog

OCTOBER 15, 2019

Part I: Overview Andreas Andreakis , Falguni Jhaveri , Ioannis Papapanagiotou , Mark Cho , Poorna Reddy , Tongliang Liu Overview It is a commonly observed pattern for applications to utilize multiple datastores where each is used to serve a specific need such as storing the canonical form of data (MySQL etc.), caching (Memcached etc.),

Transportation

Transportation Architecture Processing Storage

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

In today's world, data is generated in high volumes and to make something out of it, extracted data is needed to be transformed, stored, maintained, governed and analyzed. These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on.

Big Data

Big Data Government Open Source Storage

What is container orchestration?

Dynatrace

MARCH 24, 2023

This orchestration includes provisioning, scheduling, networking, ensuring availability, and monitoring container lifecycles. The configuration file directs the container orchestration tool on how to retrieve container images, how to create a network between containers, and where to store log data or mount storage volumes.

Infrastructure

Infrastructure Open Source Operating System Cloud

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

The study analyzes factual Kubernetes production data from thousands of organizations worldwide that are using the Dynatrace Software Intelligence Platform to keep their Kubernetes clusters secure, healthy, and high performing. Big data : To store, search, and analyze large datasets, 32% of organizations use Elasticsearch.

Open Source

Open Source Java Operating System Programming

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

Complex cloud computing environments are increasingly replacing traditional data centers. In fact, Gartner estimates that 80% of enterprises will shut down their on-premises data centers by 2025. Collect raw data in virtual and nonvirtual environments from multiple feeds, normalize and structure the data, and aggregate it for alerts.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Implementing a hybrid cloud solution involves careful decision-making regarding application and data placement, migration strategies, and choosing compatible cloud service providers while ensuring seamless integration and addressing security and compliance challenges. We will examine each of these elements in more detail.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

It can happen on an edge API system servicing customer devices, between the edge and mid-tier services, or from mid-tiers to data stores. It provides a good read on the availability and latency ranges under different production conditions. For instance, envision a response payload that delivers media streams for a playback session.

Traffic

Traffic Latency Tuning Systems

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

All Things Distributed

AUGUST 20, 2012

Managing Cold Storage with Amazon Glacier. With the introduction of Amazon Glacier , IT organizations now have a solution that removes the headaches of digital archiving and provides extremely low cost storage. All Things Distributed. Werner Vogels weblog on building scalable and robust distributed systems. Expanding the Cloud â??

Storage

Storage Cloud AWS Media

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

At the same time, NoSQL data modeling is not so well studied and lacks the systematic theory found in relational databases. In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques.

Database

Database Ecommerce Efficiency Engineering

Why test data management is more important than you think

Testsigma

MAY 7, 2020

IBM Big Data and Analytics Hub website cited a case study, where a US insurance company was estimating 15% of their testing efforts to be just test data collection for the backend system and the frontend system. The test data management for the company had become a big problem and had to be solved.

Testing

Testing Storage Database Processing

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

With disks being faster nowadays and CPU and memory resources being cheaper, we could easily say MySQL can handle TBs of data with good performance. For instance, in Percona Managed Services , we have many clients with TBs worth of data that are well performant. There are many compression tools and algorithms for data out there.

Open Source

Open Source Storage Database Big Data

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage.

All Things Distributed

MAY 18, 2010

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage. Today a new storage option for Amazon S3 has been launched: Amazon S3 Reduced Redundancy Storage (RRS). This new storage option enables customers to reduce their costs by storing non-critical, reproducible data at lower levels of redundancy. Comments ().

Storage

Storage Cloud AWS Scalability

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

All Things Distributed

DECEMBER 13, 2016

Today, I'm happy to announce that the AWS Europe (London) Region, our 16th technology infrastructure region globally, is now generally available for use by customers worldwide. With AWS, GoSquared can process tens of billions of data points every day from four continents to provide customers with a single view.

AWS

AWS Cloud Artificial Intelligence IoT

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

We live in a world where massive volumes of data are generated from websites, connected devices and mobile apps. In such a data intensive environment, making key business decisions such as running marketing and sales campaigns, logistic planning, financial analysis and ad targeting require deriving insights from these data.

Cloud

Cloud Big Data AWS Analytics

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

Today, I'm happy to share that the Canada (Central) Region is available for use by customers worldwide. The AWS Cloud now operates in 40 Availability Zones within 15 geographic regions around the world, with seven more Availability Zones and three more regions coming online in China, France, and the U.K. in the coming year.

AWS

AWS Cloud Lambda Innovation

Expanding the Cloud: Introducing the AWS Asia Pacific (Mumbai) Region

All Things Distributed

JUNE 26, 2016

Today, I’m happy to announce that the Asia Pacific (Mumbai) Region is generally available for use by customers worldwide. Seamless ingestion of large volumes of sensed data. AdiMap uses Amazon Kinesis to process real-time streaming online ad data and job feeds, and processes them for storage in petabyte-scale Amazon Redshift.

AWS

AWS Cloud Healthcare Blockchain

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

ScaleOut Software

APRIL 3, 2020

For example, the parameters for a ventilator could include its identifier, make and model, current location, status (in use, in storage, broken), time in use, technical issues and repairs, and contact information. Show me a list of currently available or soon to be available ventilators in my county right now.”.

Logistics

Logistics Analytics Scalability Cloud

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

ScaleOut Software

APRIL 3, 2020

For example, the parameters for a ventilator could include its identifier, make and model, current location, status (in use, in storage, broken), time in use, technical issues and repairs, and contact information. Show me a list of currently available or soon to be available ventilators in my county right now.”.

Logistics

Logistics Analytics Scalability Cloud

Expanding the AWS Cloud – Introducing the AWS Europe (Stockholm) Region

All Things Distributed

DECEMBER 12, 2018

Today, I'm happy to announce that the AWS Europe (Stockholm) Region, our 20th Region globally, is now generally available for use by customers. With this launch, AWS now provides 60 Availability Zones, with another 12 zones and four Regions expected to come online by 2020 in Bahrain, Cape Town, Hong Kong, and Milan.

AWS

AWS Cloud Games Serverless

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

To our shareowners: Random forests, naÃ¯ve Bayesian estimators, RESTful services, gossip protocols, eventual consistency, data sharding, anti-entropy, Byzantine quorum, erasure coding, vector clocks. The storage systems weve pioneered demonstrate extreme scalability while maintaining tight control over performance, availability, and cost.

Technology

Technology Technology AWS Storage

New AWS feature: Run your website from Amazon S3 - All Things.

All Things Distributed

FEBRUARY 17, 2011

Since a few days ago this weblog serves 100% of its content directly out of the Amazon Simple Storage Service (S3) without the need for a web server to be involved. been running at a traditional hosting site for many years until this preferred simple solution became available: today marks that day and I couldnt be happier about it.

AWS

AWS Website Storage Servers

The AWS GovCloud (US) Region - All Things Distributed

All Things Distributed

AUGUST 16, 2011

For example a number of our European customers are subject to data residency requirements when it comes to PII data and they use the EU Region to meet to those requirements. Government and Big Data. One particular early use case for AWS GovCloud (US) will be massive data processing and analytics. More information.

AWS

AWS Government Big Data Cloud

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

Their dataset has about 7B edges… Meanwhile, AnalyticDB is Alibaba’s real-time OLAP RDBMS handling 10PB of data (in excess of 100 trillion rows!). Autoscaling tiered cloud storage in Anna. AliGraph covers Alibaba’s distributed graph engine supporting the development of new GNN applications. Research papers. (In In random order!).

Blockchain

Blockchain Hardware Google Analytics

Expanding the Cloud - Introducing the AWS Asia Pacific (Tokyo.

All Things Distributed

MARCH 2, 2011

Japanese companies and consumers have become used to low latency and high-speed networking available between their businesses, residences, and mobile devices. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Driving down the cost of Big-Data analytics. Contact Info.

AWS

AWS Cloud Games Latency

5 Terabyte Object Support in Amazon S3 - All Things Distributed

All Things Distributed

DECEMBER 9, 2010

Amazon S3 has always been a scalable, durable and available data repository for almost any customer workload. This is especially true for customers managing HD video or data-intensive instruments such as genomic sequencers. Who has files larger than 5GB? For example, a 2-hour movie on Blu-ray can be 50 gigabytes.

AWS

AWS Big Data Scalability Storage

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

All Things Distributed

AUGUST 22, 2011

Please note that Amazon ElastiCache is currently available in the US East (Virginia) Region. It will be available in other AWS Regions in the coming months. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Driving down the cost of Big-Data analytics. Contact Info.

Cloud

Cloud Cache AWS Storage

Spot Instances - Increased Control - All Things Distributed

All Things Distributed

JULY 11, 2011

With this change, we will improve the granularity of pricing information you receive by introducing a Spot Instance price per Availability Zone rather than a Spot Instance price per Region. Customers whose bids exceed the Spot price gain access to the available Spot Instances and run as long as the bid exceeds the Spot Price.

AWS

AWS Storage Cloud Big Data

Software Testing Trends 2021 – What can we expect?

Testsigma

FEBRUARY 12, 2021

When more companies transition into digital-first projects, there must be an expanded number of processes and IT data departments to keep IT teams on track. New technologies like the AI, NLP, AI, ML, autonomous products available are used by testers and organisations to solve the problems of test cases, time consumed in testing.

Artificial Intelligence

Artificial Intelligence Software Software IoT

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

As a big music fan with well over 100Gb in digital music I am particularly excited that I now have access to all my digital music anywhere I go. What used to be only available in physical formats now often has digital equivalents and this digitalization is driving great new innovations. Driving Storage Costs Down for AWS Customers.

AWS

AWS Cloud Storage Internet

Simplifying IT - Create Your Application with AWS CloudFormation.

All Things Distributed

FEBRUARY 25, 2011

When a new customer is onboarded, the ISV has to spin up a collection of AWS resources to run their web-servers, app-servers and databases in a multi-AZ (availability zone) setting to achieve high-availability. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. At werner.ly

AWS

AWS Cloud Scalability Storage

What is Greenplum Database? Intro to the Big Data Database

Introduction to Azure Data Lake Storage Gen2

Trending Sources

What is a Distributed Storage System

Optimizing data warehouse storage

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Kubernetes for Big Data Workloads

In-Stream Big Data Processing

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Redis vs Memcached in 2024

Advancing Application Performance with NVMe Storage, Part 3

Advancing Application Performance With NVMe Storage, Part 2

What is cloud monitoring? How to improve your full-stack visibility

Delta: A Data Synchronization and Enrichment Platform

How to Optimize Elasticsearch for Better Search Performance

What is container orchestration?

Kubernetes in the wild report 2023

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Mastering Hybrid Cloud Strategy

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

NoSQL Data Modeling Techniques

Why test data management is more important than you think

Why MySQL Could Be Slow With Large Tables

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage.

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

Expanding the Cloud: Introducing Amazon QuickSight

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

Expanding the Cloud: Introducing the AWS Asia Pacific (Mumbai) Region

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

Expanding the AWS Cloud – Introducing the AWS Europe (Stockholm) Region

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

New AWS feature: Run your website from Amazon S3 - All Things.

The AWS GovCloud (US) Region - All Things Distributed

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

Expanding the Cloud - Introducing the AWS Asia Pacific (Tokyo.

5 Terabyte Object Support in Amazon S3 - All Things Distributed

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

Spot Instances - Increased Control - All Things Distributed

Software Testing Trends 2021 – What can we expect?

Music to my Ears - All Things Distributed

Simplifying IT - Create Your Application with AWS CloudFormation.

Stay Connected