Architecture, Big Data and Storage - Technology Performance Pulse

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Storage provisioning.

Big Data

Big Data Storage Benchmarking Hardware

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The pipelines can be stateful and the engine’s middleware should provide a persistent storage to enable state checkpointing. Interoperability with Hadoop.

Big Data

Big Data Processing Lambda Database

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

It has been a norm to perceive that distributed databases use the method of adding cheap PC(s) to achieve scalability (storage and computing) and attempt to store data once and for all on demand. Having a distributed and scalable graph database system is highly sought after in many enterprise scenarios.

Scalability

Scalability Big Data Hardware Internet

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

OCTOBER 17, 2018

To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data

Big Data Latency Transportation Traffic

Databook: Turning Big Data into Knowledge with Metadata at Uber

Uber Engineering

AUGUST 3, 2018

From driver and rider locations and destinations, to restaurant orders and payment transactions, every interaction on Uber’s transportation platform is driven by data.

Big Data

Big Data Transportation Engineering Storage

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

In today's world, data is generated in high volumes and to make something out of it, extracted data is needed to be transformed, stored, maintained, governed and analyzed. These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on.

Big Data

Big Data Government Open Source Storage

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

What is container orchestration?

Dynatrace

MARCH 24, 2023

Problems include provisioning and deployment; load balancing; securing interactions between containers; configuration and allocation of resources such as networking and storage; and deprovisioning containers that are no longer needed. How does container orchestration work? The post What is container orchestration?

Infrastructure

Infrastructure Open Source Operating System Cloud

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

Today’s streaming analytics architectures are not equipped to make sense of this rapidly changing information and react to it as it arrives. Incoming data is saved into data storage (historian database or log store) for query by operational managers who must attempt to find the highest priority issues that require their attention.

IoT

IoT Analytics Big Data Architecture

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

When undertaking system migrations, one of the main challenges is establishing confidence and seamlessly transitioning the traffic to the upgraded architecture without adversely impacting the customer experience. This blog series will examine the tools, techniques, and strategies we have utilized to achieve this goal.

Traffic

Traffic Latency Tuning Systems

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Redis is an in-memory key-value store and cache that simplifies processing, storage, and interaction with data in Kubernetes environments. Specifically, they provide asynchronous communications within microservices architectures and high-throughput distributed systems. Databases : Among databases, Redis is the most used at 60%.

Open Source

Open Source Java Operating System Programming

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

Besides the traditional system hardware, storage, routers, and software, ITOps also includes virtual components of the network and cloud infrastructure. AIOps (artificial intelligence for IT operations) combines big data, AI algorithms, and machine learning for actionable, real-time insights that help ITOps continuously improve operations.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

In this blog post, we explain what Greenplum is, and break down the Greenplum architecture, advantages, major use cases, and how to get started. It’s architecture was specially designed to manage large-scale data warehouses and business intelligence workloads by giving you the ability to spread your data out across a multitude of servers.

Big Data

Big Data Database Artificial Intelligence Open Source

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Uber Engineering

APRIL 5, 2018

Three years ago, Uber Engineering adopted Hadoop as the storage ( HDFS ) and compute ( YARN ) infrastructure for our organization’s big data analysis.

Systems

Systems Big Data Storage Infrastructure

Delta: A Data Synchronization and Enrichment Platform

The Netflix TechBlog

OCTOBER 15, 2019

Thus, ensuring the atomicity of writes across different storage technologies remains a challenging problem for applications [3]. Delta Delta has been developed to address the limitations of existing solutions for data synchronization, and also allows to enrich data on the fly. Deal Service, Talent Service and Vendor Service).

Transportation

Transportation Architecture Storage Processing

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

If CPU usage is not a bottleneck in your setup, you can leverage compression as it can improve performance which means that less data needs to be read from disk and written to memory, and indexes are compressed too. It can help us to save costs on storage and backup times. It is available under a paid subscription.

Open Source

Open Source Storage Database Big Data

Expanding the Cloud: Introducing the AWS Asia Pacific (Mumbai) Region

All Things Distributed

JUNE 26, 2016

AdiMap uses Amazon Kinesis to process real-time streaming online ad data and job feeds, and processes them for storage in petabyte-scale Amazon Redshift. Advanced problem solving that connects big data with machine learning. warehouses to glean business insights for jobs, ad spend, or financials for mobile apps.

AWS

AWS Cloud Healthcare Blockchain

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

It adopted Amazon Redshift, Amazon EMR and AWS Lambda to power its data warehouse, big data, and data science applications, supporting the development of product features at a fraction of the cost of competing solutions. Kik Interactive is a Canadian chat platform with hundreds of millions of users around the globe.

AWS

AWS Cloud Lambda Innovation

A case for ELT

Abhishek Tiwari

DECEMBER 22, 2017

Cheap storage and on-demand compute in the cloud coupled with the emergence of new big data frameworks and tools are forcing us to rethink the whole ETL and data warehousing architecture. If the majority of your data is unstructured such as text, images, documents, etc. Classic ETL. Late transformation.

Big Data

Big Data Retail Storage Google

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Key Takeaways Redis offers complex data structures and additional features for versatile data handling, while Memcached excels in simplicity with a fast, multi-threaded architecture for basic caching needs. With these goals in mind, two in-memory data stores, Redis and Memcached, have emerged as the top contenders.

Cache

Cache Storage Scalability Architecture

Register for AWS re: Invent - All Things Distributed

All Things Distributed

JULY 16, 2012

There are sessions in many different categories: Architecture, Big Data, HPC, Computer & Networking, Storage, Databases, Security, Tools & Languages, Media Sharing & Content Delivery, Managing AWS Resources, Enterprise IT, Mobile, Start-up, and more.

AWS

AWS Big Data Media Storage

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Defining Hybrid Cloud Strategy The decision-making process about where to situate data and applications is vital to any hybrid cloud solution. Defining Hybrid Cloud Strategy The decision-making process about where to situate data and applications is vital to any hybrid cloud solution.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

AUGUST 14, 2019

Maintaining Uber’s large-scale data warehouse comes with an operational cost in terms of ETL functions and storage. Once identified, … The post Less is More: Engineering Data Warehouse Efficiency with Minimalist Design appeared first on Uber Engineering Blog.

Efficiency

Efficiency Engineering Design Storage

Job Openings in AWS - Senior Leader in Database Services - All.

All Things Distributed

AUGUST 19, 2011

AWS Database Services is responsible for setting the database strategy and delivering distributed structured storage services to our AWS customers. This team is constantly rethinking the assumptions behind how traditional databases were built and constantly working on building the right database architectures suited for the Cloud environment.

AWS

AWS Database Storage Scalability

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Logs highlight observability challenges Ingesting, storing, and processing the unprecedented explosion of data from sources such as software as a service, multicloud environments, containers, and serverless architectures can be overwhelming for today’s organizations. Seamless integration.

Analytics

Analytics Infrastructure Storage Efficiency

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

ScaleOut Software

APRIL 3, 2020

For example, the parameters for a ventilator could include its identifier, make and model, current location, status (in use, in storage, broken), time in use, technical issues and repairs, and contact information. This allows quick answers to questions such as: “Show me the percentage shortfall in ventilators by state.”.

Logistics

Logistics Analytics Scalability Cloud

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

ScaleOut Software

APRIL 3, 2020

For example, the parameters for a ventilator could include its identifier, make and model, current location, status (in use, in storage, broken), time in use, technical issues and repairs, and contact information. This allows quick answers to questions such as: “Show me the percentage shortfall in ventilators by state.”.

Logistics

Logistics Analytics Scalability Cloud

Dutch Enterprises and The Cloud

All Things Distributed

SEPTEMBER 6, 2013

Shell leverages AWS for big data analytics to help achieve these goals. Due to the exponential growth of the biology and informatics fields, Unilever needs to maintain this new program within a highly-scalable environment that supports parallel computation and heavy data storage demands.

Cloud

Cloud Energy AWS Healthcare

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

While data lakes and data warehousing architectures are commonly used modes for storing and analyzing data, a data lakehouse is an efficient third way to store and analyze data that unifies the two architectures while preserving the benefits of both. What is a data lakehouse? Data management.

Artificial Intelligence

Artificial Intelligence Analytics Storage Government

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

All Things Distributed

NOVEMBER 19, 2010

Understanding Throughput-Oriented Architectures - background article in CACM on massively parallel and throughput vs latency oriented architectures. The Big Idea: Biomimetic Architecture - The National Geographic came in the mail this week with a beautiful pull-out of GaudÃs Sagrada FamÃlia, the online version is only a summary.

AWS

AWS Cloud Benchmarking Storage

Around the World in 28 Days - All Things Distributed

All Things Distributed

SEPTEMBER 30, 2010

Visiting future customers is equally exiting as you get a change to understand their current architecture, if it is a migration, and how they plan to exploit cloud services in their new setup. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Driving down the cost of Big-Data analytics.

AWS

AWS Storage Cloud Best Practices

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Limited data availability constrains value creation. Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes. Teams have introduced workarounds to reduce storage costs. Dynatrace discovers logs automatically at scale.

Analytics

Analytics Artificial Intelligence Storage Serverless

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 30, 2018

LISA originally stood for "Large Installation System Administration," where "large" meant systems with more than a gigabyte of storage, or with more than 100 users. In fact, we’d link to the first LISA conference website for reference, but this conference not only predates the Wayback Machine – it also predates the World Wide Web!

DevOps

DevOps Network Best Practices Programming

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

As cloud and big data complexity scales beyond the ability of traditional monitoring tools to handle, next-generation cloud monitoring and observability are becoming necessities for IT teams. Cloud storage monitoring. Hybrid cloud combines an on-premises or private data center with public cloud infrastructure.

Cloud

Cloud Monitoring Best Practices Infrastructure

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Additionally, many high-end HPC applications take advantage of knowing their in-house hardware platforms to achieve major speedup by exploiting the specific processor architecture. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Driving down the cost of Big-Data analytics.

Cloud

Cloud AWS Automotive Latency

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 29, 2018

LISA originally stood for "Large Installation System Administration," where "large" meant systems with more than a gigabyte of storage, or with more than 100 users. In fact, we’d link to the first LISA conference website for reference, but this conference not only predates the Wayback Machine – it also predates the World Wide Web!

DevOps

DevOps Network Best Practices Programming

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Dynatrace

FEBRUARY 16, 2023

As teams try to gain insight into this data deluge, they have to balance the need for speed, data fidelity, and scale with capacity constraints and cost. To solve this problem, Dynatrace launched Grail, its causational data lakehouse , in 2022. Logs on Grail Log data is foundational for any IT analytics. .

Analytics

Analytics Innovation Metrics Database

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

While registrars manage the namespace in the DNS naming architecture, DNS servers are used to provide the mapping between names and the addresses used to identify an access point. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Driving down the cost of Big-Data analytics.

Cloud

Cloud Internet Internet AWS

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3.

Latency

Latency Storage Big Data Tuning

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. These two narratives of reference architecture and ingestion/indexing system are interwoven throughout the paper. Why do we need a new reference architecture?

Cloud

Cloud Big Data Latency Architecture

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A common theme across all these trends is to remove the complexity by simplifying data management as a whole. In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture.

Big Data

Big Data Artificial Intelligence Storage Hardware

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

To our shareowners: Random forests, naÃ¯ve Bayesian estimators, RESTful services, gossip protocols, eventual consistency, data sharding, anti-entropy, Byzantine quorum, erasure coding, vector clocks. Look inside a current textbook on software architecture, and youll find few patterns that we dont apply at Amazon. At werner.ly

Technology

Technology Technology AWS Storage

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

The scalability, reliability and durability requirements for Cloud Drive are very high which is why they decided to make use of the Amazon Simple Storage Service (S3) as the core component of their service. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. At werner.ly Syndication.

AWS

AWS Cloud Storage Internet

Kubernetes for Big Data Workloads

In-Stream Big Data Processing

Trending Sources

What Should You Know About Graph Database’s Scalability?

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Databook: Turning Big Data into Knowledge with Metadata at Uber

How to Optimize Elasticsearch for Better Search Performance

What is a Distributed Storage System

What is container orchestration?

The Need for Real-Time Device Tracking

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Kubernetes in the wild report 2023

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

What is Greenplum Database? Intro to the Big Data Database

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Delta: A Data Synchronization and Enrichment Platform

Optimizing data warehouse storage

Why MySQL Could Be Slow With Large Tables

Expanding the Cloud: Introducing the AWS Asia Pacific (Mumbai) Region

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

A case for ELT

Redis vs Memcached in 2024

Register for AWS re: Invent - All Things Distributed

Mastering Hybrid Cloud Strategy

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Job Openings in AWS - Senior Leader in Database Services - All.

Conducting log analysis with an observability platform and full data context

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

Dutch Enterprises and The Cloud

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

Around the World in 28 Days - All Things Distributed

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

USENIX LISA 2018: CFP Now Open

What is cloud monitoring? How to improve your full-stack visibility

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

USENIX LISA 2018: CFP Now Open

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Helios: hyperscale indexing for the cloud & edge – part 1

5 data integration trends that will define the future of ETL in 2018

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Music to my Ears - All Things Distributed

Stay Connected