Big Data, Hardware and Storage - Technology Performance Pulse

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Performance. Native frameworks.

Big Data

Big Data Storage Benchmarking Hardware

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

There is a countless number of enterprises, particularly Internet giants, that have explored ways to make graph data processing scalable. It has been a norm to perceive that distributed databases use the method of adding cheap PC(s) to achieve scalability (storage and computing) and attempt to store data once and for all on demand.

Scalability

Scalability Big Data Hardware Internet

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The pipelines can be stateful and the engine’s middleware should provide a persistent storage to enable state checkpointing. Interoperability with Hadoop.

Big Data

Big Data Processing Lambda Database

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

Besides the traditional system hardware, storage, routers, and software, ITOps also includes virtual components of the network and cloud infrastructure. Although modern cloud systems simplify tasks, such as deploying apps and provisioning new hardware and servers, hybrid cloud and multicloud environments are often complex.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

On-premises data centers invest in higher capacity servers since they provide more flexibility in the long run, while the procurement price of hardware is only one of many cost factors. Redis is an in-memory key-value store and cache that simplifies processing, storage, and interaction with data in Kubernetes environments.

Open Source

Open Source Java Operating System Programming

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is an open-source , hardware-agnostic MPP database for analytics, based on PostgreSQL and developed by Pivotal who was later acquired by VMware. High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages. What Exactly is Greenplum? Greenplum Advantages.

Big Data

Big Data Database Artificial Intelligence Open Source

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

If CPU usage is not a bottleneck in your setup, you can leverage compression as it can improve performance which means that less data needs to be read from disk and written to memory, and indexes are compressed too. It can help us to save costs on storage and backup times. MyRocks is shipped in Percona Server for MySQL.

Open Source

Open Source Storage Database Big Data

Dutch Enterprises and The Cloud

All Things Distributed

SEPTEMBER 6, 2013

Shell leverages AWS for big data analytics to help achieve these goals. Due to the exponential growth of the biology and informatics fields, Unilever needs to maintain this new program within a highly-scalable environment that supports parallel computation and heavy data storage demands.

Cloud

Cloud Energy AWS Healthcare

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Additionally, many high-end HPC applications take advantage of knowing their in-house hardware platforms to achieve major speedup by exploiting the specific processor architecture. Given the specialized nature of these platforms, they require dedicated resources to maintain and operate and put a big burden on the IT organization.

Cloud

Cloud AWS Automotive Latency

Structural Evolutions in Data

O'Reilly

SEPTEMBER 19, 2023

It progressed from “raw compute and storage” to “reimplementing key services in push-button fashion” to “becoming the backbone of AI work”—all under the umbrella of “renting time and storage on someone else’s computers.” Cloud computing? ” scenarios at industrial scale.

Hardware

Hardware Storage Big Data Blockchain

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

However, the data infrastructure to collect, store and process data is geared toward developers (e.g., In AWS’ quest to enable the best data storage options for engineers, we have built several innovative database solutions like Amazon RDS, Amazon RDS for Aurora, Amazon DynamoDB, and Amazon Redshift. Big data challenges.

Cloud

Cloud Big Data AWS Analytics

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

More importantly, UDM utilizes a single storage backend with benefits of multiple storage systems which avoids moving data across systems hence data duplication, and data consistency issues. In contrast, Alluxio a middleware for data access - think Alluxio storage layer as fast cache.

Big Data

Big Data Artificial Intelligence Storage Hardware

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

Autoscaling tiered cloud storage in Anna. Could it be Analyzing efficient stream processing on modern hardware ? Hyper Dimension Shuffle describes how Microsoft improved the cost of data shuffling, one of the most costly operations, in their petabyte-scale internal big data analytics platform, SCOPE.

Blockchain

Blockchain Hardware Google Analytics

Expanding the AWS Cloud – Introducing the AWS Europe (Stockholm) Region

All Things Distributed

DECEMBER 12, 2018

The first platform is a real time, big data platform being used for analyzing traffic usage patterns to identify congestion and connectivity issues. The second platform is a managed IoT cloud with customer-facing applications and data management, which went live in 2016. Telenor Connexion is all-in on AWS.

AWS

AWS Cloud Games Serverless

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

This lead to the birth of the Graphics Processing Unit (GPU) which was focused on providing a very fine grained parallel model, with processing organized in multiple stages, where the data would flow through.Â Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. At werner.ly Syndication.

AWS

AWS Latency Programming Architecture

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

This blog post gives a glimpse of the computer systems research papers presented at the USENIX Annual Technical Conference (ATC) 2019, with an emphasis on systems that use new hardware architectures. Intel Quick Assist Technology (QAT) was the focus of the QZFS paper which used this new hardware device to speed up file system compression.

Architecture

Architecture Hardware Cache Storage

Technology Performance Pulse

Kubernetes for Big Data Workloads

What Should You Know About Graph Database’s Scalability?

Trending Sources

In-Stream Big Data Processing

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

What is a Distributed Storage System

Kubernetes in the wild report 2023

What is Greenplum Database? Intro to the Big Data Database

Why MySQL Could Be Slow With Large Tables

Dutch Enterprises and The Cloud

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Structural Evolutions in Data

Expanding the Cloud: Introducing Amazon QuickSight

5 data integration trends that will define the future of ETL in 2018

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

Expanding the AWS Cloud – Introducing the AWS Europe (Stockholm) Region

Amazon EC2 Cluster GPU Instances - All Things Distributed

The Winds of Architecture Changes at the USENIX ATC 2019

Stay Connected