Big Data, Database, Engineering and Storage - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is a massively parallel processing (MPP) SQL database that is built and based on PostgreSQL. It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data.

Big Data

Big Data Database Artificial Intelligence Open Source

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The design of the in-stream processing engine itself was driven by the following requirements: SQL-like functionality. Strict fault-tolerance is a principal requirement for the engine.

Big Data

Big Data Processing Lambda Database

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

With these goals in mind, two in-memory data stores, Redis and Memcached, have emerged as the top contenders. This article will explore how they handle data storage and scalability, perform in different scenarios, and, most importantly, how these factors influence your choice. Data transfer technology. 3d render.

Cache

Cache Storage Scalability Architecture

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Causal AI—which brings AI-enabled actionable insights to IT operations—and a data lakehouse, such as Dynatrace Grail , can help break down silos among ITOps, DevSecOps, site reliability engineering, and business analytics teams. “The weakness of a data lake is they fail when you need to access them fast,” Pawlowski said.

Analytics

Analytics Infrastructure Storage Efficiency

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

The strongest Kubernetes growth areas are security, databases, and CI/CD technologies. Most Kubernetes clusters in the cloud (73%) are built on top of managed distributions from the hyperscalers like AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). Java, Go, and Node.js

Open Source

Open Source Java Operating System Programming

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Public Cloud Infrastructure Third-party providers run public cloud services, delivering a broad array of offerings like computing power, storage solutions, and network capabilities that enhance the functionality of a hybrid cloud architecture. We will examine each of these elements in more detail.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. Driving Storage Costs Down for AWS Customers. Comments (). Syndication.

Big Data

Big Data Analytics AWS Cloud

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

Some startups adopted MySQL in its early days such as Facebook, Uber, Pinterest, and many more, which are now big and successful companies that prove that MySQL can run on large databases and on heavily used sites. It can help us to save costs on storage and backup times.

Open Source

Open Source Storage Database Big Data

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

All Things Distributed

AUGUST 20, 2012

Managing Cold Storage with Amazon Glacier. With the introduction of Amazon Glacier , IT organizations now have a solution that removes the headaches of digital archiving and provides extremely low cost storage. Data is retrieved by scheduling a job, which typically completes within 3 to 5 hours. All Things Distributed.

Storage

Storage Cloud AWS Media

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

Incoming data is saved into data storage (historian database or log store) for query by operational managers who must attempt to find the highest priority issues that require their attention. The best they can usually do in real-time using general purpose tools is to filter and look for patterns of interest.

IoT

IoT Analytics Big Data Architecture

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. What follows is a discussion of where big data systems might be heading, heavily inspired by the remarks in this paper, but with several of my own thoughts mixed in.

Cloud

Cloud Big Data Latency Architecture

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

ETL refers to extract, transform, load and it is generally used for data warehousing and data integration. ETL is a product of the relational database era and it has not evolved much in last decade. There are several emerging data trends that will define the future of ETL in 2018. Machine learning meets data integration.

Big Data

Big Data Artificial Intelligence Storage Hardware

Should You Use ClickHouse as a Main Operational Database?

Percona

JANUARY 14, 2019

What if we use ClickHouse (which is a columnar analytical database) as our main datastore? Well, typically, an analytical database is not a replacement for a transactional or key/value datastore. Although such databases can be very efficient with counts and averages, some queries will be slow or simply non existent. Processed 8.19

Database

Database Analytics Blockchain Healthcare

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. At the same time, NoSQL data modeling is not so well studied and lacks the systematic theory found in relational databases. Document databases advance the BigTable model offering two significant improvements.

Database

Database Ecommerce Efficiency Engineering

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

MongoDB is an important database, and this paper explains the tunable (per-operation) consistency models that MongoDB provides and how they are implemented under the covers. AliGraph covers Alibaba’s distributed graph engine supporting the development of new GNN applications. Autoscaling tiered cloud storage in Anna.

Blockchain

Blockchain Hardware Google Analytics

A case for ELT

Abhishek Tiwari

DECEMBER 22, 2017

Cheap storage and on-demand compute in the cloud coupled with the emergence of new big data frameworks and tools are forcing us to rethink the whole ETL and data warehousing architecture. Then we perform frequent batch ETL from application databases to a data warehouse. Classic ETL. Late transformation.

Big Data

Big Data Retail Storage Google

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

As some of you may remember I was pretty excited when Amazon Simple Storage Service (S3) released its website feature such that I could serve this weblog completely from S3. Jekyll in written in Ruby and uses YAML for metadata management and uses the Liquid template engine to manipulate the content. Syndication. or rss feed.

Servers

Servers Social Media AWS Website

Top Benefits of Data-Driven Test Automation

Testsigma

JULY 14, 2020

Test data storage can be achieved by any of the below options-. Database tables. Tools/ frameworks for data-driven automation testing-. If there is any update required either in test scripts or test data, it is hassle-free because both test data and test scripts are placed separately with no dependency on each other.

Testing

Testing Artificial Intelligence DevOps Big Data

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 30, 2018

USENIX’s LISA conference is the premier event for topics in production system engineering. LISA originally stood for "Large Installation System Administration," where "large" meant systems with more than a gigabyte of storage, or with more than 100 users. Join us for 3 days in Nashville at LISA'18.

DevOps

DevOps Network Best Practices Programming

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

All Things Distributed

JANUARY 19, 2011

Flexibility is one of the key principles of Amazon Web Services - developers can select any programming language and software package, any operating system, any middleware and any database to build systems and applications that meet their requirements. and Engine Yard , Springsource users have CloudFoundry. Syndication. or rss feed.

AWS

AWS Cloud Java Operating System

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 29, 2018

USENIX’s LISA conference is the premier event for topics in production system engineering. LISA originally stood for "Large Installation System Administration," where "large" meant systems with more than a gigabyte of storage, or with more than 100 users. Join us for 3 days in Nashville at LISA'18.

DevOps

DevOps Network Best Practices Programming

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

All Things Distributed

NOVEMBER 19, 2010

Science & Engineering. an engineering adventure to break the 1,000 mph barrier in a car. The Big Idea: Biomimetic Architecture - The National Geographic came in the mail this week with a beautiful pull-out of GaudÃs Sagrada FamÃlia, the online version is only a summary. Driving Storage Costs Down for AWS Customers.

AWS

AWS Cloud Benchmarking Storage

Around the World in 28 Days - All Things Distributed

All Things Distributed

SEPTEMBER 30, 2010

There is huge variety in exiting architectures and I am often impressed about the ingenuity of the engineers in how to best transform the application if "Lift & Shift" is not an option. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Driving down the cost of Big-Data analytics.

AWS

AWS Storage Cloud Best Practices

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Cluster Computer Instances are similar to other Amazon EC2 instances but have been specifically engineered to provide high performance compute and networking. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Job Openings in AWS - Senior Leader in Database Services. until today.

Cloud

Cloud AWS Automotive Latency

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

There is more than one Werner Vogels in this world and although I never get emails, snail mail or phones calls for any of my peers, I am sure they are somewhat frustrated if they type in our name in a search engine :-). Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Syndication.

Cloud

Cloud Internet Internet AWS

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

All Things Distributed

APRIL 28, 2010

Currently, each AWS Region contains multiple Availability Zones, which are distinct locations that are engineered to be insulated from failures in other Availability Zones. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Job Openings in AWS - Senior Leader in Database Services.

AWS

AWS Cloud Latency Storage

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

However, the data infrastructure to collect, store and process data is geared toward developers (e.g., In AWS’ quest to enable the best data storage options for engineers, we have built several innovative database solutions like Amazon RDS, Amazon RDS for Aurora, Amazon DynamoDB, and Amazon Redshift.

Cloud

Cloud Big Data AWS Analytics

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

We use high-performance transactions systems, complex rendering and object caching, workflow and queuing systems, business intelligence and data analytics, machine learning and pattern recognition, neural networks and probabilistic decision making, and a wide variety of other techniques. Driving Storage Costs Down for AWS Customers.

Technology

Technology Technology AWS Storage

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

The scalability, reliability and durability requirements for Cloud Drive are very high which is why they decided to make use of the Amazon Simple Storage Service (S3) as the core component of their service. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. More details at [link].

AWS

AWS Cloud Storage Internet

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

A third generation of APIs, however, left the graphics specifics interfaces behind and instead focused on exposing the pipeline as a generic highly parallel engine supporting task and data parallelism. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Syndication. or rss feed.

AWS

AWS Latency Programming Architecture

Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Optimizing data warehouse storage

Trending Sources

In-Stream Big Data Processing

Redis vs Memcached in 2024

Conducting log analysis with an observability platform and full data context

Kubernetes in the wild report 2023

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Mastering Hybrid Cloud Strategy

Driving down the cost of Big-Data analytics - All Things Distributed

Why MySQL Could Be Slow With Large Tables

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

The Need for Real-Time Device Tracking

Helios: hyperscale indexing for the cloud & edge – part 1

5 data integration trends that will define the future of ETL in 2018

Should You Use ClickHouse as a Main Operational Database?

NoSQL Data Modeling Techniques

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

A case for ELT

No Server Required - Jekyll & Amazon S3 - All Things Distributed

Top Benefits of Data-Driven Test Automation

USENIX LISA 2018: CFP Now Open

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

USENIX LISA 2018: CFP Now Open

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

Around the World in 28 Days - All Things Distributed

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

Expanding the Cloud: Introducing Amazon QuickSight

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Music to my Ears - All Things Distributed

Amazon EC2 Cluster GPU Instances - All Things Distributed

Stay Connected