Big Data, Data, Performance and Storage - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data. This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes.

Big Data

Big Data Database Artificial Intelligence Open Source

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

While data lakes and data warehousing architectures are commonly used modes for storing and analyzing data, a data lakehouse is an efficient third way to store and analyze data that unifies the two architectures while preserving the benefits of both. What is a data lakehouse? How does a data lakehouse work?

Artificial Intelligence

Artificial Intelligence Analytics Storage Government

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer on the Product Data Science and Engineering team.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. This system has been designed to supplement and succeed the existing Hadoop-based system that had too high latency of data processing and too high maintenance costs.

Big Data

Big Data Processing Lambda Database

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Performance.

Big Data

Big Data Storage Benchmarking Hardware

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Modern organizations ingest petabytes of data daily, but legacy approaches to log analysis and management cannot accommodate this volume of data. At Dynatrace Perform 2023 , Maciej Pawlowski, senior director of product management for infrastructure monitoring at Dynatrace, and a senior software engineer at a U.K.

Analytics

Analytics Infrastructure Storage Efficiency

Advancing Application Performance With NVMe Storage, Part 2

DZone

JUNE 3, 2019

Using local SSDs inside of the GPU node delivers fast access to data during training, but introduces challenges that impact the overall solution in terms of scalability, data access, and data protection.

Storage

Storage Performance Network Scalability

Advancing Application Performance with NVMe Storage, Part 3

DZone

JUNE 4, 2019

NVMe Storage Use Cases. NVMe storage's strong performance, combined with the capacity and data availability benefits of shared NVMe storage over local SSD, makes it a strong solution for AI/ML infrastructures of any size. There are several AI/ML focused use cases to highlight.

Storage

Storage FinTech Artificial Intelligence Performance

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Dynatrace

FEBRUARY 16, 2023

How do you get more value from petabytes of exponentially exploding, increasingly heterogeneous data? The short answer: The three pillars of observability—logs, metrics, and traces—converging on a data lakehouse. To solve this problem, Dynatrace launched Grail, its causational data lakehouse , in 2022.

Analytics

Analytics Innovation Metrics Database

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

In today's world, data is generated in high volumes and to make something out of it, extracted data is needed to be transformed, stored, maintained, governed and analyzed. These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on.

Big Data

Big Data Government Open Source Storage

Advancing Application Performance with NVMe Storage, Part 1

DZone

MAY 30, 2019

With big data on the rise and data algorithms advancing, the ways in which technology has been applied to real-world challenges have grown more automated and autonomous. Financial analysis with real-time analytics is used for predicting investments and drives the FinTech industry's needs for high-performance computing.

Artificial Intelligence

Artificial Intelligence Social Media FinTech Storage

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

There is a countless number of enterprises, particularly Internet giants, that have explored ways to make graph data processing scalable. It has been a norm to perceive that distributed databases use the method of adding cheap PC(s) to achieve scalability (storage and computing) and attempt to store data once and for all on demand.

Scalability

Scalability Big Data Hardware Internet

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Several pain points have made it difficult for organizations to manage their data efficiently and create actual value. Limited data availability constrains value creation. Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes.

Analytics

Analytics Artificial Intelligence Storage Serverless

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis and Memcached both provide high performance with sub-millisecond response times. Managed DBaaS solutions like ScaleGrid.io

Cache

Cache Storage Scalability Architecture

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

With more organizations taking the multicloud plunge, monitoring cloud infrastructure is critical to ensure all components of the cloud computing stack are available, high-performing, and secure. These next-generation cloud monitoring tools present reports — including metrics, performance, and incident detection — visually via dashboards.

Cloud

Cloud Monitoring Best Practices Infrastructure

When Performance Matters, Think NVMe

DZone

MAY 21, 2019

The demand for more IT resource-intensive applications has significantly increased today, whether it is to process quicker transactions, gain real-time insight, crunch big data sets, or to meet customer expectations. That’s because NVMe provides 6x higher bandwidth and IOPS advantage compared to SAS/SATA SSD.

Performance

Performance Big Data Storage Processing

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. However, this cannot be done without efficient, scalable data analytics.

Big Data

Big Data Analytics AWS Cloud

Delta: A Data Synchronization and Enrichment Platform

The Netflix TechBlog

OCTOBER 15, 2019

Part I: Overview Andreas Andreakis , Falguni Jhaveri , Ioannis Papapanagiotou , Mark Cho , Poorna Reddy , Tongliang Liu Overview It is a commonly observed pattern for applications to utilize multiple datastores where each is used to serve a specific need such as storing the canonical form of data (MySQL etc.), caching (Memcached etc.),

Transportation

Transportation Architecture Processing Storage

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Implementing a hybrid cloud solution involves careful decision-making regarding application and data placement, migration strategies, and choosing compatible cloud service providers while ensuring seamless integration and addressing security and compliance challenges. We will examine each of these elements in more detail.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

Complex cloud computing environments are increasingly replacing traditional data centers. In fact, Gartner estimates that 80% of enterprises will shut down their on-premises data centers by 2025. The primary goal of ITOps is to provide a high-performing, consistent IT environment. Performance. What does IT operations do?

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

What is container orchestration?

Dynatrace

MARCH 24, 2023

Problems include provisioning and deployment; load balancing; securing interactions between containers; configuration and allocation of resources such as networking and storage; and deprovisioning containers that are no longer needed. How does container orchestration work?

Infrastructure

Infrastructure Open Source Operating System Cloud

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

The study analyzes factual Kubernetes production data from thousands of organizations worldwide that are using the Dynatrace Software Intelligence Platform to keep their Kubernetes clusters secure, healthy, and high performing. Big data : To store, search, and analyze large datasets, 32% of organizations use Elasticsearch.

Open Source

Open Source Java Operating System Programming

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

It can happen on an edge API system servicing customer devices, between the edge and mid-tier services, or from mid-tiers to data stores. The first phase involves validating functional correctness, scalability, and performance concerns and ensuring the new systems’ resilience before the migration.

Traffic

Traffic Latency Tuning Systems

Reduce RPO, Encrypt Backups, and More in 1.15.0 Release of Percona Operator for MongoDB

Percona

OCTOBER 18, 2023

release , we added support for physical backups and restores to significantly reduce Recovery Time Objective ( RTO ), especially for big data sets. However, the problem of losing data between backups – in other words, Recovery Point Objective (RPO) – for physical backups was not solved. spec: backup: enabled: true.

Best Practices

Best Practices Storage AWS Big Data

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

Incoming data is saved into data storage (historian database or log store) for query by operational managers who must attempt to find the highest priority issues that require their attention. The following diagram illustrates a typical workflow. What’s missing in this picture?

IoT

IoT Analytics Big Data Architecture

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. At the same time, NoSQL data modeling is not so well studied and lacks the systematic theory found in relational databases. Many techniques that are described below are perfectly applicable to this model.

Database

Database Ecommerce Efficiency Engineering

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

While the technologies have evolved and matured enough, there are still some people thinking that MySQL is only for small projects or that it can’t perform well with large tables. With disks being faster nowadays and CPU and memory resources being cheaper, we could easily say MySQL can handle TBs of data with good performance.

Open Source

Open Source Storage Database Big Data

Why test data management is more important than you think

Testsigma

MAY 7, 2020

IBM Big Data and Analytics Hub website cited a case study, where a US insurance company was estimating 15% of their testing efforts to be just test data collection for the backend system and the frontend system. The test data management for the company had become a big problem and had to be solved.

Testing

Testing Storage Database Processing

Probabilistic Data Structures for Web Analytics and Data Mining

Highly Scalable

MAY 1, 2012

Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce.

Analytics

Analytics Traffic Big Data Efficiency

Top Benefits of Data-Driven Test Automation

Testsigma

JULY 14, 2020

According to Wikipedia, Data-Driven Testing(DDT) is a software testing methodology that is used in the testing of computer software to describe testing done using a table of conditions directly as test inputs and verifiable outputs as well as the process where test environment settings and control are not hard-coded. Database tables.

Testing

Testing Artificial Intelligence DevOps Big Data

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

AWS data centers in Canada will draw from a regional electricity grid that is 99 percent powered by hydropower. It adopted Amazon Redshift, Amazon EMR and AWS Lambda to power its data warehouse, big data, and data science applications, supporting the development of product features at a fraction of the cost of competing solutions.

AWS

AWS Cloud Lambda Innovation

A case for ELT

Abhishek Tiwari

DECEMBER 22, 2017

Cheap storage and on-demand compute in the cloud coupled with the emergence of new big data frameworks and tools are forcing us to rethink the whole ETL and data warehousing architecture. Then we perform frequent batch ETL from application databases to a data warehouse. Classic ETL. Late transformation.

Big Data

Big Data Retail Storage Google

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

ScaleOut Software

APRIL 3, 2020

For example, the parameters for a ventilator could include its identifier, make and model, current location, status (in use, in storage, broken), time in use, technical issues and repairs, and contact information. These questions can be answered using the latest data as it streams in from the field.

Logistics

Logistics Analytics Scalability Cloud

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

ScaleOut Software

APRIL 3, 2020

For example, the parameters for a ventilator could include its identifier, make and model, current location, status (in use, in storage, broken), time in use, technical issues and repairs, and contact information. These questions can be answered using the latest data as it streams in from the field.

Logistics

Logistics Analytics Scalability Cloud

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Structural Evolutions in Data

O'Reilly

SEPTEMBER 19, 2023

It progressed from “raw compute and storage” to “reimplementing key services in push-button fashion” to “becoming the backbone of AI work”—all under the umbrella of “renting time and storage on someone else’s computers.” Cloud computing? And Hadoop rolled in. Until it wasn’t.

Hardware

Hardware Storage Big Data Blockchain

Fast key-value stores: an idea whose time has come and gone

The Morning Paper

JUNE 23, 2019

Coupled with stateless application servers to execute business logic and a database-like system to provide persistent storage, they form a core component of popular data center service archictectures. If you want to store time-expiring data that should be shared across application processes, used Memcached or Redis.

Cache

Cache Latency Google Lambda

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

Their dataset has about 7B edges… Meanwhile, AnalyticDB is Alibaba’s real-time OLAP RDBMS handling 10PB of data (in excess of 100 trillion rows!). Autoscaling tiered cloud storage in Anna. The authors claim a two orders-of-magnitude performance improvement. speedup over the best performing existing method.

Blockchain

Blockchain Hardware Google Analytics

Introducing the AWS South America - All Things Distributed

All Things Distributed

DECEMBER 14, 2011

The new Sao Paulo Region provides better latency to South America, which enables AWS customers to deliver higher performance services to their South American end-users. Additionally, it allows them to keep their data inside of Brazil. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

AWS

AWS Latency Storage Big Data

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

All Things Distributed

AUGUST 22, 2011

There are many success stories about the effectiveness of caching in many different scenarios; next to helping applications achieving fast and predictable performance, it often protects databases from requests bursts and brownouts under overload conditions. Driving Storage Costs Down for AWS Customers. At werner.ly Syndication.

Cloud

Cloud Cache AWS Storage

Driving Bandwidth Cost Down for AWS Customers. - All Things.

All Things Distributed

JUNE 29, 2011

In Amazon Web Services there are similar dimensions that are forever important to our customers; scale, reliability, security, performance, ease of use, and of course pricing. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. Driving down the cost of Big-Data analytics.

AWS

AWS Retail Innovation Strategy

Software Testing Trends 2021 – What can we expect?

Testsigma

FEBRUARY 12, 2021

When more companies transition into digital-first projects, there must be an expanded number of processes and IT data departments to keep IT teams on track. The latest trend in hyperautomation – Gartner projects that by the year 2023, 25% of tasks by the organisations will be able to perform autonomously and hence reducing the cost.

Artificial Intelligence

Artificial Intelligence Software Software IoT

What is Greenplum Database? Intro to the Big Data Database

What is a Distributed Storage System

Trending Sources

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

In-Stream Big Data Processing

Kubernetes for Big Data Workloads

Conducting log analysis with an observability platform and full data context

Advancing Application Performance With NVMe Storage, Part 2

Advancing Application Performance with NVMe Storage, Part 3

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

How to Optimize Elasticsearch for Better Search Performance

Advancing Application Performance with NVMe Storage, Part 1

What Should You Know About Graph Database’s Scalability?

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Redis vs Memcached in 2024

What is cloud monitoring? How to improve your full-stack visibility

When Performance Matters, Think NVMe

Driving down the cost of Big-Data analytics - All Things Distributed

Delta: A Data Synchronization and Enrichment Platform

Mastering Hybrid Cloud Strategy

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

What is container orchestration?

Kubernetes in the wild report 2023

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Reduce RPO, Encrypt Backups, and More in 1.15.0 Release of Percona Operator for MongoDB

Optimizing data warehouse storage

The Need for Real-Time Device Tracking

NoSQL Data Modeling Techniques

Why MySQL Could Be Slow With Large Tables

Why test data management is more important than you think

Probabilistic Data Structures for Web Analytics and Data Mining

Top Benefits of Data-Driven Test Automation

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

A case for ELT

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

Track Thousands of Assets in a Time of Crisis Using Real-Time Digital Twins

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Structural Evolutions in Data

Fast key-value stores: an idea whose time has come and gone

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

Introducing the AWS South America - All Things Distributed

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

Driving Bandwidth Cost Down for AWS Customers. - All Things.

Software Testing Trends 2021 – What can we expect?

Stay Connected