Processing and Storage - Technology Performance Pulse

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

Cutting Big Data Costs: Effective Data Processing With Apache Spark

DZone

SEPTEMBER 14, 2023

In today's data-driven world, efficient data processing plays a pivotal role in the success of any project. Apache Spark , a robust open-source data processing framework, has emerged as a game-changer in this domain. Optimizing Data Input Make Use of Data Forma t In most cases, the data being processed is stored in a columnar format.

Big Data

Big Data Processing Games Open Source

Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake

DZone

JULY 13, 2023

In today's data-driven world, organizations need efficient and scalable data pipelines to process and analyze large volumes of data. Medallion Architecture provides a framework for organizing data processing workflows into different zones, enabling optimized batch and stream processing.

Azure

Azure Architecture Efficiency Processing

Dynatrace OpenPipeline: Stream processing data ingestion converges observability, security, and business data at massive scale for analytics and automation in context

Dynatrace

JANUARY 31, 2024

Organizations choose data-driven approaches to maximize the value of their data, achieve better business outcomes, and realize cost savings by improving their products, services, and processes. Data is then dynamically routed into pipelines for further processing.

Analytics

Analytics Processing Transportation Storage

Storage handling improvements increase retention of transaction data for Dynatrace Managed

Dynatrace

JULY 29, 2021

Using existing storage resources optimally is key to being able to capture the right data over time. Increased storage space availability. The compression of transaction data older than three days can free up to 50% more storage space in your Dynatrace Managed Cluster. Data compression is completed on June 12.

Storage

Storage Virtualization Infrastructure Availability

Storage Autoscaling With Percona Operator for MongoDB

Percona

FEBRUARY 10, 2023

Today along with their team, we will see how pvc-autoresizer can automate storage scaling for MongoDB clusters on Kubernetes. Our goal is to automate storage scaling when our disk reaches a certain threshold of use and simultaneously reduce the amount of alert noise related to that. kubectl annotate pvc --all resize.topolvm.io/storage_limit="100Gi"

Storage

Storage Blockchain AWS Cloud

MySQL General Tablespaces: A Powerful Storage Option for Your Data

Percona

JANUARY 4, 2024

Managing storage and performance efficiently in your MySQL database is crucial, and general tablespaces offer flexibility in achieving this. In contrast to the single system tablespace that holds system tables by default, general tablespaces are user-defined storage containers for multiple InnoDB tables.

Storage

Storage Engineering Database Hardware

Advancing Application Performance with NVMe Storage, Part 3

DZone

JUNE 4, 2019

NVMe Storage Use Cases. NVMe storage's strong performance, combined with the capacity and data availability benefits of shared NVMe storage over local SSD, makes it a strong solution for AI/ML infrastructures of any size. There are several AI/ML focused use cases to highlight.

Storage

Storage FinTech Artificial Intelligence Performance

Process more with less using smarter cluster overload prevention for Dynatrace Managed

Dynatrace

MAY 14, 2020

By vastly increasing the number of PurePaths that are processed by a Dynatrace Managed cluster, your initial sizing considerations for Dynatrace Managed nodes and clusters may however end up being inadequate for supporting such volume. A Dynatrace Managed cluster may lack the necessary hardware to process all the additional incoming data.

Processing

Processing Hardware Traffic Storage

Unlocking the Secrets of TOAST: How To Optimize Large Column Storage in PostgreSQL for Top Performance and Scalability

Percona

FEBRUARY 1, 2023

This post will look at using The Oversized-Attribute Storage Technique (TOAST) to improve performance and scalability. This process is done automatically and does not significantly impact how the database is used. text, bytea), and “strategy” is one of the four TOAST storage strategies (PLAIN, EXTENDED, EXTERNAL, MAIN).

Storage

Storage Scalability Strategy Performance

Building an elastic query engine on disaggregated storage

The Morning Paper

MARCH 8, 2020

Building an elastic query engine on disaggregated storage , Vuppalapati, NSDI’20. Snowflake is a data warehouse designed to overcome these limitations, and the fundamental mechanism by which it achieves this is the decoupling (disaggregation) of compute and storage. joins) during query processing. Disaggregation (or not).

Storage

Storage Engineering Cache Serverless

Beyond uptime: Unveiling the improved Dynatrace SLA

Dynatrace

APRIL 24, 2024

To transparently manage expectations and maintain trust with our customers, we expanded the Dynatrace SLA beyond accessing the user interface to cover the full range of relevant product categories, such as processing and retaining incoming data, accessing and working with data, and triggering automations.

Azure

Azure Infrastructure Metrics AWS

Transforming Business Outcomes Through Strategic NoSQL Database Selection

DZone

NOVEMBER 25, 2023

We often dwell on the technical aspects of database selection, focusing on performance metrics , storage capacity, and querying capabilities. But if your application primarily revolves around batch processing of large datasets, then focusing on write speed could mislead your selection process.

Database

Database Latency Speed Metrics

Push Zone Supports Image Processing

KeyCDN

FEBRUARY 13, 2020

Push Zones are now seamlessly supporting Image Processing ! The complete feature set of Image Processing is now also available for Push Zones. Our edge servers are directly linked to our global storage cluster, which ensures faster loading times of images. It only takes a few steps to upload and transform images.

Processing

Processing Storage Latency Servers

How To Deploy the ELK Stack on Kubernetes

DZone

OCTOBER 24, 2023

Logstash: a log-processing tool that collects logs from various sources, parses them, and sends them to Elasticsearch for storage and analysis. Kibana: A powerful visualization tool that allows you to explore and analyze the data stored in Elasticsearch using interactive charts, graphs, and dashboards.

Analytics

Analytics Storage Infrastructure Scalability

Narrowing the gap between serverless and its state with storage functions

The Morning Paper

JANUARY 28, 2020

Narrowing the gap between serverless and its state with storage functions , Zhang et al., Shredder is " a low-latency multi-tenant cloud store that allows small units of computation to be performed directly within storage nodes. " In from of them is a networking layer, and the in-memory storage layer holds the actual data.

Serverless

Serverless Storage Latency Hardware

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is a massively parallel processing (MPP) SQL database that is built and based on PostgreSQL. High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages. Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment.

Big Data

Big Data Database Artificial Intelligence Open Source

Tuning EMQX To Scale to One Million Concurrent Connection on Kubernetes

DZone

JUNE 6, 2023

When dealing with IoT, one of the first things that come to mind is the limited processing, networking, and storage capabilities these devices operate with. A messaging protocol is a set of rules and formats that are agreed upon among entities that want to communicate with each other.

IoT

IoT Tuning Storage Network

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

DZone

MARCH 29, 2023

Data migration is the process of moving data from one location to another, which is an essential aspect of cloud migration. Data migration involves transferring data from on-premise storage to the cloud. With the rapid adoption of cloud computing , businesses are moving their IT infrastructure to the cloud.

Best Practices

Best Practices Cloud Storage Data Engineering

Microsoft Azure Event Hubs

DZone

FEBRUARY 23, 2023

Introduction With big data streaming platform and event ingestion service Azure Event Hubs , millions of events can be received and processed in a single second. Any real-time analytics provider or batching/storage adaptor can transform and store data supplied to an event hub.

Azure

Azure Big Data Analytics Storage

Designing Instagram

High Scalability

JANUARY 11, 2022

There are two major processes which gets executed when a user posts a photo on Instagram. Firstly, the synchronous process which is responsible for uploading image content on file storage, persisting the media metadata in graph data-storage, returning the confirmation message to the user and triggering the process to update the user activity.

Design

Design Media Storage Logistics

Platform engineering: Empowering key Kubernetes use cases with Dynatrace

Dynatrace

OCTOBER 30, 2023

Unified observability and security in the development process are crucial to defining ownership in the development process. Dynatrace AutomationEngine fully automates this entire process. AutomationEngine links these processes to your team’s progressive delivery pipelines. This context arrives automatically.

Engineering

Engineering DevOps Innovation Storage

The Power of Caching: Boosting API Performance and Scalability

DZone

AUGUST 16, 2023

Caching is the process of storing frequently accessed data or resources in a temporary storage location, such as memory or disk, to improve retrieval speed and reduce the need for repetitive processing.

Cache

Cache Scalability Performance Latency

Building an Optimized Data Pipeline on Azure Using Spark, Data Factory, Databricks, and Synapse Analytics

DZone

APRIL 11, 2023

Data processing in the cloud has become increasingly popular due to its scalability, flexibility, and cost-effectiveness. This article will explore how these technologies can be used together to create an optimized data pipeline for data processing in the cloud.

Azure

Azure Analytics Storage Cloud

MySQL Backups: Methods & Best Practices

Scalegrid

FEBRUARY 1, 2024

Having MySQL backups for your database can speed up and simplify the recovery process. The biggest drawbacks are that a full backup can be time-consuming, and they require a significant amount of storage space. This backup type can save on storage space and reduce backup time by only capturing the changes made since the last backup.

Best Practices

Best Practices Storage Strategy Database

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

Data warehouses offer a single storage repository for structured data and provide a source of truth for organizations. However, organizations must structure and store data inputs in a specific format to enable extract, transform, and load processes, and efficiently query this data. Massively parallel processing. Query language.

Artificial Intelligence

Artificial Intelligence Analytics Storage Government

The history of Grail: Why you need a data lakehouse

Dynatrace

OCTOBER 4, 2022

This architecture offers rich data management and analytics features (taken from the data warehouse model) on top of low-cost cloud storage systems (which are used by data lakes). This decoupling ensures the openness of data and storage formats, while also preserving data in context. Ingest and process with Grail. Retain data.

Artificial Intelligence

Artificial Intelligence Analytics Storage Architecture

Pioneering customer-centric pricing models: Decoding ingest-centric vs. answer-centric pricing

Dynatrace

OCTOBER 17, 2023

As a result, IT organizations are overwhelmed as they strive to balance cost control processes with ensuring that their respective organizations have access to all the data required for their various use cases. All data is readily accessible without storage tiers, such as costly solid-state drives (SSDs). Ingest and process.

Retail

Retail Storage Best Practices Architecture

Apache Kafka + Apache Flink = Match Made in Heaven

DZone

MAY 5, 2023

Apache Kafka and Apache Flink are increasingly joining forces to build innovative real-time stream processing applications. The core of Kafka is messaging at any scale in combination with a distributed storage (= commit log) for reliable durability, decoupling of applications, and replayability of historical data.

Open Source

Open Source Storage Innovation Engineering

Low Overhead Continuous Contextual Production Profiling

DZone

JUNE 15, 2023

It is worth noting that this data collection process does not impact the performance of the application. Moreover, the process of collecting these profiles introduces overhead during application runtime and necessitates the storage and visualization of significantly large datasets.

Latency

Latency Storage Strategy Metrics

Key Advantages of DBMS for Efficient Data Management

Scalegrid

JANUARY 5, 2024

Despite initial investment costs, DBMS presents long-term savings and improved efficiency through automated processes, efficient query optimizations, and scalability, contributing to enhanced decision-making and end-user productivity. It provides tools for organizing and retrieving data efficiently.

Efficiency

Efficiency Storage Database Scalability

AWS serverless services: Exploring your options

Dynatrace

OCTOBER 7, 2021

This means you no longer have to provision, scale, and maintain servers to run your applications, databases, and storage systems. Speed is next; serverless solutions are quick to spin up or down as needed, and there are no delays due to limited storage or resource access. AWS offers four serverless offerings for storage.

Serverless

Serverless AWS Lambda Storage

Privacy spotlight: Retain data in Grail with 1-day precision, for up to 10 years

Dynatrace

DECEMBER 5, 2023

Streamline privacy requirements with flexible retention periods Data retention is a critical aspect of data handling, and it’s not just about privacy compliance—it’s about having the flexibility to optimize data storage times in Grail for your Dynatrace use cases. Other data types will be available soon). What’s next?

Storage

Storage Healthcare Best Practices Speed

Metadata Synchronization in Alluxio: Design, Implementation, and Optimization

DZone

DECEMBER 14, 2021

Metadata synchronization (sync) is a core feature in Alluxio that keeps files and directories consistent with their source of truth in under-storage systems, thus making it simple for users to reason the data retrieved from Alluxio. Meanwhile, understanding the internal process is important in order to tune the performance.

Design

Design Storage Tuning Systems

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

Uber Engineering

MARCH 12, 2017

With the evolution of storage formats like Apache Parquet and Apache ORC and query engines like Presto and Apache Impala , the Hadoop ecosystem has the potential to become a general-purpose, unified serving layer for workloads that can tolerate latencies … The post Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop appeared (..)

Processing

Processing Latency Storage Engineering

Optimizing the Storage of Large Volumes of Metrics for a Long Time in VictoriaMetrics

Percona Community

JUNE 1, 2022

An important role is played by the time of their storage. Often, in order to understand certain processes and predict their development in the future, we need to analyze metrics over a fairly long period of time. Introduction Nowadays, the main tools for monitoring the operation of any application are metrics and logs.

Storage

Storage Metrics Monitoring Processing

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

This article will explore how they handle data storage and scalability, perform in different scenarios, and, most importantly, how these factors influence your choice. It uses a hash table to manage these pairs, divided into fixed-size buckets with linked lists for key-value storage. step in to simplify this task.

Cache

Cache Storage Scalability Architecture

Optimize your environment: Unveiling Dynatrace Hyper-V extension for enhanced performance and efficient troubleshooting

Dynatrace

OCTOBER 23, 2023

Secondly, determining the correct allocation of resources (CPU, memory, storage) to each virtual machine to ensure optimal performance without over-provisioning can be difficult. Firstly, managing virtual networks can be complex as networking in a virtual environment differs significantly from traditional networking.

Efficiency

Efficiency Virtualization Hardware Performance

Measuring the importance of data quality to causal AI success

Dynatrace

JANUARY 4, 2024

Improving data quality is a strategic process that involves all organizational members who create and use data. It starts with implementing data governance practices, which set standards and policies for data use and management in areas such as quality, security, compliance, storage, stewardship, and integration.

Government

Government Analytics Benchmarking Storage

The state of observability in 2024: Accelerating transformation with AI, analytics, and automation

Dynatrace

MARCH 6, 2024

Manual monitoring processes are also too time-consuming, which distracts teams from tasks that create new value for customers and the business. The report shows that log analytics are a particular challenge, as the long-term storage cost of all this data has begun to overshadow the value organizations can unlock from querying it.

Analytics

Analytics Innovation Strategy Storage

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

DZone

JULY 3, 2023

Data engineering projects often require the setup and management of complex infrastructures that support data processing, storage, and analysis. Traditionally, this process involved manual configuration, leading to potential inconsistencies, human errors, and time-consuming deployments.

Data Engineering

Data Engineering Infrastructure Engineering Code

What is log analytics? How a modern observability approach provides critical business insight

Dynatrace

JULY 29, 2022

Log analytics is the process of viewing, interpreting, and querying log data so developers and IT teams can quickly detect and resolve application and system issues. Cold storage and rehydration. Cold storage and rehydration. Data that organizations may need to access only once a quarter or year can reside in cold storage.

Analytics

Analytics Storage Retail DevOps

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-stream processing is the immediate need in many practical applications. Fault-tolerance. Interoperability with Hadoop.

Big Data

Big Data Processing Lambda Database

Master MySQL Point in Time Recovery

Scalegrid

MARCH 8, 2024

Executing PITR requires restoring from the full backup and then applying binary log events in sequence up to the desired point in time, with advanced techniques and third-party tools available to optimize large dataset handling and automate the recovery process. Each caters to specific needs.

Database

Database Strategy Servers Best Practices

What is a Distributed Storage System

Cutting Big Data Costs: Effective Data Processing With Apache Spark

Trending Sources

Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake

Dynatrace OpenPipeline: Stream processing data ingestion converges observability, security, and business data at massive scale for analytics and automation in context

Storage handling improvements increase retention of transaction data for Dynatrace Managed

Storage Autoscaling With Percona Operator for MongoDB

MySQL General Tablespaces: A Powerful Storage Option for Your Data

Advancing Application Performance with NVMe Storage, Part 3

Process more with less using smarter cluster overload prevention for Dynatrace Managed

Unlocking the Secrets of TOAST: How To Optimize Large Column Storage in PostgreSQL for Top Performance and Scalability

Building an elastic query engine on disaggregated storage

Beyond uptime: Unveiling the improved Dynatrace SLA

Transforming Business Outcomes Through Strategic NoSQL Database Selection

Push Zone Supports Image Processing

How To Deploy the ELK Stack on Kubernetes

Narrowing the gap between serverless and its state with storage functions

What is Greenplum Database? Intro to the Big Data Database

Tuning EMQX To Scale to One Million Concurrent Connection on Kubernetes

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

Microsoft Azure Event Hubs

Designing Instagram

Platform engineering: Empowering key Kubernetes use cases with Dynatrace

The Power of Caching: Boosting API Performance and Scalability

Building an Optimized Data Pipeline on Azure Using Spark, Data Factory, Databricks, and Synapse Analytics

MySQL Backups: Methods & Best Practices

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

The history of Grail: Why you need a data lakehouse

Pioneering customer-centric pricing models: Decoding ingest-centric vs. answer-centric pricing

Apache Kafka + Apache Flink = Match Made in Heaven

Low Overhead Continuous Contextual Production Profiling

Key Advantages of DBMS for Efficient Data Management

AWS serverless services: Exploring your options

Privacy spotlight: Retain data in Grail with 1-day precision, for up to 10 years

Metadata Synchronization in Alluxio: Design, Implementation, and Optimization

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

Optimizing the Storage of Large Volumes of Metrics for a Long Time in VictoriaMetrics

Redis vs Memcached in 2024

Optimize your environment: Unveiling Dynatrace Hyper-V extension for enhanced performance and efficient troubleshooting

Measuring the importance of data quality to causal AI success

The state of observability in 2024: Accelerating transformation with AI, analytics, and automation

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

What is log analytics? How a modern observability approach provides critical business insight

In-Stream Big Data Processing

Master MySQL Point in Time Recovery

Stay Connected