Data - Technology Performance Pulse

Essential Guidelines for Building Optimized ETL Data Pipelines in the Cloud With Azure Data Factory

DZone

AUGUST 14, 2024

When building ETL data pipelines using Azure Data Factory (ADF) to process huge amounts of data from different sources, you may often run into performance and design-related challenges. This article will serve as a guide in building high-performance ETL pipelines that are both efficient and scalable.

Azure

Azure Cloud Scalability Efficiency

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

DZone

SEPTEMBER 9, 2024

Efficient data processing is crucial for businesses and organizations that rely on big data analytics to make informed decisions. One key factor that significantly affects the performance of data processing is the storage format of the data.

Big Data

Big Data Storage Analytics Benchmarking

Ensuring Data Integrity Through Anomaly Detection: Essential Tools for Data Engineers

DZone

JULY 31, 2024

However, amidst this rapid evolution, ensuring a robust data universe characterized by high quality and integrity is indispensable. While much emphasis is often placed on refining AI models, the significance of pristine datasets can sometimes be overshadowed.

Data Engineering

Data Engineering FinTech Engineering Analytics

Data Observability: Better Insights Through Reliable Data Practices

DZone

OCTOBER 3, 2023

This is an article from DZone's 2023 Data Pipelines Trend Report. For more: Read the Report Organizations today rely on data to make decisions, innovate, and stay competitive. That data must be reliable and trustworthy to be useful.

Innovation

Our First Netflix Data Engineering Summit

The Netflix TechBlog

DECEMBER 14, 2023

Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines. The result was a series of talks which we are now sharing with the rest of the Data Engineering community!

Data Engineering

Data Engineering Engineering Software Engineering Best Practices

Cutting Big Data Costs: Effective Data Processing With Apache Spark

DZone

SEPTEMBER 14, 2023

In today's data-driven world, efficient data processing plays a pivotal role in the success of any project. Apache Spark , a robust open-source data processing framework, has emerged as a game-changer in this domain.

Big Data

Big Data Processing Games Open Source

Dynatrace completed Data Privacy Framework self-certification

Dynatrace

APRIL 22, 2024

To enable participating organizations to meet the EU requirements for transferring personal data to the U.S., the Data Privacy Framework (DPF) is designed to serve as an adequate data transfer mechanism under the GDPR. Data Privacy Framework Program (The EU-U.S. Benefits of Data Privacy Framework for Dynatrace customers.

Government

Government Programming Analytics Efficiency

Financial Data Engineering in SAS

DZone

JANUARY 8, 2024

Financial data engineering in SAS involves the management, processing, and analysis of financial data using the various tools and techniques provided by the SAS software suite. Here are some key aspects of financial data engineering in SAS: 1.

Data Engineering

Data Engineering Engineering Database Software

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Dynatrace

OCTOBER 3, 2024

Considering the latest State of Observability 2024 report, it’s evident that multicloud environments not only come with an explosion of data beyond humans’ ability to manage it. It’s increasingly difficult to ingest, manage, store, and sort through this amount of data. You can find the list of use cases here.

Performance

Performance Architecture Innovation Latency

Advanced Strategies for Building Modern Data Pipelines

DZone

SEPTEMBER 26, 2024

In today's data-driven world, organizations increasingly rely on sophisticated data pipelines to manage vast volumes of data generated daily. Let’s dive into the key steps to building out your data pipelines.

Strategy

Strategy Scalability Efficiency Technology

Phased Approach to Data Warehouse Modernization

DZone

JULY 4, 2024

Based on the scale of your existing data warehouse processes or jobs, it can be an enormous task to modernize. A modernized database will help you focus on building innovative solutions rather than investing your time and effort in managing these legacy systems.

Innovation

Innovation Database Processing Systems

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. We expect complete and accurate data at the end of each run.

Data Engineering

Data Engineering Engineering Processing Games

Navigating the Divide: Distinctions Between Time Series Data and Relational Data

DZone

MAY 26, 2023

Therefore, I found it important to write a piece based on my understanding of time series data versus relational data as someone with a unique understanding of both. I have coded many applications, both client and web, over my career, and I understand the importance of building a well-developed application from the ground up.

Database

Database Code Development

Master the Art of Querying Data on Amazon S3

DZone

JUNE 3, 2024

In an era where data is the new oil, effectively utilizing data is crucial for the growth of every organization. It is not enough to store these data durably, but also to effectively query and analyze them. Without a querying capability, the data stored in S3 would not be of any benefit.

Big Data

Big Data AWS Storage Analytics

Privacy Spotlight: Easily comply with data subject rights in Dynatrace

Dynatrace

MAY 2, 2024

Across the globe, privacy laws grant individuals data subject rights, such as the right to access and delete personal data processed about them. Successful compliance with privacy rights requests involves tracking and verifying requests across the entire data ecosystem, including third-party services.

Tuning

Tuning Scalability Efficiency Processing

Mastering PUE for Unmatched Data Center Performance

DZone

AUGUST 20, 2024

Data centers use a lot of electricity. Learning to measure data center power usage effectiveness (PUE) is a crucial part of that goal. What Is Data Center Power Usage Effectiveness? As its name implies, PUE measures how efficiently a data center uses the energy it consumes. The more even the ratio, the better.

Energy

Energy Performance Efficiency

Best Practices for Building the Data Pipelines

DZone

DECEMBER 16, 2023

In my previous article ‘ Data Validation to Improve Data Quality ’, I shared the importance of data quality and a checklist of validation rules to achieve it. Those validation rules alone may not guarantee the best data quality. Most pipelines are automated and run on a fixed schedule.

Best Practices

Viking Enterprise Solutions: Empowering Modern Data Infrastructure

DZone

JULY 10, 2024

In today's rapidly evolving technological landscape, developers, engineers, and architects face unprecedented challenges in managing, processing, and deriving value from vast amounts of data.

Infrastructure

Infrastructure Hardware Innovation Efficiency

Enhancing Performance With Data Modeling: Techniques and Best Practices for Optimization in Snowflake

DZone

OCTOBER 8, 2024

Snowflake is a powerful cloud-based data warehousing platform known for its scalability and flexibility. To fully leverage its capabilities and improve efficient data processing, it's crucial to optimize query performance.

Best Practices

Best Practices Performance Architecture Scalability

Efficient Data Management With Offset and Cursor-Based Pagination in Modern Applications

DZone

JUNE 19, 2024

Pagination is a core technique used to manage data effectively. Leveraging Jakarta Data , this exploration integrates these pagination techniques into a REST API developed with Quarkus and MongoDB. Retrieval strategies play a crucial role in improving performance and scalability, especially when response times are critical.

Efficiency

Efficiency Strategy Scalability Technology

Data Integration in Real-Time Systems

DZone

NOVEMBER 7, 2023

In the rapidly evolving digital landscape, the role of data has shifted from being merely a byproduct of business to becoming its lifeblood. With businesses constantly in the race to stay ahead, the process of integrating this data becomes crucial. However, it's no longer enough to assimilate data in isolated, batch-oriented processes.

Systems

Systems Analytics Architecture Engineering

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

The jobs executing such workloads are usually required to operate indefinitely on unbounded streams of continuous data and exhibit heterogeneous modes of failure as they run over long periods. Summary Ensuring fault tolerance in data-intensive, event-driven applications is crucial for successful industry deployments.

Engineering

Engineering Tuning Latency Open Source

2. Diving Deeper into Psyberg: Stateless vs Stateful Data Processing

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty In the inaugural blog post of this series, we introduced you to the state of our pipelines before Psyberg and the challenges with incremental processing that led us to create the Psyberg framework within Netflix’s Membership and Finance data engineering team.

Processing

Processing Data Engineering Efficiency Analytics

Batch Processing for Data Integration

DZone

NOVEMBER 7, 2023

In the labyrinth of data-driven architectures, the challenge of data integration—fusing data from disparate sources into a coherent, usable form — stands as one of the cornerstones. As businesses amass data at an unprecedented pace, the question of how to integrate this data effectively comes to the fore.

Processing

Processing Architecture Technology Technology

Measuring the importance of data quality to causal AI success

Dynatrace

JANUARY 4, 2024

While this approach can be effective if the model is trained with a large amount of data, even in the best-case scenarios, it amounts to an informed guess, rather than a certainty. But to be successful, data quality is critical. Teams need to ensure the data is accurate and correctly represents real-world scenarios. Consistency.

Government

Government Analytics Benchmarking Storage

Effective Log Data Analysis With Amazon CloudWatch: Harnessing Machine Learning

DZone

FEBRUARY 5, 2024

In today's cloud computing world, all types of logging data are extremely valuable. Logs can include a wide variety of data, including system events, transaction data, user activities, web browser logs, errors, and performance metrics. This innovative service is transforming the way organizations handle their log data.

Analytics

Analytics Innovation Strategy Efficiency

Best Practices for Picking PostgreSQL Data Types

DZone

OCTOBER 16, 2023

When creating applications that store and analyze large amounts of data, such as time series, log data, or event-storing ones, developing a good and future-proof data model can be a difficult task. Choosing the right data types in PostgreSQL can significantly impact your database's performance and efficiency.

Best Practices

Best Practices Speed Database Efficiency

Data Reprocessing Pipeline in Asset Management Platform @Netflix

The Netflix TechBlog

MARCH 10, 2023

This platform has evolved from supporting studio applications to data science applications, machine-learning applications to discover the assets metadata, and build various data facts. Hence we built the data pipeline that can be used to extract the existing assets metadata and process it specifically to each new use case.

Media

Media Traffic Processing Design

Dynatrace Opportunity Insights uses AI prediction and real-user data to optimize business outcomes

Dynatrace

AUGUST 12, 2024

As a result, it’s challenging to get business and resources focused on performance and error optimization without supporting data that shows how those optimizations will impact your organization’s financial outcomes. The addition of more and more metrics over time has only made this increasingly complex.

Airlines

Airlines Metrics Speed Internet

Building an Optimized Data Pipeline on Azure Using Spark, Data Factory, Databricks, and Synapse Analytics

DZone

APRIL 11, 2023

Data processing in the cloud has become increasingly popular due to its scalability, flexibility, and cost-effectiveness. This article will explore how these technologies can be used together to create an optimized data pipeline for data processing in the cloud.

Azure

Azure Analytics Storage Cloud

Salesforce Bulk API 2.0: Streamlining Large-Scale Data Operations

DZone

JULY 11, 2024

Have you ever faced the challenge of managing large data operations within Salesforce , such as updating, inserting, deleting, or querying records? These operations might arise from one-time data migration projects or ongoing data integration needs with external systems. In such scenarios, Salesforce Bulk API 2.0

Efficiency

Efficiency Design Processing Systems

Stream logs to Dynatrace with Amazon Data Firehose to boost your cloud-native journey

Dynatrace

MAY 3, 2024

Log data—the most verbose form of observability data, complementing other standardized signals like metrics and traces—is especially critical. As cloud complexity grows, it brings more volume, velocity, and variety of log data. When trying to address this challenge, your cloud architects will likely choose Amazon Data Firehose.

Cloud

Cloud Lambda AWS Analytics

Edge Data Platforms, Real-Time Services, and Modern Data Trends

DZone

AUGUST 18, 2023

We all know that data is being generated at an unprecedented rate. You may also know that this has led to an increase in the demand for efficient and secure data storage solutions that won’t break the bank. This article will explore what edge data platforms and real-time services are, why they are important, and how they can be used.

IoT

IoT Media Latency Storage

Breaking Down Data Silos With a Unified Data Warehouse: An Apache Doris-Based CDP

DZone

MARCH 18, 2024

The data silos problem is like arthritis for online businesses because almost everyone gets it as they grow old. For one reason or another, it is tricky to integrate the data from all these sources. Data stays where it is and cannot be interrelated for further analysis. That's how data silos come to form.

Mobile

Mobile Website

Bringing Software Engineering Rigor to Data

DZone

FEBRUARY 20, 2023

The data community is striving to incorporate the core concepts of engineering rigor found in software communities but still has further to go. This talk covers ways to leverage software engineering practices for data engineering and demonstrates how measuring key performance metrics could help build more robust and reliable data pipelines.

Software Engineering

Software Engineering Engineering Software Software

5 truths about zero trust, data, and the federal government

Dynatrace

OCTOBER 24, 2023

The session focused on how agencies are protecting data within the context of zero trust. Reflecting on their discussion, I identified five emerging “truths” about zero trust and data. Identity is not necessarily as important as data because at the end of the day, we steal identities to get to the data,” said Williamson. “I

Government

Government Artificial Intelligence Best Practices Infrastructure

How AI and Data Science in 2024 Will Shape Tomorrow's World

DZone

DECEMBER 29, 2023

In the ever-evolving landscape of technology, the tandem growth of Artificial Intelligence (AI) and Data Science has emerged as a beacon of hope, promising unparalleled advancements that will significantly impact and enhance various aspects of our lives.

Artificial Intelligence

Artificial Intelligence Healthcare Technology Technology

A Hands-On Guide to OpenTelemetry: Exploring Telemetry Data With Jaeger

DZone

AUGUST 23, 2024

Are you ready to start your journey on the road to collecting telemetry data from your applications? Great observability begins with great instrumentation! In this series, you'll explore how to adopt OpenTelemetry (OTel) and how to instrument an application to collect tracing telemetry.

Data Observability: Reliability in the AI Era

DZone

DECEMBER 2, 2023

When we introduced the concept of data observability four years ago, it resonated with organizations that had unlocked new value…and new problems thanks to the modern data stack. Now, four years later, we are seeing organizations grapple with the tremendous potential…and tremendous challenges posed by generative AI.

Introduction to Azure Data Lake Storage Gen2

DZone

FEBRUARY 1, 2023

Built on Azure Blob Storage, Azure Data Lake Storage Gen2 is a suite of features for big data analytics. Azure Data Lake Storage Gen1 and Azure Blob Storage's capabilities are combined in Data Lake Storage Gen2. For instance, Data Lake Storage Gen2 offers scale, file-level security, and file system semantics.

Azure

Azure Storage Big Data Analytics

Boost DevOps maturity with observability and a data lakehouse

Dynatrace

JUNE 9, 2023

ln a world driven by macroeconomic uncertainty, businesses increasingly turn to data-driven decision-making to stay agile. They’re unleashing the power of cloud-based analytics on large data sets to unlock the insights they and the business need to make smarter decisions. All of these factors challenge DevOps maturity.

DevOps

DevOps Analytics Storage Metrics

Managing Data Residency: Concepts and Theory

DZone

MAY 12, 2023

I believe that chief among them is d ata residency or data location: Data localization or data residency law requires data about a nation's citizens or residents to be collected, processed, and/or stored inside the country, often before being transferred internationally.

Cloud

Cloud Processing Systems

Keeping data in India with AI-powered observability operated on AWS Mumbai

Dynatrace

NOVEMBER 6, 2023

Recently, the Parliament of India released the Digital Personal Data Protection Act 2023 , which regulates the processing of digital personal data in India and recognizes the right of individuals to protect their data in India. Dynatrace is constantly evaluating the further expansion of its regional presence of Dynatrace SaaS.

AWS

AWS Azure Tuning Analytics

Essential Guidelines for Building Optimized ETL Data Pipelines in the Cloud With Azure Data Factory

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

Trending Sources

Ensuring Data Integrity Through Anomaly Detection: Essential Tools for Data Engineers

Data Observability: Better Insights Through Reliable Data Practices

Our First Netflix Data Engineering Summit

Cutting Big Data Costs: Effective Data Processing With Apache Spark

Dynatrace completed Data Privacy Framework self-certification

Financial Data Engineering in SAS

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Advanced Strategies for Building Modern Data Pipelines

Phased Approach to Data Warehouse Modernization

A Recap of the Data Engineering Open Forum at Netflix

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Navigating the Divide: Distinctions Between Time Series Data and Relational Data

Master the Art of Querying Data on Amazon S3

Privacy Spotlight: Easily comply with data subject rights in Dynatrace

Mastering PUE for Unmatched Data Center Performance

Best Practices for Building the Data Pipelines

Viking Enterprise Solutions: Empowering Modern Data Infrastructure

Enhancing Performance With Data Modeling: Techniques and Best Practices for Optimization in Snowflake

Efficient Data Management With Offset and Cursor-Based Pagination in Modern Applications

Data Integration in Real-Time Systems

Why applying chaos engineering to data-intensive applications matters

2. Diving Deeper into Psyberg: Stateless vs Stateful Data Processing

Batch Processing for Data Integration

Measuring the importance of data quality to causal AI success

Effective Log Data Analysis With Amazon CloudWatch: Harnessing Machine Learning

Best Practices for Picking PostgreSQL Data Types

Data Reprocessing Pipeline in Asset Management Platform @Netflix

Dynatrace Opportunity Insights uses AI prediction and real-user data to optimize business outcomes

Building an Optimized Data Pipeline on Azure Using Spark, Data Factory, Databricks, and Synapse Analytics

Salesforce Bulk API 2.0: Streamlining Large-Scale Data Operations

Stream logs to Dynatrace with Amazon Data Firehose to boost your cloud-native journey

Edge Data Platforms, Real-Time Services, and Modern Data Trends

Breaking Down Data Silos With a Unified Data Warehouse: An Apache Doris-Based CDP

Bringing Software Engineering Rigor to Data

5 truths about zero trust, data, and the federal government

How AI and Data Science in 2024 Will Shape Tomorrow's World

A Hands-On Guide to OpenTelemetry: Exploring Telemetry Data With Jaeger

Data Observability: Reliability in the AI Era

Introduction to Azure Data Lake Storage Gen2

Boost DevOps maturity with observability and a data lakehouse

Managing Data Residency: Concepts and Theory

Keeping data in India with AI-powered observability operated on AWS Mumbai

Stay Connected