Big Data - Technology Performance Pulse

3 Performance Tricks for Dealing With Big Data Sets

DZone

AUGUST 21, 2021

This article describes 3 different tricks that I used in dealing with big data sets (order of 10 million records) and that proved to enhance performance dramatically. Trick 1: CLOB Instead of Result Set.

Big Data

Big Data Performance Tuning Mobile

Write Optimized Spark Code for Big Data Applications

DZone

MARCH 7, 2023

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. PySpark is the Python API for Apache Spark , which allows Python developers to write Spark applications using Python instead of Scala or Java.

Big Data

Big Data Code Tuning Open Source

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results. Greenplum features a cost-based query optimizer for large-scale, big data workloads. Query Optimization.

Big Data

Big Data Database Artificial Intelligence Open Source

Cutting Big Data Costs: Effective Data Processing With Apache Spark

DZone

SEPTEMBER 14, 2023

In today's data-driven world, efficient data processing plays a pivotal role in the success of any project. Apache Spark , a robust open-source data processing framework, has emerged as a game-changer in this domain.

Big Data

Big Data Processing Games Open Source

DataCentral: Uber’s Big Data Observability and Chargeback Platform

Uber Engineering

MARCH 21, 2024

Discover real-time query analytics and governance with DataCentral: Uber’s big data observability powerhouse, tackling millions of queries in petabyte-scale environments.

Big Data

Big Data Government Analytics

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Scalegrid

NOVEMBER 25, 2019

Google Cloud does offer their own wide column store and big data database called Bigtable which is actually ranked #111, one under ScyllaDB at #110 on DB-Engines. Google Cloud Platform (GCP) was the second most popular cloud provider for ScyllaDB, coming in at 30.4% of all cloud deployments.

Big Data

Big Data Database Open Source Azure

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. Other flows are more sophisticated: one Storm topology can pass the data to another topology via Kafka or Cassandra. Towards Unified Big Data Processing. Apache Spark [10].

Big Data

Big Data Processing Lambda Database

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

OCTOBER 17, 2018

To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data

Big Data Latency Transportation Traffic

How Amazon is solving big-data challenges with data lakes

All Things Distributed

JANUARY 20, 2020

Back when Jeff Bezos filled orders in his garage and drove packages to the post office himself, crunching the numbers on costs, tracking inventory, and forecasting future demand was relatively simple.

Big Data

Big Data Logistics Retail Government

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

Databook: Turning Big Data into Knowledge with Metadata at Uber

Uber Engineering

AUGUST 3, 2018

From driver and rider locations and destinations, to restaurant orders and payment transactions, every interaction on Uber’s transportation platform is driven by data.

Big Data

Big Data Transportation Engineering Storage

An overview of end-to-end entity resolution for big data

The Morning Paper

DECEMBER 13, 2020

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. ACM Computing Surveys, Dec. 2020, Article No.

Big Data

Big Data Open Source Processing Analytics

Performance Monitoring Dashboards in the Age of Big Data Pollution

Rigor

MAY 22, 2019

Big data is like the pollution of the information age. The Big Data Struggle and Performance Reporting. As the big data era brings in multiple options for visualization, it has become apparent that not all solutions are created equal. Conclusion.

Big Data

Big Data Monitoring Performance Metrics

Experiences with approximating queries in Microsoft’s production big-data clusters

The Morning Paper

SEPTEMBER 8, 2019

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of users to run some pretty complex queries. VLDB’19. For the larger more production-like query analysed in §4.2.1,

Big Data

Big Data Analytics Latency Azure

Master the Art of Querying Data on Amazon S3

DZone

JUNE 3, 2024

This is especially the case when it comes to taking advantage of vast amounts of data stored in cloud platforms like Amazon S3 - Simple Storage Service, which has become a central repository of data types ranging from the content of web applications to big data analytics.

Big Data

Big Data AWS Analytics Storage

Snowflake Workload Optimization

DZone

AUGUST 23, 2023

In the era of big data, efficient data management and query performance are critical for organizations that want to get the best operational performance from their data investments.

Big Data

Big Data Analytics Innovation Efficiency

Scaling for Success: Why Scalability Is the Forefront of Modern Applications

DZone

JUNE 13, 2023

The reason is straightforward, today, applications generate enormous amounts of data. As we embrace new technologies like cloud computing, big data analysis, and the Internet of Things (IoT), there is a noticeable spike in the amount of data generated from different applications.

Scalability

Scalability IoT Big Data Internet

Microsoft Azure Event Hubs

DZone

FEBRUARY 23, 2023

Introduction With big data streaming platform and event ingestion service Azure Event Hubs , millions of events can be received and processed in a single second. Any real-time analytics provider or batching/storage adaptor can transform and store data supplied to an event hub.

Azure

Azure Big Data Analytics Storage

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. Driving down the cost of Big-Data analytics. Comments ().

Big Data

Big Data Analytics AWS Cloud

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information. Here are the six steps of a typical ITOA process : Define the data infrastructure strategy. Why use a data lakehouse for causal AI? Why is ITOA important? Apache Spark.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Apache Doris for Log and Time Series Data Analysis

DZone

MAY 25, 2024

As NetEase expands its business horizons, the logs and time series data it receives explode, and problems like surging storage costs and declining stability come. As NetEase's pick among all big data components for platform upgrades, Apache Doris fits into both scenarios and brings much faster query performance.

Best Practices

Best Practices Big Data Games Analytics

Introduction to Azure Data Lake Storage Gen2

DZone

FEBRUARY 1, 2023

Built on Azure Blob Storage, Azure Data Lake Storage Gen2 is a suite of features for big data analytics. Azure Data Lake Storage Gen1 and Azure Blob Storage's capabilities are combined in Data Lake Storage Gen2. For instance, Data Lake Storage Gen2 offers scale, file-level security, and file system semantics.

Azure

Azure Storage Big Data Analytics

Introduction to Grafana, Prometheus, and Zabbix

DZone

FEBRUARY 6, 2024

If the data sources are not available then customized plugins can be developed to integrate these data sources. Grafana is used widely these days to monitor and visualize the metrics for 100s or 1000s of servers, Kubernetes Platforms, Virtual Machines, Big Data Platforms, etc.

Big Data

Big Data Open Source Virtualization Metrics

Turbocharge Your Apache Spark Jobs for Unmatched Performance

DZone

JULY 17, 2023

Apache Spark is a leading platform in the field of big data processing, known for its speed, versatility, and ease of use. Understanding Apache Spark Apache Spark is a unified computing engine designed for large-scale data processing. However, getting the most out of Spark often involves fine-tuning and optimization.

Big Data

Big Data Performance Open Source Tuning

Understanding gRPC Concepts, Use Cases, and Best Practices

DZone

JANUARY 19, 2023

Because with the advent of cloud providers, we are less worried about managing data centers. This leads to an increase in the size of data as well. Big data is generated and transported using various mediums in single requests. Everything is available within seconds on-demand.

Best Practices

Best Practices Transportation Big Data Latency

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Netflix’s unique work culture and petabyte-scale data problems are what drew me to Netflix. During earlier years of my career, I primarily worked as a backend software engineer, designing and building the backend systems that enable big data analytics.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

I was later hired into my first purely data gig where I was able to deepen my knowledge of big data. After that, I joined MySpace back at its peak as a data engineer and got my first taste of data warehousing at internet-scale. In the data engineering space, very little of the same technology remains.

Data Engineering

Data Engineering Engineering Entertainment Big Data

Stream Processing vs. Batch Processing: What to Know

DZone

JANUARY 31, 2023

Big data is at the center of all business decisions these days. It refers to large volumes of data generated through different sources, and this data then provides the foundation for business decisions. There are different ways through which we can process data.

Processing

Processing Big Data Systems

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

This blog will explore these two systems and how they perform auto-diagnosis and remediation across our Big Data Platform and Real-time infrastructure. The streaming platform recently added Data Mesh , and we need to expand Streaming Pensive to cover that.

Big Data

Big Data Infrastructure Metrics Games

What is software automation? Optimize the software lifecycle with intelligent automation

Dynatrace

JUNE 26, 2023

Software analytics offers the ability to gain and share insights from data emitted by software systems and related operational processes to develop higher-quality software faster while operating it efficiently and securely. This involves big data analytics and applying advanced AI and machine learning techniques, such as causal AI.

Software

Software Software Analytics Big Data

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

The need to optimize offline data pipeline optimization has become a necessity with the growing complexity and scale of modern data pipelines. In this kickoff post, we delve into the intricacies of Apache Airflow and AWS EMR, a managed cluster platform for big data processing.

Best Practices

Best Practices Data Engineering Big Data Games

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

MAY 14, 2019

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., ASPLOS’19.

Big Data

Big Data Cloud Performance Hardware

Moving HPC to the Cloud: A Guide for 2020

High Scalability

SEPTEMBER 14, 2020

This is a guest post by Limor Maayan-Wainstein , a senior technical writer with 10 years of experience writing about cybersecurity, big data, cloud computing, web development, and more. High performance computing (HPC) enables you to solve complex problems which cannot be solved by regular computing.

Cloud

Cloud Big Data Virtualization Efficiency

Revolutionizing System Testing With AI and ML

DZone

JUNE 6, 2023

This can include the use of cloud computing, artificial intelligence, big data analytics, the Internet of Things (IoT), and other digital tools. The digital transformation of businesses involves the adoption of digital technologies to change the way companies operate and deliver value to their customers.

Artificial Intelligence

Artificial Intelligence Systems IoT Testing

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Within Amazon S3’s offerings are features like metadata tagging, different classes of data movement and storage options, configuring control over access permissions, and ensuring safety against disasters through data replication mechanisms.

Storage

Storage Systems Big Data Azure

EDI and API: Which Trends Are Transforming the Modern Supply Chain Management?

DZone

JULY 22, 2022

Honestly, these two terms have recently been doing rounds in the big data world. These technologies specialize in transmitting large amounts of data across different trading partners and companies.

Big Data

Big Data Technology Technology Systems

Big / Bug Data: Analyzing the Apache Flink Source Code

DZone

DECEMBER 21, 2020

Applications used in the field of Big Data process huge amounts of information, and this often happens in real time. Naturally, such applications must be highly reliable so that no error in the code can interfere with data processing. It is an open-source framework for distributed processing of large amounts of data.

Code

Code Java Big Data Open Source

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

In today's world, data is generated in high volumes and to make something out of it, extracted data is needed to be transformed, stored, maintained, governed and analyzed. These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on.

Big Data

Big Data Government Open Source Storage

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

Having a distributed and scalable graph database system is highly sought after in many enterprise scenarios.

Scalability

Scalability Big Data Hardware Internet

Path to NoOps part 1: How modern AIOps brings NoOps within reach

Dynatrace

OCTOBER 25, 2022

“AIOps platforms address IT leaders’ need for operations support by combining big data and machine learning functionality to analyze the ever-increasing volume, variety and velocity of data generated by IT in response to digital transformation.” – Gartner Market Guide for AIOps platforms.

DevOps

DevOps Big Data Cloud Innovation

Optimizing dbt and Google’s BigQuery

DZone

DECEMBER 21, 2020

Setting up a data warehouse is the first step towards fully utilizing big data analysis. Still, it is one of many that need to be taken before you can generate value from the data you gather. An important step in that chain of the process is data modeling and transformation.

Big Data

Big Data Google Scalability Processing

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

To handle errors efficiently, Netflix developed a rule-based classifier for error classification called “Pensive.”

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms.

Tuning

Tuning Efficiency Big Data Engineering

What is RabbitMQ Used For

Scalegrid

JUNE 28, 2024

Can RabbitMQ handle the high-throughput needs of big data applications? For high-throughput big data applications, RabbitMQ may fall short of expectations. RabbitMQ’s flexible routing is adept at dealing with intricate transaction situations.

IoT

IoT Healthcare Programming Open Source

3 Performance Tricks for Dealing With Big Data Sets

Write Optimized Spark Code for Big Data Applications

Trending Sources

What is Greenplum Database? Intro to the Big Data Database

Cutting Big Data Costs: Effective Data Processing With Apache Spark

DataCentral: Uber’s Big Data Observability and Chargeback Platform

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

In-Stream Big Data Processing

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

How Amazon is solving big-data challenges with data lakes

Kubernetes for Big Data Workloads

Databook: Turning Big Data into Knowledge with Metadata at Uber

An overview of end-to-end entity resolution for big data

Performance Monitoring Dashboards in the Age of Big Data Pollution

Experiences with approximating queries in Microsoft’s production big-data clusters

Master the Art of Querying Data on Amazon S3

Snowflake Workload Optimization

Scaling for Success: Why Scalability Is the Forefront of Modern Applications

Microsoft Azure Event Hubs

Driving down the cost of Big-Data analytics - All Things Distributed

What is IT operations analytics? Extract more data insights from more sources

Apache Doris for Log and Time Series Data Analysis

Introduction to Azure Data Lake Storage Gen2

Introduction to Grafana, Prometheus, and Zabbix

Turbocharge Your Apache Spark Jobs for Unmatched Performance

Understanding gRPC Concepts, Use Cases, and Best Practices

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Data Engineers of Netflix?—?Interview with Kevin Wylie

Stream Processing vs. Batch Processing: What to Know

Auto-Diagnosis and Remediation in Netflix Data Platform

What is software automation? Optimize the software lifecycle with intelligent automation

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

Moving HPC to the Cloud: A Guide for 2020

Revolutionizing System Testing With AI and ML

What is a Distributed Storage System

EDI and API: Which Trends Are Transforming the Modern Supply Chain Management?

Big / Bug Data: Analyzing the Apache Flink Source Code

How to Optimize Elasticsearch for Better Search Performance

What Should You Know About Graph Database’s Scalability?

Path to NoOps part 1: How modern AIOps brings NoOps within reach

Optimizing dbt and Google’s BigQuery

A Recap of the Data Engineering Open Forum at Netflix

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

What is RabbitMQ Used For

Stay Connected