article thumbnail

3 Performance Tricks for Dealing With Big Data Sets

DZone

This article describes 3 different tricks that I used in dealing with big data sets (order of 10 million records) and that proved to enhance performance dramatically. Trick 1: CLOB Instead of Result Set.

Big Data 246
article thumbnail

Write Optimized Spark Code for Big Data Applications

DZone

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. In this article, we will discuss some tips and techniques for tuning PySpark applications.

Big Data 161
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Cutting Big Data Costs: Effective Data Processing With Apache Spark

DZone

In today's data-driven world, efficient data processing plays a pivotal role in the success of any project. Apache Spark , a robust open-source data processing framework, has emerged as a game-changer in this domain.

Big Data 269
article thumbnail

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results. Greenplum features a cost-based query optimizer for large-scale, big data workloads. Query Optimization.

Big Data 321
article thumbnail

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Scalegrid

There are dozens of quality articles on ScyllaDB vs. Cassandra, so we’ll stop short here so we can get to the real purpose of this article, breaking down the ScyllaDB user data. It does, but they claim in this report that it’s a 2.5X ScyllaDB Cloud vs. ScyllaDB On-Premises. of all cloud deployments.

Big Data 187
article thumbnail

In-Stream Big Data Processing

Highly Scalable

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. All these topics will be discussed in the later sections of the article. The article is based on a research project developed at Grid Dynamics Labs. Interoperability with Hadoop.

Big Data 154
article thumbnail

Kubernetes for Big Data Workloads

Abhishek Tiwari

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.