Article and Big Data - Technology Performance Pulse

3 Performance Tricks for Dealing With Big Data Sets

DZone

AUGUST 21, 2021

This article describes 3 different tricks that I used in dealing with big data sets (order of 10 million records) and that proved to enhance performance dramatically. Trick 1: CLOB Instead of Result Set.

Big Data

Big Data Performance Tuning Mobile

Write Optimized Spark Code for Big Data Applications

DZone

MARCH 7, 2023

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. In this article, we will discuss some tips and techniques for tuning PySpark applications.

Big Data

Big Data Code Tuning Open Source

Cutting Big Data Costs: Effective Data Processing With Apache Spark

DZone

SEPTEMBER 14, 2023

In today's data-driven world, efficient data processing plays a pivotal role in the success of any project. Apache Spark , a robust open-source data processing framework, has emerged as a game-changer in this domain.

Big Data

Big Data Processing Games Open Source

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results. Greenplum features a cost-based query optimizer for large-scale, big data workloads. Query Optimization.

Big Data

Big Data Database Artificial Intelligence Open Source

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Scalegrid

NOVEMBER 25, 2019

There are dozens of quality articles on ScyllaDB vs. Cassandra, so we’ll stop short here so we can get to the real purpose of this article, breaking down the ScyllaDB user data. It does, but they claim in this report that it’s a 2.5X ScyllaDB Cloud vs. ScyllaDB On-Premises. of all cloud deployments.

Big Data

Big Data Database Open Source Azure

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. All these topics will be discussed in the later sections of the article. The article is based on a research project developed at Grid Dynamics Labs. Interoperability with Hadoop.

Big Data

Big Data Processing Lambda Database

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

An overview of end-to-end entity resolution for big data

The Morning Paper

DECEMBER 13, 2020

An overview of end-to-end entity resolution for big data , Christophides et al., 2020, Article No. It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. ACM Computing Surveys, Dec.

Big Data

Big Data Open Source Processing Analytics

Turbocharge Your Apache Spark Jobs for Unmatched Performance

DZone

JULY 17, 2023

Apache Spark is a leading platform in the field of big data processing, known for its speed, versatility, and ease of use. This article delves into various techniques that can be employed to optimize your Apache Spark jobs for maximum performance.

Big Data

Big Data Performance Open Source Tuning

Optimizing dbt and Google’s BigQuery

DZone

DECEMBER 21, 2020

Setting up a data warehouse is the first step towards fully utilizing big data analysis. Still, it is one of many that need to be taken before you can generate value from the data you gather. An important step in that chain of the process is data modeling and transformation.

Big Data

Big Data Google Scalability Processing

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

In today's world, data is generated in high volumes and to make something out of it, extracted data is needed to be transformed, stored, maintained, governed and analyzed. These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on.

Big Data

Big Data Government Open Source Storage

RSA Guide 2023: Cloud application security remains core challenge for organizations

Dynatrace

APRIL 11, 2023

Context is king: Converging security and observability to protect a digital-first world – Forbes.com article Many tools also don’t see deep enough inside cloud-native architectures to quickly pinpoint newly discovered zero-days for patching. If something doesn’t change, organizations will be unprepared when the next Log4Shell emerges.

Cloud

Cloud DevOps Open Source Retail

When Performance Matters, Think NVMe

DZone

MAY 21, 2019

The demand for more IT resource-intensive applications has significantly increased today, whether it is to process quicker transactions, gain real-time insight, crunch big data sets, or to meet customer expectations. That’s because NVMe provides 6x higher bandwidth and IOPS advantage compared to SAS/SATA SSD.

Performance

Performance Big Data Storage Processing

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

This article will explore hybrid cloud benefits and steps to craft a plan that aligns with your unique business challenges. Workloads from web content, big data analytics, and artificial intelligence stand out as particularly well-suited for hybrid cloud infrastructure owing to their fluctuating computational needs and scalability demands.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

With these goals in mind, two in-memory data stores, Redis and Memcached, have emerged as the top contenders. This article will explore how they handle data storage and scalability, perform in different scenarios, and, most importantly, how these factors influence your choice. Data transfer technology. 3d render.

Cache

Cache Storage Scalability Architecture

Automating Physical Backups of MongoDB on Kubernetes

Percona

MARCH 15, 2023

Physical backups in PBM were introduced a few months ago, and, with those, we saw significant improvement in recovery time (read more in this blog post, Physical Backup Support in Percona Backup for MongoDB ): So if we want to reduce the Recovery Time Objective (RTO) for big data sets, we must provide physical backups and restores in the Operator.

Database

Database Big Data Processing Servers

Comparing Apache Ignite In-Memory Cache Performance With Hazelcast In-Memory Cache and Java Native Hashmap

DZone

MARCH 16, 2020

This article compares different options for the in-memory maps and their performances in order for an application to move away from traditional RDBMS tables for frequently accessed data.

Cache

Cache Java Performance Database

Benchmarking the AWS Graviton2 with KeyDB

DZone

MAY 14, 2020

This article compares KeyDB running on several different M5 & M6g EC2 instances to get some insight into cost, performance, and use case benefits. We are, of course, referring to the Amazon EC2 M6g instances powered by AWS Graviton2 processors.

AWS

AWS Benchmarking Database Performance

Why You Should Spend More Time Thinking About Phone Call Tracking App

Tech News Gather

OCTOBER 7, 2023

This article sheds light on the often-underestimated capabilities of phone call tracking apps and why they deserve your undivided attention. By optimizing your marketing and customer service based on call data, you can outperform competitors who rely solely on digital analytics.

Strategy

Strategy Big Data Scalability Games

Optimizing anomaly detection and noise

Dynatrace

MARCH 11, 2021

It’s been a while since my last blog article on managing Dynatrace Managed at scale. In the fourth part of the series, I’ll show you how I used Dynatrace’s raw problem and event data to find the best fit for optimized anomaly detection settings. I took a big-data-analysis approach, which started with another problem visualization.

Tuning

Tuning Architecture Monitoring Big Data

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

ACM Sigarch

MAY 31, 2023

Heterogeneous and Composable Memory (HCM) offers a feasible solution for terabyte- or petabyte-scale systems, addressing the performance and efficiency demands of emerging big-data applications. This article lays out the ideas and discussions shared at the workshop. Figure 1: Heterogeneous memory with CXL (source: Maruf et al.,

Latency

Latency Hardware Cache Architecture

40+ Best Web Development Blogs of 2018

KeyCDN

OCTOBER 2, 2018

Its videos and blog articles address issues such as web performance, extensible component development and the intersection of CSS with other technologies, like HTML and JavaScript. features collected articles that concentrate on what makes JavaScript work and what doesn’t. Visit website 3. Visit website 6. Visit website 11.

Development

Development Website Design Code

Web Performance Bookshelf

Rigor

JANUARY 13, 2020

Reading time 1 min Why share the library of the web performance books while there’s a substantial collection of fantastic websites and articles on the net? A collection of practical articles on front-end website performance for front-end developers. Web Performance Collection. Building Progressive Apps.

Performance

Performance Social Media Website Website Performance

Cloud-Based Testing – A tester’s perspective

Testsigma

MAY 14, 2021

Here is an article that will help you ascertain if you need to implement cloud-based test automation in your organization: 6 signs you need to invest in a cloud-based test automation tool. Examples are DevOps, AWS, Big Data, Testing as Service, testing environments. How is cloud-based testing different from traditional testing.

Cloud

Cloud Testing Testing Tools Internet

Rethinking the 'production' of data

All Things Distributed

DECEMBER 20, 2017

This article titled " Daten müssen strategischer Teil des Geschäfts werden " appeared in German last week in the "IT und Datenproduktion" column of Wirtschaftwoche. How companies can use ideas from mass production to create business with data. Strategically, IT doesn't matter.

Artificial Intelligence

Artificial Intelligence Social Media Logistics AWS

Most Popular Tools For Cloud Automation Testing

Testsigma

SEPTEMBER 8, 2021

In this article, we will focus on tools that make this happen in real-time on the cloud. AppPerfect is one among the tools list that is a versatile tool – it is of great use for not only testers but developers and big data operations. Cloud-based automation testing is a simple concept. Signup now. AppPerfect.

Cloud

Cloud Testing AWS Testing Tools

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

At the same time, NoSQL data modeling is not so well studied and lacks the systematic theory found in relational databases. In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques. General Notes on NoSQL Data Modeling.

Database

Database Ecommerce Efficiency Engineering

Probabilistic Data Structures for Web Analytics and Data Mining

Highly Scalable

MAY 1, 2012

In this article, I provide an overview of probabilistic data structures that allow one to estimate these and many other metrics and trade precision of the estimations for the memory consumption. I would like to thank Mikhail Khludnev and Kirill Uvaev, who reviewed this article and provided valuable suggestions. Case Study.

Analytics

Analytics Traffic Big Data Efficiency

World’s Top Web Performance Leaders To Watch

Rigor

SEPTEMBER 11, 2019

Follow @ tameverts to stay updated on all her upcoming speaking engagements and articles. His most recent book, Hacking Web Performance , walks the reader through improving performance from initial load and data transfer to resource loading and the overall user experience. His articles for WebFundamentals are not to be missed.

Performance

Performance Education Google Website

Data Mining Problems in Retail

Highly Scalable

MARCH 10, 2015

The rise of omni-channel retail that integrates marketing, customer relationship management, and inventory management across all online and offline channels has produced a plethora of correlated data which increases both the importance and capabilities of data-driven decisions. However, many of these models are highly parametric (i.e.

Retail

Retail C++ Analytics Metrics

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

All Things Distributed

NOVEMBER 19, 2010

In an in-depth article on Streaming Media Dan Rayburn analyzed the impact of Amazon Cloudfront move to GA: Amazons CDN Gets More Competitive, Adds SLA, New Edge Locations, Lower Pricing. Understanding Throughput-Oriented Architectures - background article in CACM on massively parallel and throughput vs latency oriented architectures.

AWS

AWS Cloud Benchmarking Storage

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

Problem Statement The purpose of this article is to give insights into analyzing and predicting “out of memory” or OOM kills on the Netflix App. We at Netflix, as a streaming service running on millions of devices, have a tremendous amount of data about device capabilities/characteristics and runtime data in our big data platform.

Big Data

Big Data Cache Engineering Data Engineering

I Used The Web For A Day On A 50 MB Budget

Smashing Magazine

JULY 29, 2019

This article is part of a series in which I attempt to use the web under various constraints, representing a given demographic of user. In Amazon’s case, there is room to make some big data savings on the desktop site and we shouldn’t get complacent just because the screen size suggests I’m not on a mobile device. Chris Ashton.

Cache

Cache Google Mobile Network

MapReduce Patterns, Algorithms, and Use Cases

Highly Scalable

JANUARY 31, 2012

In this article I digested a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found on the web or scientific articles. Chu et al provides an excellent description of machine learning algorithms for MapReduce in the article Map-Reduce for Machine Learning on Multicore.

C++

C++ Network Ecommerce Processing

Fast Intersection of Sorted Lists Using SSE Instructions

Highly Scalable

JUNE 5, 2012

In this article I describe several useful techniques that are based on SSE instructions and provide results of performance testing for Lucene, Java, and C implementations. in this article. From a functional point of view, we needed mainly a standard boolean query processing, so it was possible to use Solr/Lucene as a platform.

C++

C++ Java Performance Testing Efficiency

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3.

Latency

Latency Storage Big Data Tuning

Utilities, Strategic Investments, and the CIO

The Agile Manager

FEBRUARY 27, 2012

The rise of Big Data - the ability to store and analyze large volumes of structured and unstructured, internal and external data - promises to let companies react more nimbly than ever before. As the FT articles make clear, that debate rages on.

Ecommerce

Ecommerce Social Media Retail Airlines

Should You Use ClickHouse as a Main Operational Database?

Percona

JANUARY 14, 2019

Currently, an issue has been opened to make the “tailing” based on the primary key much faster: slow order by primary key with small limit on big data. To do that I’m using the ClickHouse function alphaTokens (body) which will split the “body” field into words.

Database

Database Analytics Blockchain Healthcare

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The Netflix TechBlog

FEBRUARY 16, 2021

These expressions (rules) are evaluated in the current request session context and can access data such as A/B test assignments, necessary member information, customized input, etc. We’ll skip over Hendrix’s specific details and focus on the SKU platform adoption in this article for brevity.

Mobile

Mobile Engineering Infrastructure Scalability

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits. This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture.

Storage

Storage Latency Efficiency Data Engineering

Data Pipelines: The Hammer for Every Nail

Abhishek Tiwari

JULY 7, 2023

In the era of big data and complex data processing, data pipelines have emerged as a popular solution for managing and manipulating data. They provide a systematic approach to extract, transform, and load (ETL) data from various sources, enabling organizations to derive valuable insights.

Logistics

Logistics Transportation Scalability Data Engineering

The workplace of the future

All Things Distributed

MAY 21, 2018

This article titled " Die Arbeitswelt der Zukunft " appeared in German last week in the "Digitalisierung" column of Wirtschaftwoche. The workplace of the future.

Artificial Intelligence

Artificial Intelligence Technology Technology IoT

Choosing Consistency - All Things Distributed

All Things Distributed

FEBRUARY 24, 2010

I laid out some of these challenges in an article explaining the concept of eventual consistency. If you need to achieve high-availability and scalable performance, you will need to resort to data replication techniques. Driving down the cost of Big-Data analytics. Introducing the AWS South America (Sao Paulo) Region.

AWS

AWS Latency Database Scalability

Smashing Podcast Episode 41 With Eva PenzeyMoog: Designing For Safety

Smashing Magazine

AUGUST 9, 2021

But if you just want something and more immediately, you can go to Theinclusivesafetyproject.com and there’s a resources page there that has different sort of articles or studies to look at different people working in related spaces to follow on Twitter, books to read, things like that. Eva: I have been learning about data.

Design

Design Education Network Google

3 Performance Tricks for Dealing With Big Data Sets

Write Optimized Spark Code for Big Data Applications

Trending Sources

Cutting Big Data Costs: Effective Data Processing With Apache Spark

What is Greenplum Database? Intro to the Big Data Database

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

In-Stream Big Data Processing

Kubernetes for Big Data Workloads

An overview of end-to-end entity resolution for big data

Turbocharge Your Apache Spark Jobs for Unmatched Performance

Optimizing dbt and Google’s BigQuery

How to Optimize Elasticsearch for Better Search Performance

RSA Guide 2023: Cloud application security remains core challenge for organizations

When Performance Matters, Think NVMe

Mastering Hybrid Cloud Strategy

Redis vs Memcached in 2024

Automating Physical Backups of MongoDB on Kubernetes

Comparing Apache Ignite In-Memory Cache Performance With Hazelcast In-Memory Cache and Java Native Hashmap

Benchmarking the AWS Graviton2 with KeyDB

Why You Should Spend More Time Thinking About Phone Call Tracking App

Optimizing anomaly detection and noise

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

40+ Best Web Development Blogs of 2018

Web Performance Bookshelf

Cloud-Based Testing – A tester’s perspective

Rethinking the 'production' of data

Most Popular Tools For Cloud Automation Testing

NoSQL Data Modeling Techniques

Probabilistic Data Structures for Web Analytics and Data Mining

World’s Top Web Performance Leaders To Watch

Data Mining Problems in Retail

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

I Used The Web For A Day On A 50 MB Budget

MapReduce Patterns, Algorithms, and Use Cases

Fast Intersection of Sorted Lists Using SSE Instructions

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Utilities, Strategic Investments, and the CIO

Should You Use ClickHouse as a Main Operational Database?

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

Optimizing data warehouse storage

Data Pipelines: The Hammer for Every Nail

The workplace of the future

Choosing Consistency - All Things Distributed

Smashing Podcast Episode 41 With Eva PenzeyMoog: Designing For Safety

Stay Connected