Big Data and Database - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is a massively parallel processing (MPP) SQL database that is built and based on PostgreSQL. It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data.

Big Data

Big Data Database Artificial Intelligence Open Source

3 Performance Tricks for Dealing With Big Data Sets

DZone

AUGUST 21, 2021

This article describes 3 different tricks that I used in dealing with big data sets (order of 10 million records) and that proved to enhance performance dramatically. Trick 1: CLOB Instead of Result Set.

Big Data

Big Data Performance Tuning Mobile

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Scalegrid

NOVEMBER 25, 2019

ScyllaDB is an open-source distributed NoSQL data store, reimplemented from the popular Apache Cassandra database. We’ve heard a lot about this rising database from the DBA community and our users, and decided to become a sponsor for this years Scylla Summit to learn more about the deployment trends from its users.

Big Data

Big Data Database Open Source Azure

Understanding the Database Connection Pool (DBCP) Properties

DZone

APRIL 15, 2022

Recently, I faced an issue related to a very high load on the database layer. The database was having too many connections in parallel. I had to review my application’s database connection pool (DBCP) properties very closely.

Database

Database Code Big Data Performance

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It is clear that distributed in-stream data processing has something to do with query processing in distributed relational databases. Basics of Distributed Query Processing.

Big Data

Big Data Processing Lambda Database

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information. Here are the six steps of a typical ITOA process : Define the data infrastructure strategy. NoSQL database. Why use a data lakehouse for causal AI? Apache Spark.

Analytics

Analytics Artificial Intelligence Big Data Open Source

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

Having a distributed and scalable graph database system is highly sought after in many enterprise scenarios. Do Not Be Misled Designing and implementing a scalable graph database system has never been a trivial task.

Scalability

Scalability Big Data Hardware Internet

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

At its core, a distributed storage system comprises three main components: a controller for managing the system’s operations, an internal datastore where information is held, and databases geared towards ensuring scalability, partitioning capabilities, and high availability for all types of data.

Storage

Storage Systems Big Data Azure

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

I was later hired into my first purely data gig where I was able to deepen my knowledge of big data. After that, I joined MySpace back at its peak as a data engineer and got my first taste of data warehousing at internet-scale. Both were appliances located in our own data center. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Entertainment Big Data

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

The strongest Kubernetes growth areas are security, databases, and CI/CD technologies. Strongest Kubernetes growth areas are security, databases, and CI/CD technologies. Of the organizations in the Kubernetes survey, 71% run databases and caches in Kubernetes, representing a +48% year-over-year increase. Java, Go, and Node.js

Open Source

Open Source Java Operating System Programming

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications.

Big Data

Big Data Analytics AWS Scalability

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

As cloud and big data complexity scales beyond the ability of traditional monitoring tools to handle, next-generation cloud monitoring and observability are becoming necessities for IT teams. Database monitoring. This ensures the database queries are performant, while also identifying host problems. Website monitoring.

Cloud

Cloud Monitoring Best Practices Infrastructure

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Advanced Redis Features Showdown Big data center concept, cloud database, server power station of the future. Data transfer technology. Cube or box Block chain of abstract financial data. Additionally, it provides robust native support for geospatial data, enhancing applications like maps and location services.

Cache

Cache Storage Scalability Architecture

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

In addition to providing visibility for core Azure services like virtual machines, load balancers, databases, and application services, we’re happy to announce support for the following 10 new Azure services, with many more to come soon: Virtual Machines (classic ones). Effortlessly optimize Azure database performance.

Azure

Azure Cloud Big Data Virtualization

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

MAY 14, 2019

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., Seer uses a lightweight RPC-level tracing system to collect request traces and aggregate them in a Cassandra database. ASPLOS’19.

Big Data

Big Data Cloud Performance Hardware

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Dynatrace

FEBRUARY 16, 2023

As teams try to gain insight into this data deluge, they have to balance the need for speed, data fidelity, and scale with capacity constraints and cost. To solve this problem, Dynatrace launched Grail, its causational data lakehouse , in 2022. And without the encumbrances of traditional databases, Grail performs fast. “In

Analytics

Analytics Innovation Metrics Database

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Using Grail to heal observability pains Grail logs not only store big data, but also map out dependencies to enable fast analytics and data reasoning. ” Weighing the value and cost of indexed databases vs. Grail With standard index databases, teams must choose relevant indexes before data ingestion.

Analytics

Analytics Infrastructure Storage Efficiency

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

Incoming data is saved into data storage (historian database or log store) for query by operational managers who must attempt to find the highest priority issues that require their attention. The best they can usually do in real-time using general purpose tools is to filter and look for patterns of interest.

IoT

IoT Analytics Big Data Architecture

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Workloads from web content, big data analytics, and artificial intelligence stand out as particularly well-suited for hybrid cloud infrastructure owing to their fluctuating computational needs and scalability demands. Ready to take your database management to the next level with ScaleGrid’s cutting-edge solutions?

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Reduce RPO, Encrypt Backups, and More in 1.15.0 Release of Percona Operator for MongoDB

Percona

OCTOBER 18, 2023

release , we added support for physical backups and restores to significantly reduce Recovery Time Objective ( RTO ), especially for big data sets. However, the problem of losing data between backups – in other words, Recovery Point Objective (RPO) – for physical backups was not solved.

Best Practices

Best Practices Storage AWS Big Data

Automating Physical Backups of MongoDB on Kubernetes

Percona

MARCH 15, 2023

We at Percona talk a lot about how Kubernetes Operators automate the deployment and management of databases. Operators seamlessly handle lots of Kubernetes primitives and database configuration bits and pieces, all to remove toil from operation teams and provide a self-service experience for developers.

Database

Database Big Data Processing Servers

A guide to Autonomous Performance Optimization

Dynatrace

SEPTEMBER 15, 2020

Stefano started his presentation by showing how much cost and performance optimization is possible when knowing how to properly configure your application runtimes, databases, or cloud environments: Correct configuration of JVM parameters can save up to 75% resource utilization while delivering same or better performance!

Performance

Performance Java Metrics Cloud

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. What follows is a discussion of where big data systems might be heading, heavily inspired by the remarks in this paper, but with several of my own thoughts mixed in.

Cloud

Cloud Big Data Latency Architecture

Job Openings in AWS - Senior Leader in Database Services - All.

All Things Distributed

AUGUST 19, 2011

Job Openings in AWS - Senior Leader in Database Services. This week it is an opening for senior leaders with AWS Database Services. AWS Database Services is responsible for setting the database strategy and delivering distributed structured storage services to our AWS customers. Comments (). Contact Info. Werner Vogels.

AWS

AWS Database Storage Scalability

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

Some startups adopted MySQL in its early days such as Facebook, Uber, Pinterest, and many more, which are now big and successful companies that prove that MySQL can run on large databases and on heavily used sites. It was developed for optimizing data storage and access for big data sets.

Open Source

Open Source Storage Database Big Data

Should You Use ClickHouse as a Main Operational Database?

Percona

JANUARY 14, 2019

What if we use ClickHouse (which is a columnar analytical database) as our main datastore? Well, typically, an analytical database is not a replacement for a transactional or key/value datastore. Although such databases can be very efficient with counts and averages, some queries will be slow or simply non existent. Processed 4.15

Database

Database Analytics Blockchain Healthcare

RSA Guide 2023: Cloud application security remains core challenge for organizations

Dynatrace

APRIL 11, 2023

Meanwhile, traditional databases have demonstrated limitations in increasingly complex and distributed cloud-native environments. The schema and index-dependent approach of traditional databases can’t keep pace or provide adequate analytics of these hyperscale environments.

Cloud

Cloud DevOps Open Source Retail

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

Dynatrace

JUNE 29, 2022

A hybrid cloud, however, combines public infrastructure and services with on-premises resources or a private data center to create a flexible, interconnected IT environment. Hybrid environments provide more options for storing and analyzing ever-growing volumes of big data and for deploying digital services.

Infrastructure

Infrastructure Cloud Azure AWS

What is APM?

Dynatrace

JUNE 1, 2020

The variables that can impact the performance of an application vary; from coding errors or ‘bugs’ in the software, database slowdowns, hosting and network performance, to operating system and device type support. And I’m sure we’ve all experienced frustration when an application crashes, is slow to load, or doesn’t load at all.

Artificial Intelligence

Artificial Intelligence Social Media Monitoring IoT

What is Application Performance Monitoring?

Dynatrace

JUNE 1, 2020

The variables that can impact the performance of an application vary; from coding errors or ‘bugs’ in the software, database slowdowns, hosting and network performance, to operating system and device type support. And I’m sure we’ve all experienced frustration when an application crashes, is slow to load, or doesn’t load at all.

Monitoring

Monitoring Performance Social Media Artificial Intelligence

Comparing Apache Ignite In-Memory Cache Performance With Hazelcast In-Memory Cache and Java Native Hashmap

DZone

MARCH 16, 2020

In this case, for the sake of demonstration, I have taken 2 million dummy physician records that reside in the database table and migrated them to in-memory maps. The migration will enable the application to quickly lookup in the map and vet the physician rather than querying the database table for vetting.

Cache

Cache Java Performance Database

Benchmarking the AWS Graviton2 with KeyDB

DZone

MAY 14, 2020

The performance claims made and the hype surrounding the Graviton2 had us itching to see how our high-performance database would perform. We are, of course, referring to the Amazon EC2 M6g instances powered by AWS Graviton2 processors. The numbers were quite exciting with the AWS Graviton2 living up to the hype, we hope you enjoy!

AWS

AWS Benchmarking Database Performance

DynamoDB for Location Data: Geospatial querying on DynamoDB datasets

All Things Distributed

SEPTEMBER 5, 2013

Over the past few years, two important trends that have been disrupting the database industry are mobile applications and big data. The explosive growth in mobile devices and mobile apps is generating a huge amount of data, which has fueled the demand for big data services and for high scale databases.

Big Data

Big Data Mobile Latency Database

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

JULY 3, 2023

LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. By Rafal Gancarz

Cache

Cache Latency Traffic Database

Optimizing anomaly detection and noise

Dynatrace

MARCH 11, 2021

I took a big-data-analysis approach, which started with another problem visualization. Using the consolidated API, I started to pull events and problems from all environments and store them in a time series database (influxDB). The raw event and problem data from Dynatrace for analysis stored in InfluxDB.

Tuning

Tuning Architecture Monitoring Big Data

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. At the same time, NoSQL data modeling is not so well studied and lacks the systematic theory found in relational databases. Document databases advance the BigTable model offering two significant improvements.

Database

Database Ecommerce Efficiency Engineering

A case for ELT

Abhishek Tiwari

DECEMBER 22, 2017

Cheap storage and on-demand compute in the cloud coupled with the emergence of new big data frameworks and tools are forcing us to rethink the whole ETL and data warehousing architecture. Then we perform frequent batch ETL from application databases to a data warehouse. Classic ETL. Late transformation.

Big Data

Big Data Retail Storage Google

Expanding the Cloud: Introducing the AWS Asia Pacific (Seoul) Region

All Things Distributed

JANUARY 6, 2016

Mirae Asset Global Investments improved its web service environment and reduced annual management costs by 50% by consolidating the management of all web services, including servers, network, database, and security. Many of these enterprises are assisted by our extensive partner ecosystem in Korea.

AWS

AWS Cloud Games Latency

Expanding the Cloud – An AWS Region is coming to Hong Kong

All Things Distributed

JUNE 20, 2017

Beyond running their web properties and applications, Next Digital also uses Amazon RDS (database), Amazon ElastiCache (caching), and Amazon Redshift (data warehousing). Next Digital operates on AWS in a more highly available and fault-tolerant environment than their previous colocation solution.

AWS

AWS Logistics Cloud Social Media

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

MongoDB is an important database, and this paper explains the tunable (per-operation) consistency models that MongoDB provides and how they are implemented under the covers. Microsoft have a paper describing their new recovery mechanism in Azure SQL Database , the key feature being that it can recovery in constant time.

Blockchain

Blockchain Hardware Google Analytics

Use Digital Twins for the Next Generation in Telematics

ScaleOut Software

NOVEMBER 24, 2020

Every few seconds, the application servers collect batches of snapshots and write them to the database where they can be queried by dispatchers managing the fleet. At the same time, telemetry snapshots are stored in a data lake, such as HDFS , for offline batch analysis and visualization using big data tools like Spark.

Analytics

Analytics Architecture Scalability Software Architecture

The AWS GovCloud (US) Region - All Things Distributed

All Things Distributed

AUGUST 16, 2011

Government and Big Data. One particular early use case for AWS GovCloud (US) will be massive data processing and analytics. Several agencies of very different parts of the government have needs for data analytics that really put the Big in Big-Data, sometimes several orders of magnitude larger than commonly found in industry.

AWS

AWS Government Big Data Cloud

Why test data management is more important than you think

Testsigma

MAY 7, 2020

IBM Big Data and Analytics Hub website cited a case study, where a US insurance company was estimating 15% of their testing efforts to be just test data collection for the backend system and the frontend system. For testing purposes, usually, a mix of static and dynamic data is needed. Copy production data i.

Testing

Testing Storage Database Processing

Register for AWS re: Invent - All Things Distributed

All Things Distributed

JULY 16, 2012

There are sessions in many different categories: Architecture, Big Data, HPC, Computer & Networking, Storage, Databases, Security, Tools & Languages, Media Sharing & Content Delivery, Managing AWS Resources, Enterprise IT, Mobile, Start-up, and more.

AWS

AWS Big Data Media Storage

What is Greenplum Database? Intro to the Big Data Database

3 Performance Tricks for Dealing With Big Data Sets

Trending Sources

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Understanding the Database Connection Pool (DBCP) Properties

In-Stream Big Data Processing

What is IT operations analytics? Extract more data insights from more sources

What Should You Know About Graph Database’s Scalability?

What is a Distributed Storage System

Data Engineers of Netflix?—?Interview with Kevin Wylie

Kubernetes in the wild report 2023

Driving down the cost of Big-Data analytics - All Things Distributed

What is cloud monitoring? How to improve your full-stack visibility

Redis vs Memcached in 2024

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Conducting log analysis with an observability platform and full data context

The Need for Real-Time Device Tracking

Mastering Hybrid Cloud Strategy

Reduce RPO, Encrypt Backups, and More in 1.15.0 Release of Percona Operator for MongoDB

Automating Physical Backups of MongoDB on Kubernetes

A guide to Autonomous Performance Optimization

Helios: hyperscale indexing for the cloud & edge – part 1

Job Openings in AWS - Senior Leader in Database Services - All.

Why MySQL Could Be Slow With Large Tables

Should You Use ClickHouse as a Main Operational Database?

RSA Guide 2023: Cloud application security remains core challenge for organizations

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

What is APM?

What is Application Performance Monitoring?

Comparing Apache Ignite In-Memory Cache Performance With Hazelcast In-Memory Cache and Java Native Hashmap

Benchmarking the AWS Graviton2 with KeyDB

DynamoDB for Location Data: Geospatial querying on DynamoDB datasets

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

Optimizing anomaly detection and noise

NoSQL Data Modeling Techniques

A case for ELT

Expanding the Cloud: Introducing the AWS Asia Pacific (Seoul) Region

Expanding the Cloud – An AWS Region is coming to Hong Kong

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

Use Digital Twins for the Next Generation in Telematics

The AWS GovCloud (US) Region - All Things Distributed

Why test data management is more important than you think

Register for AWS re: Invent - All Things Distributed

Stay Connected