Big Data and Open Source - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is an open-source , hardware-agnostic MPP database for analytics, based on PostgreSQL and developed by Pivotal who was later acquired by VMware. High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages. Open Source.

Big Data

Big Data Database Artificial Intelligence Open Source

Write Optimized Spark Code for Big Data Applications

DZone

MARCH 7, 2023

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. PySpark is the Python API for Apache Spark , which allows Python developers to write Spark applications using Python instead of Scala or Java.

Big Data

Big Data Code Tuning Open Source

Cutting Big Data Costs: Effective Data Processing With Apache Spark

DZone

SEPTEMBER 14, 2023

In today's data-driven world, efficient data processing plays a pivotal role in the success of any project. Apache Spark , a robust open-source data processing framework, has emerged as a game-changer in this domain.

Big Data

Big Data Processing Games Open Source

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Scalegrid

NOVEMBER 25, 2019

ScyllaDB is an open-source distributed NoSQL data store, reimplemented from the popular Apache Cassandra database. You can run both the free open source ScyllaDB and ScyllaDB Enterprise in the cloud or on-premise, and ScyllaDB Enterprise license starts at $28.8k/year year for a total of 48 cores.

Big Data

Big Data Database Open Source Azure

Introduction to Grafana, Prometheus, and Zabbix

DZone

FEBRUARY 6, 2024

Grafana is an open-source tool to visualize the metrics and logs from different data sources. If the data sources are not available then customized plugins can be developed to integrate these data sources. What Is Grafana?

Big Data

Big Data Open Source Virtualization Metrics

An overview of end-to-end entity resolution for big data

The Morning Paper

DECEMBER 13, 2020

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. Open source ER systems. The survey includes an assessment of open source tools for ER, summarised in the table below.

Big Data

Big Data Open Source Processing Analytics

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

Turbocharge Your Apache Spark Jobs for Unmatched Performance

DZone

JULY 17, 2023

Apache Spark is a leading platform in the field of big data processing, known for its speed, versatility, and ease of use. Understanding Apache Spark Apache Spark is a unified computing engine designed for large-scale data processing. However, getting the most out of Spark often involves fine-tuning and optimization.

Big Data

Big Data Performance Open Source Tuning

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information. Here are the six steps of a typical ITOA process : Define the data infrastructure strategy. Why use a data lakehouse for causal AI? Why is ITOA important? Apache Spark.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

During earlier years of my career, I primarily worked as a backend software engineer, designing and building the backend systems that enable big data analytics. I developed many batch and real-time data pipelines using open source technologies for AOL Advertising and eBay.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Open-source software drives a vibrant Kubernetes ecosystem. Open source software drives a vibrant Kubernetes ecosystem. Across all categories in the Kubernetes survey, open source projects rank among the most frequently used solutions. Dynatrace’s investment in open source technologies keeps growing.

Open Source

Open Source Java Operating System Programming

Big / Bug Data: Analyzing the Apache Flink Source Code

DZone

DECEMBER 21, 2020

Applications used in the field of Big Data process huge amounts of information, and this often happens in real time. Naturally, such applications must be highly reliable so that no error in the code can interfere with data processing. It is an open-source framework for distributed processing of large amounts of data.

Code

Code Java Big Data Open Source

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

In today's world, data is generated in high volumes and to make something out of it, extracted data is needed to be transformed, stored, maintained, governed and analyzed. These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on.

Big Data

Big Data Government Open Source Storage

What is container orchestration?

Dynatrace

MARCH 24, 2023

Originally created by Google, Kubernetes was donated to the CNCF as an open source project. Originally developed as a research project at the University of California, Berkeley, in 2009, Mesos launched formally as a mature product in 2016 under the auspices of the Apache Software Foundation, a decentralized open source community.

Infrastructure

Infrastructure Open Source Operating System Cloud

RSA Guide 2023: Cloud application security remains core challenge for organizations

Dynatrace

APRIL 11, 2023

Open source software is likewise playing a larger role in cloud computing, which brings benefits and dilemmas: bad actors have ready access to open source software and can identify new vulnerabilities to exploit. This means that attackers may have already gained access to sensitive information or compromised the system.

Cloud

Cloud DevOps Open Source Retail

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

This article will explore how they handle data storage and scalability, perform in different scenarios, and, most importantly, how these factors influence your choice. Redis Revealed: An Overview Redis, a renowned open-source, in-memory remote dictionary server, stands out for its diverse data structures and advanced features.

Cache

Cache Storage Scalability Architecture

Seven benefits of AIOps to transform your business operations

Dynatrace

JULY 5, 2022

AIOps combines big data and machine learning to automate key IT operations processes, including anomaly detection and identification, event correlation, and root-cause analysis. As cloud-native technologies evolve, organizations layer in more tools and open source solutions to solve specific problems and provide specific benefits.

Artificial Intelligence

Artificial Intelligence Cloud Innovation Strategy

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

Our customers have frequently requested support for this first new batch of services, which cover databases, big data, networks, and computing. See the health of your big data resources at a glance. It enables you to use popular open-source frameworks such as Hadoop, Spark, and Kafka in Azure cloud environments.

Azure

Azure Cloud Big Data Virtualization

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Dynatrace

FEBRUARY 16, 2023

As teams try to gain insight into this data deluge, they have to balance the need for speed, data fidelity, and scale with capacity constraints and cost. To solve this problem, Dynatrace launched Grail, its causational data lakehouse , in 2022. Open source solutions are also making tracing harder.

Analytics

Analytics Innovation Metrics Database

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. What follows is a discussion of where big data systems might be heading, heavily inspired by the remarks in this paper, but with several of my own thoughts mixed in.

Cloud

Cloud Big Data Latency Architecture

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

ProxySQL: It is a feature-rich open-source MySQL proxy solution, that allows query routing for the most common MySQL architectures (PXC/Galera, Replication, Group Replication, etc.). Note that it requires some handling on the application as it doesn’t support the merging and data retrieval from multiple shards.

Open Source

Open Source Storage Database Big Data

Tackling the Pipeline Problem in the Architecture Research Community

ACM Sigarch

APRIL 8, 2019

big-data processing, machine learning, quantum computing, and so on). Communities in other fields have made a substantial effort towards “democratization,” where tools are not just made open source, but also accessible to everyone. For those of us who pursued computer architecture as a career, this is well understood.

Architecture

Architecture Open Source Hardware Software Engineering

The next generation of developer productivity

O'Reilly

AUGUST 15, 2023

Back in the early 2000s, a widely quoted survey reported that CIOs almost unanimously said that their IT organizations weren’t making use of open source. We suspect this estimate is lowballing Copilot’s actual usage. How little they knew! Actual usage of Copilot, ChatGPT, and similar tools is likely to be much higher than 33%.

Development

Development Programming Speed Open Source

A case for ELT

Abhishek Tiwari

DECEMBER 22, 2017

Cheap storage and on-demand compute in the cloud coupled with the emergence of new big data frameworks and tools are forcing us to rethink the whole ETL and data warehousing architecture. This type of analysis is greatly eased by open source tools such RStudio, Jupyter, Zeppelin along with scripting languages R and Python.

Big Data

Big Data Retail Storage Google

World’s Top Web Performance Leaders To Watch

Rigor

SEPTEMBER 11, 2019

Sergey is an open source developer, tireless educator on performance topics, and author of many web performance-related tools, including ShowSlow , SVN Assets , drop-in.htaccess and more. He tweets about Chrome initiatives, open source tools, and performance news @ paul_irish. Scott Jehl. Scott Jehl. Yoav Weiss.

Performance

Performance Education Google Website

Dutch Enterprises and The Cloud

All Things Distributed

SEPTEMBER 6, 2013

Shell leverages AWS for big data analytics to help achieve these goals. The company has used AWS to build an IT innovation zone, based upon open source products, which is being used to launch new innovations for customers like E-Mobility and E-thermostat products with a very fast time-to-market.

Cloud

Cloud Energy AWS Healthcare

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

We use and contribute to many open-source Python packages, some of which are mentioned below. We’ve had a number of successful Python open sources, including Security Monkey (our team’s most active open source project). If any of this interests you, check out the jobs site or find us at PyCon.

Open Source

Open Source Network Infrastructure Big Data

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

Instead they just need to configure the pipeline topology in the UI while getting other features like schema evolution and secure data access out of the box. Operational Reporting Pipeline Example Iceberg Sink Apache Iceberg is an open source table format for huge analytics datasets. Currently Iceberg sink is appended only.

Big Data

Big Data Government Analytics Processing

40+ Best Web Development Blogs of 2018

KeyCDN

OCTOBER 2, 2018

It’s awesome for discovering how grid systems, CSS animation, Big Data, etc all play roles in real-world web design. Ariya Ariya Hidayat, the author of this blog, maintains two well-known open source projects (PhantomJS and Esprima). Be sure to check it out if your dev process needs a creative kick in the pants.

Development

Development Website Design Code

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The Netflix TechBlog

FEBRUARY 16, 2021

After evaluating multiple open-source and commercial rule evaluation frameworks, we chose our internal Rules Management and Evaluation Framework?—?Hendrix. Building a scalable SKU catalog platform that allowed for rapid changes with the minimal intervention was challenging.

Mobile

Mobile Engineering Infrastructure Scalability

Structural Evolutions in Data

O'Reilly

SEPTEMBER 19, 2023

Each time, the underlying implementation changed a bit while still staying true to the larger phenomenon of “Analyzing Data for Fun and Profit.” ” They weren’t quite sure what this “data” substance was, but they’d convinced themselves that they had tons of it that they could monetize.

Hardware

Hardware Storage Big Data Blockchain

Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Write Optimized Spark Code for Big Data Applications

Trending Sources

Cutting Big Data Costs: Effective Data Processing With Apache Spark

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Introduction to Grafana, Prometheus, and Zabbix

An overview of end-to-end entity resolution for big data

Kubernetes for Big Data Workloads

Turbocharge Your Apache Spark Jobs for Unmatched Performance

What is IT operations analytics? Extract more data insights from more sources

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Kubernetes in the wild report 2023

Big / Bug Data: Analyzing the Apache Flink Source Code

How to Optimize Elasticsearch for Better Search Performance

What is container orchestration?

RSA Guide 2023: Cloud application security remains core challenge for organizations

Redis vs Memcached in 2024

Seven benefits of AIOps to transform your business operations

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Helios: hyperscale indexing for the cloud & edge – part 1

Why MySQL Could Be Slow With Large Tables

Tackling the Pipeline Problem in the Architecture Research Community

The next generation of developer productivity

A case for ELT

World’s Top Web Performance Leaders To Watch

Dutch Enterprises and The Cloud

Python at Netflix

Data Movement in Netflix Studio via Data Mesh

40+ Best Web Development Blogs of 2018

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

Structural Evolutions in Data

Stay Connected