Big Data, Efficiency, Scalability and Tuning - Technology Performance Pulse

Big Data

Efficiency

Scalability

Tuning

Write Optimized Spark Code for Big Data Applications

DZone

MARCH 7, 2023

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. In addition, pySpark applications can be tuned to optimize performance and achieve better execution time, scalability, and resource utilization.

Big Data

Big Data Code Tuning Open Source

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Summary Providing network insight into the cloud network infrastructure using eBPF flow logs at scale is made possible with eBPF and a highly scalable and efficient flow collection pipeline. After several iterations of the architecture and some tuning, the solution has proven to be able to scale.

Network

Network Transportation AWS Cloud

Join 5,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

Dynatrace

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

With more automated approaches to log monitoring and log analysis, however, organizations can gain visibility into their applications and infrastructure efficiently and with greater precision—even as cloud environments grow. “The weakness of a data lake is they fail when you need to access them fast,” Pawlowski said.

Analytics

Analytics Infrastructure Storage Efficiency

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

We will show how we are building a clean and efficient incremental processing solution (IPS) by using Netflix Maestro and Apache Iceberg. IPS provides the incremental processing support with data accuracy, data freshness, and backfill for users and addresses many of the challenges in workflows. past 3 hours or 10 days).

Processing

Processing Big Data Efficiency Engineering

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3. Moving data with Bulldozer at Netflix.

Latency

Latency Storage Big Data Tuning

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits. This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture.

Storage

Storage Latency Efficiency Data Engineering

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

As Big data and ML became more prevalent and impactful, the scalability, reliability, and usability of the orchestrating ecosystem have increasingly become more important for our data scientists and the company. Another dimension of scalability to consider is the size of the workflow.

Java

Java Scalability Traffic Architecture

Streaming SQL in Data Mesh

The Netflix TechBlog

NOVEMBER 3, 2023

For more efficient schema management and evolution, the platform will automatically infer the output schema based on the fields selected by the SQL query. Once the user is done editing their query, it is saved to the Data Mesh Pipeline, which will then be deployed as a long running, streaming SQL job. Stay tuned for more updates!

Processing

Processing Engineering Infrastructure Latency

Should You Use ClickHouse as a Main Operational Database?

Percona

JANUARY 14, 2019

However, ClickHouse is super efficient for timeseries and provides “sharding” out of the box (scalability beyond one node). Although such databases can be very efficient with counts and averages, some queries will be slow or simply non existent. Inserts are efficient for bulk inserts only. created_utc?? ?

Database

Database Analytics Blockchain Healthcare

Write Optimized Spark Code for Big Data Applications

How Netflix uses eBPF flow logs at scale for network insight

Trending Sources

Conducting log analysis with an observability platform and full data context

Incremental Processing using Netflix Maestro and Apache Iceberg

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Optimizing data warehouse storage

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Streaming SQL in Data Mesh

Should You Use ClickHouse as a Main Operational Database?

Stay Connected