Architecture, Big Data, Engineering and Tuning - Technology Performance Pulse

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. We have also noted a great potential for further improvement by model tuning (see the section of Rollout in Production).

Tuning

Tuning Efficiency Big Data Engineering

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

By Vikram Srivastava and Marcelo Mayworm Netflix has one of the most complex data platforms in the cloud on which our data scientists and engineers run batch and streaming workloads. Pensive collects logs for the failed jobs launched by the step from the relevant data platform components and then extracts the stack traces.

Big Data

Big Data Infrastructure Metrics Hardware

What is IT automation?

Dynatrace

JULY 6, 2022

As organizations continue to adopt multicloud strategies, the complexity of these environments grows, increasing the need to automate cloud engineering operations to ensure organizations can enforce their policies and architecture principles. Big data automation tools. Monitoring automation is ongoing.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Causal AI—which brings AI-enabled actionable insights to IT operations—and a data lakehouse, such as Dynatrace Grail , can help break down silos among ITOps, DevSecOps, site reliability engineering, and business analytics teams. Logs are automatically produced and time-stamped documentation of events relevant to cloud architectures.

Analytics

Analytics Infrastructure Storage Efficiency

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

And in order to gain visibility into these logs, we need to somehow ingest and enrich this data. It is easier to tune a large Spark job for a consistent volume of data. In other words, we are able to ensure that our Spark app does not “eat” more data than it was tuned to handle. We named this library Sqooby.

Network

Network Tuning AWS Big Data

Streaming SQL in Data Mesh

The Netflix TechBlog

NOVEMBER 3, 2023

The Data Mesh SQL Processor is a platform-managed, parameterized Flink Job that takes schematized sources and a Flink SQL query that will be executed against those sources. By leveraging Flink SQL within a Data Mesh Processor, we were able to support the streaming SQL functionality without changing the architecture of Data Mesh.

Processing

Processing Engineering Infrastructure Latency

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Now, imagine yourself in the role of a software engineer responsible for a micro-service which publishes data consumed by few critical customer facing services (e.g. You are about to make structural changes to the data and want to know who and what downstream to your service will be impacted.

Infrastructure

Infrastructure Big Data Transportation Architecture

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Flow Exporter The Flow Exporter is a sidecar that uses eBPF tracepoints to capture TCP flows at near real time on instances that power the Netflix microservices architecture. After several iterations of the architecture and some tuning, the solution has proven to be able to scale. What is BPF?

Network

Network Transportation AWS Cloud

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns. For example, a job would reprocess aggregates for the past 3 days because it assumes that there would be late arriving data, but data prior to 3 days isn’t worth the cost of reprocessing. past 3 hours or 10 days).

Processing

Processing Big Data Efficiency Engineering

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture. Some of the optimizations are prerequisites for a high-performance data warehouse. Orient: Gather tuning parameters for a particular table that changed.

Storage

Storage Latency Efficiency Data Engineering

Microsoft Engineering loves SQLBits

SQL Server According to Bob

FEBRUARY 15, 2018

Microsoft engineering is actually sending quite a few folks over the Atlantic to come talk about SQL Server 2017, SQL Server on Linux, GDPR, Performance, Security, Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse, and Azure CosmosDB. Best practices on Building a Big Data Analytics Solution – Michael Rys.

Engineering

Engineering Azure Best Practices Servers

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

ProxySQL: It is a feature-rich open-source MySQL proxy solution, that allows query routing for the most common MySQL architectures (PXC/Galera, Replication, Group Replication, etc.). MyRocks: MyRocks is a storage engine developed by Facebook and made open source. It supports native sharding being transparent for the application.

Open Source

Open Source Storage Database Big Data

Web Performance Bookshelf

Rigor

JANUARY 13, 2020

When it comes to web content, you can easily find what you need through many different paths, from search engines and social media to playlists and blogs, jumping from one source to another with just a tap of a finger. Information Architecture. Web Performance Tuning. Web Performance Daybook-Volume-2. Website Optimization.

Performance

Performance Social Media Website Website Performance

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 30, 2018

USENIX’s LISA conference is the premier event for topics in production system engineering. And we believe that everyone has valuable stories to share: Your tales of trial and error, analysis and workarounds, or architecture migrations, can happen at any company. ## Submit Your Work! Join us for 3 days in Nashville at LISA'18.

DevOps

DevOps Network Best Practices Programming

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 29, 2018

USENIX’s LISA conference is the premier event for topics in production system engineering. And we believe that everyone has valuable stories to share: Your tales of trial and error, analysis and workarounds, or architecture migrations, can happen at any company. ## Submit Your Work! Join us for 3 days in Nashville at LISA'18.

DevOps

DevOps Network Best Practices Programming

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

InfoQ

APRIL 15, 2024

He specifically delved into Venice DB, the NoSQL data store used for feature persistence. At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. By Rafal Gancarz

Artificial Intelligence

Artificial Intelligence Big Data Data Engineering Latency

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

We see that with our Amazon customers; when they hear a great tune on a radio they may identify it using the Shazam or Soundhound apps on their mobile phone and buy that song instantly from the Amazon MP3 store. Driving down the cost of Big-Data analytics. More details at [link]. blog comments powered by Disqus.

AWS

AWS Cloud Storage Internet

Technology Performance Pulse

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Auto-Diagnosis and Remediation in Netflix Data Platform

Trending Sources

What is IT automation?

Conducting log analysis with an observability platform and full data context

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Streaming SQL in Data Mesh

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

How Netflix uses eBPF flow logs at scale for network insight

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Incremental Processing using Netflix Maestro and Apache Iceberg

Optimizing data warehouse storage

Microsoft Engineering loves SQLBits

Why MySQL Could Be Slow With Large Tables

Web Performance Bookshelf

USENIX LISA 2018: CFP Now Open

USENIX LISA 2018: CFP Now Open

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

Music to my Ears - All Things Distributed

Stay Connected