article thumbnail

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. We have also noted a great potential for further improvement by model tuning (see the section of Rollout in Production).

Tuning 210
article thumbnail

Why MySQL Could Be Slow With Large Tables

Percona

It was developed for optimizing data storage and access for big data sets. There is a cool blog post from Vadim covering big data sets in MyRocks: MyRocks Use Case: Big Dataset Query tuning: It is common to find applications that at the beginning perform very well, but as data grows the performance starts to decrease.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

However, it is paramount that we validate the complete set of identifiers such as a list of movie ids across producers and consumers for higher overall confidence in the data transport layer of choice. Genesis Data Source and Input definition example Genesis is a stateless CLI written in Node.js Please stay tuned!

Big Data 253
article thumbnail

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java 202
article thumbnail

Conducting log analysis with an observability platform and full data context

Dynatrace

With the extent of observability data going beyond human capacity to manage, Grail is the first purpose-built causational data lakehouse that allows for immediate answers with cost-efficient, scalable storage. Business leaders can decide which logs they want to use and tune storage to their data needs.

Analytics 186
article thumbnail

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

In addition, we derive lineage information from scheduled ETL jobs by extracting workflow definitions and runtime metadata using Meson scheduler APIs. Netflix’s diverse data landscape made it challenging to capture all the right data and conforming it to a common data model.

article thumbnail

Optimizing data warehouse storage

The Netflix TechBlog

Orient: Gather tuning parameters for a particular table that changed. AutoAnalyze In short, AutoAnalyze finds the best tuning/configuration parameters for a table. At the snapshot scan stage, we get a commit definition containing the list of files and their metadata (like size, number of records, etc.)

Storage 203