article thumbnail

Kubernetes for Big Data Workloads

Abhishek Tiwari

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges.

article thumbnail

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. Upon further profiling, we found that most of the latency came from the candidate generated step (i.e.,

Tuning 210
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. By Rafal Gancarz

Cache 83
article thumbnail

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

A unified data management (UDM) system combines the best of data warehouses, data lakes, and streaming without expensive and error-prone ETL. It offers reliability and performance of a data warehouse, real-time and low-latency characteristics of a streaming system, and scale and cost-efficiency of a data lake.

article thumbnail

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

InfoQ

He specifically delved into Venice DB, the NoSQL data store used for feature persistence. At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. By Rafal Gancarz

article thumbnail

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

Whether in analyzing A/B tests, optimizing studio production, training algorithms, investing in content acquisition, detecting security breaches, or optimizing payments, well structured and accurate data is foundational. Backfill: Backfilling datasets is a common operation in big data processing. past 3 hours or 10 days).

article thumbnail

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

We at Netflix, as a streaming service running on millions of devices, have a tremendous amount of data about device capabilities/characteristics and runtime data in our big data platform. With large data, comes the opportunity to leverage the data for predictive and classification based analysis.

Big Data 179