article thumbnail

Presentation: Modern Compute Stack for Scaling Large AI/ML/LLM Workloads

InfoQ

Jules Damji discusses which infrastructure should be used for distributed fine-tuning and training, how to scale ML workloads, how to accommodate large models, and how can CPUs and GPUs be utilized? By Jules Damji

Tuning 89
article thumbnail

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. We have also noted a great potential for further improvement by model tuning (see the section of Rollout in Production).

Tuning 210
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

What is IT automation?

Dynatrace

Expect to spend time fine-tuning automation scripts as you find the right balance between automated and manual processing. AI that is based on machine learning needs to be trained. This requires significant data engineering efforts, as well as work to build machine-learning models.

article thumbnail

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java 202
article thumbnail

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

IPS enables users to continue to use the data processing patterns with minimal changes. Introduction Netflix relies on data to power its business in all phases. As our business scales globally, the demand for data is growing and the needs for scalable low latency incremental processing begin to emerge. past 3 hours or 10 days).

article thumbnail

Organise your engineering teams around the work by reteaming

Abhishek Tiwari

Luckily, aircraft operating manuals and training procedures are so formalised and well established that there is no scope of performance degradation even if one or more crew members are replaced. Because you are changing team composition, you need robust norms of conduct and engineering practices in place.