Technology Performance Pulse

Sequential A/B Testing Keeps the World Streaming Netflix Part 1: Continuous Data

The Netflix TechBlog

FEBRUARY 13, 2024

These observations are from a particular type of A/B test that Netflix runs called a software canary or regression-driven experiment. Netflix also performs canary tests — software A/B tests between current and newer software versions. Strictly control false positive (false alarm) probabilities.

Testing

Testing Software Software Metrics

Randomness in Software Estimates

Professor Beekums

JUNE 3, 2019

(..)

Software

Software Software

Experimentation is a major focus of Data Science across Netflix

The Netflix TechBlog

JANUARY 11, 2022

A Type-M error occurs when, given that we observe a statistically-significant result, the size of the estimated metric movement is magnified (or exaggerated) relative to the truth. A Type-M error means that we are over-estimating the impact of the treatment. Combined, these two effects reduce the risk of Type-S and Type-M errors.

Innovation

Innovation Metrics Engineering Testing

Percentiles don’t work: Analyzing the distribution of response times for web services

Adrian Cockcroft

JANUARY 29, 2023

Plot showing the final result of fitting multiple normal distributions to a response time curve Most people have figured out that the average response time for a web service is a very poor estimate of it’s behavior, as responses are usually much faster than the average, but there’s a long tail of much slower responses.

Lambda

Lambda Latency Cache C++

Building confidence in a decision

The Netflix TechBlog

NOVEMBER 15, 2021

Even if results are statistically significant (p-value < 0.05), the estimated metric movements may be so small that they are immaterial to the Netflix member experience, and we are better off investing our innovation efforts in other areas. Similar considerations are relevant when interpreting results. Do results repeat ?

Metrics

Metrics Testing Innovation Design

PlanAlyzer: assessing threats to the validity of online experiments

The Morning Paper

NOVEMBER 21, 2019

Our checks are based on well-known problems that arise in experimental design and causal inference… PlanAlyzer checks PlanOut programs for a variety of threats to internal validity, including failures of randomization, treatment assignment, and causal sufficiency. PlanOut itself has been ported many programming languages at this point.

Programming

Programming Internet Internet Design

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

Alex Podelko

DECEMBER 19, 2019

So here is the list of 21 sessions on my “to attend” list (check the full agenda as you may be interested in another topics and technologies – and there many more great sessions there) – in the same random order they are in the list of sessions). How is DevOps changing the Modern Software Development Landscape? ,

Efficiency

Efficiency Artificial Intelligence Scalability Performance

How Much Does It Cost to Develop an App in 2023? Cost Breakdown

Official Blog - World Web Technology

MARCH 16, 2023

You might get random answers for app development costs, but most of us don’t know how the total mobile app development cost is calculated. It can save you from headaches such as recruiting the team, interviewing them, buying licensed tools and software, training, monitoring the progress, etc. It might differ from project to project.

Development

Development Mobile Artificial Intelligence Design

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

Estimate infrastructure cost of running an A/B test The Data Science and Product team factors in the costs of running A/B tests on microservices by analyzing traces that have relevant A/B test names as tags. The scope and complexity of our software systems continue to increase as Netflix grows. What’s next?

Infrastructure

Infrastructure Transportation Storage Open Source

Data Mining Problems in Retail

Highly Scalable

MARCH 10, 2015

and the optimal subset of customers can be determined as a subset that maximizes the gross margin: This approach can also be considered the maximization of targeted net value compared to random resource distribution. maximizes the difference i.e. the lift of targeting compared to the random distribution. Since the equation (1.1)

Retail

Retail C++ Analytics Metrics

MySQL Capacity Planning

Percona

AUGUST 8, 2023

Global caches like the InnoDB buffer pool and MyISAM key cache and session-level caches like the sort buffer, join buffer, random read buffer, etc. But, much like any other software that allows for concurrent session usage, there are mutexes/semaphores in the code that are used to limit the number of sessions that can access shared resources.

Traffic

Traffic Cache Monitoring Database

Tips And Tricks For Evaluating UX/UI Designers

Smashing Magazine

OCTOBER 15, 2021

In 2021, it’s not just advisable, but compulsory to invest in customer experience as with high-quality software development available literally anywhere and at different costs, successful interaction with your customer is the real game-changer. Remote hiring goes far beyond random inquiries about the candidate’s qualities and experience.

Design

Design Website Efficiency Games

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

To our shareowners: Random forests, naÃ¯ve Bayesian estimators, RESTful services, gossip protocols, eventual consistency, data sharding, anti-entropy, Byzantine quorum, erasure coding, vector clocks. Look inside a current textbook on software architecture, and youll find few patterns that we dont apply at Amazon.

Technology

Technology Technology AWS Storage

Apple Is Not Defending Browser Engine Choice

Alex Russell

JUNE 22, 2022

The latest consolidated financials (PDF) are from 2020 and show that, without marketing expenses, Mozilla spends between $380 and $430 million US per year on software development. Browser vendors fund their industrial-scale software engineering projects through integrations. A reasonable floor comes from Mozilla's annual reports.

Engineering

Engineering Traffic Government Google

Failure Modes and Continuous Resilience

Adrian Cockcroft

NOVEMBER 11, 2019

In the same way that we have moved from a few big software releases a year to continuous delivery of many small changes, we need to move from annual disaster recover tests or suffering when things actually break, to continuously tested resilience. This discussion focuses on hardware, software and operational failure modes.

Latency

Latency Engineering Systems Hardware

Failure Modes and Continuous Resilience

Adrian Cockcroft

NOVEMBER 11, 2019

In the same way that we have moved from a few big software releases a year to continuous delivery of many small changes, we need to move from annual disaster recover tests or suffering when things actually break, to continuously tested resilience. This discussion focuses on hardware, software and operational failure modes.

Latency

Latency Engineering Systems Hardware

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

On the other hand, it turned out that software applications are not so often interested in in-database aggregation and able to control, at least in many cases, integrity and validity themselves. An important feature of a Geohash is its ability to estimate distance between regions using bit-wise code proximity, as is shown in the figure.

Database

Database Ecommerce Efficiency Engineering

Linux Load Averages: Solving the Mystery

Brendan Gregg

AUGUST 8, 2017

The color is blue for off-CPU stacks (I use warm colors for on-CPU stacks), and the saturation has random variance to differentiate frames. I generated this using my offcputime tool from [bcc] (this tool needs eBPF features from Linux 4.8+), and my [flame graph] software: #./bcc/tools/offcputime.py Yes, I'd say so. They aren't idle.

Latency

Latency Metrics C++ Systems

What Are ChatGPT and Its Friends?

O'Reilly

MARCH 23, 2023

Maybe it’s surprising that ChatGPT can write software, maybe it isn’t; we’ve had over a year to get used to GitHub Copilot, which was based on an earlier version of GPT. What Software Are We Talking About? There’s a “temperature” setting in the ChatGPT API that controls how random the response is. It has helped to write a book.

Google

Google Open Source Programming Code

HTTP/3: Performance Improvements (Part 2)

Smashing Magazine

AUGUST 22, 2021

However, if all QUIC connections were to use random CIDs, this would heavily increase memory requirements at the load balancer, because it would need to store mappings of CIDs to back-end servers. Additionally, this would still not work with connection migration, as the CIDs change to new random values.

Performance

Performance Network Latency Servers

Technology Performance Pulse

Sequential A/B Testing Keeps the World Streaming Netflix Part 1: Continuous Data

Randomness in Software Estimates

Trending Sources

Experimentation is a major focus of Data Science across Netflix

Percentiles don’t work: Analyzing the distribution of response times for web services

Building confidence in a decision

PlanAlyzer: assessing threats to the validity of online experiments

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

How Much Does It Cost to Develop an App in 2023? Cost Breakdown

Building Netflix’s Distributed Tracing Infrastructure

Data Mining Problems in Retail

MySQL Capacity Planning

Tips And Tricks For Evaluating UX/UI Designers

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Apple Is Not Defending Browser Engine Choice

Failure Modes and Continuous Resilience

Failure Modes and Continuous Resilience

NoSQL Data Modeling Techniques

Linux Load Averages: Solving the Mystery

What Are ChatGPT and Its Friends?

HTTP/3: Performance Improvements (Part 2)

Stay Connected