Engineering, Latency and Storage - Technology Performance Pulse

Designing Instagram

High Scalability

JANUARY 11, 2022

Machine Learning Engineer at Amazon and has led several machine-learning initiatives across the Amazon ecosystem. FUN FACT : In this talk , Rodrigo Schmidt, director of engineering at Instagram talks about the different challenges they have faced in scaling the data infrastructure at Instagram. This is a guest post by Ankit Sirmorya.

Design

Design Media Storage Logistics

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

OCTOBER 17, 2018

To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data

Big Data Latency Transportation Traffic

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

APRIL 27, 2023

Engineers want their alerting system to be realtime, reliable, and actionable. A few years ago, we were paged by our SRE team due to our Metrics Alerting System falling behind — critical application health alerts reached engineers 45 minutes late! It opens doors to support more exciting use-cases.

Storage

Storage Cache Metrics Database

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

a Netflix member via Twitter This is an example of a question our on-call engineers need to answer to help resolve a member issue?—?which We needed to increase engineering productivity via distributed request tracing. That is the first question our engineering teams asked us when integrating the tracer library.

Infrastructure

Infrastructure Transportation Storage Open Source

What is a Site Reliability Engineer (SRE)?

Dotcom-Montior

OCTOBER 6, 2021

A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.

Engineering

Engineering DevOps Monitoring Google

USENIX LISA2021 Computing Performance: On the Horizon

Brendan Gregg

JULY 4, 2021

AWS Graviton2); for memory with the arrival of DDR5 and High Bandwidth Memory (HBM) on-processor; for storage including new uses for 3D Xpoint as a 3D NAND accelerator; for networking with the rise of QUIC and eXpress Data Path (XDP); and so on. Ford, et al., “TCP

Performance

Performance Latency Hardware Storage

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

These workflows also utilize Davis® , the Dynatrace causal AI engine, and all your observability and security data across all platforms, in context, at scale, and in real-time. Storing frequently accessed data in faster storage, usually in-memory caching, improves data retrieval speed and overall system performance. Beyond

AWS

AWS Efficiency Azure Cloud

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

Besides the traditional system hardware, storage, routers, and software, ITOps also includes virtual components of the network and cloud infrastructure. This includes response time, accuracy, speed, throughput, uptime, CPU utilization, and latency. The primary goal of ITOps is to provide a high-performing, consistent IT environment.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

Performance

Performance Latency Cache Virtualization

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Please stop by our “Living Room” for an opportunity to connect or reconnect with Netflixers. We’ve compiled our speaking events below so you know what we’ve been working on. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Please stop by our “Living Room” for an opportunity to connect or reconnect with Netflixers. We’ve compiled our speaking events below so you know what we’ve been working on. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

Uber Engineering

MARCH 12, 2017

With the evolution of storage formats like Apache Parquet and Apache ORC and query engines like Presto and Apache Impala , the Hadoop ecosystem has the potential to become a general-purpose, unified serving layer for workloads that can tolerate latencies … The post Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop appeared (..)

Processing

Processing Latency Storage Engineering

MySQL Key Performance Indicators (KPI) With PMM

Percona

JUNE 22, 2023

Another related variable, innodb_buffer_pool_instances, determines the number of buffer pool instances for the InnoDB storage engine, which can improve the performance of multi-core systems by reducing contention on the buffer pool latch.

Performance

Performance Monitoring Traffic Database

How digital experience monitoring helps deliver business observability

Dynatrace

APRIL 26, 2022

STM generates traffic that replicates the typical path or behavior of a user on a network to measure performance for example, response times, availability, packet loss, latency, jitter, and other variables). One use case for STM is to model the behavior of a customer in the form of a flow of transactions along the buyer’s journey.

Monitoring

Monitoring Social Media IoT Metrics

USENIX LISA2021 Computing Performance: On the Horizon

Brendan Gregg

JULY 4, 2021

AWS Graviton2); for memory with the arrival of DDR5 and High Bandwidth Memory (HBM) on-processor; for storage including new uses for 3D Xpoint as a 3D NAND accelerator; for networking with the rise of QUIC and eXpress Data Path (XDP); and so on. Ford, et al., “TCP

Performance

Performance Latency Hardware Storage

Seamless offloading of web app computations from mobile device to edge clouds via HTML5 Web Worker migration

The Morning Paper

JANUARY 30, 2020

Edge servers are the middle ground – more compute power than a mobile device, but with latency of just a few ms. physics engine that simulates 3D cubes falling from the air. Why would we want to live migrate web workers? The kind of edge server envisaged here might, for example, be integrated with your WiFi access point.

Mobile

Mobile Cloud Latency Games

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

Performance

Performance Latency Cache Virtualization

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

Incoming data is saved into data storage (historian database or log store) for query by operational managers who must attempt to find the highest priority issues that require their attention. The best they can usually do in real-time using general purpose tools is to filter and look for patterns of interest.

IoT

IoT Analytics Big Data Architecture

Expanding the Cloud ? Provisioned IOPS for Amazon RDS - All.

All Things Distributed

SEPTEMBER 25, 2012

Following the huge success of being able to provision a consistent, user-requested I/O rate for DynamoDB and Elastic Block Store (EBS), the AWS Database Services team has now released Provisioned IOPS, a new high performance storage option for the Amazon Relational Database Service (Amazon RDS). Provisioned IOPS storage in RDS.

Cloud

Cloud AWS Storage Database

Testing MySQL 8.0.16 on Skylake with innodb_spin_wait_pause_multiplier

HammerDB

MAY 5, 2019

However in the Skylake microarchitecture (you can see a list of CPUs here ) the PAUSE instruction changed and in the documentation it says “the latency of the PAUSE instruction in prior generation microarchitectures is about 10 cycles, whereas in Skylake microarchitecture it has been extended to as many as 140 cycles.”

Testing

Testing Tuning Latency Storage

A case for managed and model-less inference serving

The Morning Paper

JUNE 13, 2019

Making queries to an inference engine has many of the same throughput, latency, and cost considerations as making queries to a datastore, and more and more applications are coming to depend on such queries. The following figure highlights how just one of these variables, batch size, impacts throughput and latency on ResNet50.

Hardware

Hardware Latency Serverless Energy

MySQL Performance Tuning 101: Key Tips to Improve MySQL Database Performance

Percona

SEPTEMBER 1, 2023

This reduction in latency ensures that applications and websites provide a more rapid and responsive user experience. By analyzing disk I/O metrics, you can optimize queries to reduce disk reads or upgrade to faster storage solutions. Avoid over-indexing, which can bloat storage and slow writes.

Tuning

Tuning Database Performance Hardware

Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook

The Morning Paper

MARCH 10, 2020

This benchmark can synthetically generate more precise key-value queries that represent the reads and writes of key-value stores to the underlying storage system. The paper examines three different uses of RocksDB at Facebook: UDB , the underlying storage engine for the MySQL databases storing the social graph data.

Benchmarking

Benchmarking Storage Cache Open Source

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. faster access to external storage and data locality (I/O, bandwidth). Heron is a real-time, distributed stream processing engine developed at Twitter. Storage provisioning. But Kubernetes storage is evolving quite quickly.

Big Data

Big Data Storage Benchmarking Hardware

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

These data pipelines can process data at petabytes scale and to some extent, their success can be attributed to an army of engineers devoted to build and maintain internal data pipelines. Not everyone is operating at Netflix or Spotify scale data engineering function. In a nutshell, a data pipeline is a distributed system.

Latency

Latency Analytics Scalability Engineering

Five Data-Loading Patterns To Improve Frontend Performance

Smashing Magazine

SEPTEMBER 28, 2022

An SSR application will generally have templating engines that inject the variables into an HTML when given to the client. Caching partially stores your data and is not used as permanent storage. Using the cache as permanent storage is an anti-pattern. has a whole section talking about SEO optimizations on their framework.

Cache

Cache Performance Servers Social Media

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

Growth Engineering at Netflix?—?Automated In the Growth Engineering team, we refer to this as the top of the signup funnel. For more background on the signup funnel and Growth Engineering’s role in the signup funnel, please read our initial post on the topic: Growth Engineering at Netflix? Accelerating Innovation.

Engineering

Engineering Storage Latency Entertainment

AnyLog: a grand unification of the Internet of things

The Morning Paper

FEBRUARY 23, 2020

AnyLog wants to do for structured (relational) data what the Web has done for unstructured data, with coordinators playing the role of search engines. Coordinators are servers that receive queries and return results (search engines). This comes in the form of micropayments. An embodiment for structured data for IoT.

Blockchain

Blockchain Internet Internet IoT

Redis® Monitoring Strategies for 2024

Scalegrid

DECEMBER 21, 2023

Identifying key Redis® metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To monitor Redis® instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold.

Strategy

Strategy Monitoring Latency DevOps

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Over the years, this platform took on support for both elastic online services and fully featured batch workloads supporting use cases across Netflix engineering.

AWS

AWS Entertainment Open Source Benchmarking

Scalable MicroService Architecture

VoltDB

JULY 10, 2018

In these use cases, data processing usually has less than a 5 milliseconds latency budget. With the stored procedures framework, and the in-memory data storage engine, VoltDB drives most complex business logic in the lowest latency in a scalable manner, even in a virtualized environment like VMs and containers.

Architecture

Architecture Scalability Ecommerce Latency

Scalable MicroService Architecture

VoltDB

JULY 10, 2018

In these use cases, data processing usually has less than a 5 milliseconds latency budget. With the stored procedures framework, and the in-memory data storage engine, VoltDB drives most complex business logic in the lowest latency in a scalable manner, even in a virtualized environment like VMs and containers.

Architecture

Architecture Scalability Ecommerce Latency

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

This article will explore how they handle data storage and scalability, perform in different scenarios, and, most importantly, how these factors influence your choice. It uses a hash table to manage these pairs, divided into fixed-size buckets with linked lists for key-value storage. Redis Database Management with ScaleGrid ScaleGrid.io

Cache

Cache Storage Scalability Architecture

Edgar: Solving Mysteries Faster with Observability

The Netflix TechBlog

SEPTEMBER 2, 2020

For engineers, instead of whodunit, the question is often “what failed and why?” An engineer can find herself digging through logs, poring over traces, and staring at dozens of dashboards. Edgar provides a powerful and consumable user experience to both engineers and non-engineers alike.

Latency

Latency Transportation Engineering Traffic

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

In particular this has been true for applications based on algorithms - often MPI-based - that depend on frequent low-latency communication and/or require significant cross sectional bandwidth. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. At werner.ly Syndication. or rss feed.

Cloud

Cloud AWS Automotive Latency

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

From chunk encoding to assembly and packaging, the result of each previous processing step must be uploaded to cloud storage and then downloaded by the next processing step. Uploading and downloading data always come with a penalty, namely latency.

Cloud

Cloud Media Storage Cache

Compression Methods in MongoDB: Snappy vs. Zstd

Percona

MARCH 29, 2023

Compression in any database is necessary as it has many advantages, like storage reduction, data transmission time, etc. Storage reduction alone results in significant cost savings, and we can save more data in the same space. By default, MongoDB provides a snappy block compression method for storage and network communication.

Storage

Storage Network Open Source Latency

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The data warehouse is not designed to serve point requests from microservices with low latency.

Latency

Latency Storage Big Data Tuning

Service level objectives: 5 SLOs to get started

Dynatrace

JUNE 1, 2023

Note : you might hear the term latency used instead of response time. Both latency and response time are critical to ensure reliability. Latency typically refers to the time it takes for a single request to travel from its source to its destination. Latency primarily focuses on the time spent in transit.

Latency

Latency Website Traffic Virtualization

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

There is more than one Werner Vogels in this world and although I never get emails, snail mail or phones calls for any of my peers, I am sure they are somewhat frustrated if they type in our name in a search engine :-). This achieves very low-latency for queries which is crucial for the overall performance of internet applications.

Cloud

Cloud Internet Internet AWS

Service level objective examples: 5 SLO examples for faster, more reliable apps

Dynatrace

JUNE 1, 2023

Note : you might hear the term latency used instead of response time. Both latency and response time are critical to ensure reliability. Latency typically refers to the time it takes for a single request to travel from its source to its destination. Latency primarily focuses on the time spent in transit.

Traffic

Traffic Latency Website Virtualization

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

All Things Distributed

NOVEMBER 19, 2010

Science & Engineering. Understanding Throughput-Oriented Architectures - background article in CACM on massively parallel and throughput vs latency oriented architectures. an engineering adventure to break the 1,000 mph barrier in a car. Driving Storage Costs Down for AWS Customers. From Airships to Waterslides.

AWS

AWS Cloud Benchmarking Storage

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

All Things Distributed

APRIL 28, 2010

There are four main reasons to do so: Performance - For many applications and services, data access latency to end users is important. The new Singapore Region offers customers in APAC lower-latency access to AWS services. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

AWS

AWS Cloud Latency Storage

Designing Instagram

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Trending Sources

Improved Alerting with Atlas Streaming Eval

Building Netflix’s Distributed Tracing Infrastructure

What is a Site Reliability Engineer (SRE)?

USENIX LISA2021 Computing Performance: On the Horizon

Implementing AWS well-architected pillars with automated workflows

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Optimizing data warehouse storage

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

MySQL Key Performance Indicators (KPI) With PMM

How digital experience monitoring helps deliver business observability

USENIX LISA2021 Computing Performance: On the Horizon

Seamless offloading of web app computations from mobile device to edge clouds via HTML5 Web Worker migration

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

The Need for Real-Time Device Tracking

Expanding the Cloud ? Provisioned IOPS for Amazon RDS - All.

Testing MySQL 8.0.16 on Skylake with innodb_spin_wait_pause_multiplier

A case for managed and model-less inference serving

MySQL Performance Tuning 101: Key Tips to Improve MySQL Database Performance

Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook

Kubernetes for Big Data Workloads

Friends don't let friends build data pipelines

Five Data-Loading Patterns To Improve Frontend Performance

Growth Engineering at Netflix?—?Automated Imagery Generation

AnyLog: a grand unification of the Internet of things

Redis® Monitoring Strategies for 2024

Netflix at AWS re:Invent 2019

Scalable MicroService Architecture

Scalable MicroService Architecture

Redis vs Memcached in 2024

Edgar: Solving Mysteries Faster with Observability

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Netflix Cloud Packaging in the Terabyte Era

Compression Methods in MongoDB: Snappy vs. Zstd

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Service level objectives: 5 SLOs to get started

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Service level objective examples: 5 SLO examples for faster, more reliable apps

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

Stay Connected