Blog, Engineering, Latency and Storage - Technology Performance Pulse

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

APRIL 27, 2023

Engineers want their alerting system to be realtime, reliable, and actionable. A few years ago, we were paged by our SRE team due to our Metrics Alerting System falling behind — critical application health alerts reached engineers 45 minutes late! The internals here are outside the scope of this blog post.

Storage

Storage Cache Metrics Database

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

OCTOBER 17, 2018

To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data

Big Data Latency Transportation Traffic

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

a Netflix member via Twitter This is an example of a question our on-call engineers need to answer to help resolve a member issue?—?which In our previous blog post we introduced Edgar, our troubleshooting tool for streaming sessions. We needed to increase engineering productivity via distributed request tracing.

Infrastructure

Infrastructure Transportation Storage Open Source

Compression Methods in MongoDB: Snappy vs. Zstd

Percona

MARCH 29, 2023

Compression in any database is necessary as it has many advantages, like storage reduction, data transmission time, etc. Storage reduction alone results in significant cost savings, and we can save more data in the same space. In this blog, we will discuss both data and network-level compression offered in MongoDB.

Storage

Storage Network Open Source Latency

What is a Site Reliability Engineer (SRE)?

Dotcom-Montior

OCTOBER 6, 2021

A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.

Engineering

Engineering DevOps Monitoring Google

Maximizing Performance of AWS RDS for MySQL with Dedicated Log Volumes

Percona

DECEMBER 11, 2023

A Dedicated Log Volume (DLV) is a specialized storage volume designed to house database transaction logs separately from the volume containing the database tables. DLVs are particularly advantageous for databases with large allocated storage, high I/O per second (IOPS) requirements, or latency-sensitive workloads.

AWS

AWS Benchmarking Performance Traffic

USENIX LISA2021 Computing Performance: On the Horizon

Brendan Gregg

JULY 4, 2021

AWS Graviton2); for memory with the arrival of DDR5 and High Bandwidth Memory (HBM) on-processor; for storage including new uses for 3D Xpoint as a 3D NAND accelerator; for networking with the rise of QUIC and eXpress Data Path (XDP); and so on. Ford, et al., “TCP

Performance

Performance Latency Hardware Storage

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace

JUNE 25, 2020

The network latency between cluster nodes should be around 10 ms or less. – A Dynatrace customer, Head of Performance Engineering. For Premium HA, this has been extended from 10 ms latency (in the same network region) to around 100 ms network latency due to asynchronous data replication between regions.

Availability

Availability Hardware Latency Traffic

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

In this blog post, we’ll demonstrate how Dynatrace automation and the Dynatrace Site Reliability Guardian can help you implement your applications according to all six AWS Well-Architected pillars by integrating them into your software development lifecycle (SDLC).

AWS

AWS Efficiency Azure Cloud

InnoDB Performance Optimization Basics

Percona

MARCH 23, 2023

This blog is in reference to our previous ones for ‘Innodb Performance Optimizations Basics’ 2007 and 2013. Although there have been many blogs about adjusting MySQL variables for better performance since then, I think this topic deserves a blog update since the last update was a decade ago, and MySQL 5.7

Performance

Performance Hardware Tuning Storage

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

Uber Engineering

MARCH 12, 2017

With the evolution of storage formats like Apache Parquet and Apache ORC and query engines like Presto and Apache Impala , the Hadoop ecosystem has the potential to become a general-purpose, unified serving layer for workloads that can tolerate latencies … The post Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop appeared (..)

Processing

Processing Latency Storage Engineering

MySQL Key Performance Indicators (KPI) With PMM

Percona

JUNE 22, 2023

In this blog, we will explore various MySQL KPIs that are basic and essential to track using monitoring tools like PMM. Replication lag can occur due to various factors such as network latency, system resource limitations, complex transactions, or heavy write loads on the primary/master database.

Performance

Performance Monitoring Traffic Database

Aurora vs RDS: How to Choose the Right AWS Database Solution

Percona

JULY 1, 2023

In this blog, we will answer all of these important questions and provide a general overview comparing the two database services, Aurora vs RDS. What we should really compare is the MySQL and Aurora database engines provided by Amazon RDS. What are the differences between Aurora and RDS? How do I choose which one to use?

AWS

AWS Database Serverless Storage

How digital experience monitoring helps deliver business observability

Dynatrace

APRIL 26, 2022

STM generates traffic that replicates the typical path or behavior of a user on a network to measure performance for example, response times, availability, packet loss, latency, jitter, and other variables). The post How digital experience monitoring helps deliver business observability appeared first on Dynatrace blog.

Monitoring

Monitoring Social Media IoT Metrics

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

USENIX LISA2021 Computing Performance: On the Horizon

Brendan Gregg

JULY 4, 2021

AWS Graviton2); for memory with the arrival of DDR5 and High Bandwidth Memory (HBM) on-processor; for storage including new uses for 3D Xpoint as a 3D NAND accelerator; for networking with the rise of QUIC and eXpress Data Path (XDP); and so on. Ford, et al., “TCP

Performance

Performance Latency Hardware Storage

An Enterprise-Grade MongoDB Alternative Without Licensing or Lock-in

Percona

JULY 17, 2023

In this blog, we’ll examine the reasons why people would seek an alternative to MongoDB Enterprise, and we’ll identify some of the most popular NoSQL alternatives. First, some stage-setting for this blog article. 1 among non-relational/document-based systems ( DB-Engines, July 2023 ). It ranks No.

Open Source

Open Source Database Scalability Software

Seamless offloading of web app computations from mobile device to edge clouds via HTML5 Web Worker migration

The Morning Paper

JANUARY 30, 2020

Edge servers are the middle ground – more compute power than a mobile device, but with latency of just a few ms. physics engine that simulates 3D cubes falling from the air. Why would we want to live migrate web workers? The kind of edge server envisaged here might, for example, be integrated with your WiFi access point.

Mobile

Mobile Cloud Latency Games

Expanding the Cloud ? Provisioned IOPS for Amazon RDS - All.

All Things Distributed

SEPTEMBER 25, 2012

Following the huge success of being able to provision a consistent, user-requested I/O rate for DynamoDB and Elastic Block Store (EBS), the AWS Database Services team has now released Provisioned IOPS, a new high performance storage option for the Amazon Relational Database Service (Amazon RDS). Read more about on their blog.

Cloud

Cloud AWS Storage Database

The Most Important MySQL Setting

Percona

APRIL 7, 2023

Here’s how the same test performed when running Percona Distribution for PostgreSQL 14 on these same servers: Queries: reads Queries: writes Queries: other Queries: total Transactions Latency (95th) MySQL (A) 1584986 1645000 245322 3475308 122277 20137.61 MySQL (B) 2517529 2610323 389048 5516900 194140 11523.48

Tuning

Tuning Cache Servers Benchmarking

The Future in Visual Computing: Research Challenges

ACM Sigarch

DECEMBER 6, 2018

Each of these categories opens up challenging problems in AI/visual algorithms, high-density computing, bandwidth/latency, distributed systems. Artists, researchers, and engineers are already starting to harness the power of deep learning based generative models to create content. Generative and Interactive Visual Workloads.

Wireless

Wireless IoT Analytics Architecture

MongoDB Best Practices: Security, Data Modeling, & Schema Design

Percona

APRIL 17, 2023

In this blog post, we will discuss the best practices on the MongoDB ecosystem applied at the Operating System (OS) and MongoDB levels. The CFQ works well for many general use cases but lacks latency guarantees. The deadline excels at latency-sensitive use cases ( like databases ), and noop is closer to no schedule at all.

Best Practices

Best Practices Design Tuning Database

MongoDB Database Backup: Best Practices & Expert Tips

Percona

MAY 2, 2023

This blog was originally published in September 2020 and was updated in May 2023. In this blog, we will be discussing different MongoDB database backup strategies and their use cases, along with pros and cons and a few other useful tips. Especially if going into or out of storage types that may throttle bandwidth/network traffic.

Best Practices

Best Practices Database Storage Servers

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

Growth Engineering at Netflix?—?Automated In the Growth Engineering team, we refer to this as the top of the signup funnel. For more background on the signup funnel and Growth Engineering’s role in the signup funnel, please read our initial post on the topic: Growth Engineering at Netflix? Accelerating Innovation.

Engineering

Engineering Storage Latency Entertainment

Software Testing Trends 2021 – What can we expect?

Testsigma

FEBRUARY 12, 2021

According to Gartner, the greatest technological developments in 2021 will influence the future from technology affecting how people operate, to AI engineering and hyperautomation. This obligated QA engineers, in particular, to pay more attention to the user interface. appeared first on Testsigma Blog.

Artificial Intelligence

Artificial Intelligence Software Software IoT

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

In particular this has been true for applications based on algorithms - often MPI-based - that depend on frequent low-latency communication and/or require significant cross sectional bandwidth. blog comments powered by Disqus. he posts material that doesnt belong on this blog or on twitter. Contact Info. Werner Vogels.

Cloud

Cloud AWS Automotive Latency

Progress Delayed Is Progress Denied

Alex Russell

APRIL 29, 2021

Apple forces developers of competing browsers to use their engine for all browsers on iOS , restricting their ability to deliver a better version of the web platform. They are, pound for pound, some of the best engine developers globally and genuinely want good things for the web. So is speedy resolution and agreement.

Media

Media Games Education Engineering

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

All Things Distributed

NOVEMBER 19, 2010

There were blog posts by Jeff Barr The Cluster GPU Instance and James Hamilton HPC in the Cloud with GPGPUs , as well as my background posting: Expanding the Cloud - Adding the Incredible Power of the Amazon EC2 Cluster GPU Instances. Science & Engineering. an engineering adventure to break the 1,000 mph barrier in a car.

AWS

AWS Cloud Benchmarking Storage

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

There is more than one Werner Vogels in this world and although I never get emails, snail mail or phones calls for any of my peers, I am sure they are somewhat frustrated if they type in our name in a search engine :-). This achieves very low-latency for queries which is crucial for the overall performance of internet applications.

Cloud

Cloud Internet Internet AWS

Hobson's Browser

Alex Russell

JULY 14, 2021

The 85% global-share OS (Android) has historically facilitated browser choice and diversity in browser engines. Engine diversity is essential, as it is the mechanism that causes competition to deliver better performance, capability, privacy, security, and user controls. If so, it's a browser regardless of the underlying engine.

Google

Google Mobile Engineering Internet

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

Our previous tech blog Packaging award-winning shows with award-winning technology detailed our packaging technology deployed on the streaming side. From chunk encoding to assembly and packaging, the result of each previous processing step must be uploaded to cloud storage and then downloaded by the next processing step.

Cloud

Cloud Media Storage Cache

Edgar: Solving Mysteries Faster with Observability

The Netflix TechBlog

SEPTEMBER 2, 2020

For engineers, instead of whodunit, the question is often “what failed and why?” An engineer can find herself digging through logs, poring over traces, and staring at dozens of dashboards. In an earlier blog post, we discussed Telltale , our health monitoring system. The more complex a system, the more places to look for clues.

Latency

Latency Transportation Engineering Traffic

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

All Things Distributed

APRIL 28, 2010

There are four main reasons to do so: Performance - For many applications and services, data access latency to end users is important. The new Singapore Region offers customers in APAC lower-latency access to AWS services. You can also find more information on the AWS developer blog. blog comments powered by Disqus.

AWS

AWS Cloud Latency Storage

Optimizing Web Performance: Understanding Waterfall Charts

Dotcom-Montior

MARCH 6, 2022

Waterfall charts are diagrams which represent how website resources are being downloaded, parsed by the engine, in a timeline that gives us the opportunity to see the sequence and dependencies between resources. A fast website increases conversion rates and leads you to perform well on search engines. Do You Need a CDN?

Performance

Performance Cache Website Speed

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Our previous blog post presented replay traffic testing — a crucial instrument in our toolkit that allows us to implement these transformations with precision and reliability. This blog post will delve into the techniques leveraged at Netflix to introduce these changes to production.

Traffic

Traffic Metrics Systems Strategy

Boosted race trees for low energy classification

The Morning Paper

MAY 28, 2019

We don’t talk about energy as often as we probably should on this blog, but it’s certainly true that our data centres and various IT systems consume an awful lot of it. Together this set of four operations allow us to deliberately engineer “race conditions” in a circuit to perform computation at low energy. ASPLOS’19.

Energy

Energy Hardware Efficiency Architecture

Optimizing Web Performance: Understanding Waterfall Charts

Dotcom-Montior

MAY 6, 2020

Waterfall charts are diagrams which represent how website resources are being downloaded, parsed by the engine, in a timeline that gives us the opportunity to see the sequence and dependencies between resources. A fast website increases conversion rates and leads you to perform well on search engines. Do You Need a CDN?

Performance

Performance Cache Website Speed

Solaris to Linux Migration 2017

Brendan Gregg

SEPTEMBER 5, 2017

This includes many great engineers who I'm sure will excel in whatever they choose to work on next. Here's some output from my zfsdist tool, in bcc/BPF, which measures ZFS latency as a histogram on Linux: # zfsdist. Tracing ZFS operation latency. Many new tools can now be written, and the main toolkit we're working on is [bcc].

Virtualization

Virtualization AWS Engineering Hardware

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

This blog post presents how our current iteration of Titus deals with high API call volumes by scaling out horizontally. When a new leader is elected it loads all data from external storage. We started seeing increased response latencies and leader servers running at dangerously high utilization.

Cache

Cache Latency Traffic Systems

Netflix Drive

The Netflix TechBlog

MAY 5, 2021

Netflix Drive relies on a data store that will be the persistent storage layer for assets, and a metadata store which will provide a relevant mapping from the file system hierarchy to the data store entities. We will cover the different namespaces of Netflix Drive in more detail in a subsequent blog post. A sample manifest file.

Media

Media Storage Architecture Cloud

Evolution of ML Fact Store

The Netflix TechBlog

APRIL 26, 2022

The first version of our logger library optimized for storage by deduplicating facts and optimized for network i/o using different compression methods for each fact. Since we were optimizing at the logging level for storage and performance, we had less data and metadata to play with to optimize the query performance.

Storage

Storage Design Scalability Latency

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

Such coupling problems abound with our Reloaded architecture, and hence the Media Cloud Engineering and Encoding Technologies teams have been working together to develop a solution that addresses many of the concerns with our previous architecture. This enables us to use our scale to increase throughput and reduce latencies.

Media

Media Innovation Metrics Latency

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

Amazon DynamoDB offers low, predictable latencies at any scale. In response, we began to develop a collection of storage and database technologies to address the demanding scalability and reliability requirements of the Amazon.com ecommerce platform. Customers can typically achieve average service-side in the single-digit milliseconds.

Scalability

Scalability Database Ecommerce Latency

A one size fits all database doesn't fit anyone

All Things Distributed

JUNE 21, 2018

That learning is at the heart of this blog post—databases are built for a purpose and matching the use case with the database will help you write high-performance, scalable, and more functional applications faster. The purpose of DynamoDB is to provide consistent single-digit millisecond latency for any scale of workloads.

Database

Database AWS Games Latency

Improved Alerting with Atlas Streaming Eval

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Trending Sources

Building Netflix’s Distributed Tracing Infrastructure

Compression Methods in MongoDB: Snappy vs. Zstd

What is a Site Reliability Engineer (SRE)?

Maximizing Performance of AWS RDS for MySQL with Dedicated Log Volumes

USENIX LISA2021 Computing Performance: On the Horizon

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Implementing AWS well-architected pillars with automated workflows

InnoDB Performance Optimization Basics

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

MySQL Key Performance Indicators (KPI) With PMM

Aurora vs RDS: How to Choose the Right AWS Database Solution

How digital experience monitoring helps deliver business observability

Optimizing data warehouse storage

USENIX LISA2021 Computing Performance: On the Horizon

An Enterprise-Grade MongoDB Alternative Without Licensing or Lock-in

Seamless offloading of web app computations from mobile device to edge clouds via HTML5 Web Worker migration

Expanding the Cloud ? Provisioned IOPS for Amazon RDS - All.

The Most Important MySQL Setting

The Future in Visual Computing: Research Challenges

MongoDB Best Practices: Security, Data Modeling, & Schema Design

MongoDB Database Backup: Best Practices & Expert Tips

Growth Engineering at Netflix?—?Automated Imagery Generation

Software Testing Trends 2021 – What can we expect?

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Progress Delayed Is Progress Denied

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Hobson's Browser

Netflix Cloud Packaging in the Terabyte Era

Edgar: Solving Mysteries Faster with Observability

Expanding the Cloud - Opening the AWS Asia Pacific (Singapore.

Optimizing Web Performance: Understanding Waterfall Charts

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Boosted race trees for low energy classification

Optimizing Web Performance: Understanding Waterfall Charts

Solaris to Linux Migration 2017

Consistent caching mechanism in Titus Gateway

Netflix Drive

Evolution of ML Fact Store

Netflix Video Quality at Scale with Cosmos Microservices

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

A one size fits all database doesn't fit anyone

Stay Connected