AWS, Java and Latency - Technology Performance Pulse

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. We chose Open-Zipkin because it had better integrations with our Spring Boot based Java runtime environment.

Infrastructure

Infrastructure Transportation Storage Open Source

Expanding the Cloud ? The Amazon Simple Workflow Service - All.

All Things Distributed

FEBRUARY 22, 2012

Today AWS launched an exciting new service for developers: the Amazon Simple Workflow Service. They must deal with the increased latency and unreliability inherent in remote communication. Tasks can be long-running, may fail, may timeout and may complete with varying throughputs and latencies. Expanding the Cloud â??

Cloud

Cloud AWS Java Scalability

Elastic Beanstalk a la Node - All Things Distributed

All Things Distributed

MARCH 11, 2013

I spent a lot of time talking to AWS developers, many working in the gaming and mobile space, and most of them have been finding Node.js allows these developers to handle a large number of concurrent connections with low latencies. Today, AWS Elastic Beanstalk just added support for Node.js Who is using Elastic Beanstalk?

AWS

AWS Mobile Games Java

Analyzing a High Rate of Paging

Brendan Gregg

AUGUST 29, 2021

1072-aws (xxx) 12/18/2018 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.03 biolatency From [bcc], this eBPF tool shows a latency histogram of disk I/O. 1072-aws (xxx) 12/19/2018 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 12.25 avg-cpu: %user %nice %system %iowait %steal %idle 14.81

Cache

Cache C++ AWS Latency

Seeing through hardware counters: a journey to threefold performance increase

The Netflix TechBlog

NOVEMBER 9, 2022

We decided to move one of our Java microservices?—?let’s to a larger AWS instance size, from m5.4xl (16 vCPUs) to m5.12xl (48 vCPUs). What’s worse, average latency degraded by more than 50%, with both CPU and latency patterns becoming more “choppy.” The problem It started off as a routine migration. let’s call it GS2?—?to

Hardware

Hardware Cache Performance Latency

Improving the Cloud - More Efficient Queuing with SQS - All Things.

All Things Distributed

NOVEMBER 8, 2012

For example, AWS customers use SQS for asynchronous communication pipelines, buffer queues for databases, asynchronous work queues, and moving latency out of highly responsive requests paths. In addition to Long Polling, we are also launching richer client functionality in the Java SDK.

Efficiency

Efficiency Cloud Games Scalability

Design Patterns: Queue-Based Load Leveling Pattern

cdemi

DECEMBER 4, 2016

Apache Kafka - High-Throughput, Low-Latency, Uses Apache ZooKeeper for Distribution, Written in Scala and Java. Amazon Simple Queue Service - The Go-To choice if you're already on AWS, Reliable, Simple, Flexible, Scalable, Secure, Inexpensive.

Design

Design Azure Scalability Latency

The Speed of Time

Brendan Gregg

SEPTEMBER 25, 2021

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. There's no Java stack—there should be a tower of green Java methods—instead there's only a single green frame or two. This is how Java flame graphs looked at the time. 30.14% in the middle of the flame graph.

Speed

Speed Java AWS Virtualization

Analyzing a High Rate of Paging

Brendan Gregg

AUGUST 29, 2021

1072-aws (xxx) 12/18/2018 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.03 biolatency From [bcc], this eBPF tool shows a latency histogram of disk I/O. 1072-aws (xxx) 12/19/2018 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 12.25 avg-cpu: %user %nice %system %iowait %steal %idle 14.81

Cache

Cache C++ AWS Systems

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

A single API team maintained both the Java implementation of the Falcor framework and the API Server. To determine customer impact, we could compare various metrics such as error rates, latencies, and time to render. Watch our Chaos Engineering talk from AWS Reinvent to learn more about Sticky Canaries.

Traffic

Traffic Latency Cache Metrics

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Dynatrace

MAY 17, 2023

It is available for the major OS and cloud platforms (for example, Windows, Linux, Solaris, AWS, Azure, and more) and only requires the deployment of a single service to monitor its environment. Garbage collection count Garbage collection is JVM related and indicates how often the Java GC ran.

Metrics

Metrics Monitoring Database Network

The Speed of Time

Brendan Gregg

SEPTEMBER 25, 2021

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. There's no Java stack—there should be a tower of green Java methods—instead there's only a single green frame or two. This is how Java flame graphs looked at the time. This will slow this test a little.)

Speed

Speed Java AWS Virtualization

Applying Netflix DevOps Patterns to Windows

The Netflix TechBlog

AUGUST 22, 2019

Packer requires specific information for your baking environment and extensive AWS IAM permissions. In order to simplify the use of Packer for our software developers, we bundled Netflix-specific AWS environment information and helper scripts. This means changes can be tracked and reviewed like any other code change.

DevOps

DevOps AWS Tuning Infrastructure

Extending Dynatrace

Dynatrace

JULY 10, 2019

With insights from Dynatrace into network latency and utilization of your cloud resources, you can design your scaling mechanisms and save on costly CPU hours. Dynatrace provides out-of-the-box support for VMware, AWS, Azure, Pivotal Cloud Foundry, and Kubernetes. OneAgent & application traces.

Java

Java Best Practices Metrics Azure

The Speed of Time

Brendan Gregg

SEPTEMBER 25, 2021

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. There's no Java stack—there should be a tower of green Java methods—instead there's only a single green frame or two. This is how Java flame graphs looked at the time. 30.14% in the middle of the flame graph.

Speed

Speed Java AWS Virtualization

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. These principles reduce resource usage by being more efficient and effective while lowering the end-to-end latency in data processing.

Storage

Storage Latency Efficiency Data Engineering

Stuff The Internet Says On Scalability For July 20th, 2018

High Scalability

JULY 20, 2018

crabbone : This is the prism through which Java programmers view the world. The truth about it is that Java only gets you a good bang for your buck just a wee bit before it hits OOM. MRAM works in consumer applications, but it’s still unclear if it will ever meet the temperature requirements for automotive.

Internet

Internet Internet Scalability Automotive

Zero Configuration Service Mesh with On-Demand Cluster Discovery

The Netflix TechBlog

AUGUST 29, 2023

A brief history of IPC at Netflix Netflix was early to the cloud, particularly for large-scale companies: we began the migration in 2008, and by 2010, Netflix streaming was fully run on AWS. There is a downside to fetching this data on-demand: this adds latency to the first request to a cluster.

Traffic

Traffic Latency Cloud C++

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

All Things Distributed

JULY 14, 2015

DynamoDB Streams is the enabling technology behind two other features announced today: cross-region replication maintains identical copies of DynamoDB tables across AWS regions with push-button ease, and triggers execute AWS Lambda functions on streams, allowing you to respond to changing data conditions. DynamoDB Streams.

Database

Database Lambda AWS IoT

Expanding the Cloud: Amazon Machine Learning Service, the Amazon Elastic Filesystem and more

All Things Distributed

APRIL 9, 2015

Details on the AWS Blog. AWS has been offering a range of storage solutions: objects, block storage, databases, archiving, etc. When we designed Amazon EFS we decided to build along the AWS principles: Elastic, scalable, highly available, consistent performance, secure, and cost-effective. Details on the AWS Blog.

Lambda

Lambda Cloud IoT AWS

Millions of tiny databases

The Morning Paper

MARCH 3, 2020

It takes you through the thinking processes and engineering practices behind the design of a key part of the control plane for AWS Elastic Block Storage (EBS): the Physalia database that stores configuration information. For Physalia, and for AWS more generally, the guiding principle is minimise the blast radius. NSDI’20.

Database

Database AWS Network Design

Amazon DynamoDB Accelerator (DAX): Speed Up DynamoDB Response Times from Milliseconds to Microseconds without Application Rewrite.

All Things Distributed

JUNE 21, 2017

You can add DAX to your existing DynamoDB applications with just a few clicks in the AWS Management Console – no application rewrites required. DynamoDB was the first service at AWS to use SSD storage. These high-throughput, low-latency requirements need caching, not as a consideration, but as a best practice.

Speed

Speed Cache Latency AWS

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

The Morning Paper

MAY 12, 2019

The suite is built using popular OSS applications and representative technologies, deliberately using a mix of languages (C/C++, Java, Javascript, node.js, Python, Ruby, Go, Scala, …) and both RESTful and RPC (Thrift, gRPC) style service interfaces. The bottom line shows the tail latency impact in the microservices-based applications.

Open Source

Open Source Hardware Benchmarking Systems

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

There are services at Netflix that use RDBMS kind of databases such as MySQL or PostgreSQL via AWS RDS. Passive instances across regions are also possible, though it is recommended to operate in the same region as the database host in order to keep the change capture latencies low. The destination may be a datastore or an external API.

Database

Database Traffic Transportation Open Source

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

There are services at Netflix that use RDBMS kind of databases such as MySQL or PostgreSQL via AWS RDS. Passive instances across regions are also possible, though it is recommended to operate in the same region as the database host in order to keep the change capture latencies low. The destination may be a datastore or an external API.

Database

Database Traffic Transportation Open Source

Transforming enterprise integration with reactive streams

O'Reilly Software

MARCH 7, 2018

Created in 2007, and described as "a versatile open source integration framework based on known enterprise integration patterns," it is a very popular Java library for system integration, offering implementations of most (if not all) of the standard enterprise integration patterns (EIP). AWS, Kafka, Google Cloud, Spring, ElasticSearch).

Transportation

Transportation Java Programming Architecture

Technology Performance Pulse

Building Netflix’s Distributed Tracing Infrastructure

Expanding the Cloud ? The Amazon Simple Workflow Service - All.

Trending Sources

Elastic Beanstalk a la Node - All Things Distributed

Analyzing a High Rate of Paging

Seeing through hardware counters: a journey to threefold performance increase

Improving the Cloud - More Efficient Queuing with SQS - All Things.

Design Patterns: Queue-Based Load Leveling Pattern

The Speed of Time

Analyzing a High Rate of Paging

Migrating Netflix to GraphQL Safely

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

The Speed of Time

Applying Netflix DevOps Patterns to Windows

Extending Dynatrace

The Speed of Time

Optimizing data warehouse storage

Stuff The Internet Says On Scalability For July 20th, 2018

Zero Configuration Service Mesh with On-Demand Cluster Discovery

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

Expanding the Cloud: Amazon Machine Learning Service, the Amazon Elastic Filesystem and more

Millions of tiny databases

Amazon DynamoDB Accelerator (DAX): Speed Up DynamoDB Response Times from Milliseconds to Microseconds without Application Rewrite.

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

DBLog: A Generic Change-Data-Capture Framework

DBLog: A Generic Change-Data-Capture Framework

Transforming enterprise integration with reactive streams

Stay Connected