Remove Blog Remove Engineering Remove Latency Remove Storage
article thumbnail

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

Engineers want their alerting system to be realtime, reliable, and actionable. A few years ago, we were paged by our SRE team due to our Metrics Alerting System falling behind — critical application health alerts reached engineers 45 minutes late! The internals here are outside the scope of this blog post.

Storage 288
article thumbnail

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data 109
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

a Netflix member via Twitter This is an example of a question our on-call engineers need to answer to help resolve a member issue?—?which In our previous blog post we introduced Edgar, our troubleshooting tool for streaming sessions. We needed to increase engineering productivity via distributed request tracing.

article thumbnail

Compression Methods in MongoDB: Snappy vs. Zstd

Percona

Compression in any database is necessary as it has many advantages, like storage reduction, data transmission time, etc. Storage reduction alone results in significant cost savings, and we can save more data in the same space. In this blog, we will discuss both data and network-level compression offered in MongoDB.

Storage 105
article thumbnail

What is a Site Reliability Engineer (SRE)?

Dotcom-Montior

A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.

article thumbnail

Maximizing Performance of AWS RDS for MySQL with Dedicated Log Volumes

Percona

A Dedicated Log Volume (DLV) is a specialized storage volume designed to house database transaction logs separately from the volume containing the database tables. DLVs are particularly advantageous for databases with large allocated storage, high I/O per second (IOPS) requirements, or latency-sensitive workloads.

AWS 94
article thumbnail

USENIX LISA2021 Computing Performance: On the Horizon

Brendan Gregg

AWS Graviton2); for memory with the arrival of DDR5 and High Bandwidth Memory (HBM) on-processor; for storage including new uses for 3D Xpoint as a 3D NAND accelerator; for networking with the rise of QUIC and eXpress Data Path (XDP); and so on. Ford, et al., “TCP