article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic 339
article thumbnail

Efficient SLO event integration powers successful AIOps

Dynatrace

The first part of this blog post briefly explores the integration of SLO events with AI. Consequently, the AI is founded upon the related events, and due to the detection parameters (threshold, period, analysis interval, frequent detection, etc), an issue arose. By analogy, envision an apple tree where an apple drops.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Datadog Creates Scalable Data Ingestion Architecture

InfoQ

Datadog created a dedicated data ingestion architecture offering exactly-once semantics for their third-generation event store, Husky. The event-driven architecture (EDA) can accommodate bursts in traffic in the multi-tenant platform with reasonable ingestion latency and acceptable operational costs.

article thumbnail

Towards a Reliable Device Management Platform

The Netflix TechBlog

In the Device Management Platform, this is achieved by having device updates be event-sourced through the control plane to the cloud so that NTS will always have the most up-to-date information about the devices available for testing. The RAE is configured to be effectively a router that devices under test (DUTs) are connected to.

Latency 213
article thumbnail

Breaking data silos: Liquid Reply’s journey to custom API observability with OpenTelemetry and Dynatrace

Dynatrace

Moreover, distributed microservices architecture and data silos mean teams don’t have access to the context that’s critical to make sense of this data and all its different data formats. The organization needed to ensure the correlation of all events in a complete end-to-end trace.

article thumbnail

Automated observability, security, and reliability at scale

Dynatrace

This is especially crucial in microservice architectures, where the number of components can be overwhelming. The screenshot below displays a workflow that listens for a deployment event of the easytrade service in the production stage.

article thumbnail

Bringing IT automation to life at Dynatrace Innovate Barcelona

Dynatrace

By unifying all relevant events in Grail, teams could identify suspicious activity, then have the platform automatically trigger the steps to analyze those activities. As a result, the team found that cloud architecture had resulted in overprovisioning of resources. “There are way over 30 availability zones.