Engineering, Infrastructure and Tuning - Technology Performance Pulse

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing enables software engineers to model their applications’ business logic as high-level representations in a directed acyclic graph without explicitly defining a physical execution plan. Failures can occur unpredictably across various levels, from physical infrastructure to software layers.

Engineering

Engineering Tuning Latency Open Source

Site Reliability Engineering

DZone

JANUARY 19, 2024

In the dynamic world of online services, the concept of site reliability engineering (SRE) has risen as a pivotal discipline, ensuring that large-scale systems maintain their performance and reliability.

Engineering

Engineering Tuning Software Engineering Internet

Speed Trino Queries With These Performance-Tuning Tips

DZone

NOVEMBER 27, 2023

An open-source distributed SQL query engine, Trino is widely used for data analytics on distributed data storage. Optimizing Trino to make it faster can help organizations achieve quicker insights and better user experiences, as well as cut costs and improve infrastructure efficiency and scalability. But how do we do that?

Tuning

Tuning Speed Performance Open Source

Enhancing Kubernetes cluster management key to platform engineering success

Dynatrace

MARCH 29, 2024

Five of the most common include cluster instability, resource and cost management, security, observability, and stress on engineering teams. Engineering teams are overwhelmed with stuff to do.” The post Enhancing Kubernetes cluster management key to platform engineering success appeared first on Dynatrace news.

Engineering

Engineering DevOps Operating System Open Source

What is chaos engineering?

Dynatrace

OCTOBER 28, 2021

Chaos engineering answers this need so organizations can deliver robust, resilient cloud-native applications that can stand up under any conditions. What is chaos engineering? Chaos engineers ask why. As chaos engineers grow confident in their testing, they change more variables and broaden the scope of the disaster.

Engineering

Engineering Entertainment Cloud Infrastructure

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

a Netflix member via Twitter This is an example of a question our on-call engineers need to answer to help resolve a member issue?—?which Now let’s look at how we designed the tracing infrastructure that powers Edgar. We needed to increase engineering productivity via distributed request tracing.

Infrastructure

Infrastructure Transportation Storage Open Source

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Dynatrace

DECEMBER 18, 2023

For busy site reliability engineers, ensuring system reliability, scalability, and overall health is an imperative that’s getting harder to achieve in ever-expanding, cloud-native, container-based environments. Because of its adaptability, Prometheus has become an essential tool for observability engineering. Jolly good!

Metrics

Metrics Engineering Energy Tuning

How Netflix Content Engineering makes a federated graph searchable

The Netflix TechBlog

APRIL 12, 2022

By Alex Hutter , Falguni Jhaveri and Senthil Sayeebaba Over the past few years Content Engineering at Netflix has been transitioning many of its services to use a federated GraphQL platform. it began to power a significant portion of the user experience for many applications within Content Engineering.

Engineering

Engineering Architecture Java Infrastructure

Bring syslog into Dynatrace using OpenTelemetry to get open source value with enterprise support

Dynatrace

MARCH 15, 2024

Getting insights into the health and disruptions of your networking or infrastructure is fundamental to enterprise observability. Even for a supported component, delivering logs from applications and infrastructure to DevSecBizOps workflows requires significant manual configuration.

Open Source

Open Source Infrastructure Network Government

Unlock log analytics: Seamless insights without writing queries

Dynatrace

MAY 28, 2024

You can easily pivot between a hot Kubernetes cluster and the log file related to the issue in 2-3 clicks in these Dynatrace® Apps: Infrastructure & Observability (I&O), Databases, Clouds, and Kubernetes. Finding answers begins with opening the right app for your use case. A sudden drop in received log data?

Analytics

Analytics Infrastructure Database Monitoring

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

MySQL Performance Tuning 101: Key Tips to Improve MySQL Database Performance

Percona

SEPTEMBER 1, 2023

While there is no magic bullet for MySQL performance tuning, there are a few areas that can be focused on upfront that can dramatically improve the performance of your MySQL installation. What are the Benefits of MySQL Performance Tuning? A finely tuned database processes queries more efficiently, leading to swifter results.

Tuning

Tuning Database Performance Hardware

Enhance data collection with Dynatrace OpenTelemetry Collector distribution

Dynatrace

MARCH 15, 2024

Developers and operators can gain insights into their applications and infrastructure without fear of vendor lock-in because OpenTelemetry is fully open source and owned by CNCF. So, stay tuned. The OpenTelemetry project is supported and maintained by representatives from Microsoft, Google, Amazon, and many others, including Dynatrace.

Open Source

Open Source Best Practices Infrastructure Tuning

Growth Engineering at Netflix- Creating a Scalable Offers Platform

The Netflix TechBlog

FEBRUARY 9, 2021

The Growth Engineering team is responsible for executing growth initiatives that help us anticipate and adapt to this change. For more background on Growth Engineering and the signup funnel, please have a look at our previous blog post that covers the basics. We need to be constantly adapting and innovating as a result of this change.

Engineering

Engineering Scalability Architecture Innovation

Ensure safe and secure releases at scale by providing Golden Paths

Dynatrace

NOVEMBER 14, 2023

To bring these practices to life within an organization at scale, the discipline of platform engineering has gained popularity. From a high-level point of view, platform engineering aims to: Reduce the cognitive load on development teams. Improve reliability and resiliency of products that rely on platform capabilities.

Government

Government Best Practices DevOps Engineering

What is IT automation?

Dynatrace

JULY 6, 2022

As organizations continue to adopt multicloud strategies, the complexity of these environments grows, increasing the need to automate cloud engineering operations to ensure organizations can enforce their policies and architecture principles. This requires significant data engineering efforts, as well as work to build machine-learning models.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

With more automated approaches to log monitoring and log analysis, however, organizations can gain visibility into their applications and infrastructure efficiently and with greater precision—even as cloud environments grow. They enable IT teams to identify and address the precise cause of application and infrastructure issues.

Analytics

Analytics Infrastructure Storage Efficiency

Design Principles for Mathematical Engineering in Experimentation Platform

The Netflix TechBlog

MARCH 7, 2019

To unlock these innovations we are making a strategic choice that our focus should be geared towards developing the surrounding infrastructure so that scientists’ work can be easily absorbed into the wider Netflix Experimentation Platform. Graduation We need a process for graduating new research into the experimentation platform.

Engineering

Engineering Design Innovation Java

Optimizing your Kubernetes clusters without breaking the bank

Dynatrace

JANUARY 14, 2022

Its ability to densely schedule containers into the underlying machines translates to low infrastructure costs. Tuning thousands of parameters has become an impossible task to achieve via a manual and time-consuming approach. SREcon21 – Automating Performance Tuning with Machine Learning. Additional resources.

Latency

Latency Tuning Efficiency AWS

Presentation: Modern Compute Stack for Scaling Large AI/ML/LLM Workloads

InfoQ

MAY 8, 2024

Jules Damji discusses which infrastructure should be used for distributed fine-tuning and training, how to scale ML workloads, how to accommodate large models, and how can CPUs and GPUs be utilized? By Jules Damji

Tuning

Tuning Infrastructure Artificial Intelligence Data Engineering

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Challenges The cloud network infrastructure that Netflix utilizes today consists of AWS services such as VPC, DirectConnect, VPC Peering, Transit Gateways, NAT Gateways, etc and Netflix owned devices. These metrics are visualized using Lumen , a self-service dashboarding infrastructure. What is BPF?

Network

Network Transportation AWS Cloud

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

While infrastructure has historically been treated as a bottleneck where proper scaling and compute power are applied to improve performance, these aspects are now typically addressed by hyperscalers that offer cloud-based infrastructure and infrastructure as a service.

Best Practices

Best Practices Code Infrastructure Latency

Streaming SQL in Data Mesh

The Netflix TechBlog

NOVEMBER 3, 2023

On the Data Platform team, we build the infrastructure used across the company to process data at scale. Based on the sources that the processor is connected to, the SQL Processor will automatically convert the upstream sources as tables within Flink’s SQL engine. Stay tuned for more updates!

Processing

Processing Engineering Infrastructure Latency

Kubernetes made simple? Kelsey Hightower and Andreas Grabner discuss the future of cloud-native technologies

Dynatrace

FEBRUARY 17, 2022

Principal engineer at Google and co-founder of KubeCon, Hightower advocates simplicity and automation. With automatic and intelligent observability of all their infrastructure, apps, services, and workloads and their dependencies, Dynatrace pinpoints exactly where something is going wrong. Dynatrace news.

Technology

Technology Technology Cloud Infrastructure

Dynatrace Named a Leader and Positioned Furthest for Vision and Highest in Execution in the 2023 Gartner® Magic Quadrant™ for Application Performance Monitoring and Observability

Dynatrace

JULY 10, 2023

Here is what a few of these customers say about Dynatrace: “ Dynatrace has been a game changer in our ability to respond to incidents, identify areas for performance tuning, and gain meaningful data from user behavior analysis.” Director of infrastructure, software sector “ Strong technology and stronger people.

Monitoring

Monitoring Retail Performance Government

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

By Vikram Srivastava and Marcelo Mayworm Netflix has one of the most complex data platforms in the cloud on which our data scientists and engineers run batch and streaming workloads. Pensive infrastructure comprises two separate systems to support batch and streaming workloads. What’s Next? But our job is nowhere near done.

Big Data

Big Data Infrastructure Metrics Hardware

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

Cloud Network Insight is a suite of solutions that provides both operational and analytical insight into the Cloud Network Infrastructure to address the identified problems. It is easier to tune a large Spark job for a consistent volume of data. As with any sustainable engineering design, focusing on simplicity is very important.

Network

Network Tuning AWS Big Data

Building In-Video Search

The Netflix TechBlog

NOVEMBER 6, 2023

Building in-video search To build such a visual search engine, we needed a machine learning system that can understand visual elements. To train these parameters as well as fine-tune the pretrained image-text model weights, we leverage in-house datasets that pair shots of varying durations with rich textual descriptions of their content.

Media

Media Social Media Tuning Efficiency

Seamless AI-powered observability for multicloud serverless applications

Dynatrace

FEBRUARY 9, 2022

Engineers often choose best-of-breed services from multiple sources to create a single application. 2 Automatic detected queues anomaly by AI engine Davis. Stay tuned for updates. This enables proactive AI-driven analysis and easy troubleshooting in serverless scenarios. 3 End-to-end distributed trace including Azure Functions.

Serverless

Serverless Azure Lambda AWS

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

By the summer of 2020, many UI engineers were ready to move to GraphQL. The GraphQL shim enabled client engineers to move quickly onto GraphQL, figure out client-side concerns like cache normalization, experiment with different GraphQL clients, and investigate client performance without being blocked by server-side migrations.

Traffic

Traffic Latency Cache Metrics

Software intelligence as code enables tailored observability, AIOps, and application security at scale

Dynatrace

FEBRUARY 9, 2022

More recently, teams have begun to apply DevOps best practices to infrastructure automation, giving developers a more active role with GitOps as an operational framework. Key components of GitOps are declarative infrastructure as code, orchestration, and observability.

Code

Code Software Software DevOps

Software engineering for machine learning: a case study

The Morning Paper

JULY 7, 2019

Software engineering for machine learning: a case study Amershi et al., More specifically, we’ll be looking at the results of an internal study with over 500 participants designed to figure out how product development and software engineering is changing at Microsoft with the rise of AI and ML. ICSE’19. A general process.

Software Engineering

Software Engineering Engineering Software Software

Enhance data collection with Dynatrace OTel Collector distribution

Dynatrace

MARCH 15, 2024

Developers and operators can gain insights into their applications and infrastructure without fear of vendor lock-in because OpenTelemetry is fully open source and owned by CNCF. So, stay tuned. The OpenTelemetry project is supported and maintained by representatives from Microsoft, Google, Amazon, and many others, including Dynatrace.

Open Source

Open Source Best Practices Infrastructure Tuning

Applying Netflix DevOps Patterns to Windows

The Netflix TechBlog

AUGUST 22, 2019

Artisan Crafted Images In the Netflix full cycle DevOps culture the team responsible for building a service is also responsible for deploying, testing, infrastructure, and operation of that service. A key responsibility of Netflix engineers is identifying gaps and pain points in the development and operation of services.

DevOps

DevOps AWS Tuning Infrastructure

Dynatrace extends automatic and intelligent observability to cloud and Kubernetes logs for smarter automation at scale

Dynatrace

FEBRUARY 8, 2021

Putting logs into context with metrics, traces, and the broader application topology enables and improves how companies manage their cloud architectures, platforms and infrastructure, optimizing applications and remediate incidents in a highly efficient way. Now, Dynatrace applies Davis, its AI engine, to monitor the new log sources.

Cloud

Cloud Azure Analytics AWS

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

Reloaded was created as a single monolithic system, where developers from various media teams in ET and our platform partner team Content Infrastructure and Solutions (CIS)¹ worked on the same codebase, building a single system that handled all media assets. The service also provides options that allow fine-tuning latency, throughput, etc.,

Processing

Processing Media Latency Innovation

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

Dynatrace

JULY 24, 2023

Imagine having an AI engine that comprehends the complete context of the transaction and intelligently determines whether to send a discount code—and which one to send. Full contextual awareness helps the AI engine make informed decisions. Has the user purchased this product before? But it doesn’t stop there.

DevOps

DevOps Traffic Efficiency Servers

Flexible, scalable, self-service Kubernetes native observability now in General Availability

Dynatrace

MAY 17, 2022

To solve this problem , Dynatrace offers a fully automated approach to infrastructure and application observability including Kubernetes control plane, deployments, pods, nodes, and a wide array of cloud-native technologies. None of this complexity is exposed to application and infrastructure teams. A look to the future.

Availability

Availability Scalability Cloud Metrics

PostgreSQL vs. Oracle: Difference in Costs, Ease of Use & Functionality

Scalegrid

JULY 13, 2020

Compare ease of use across compatibility, extensions, tuning, operating systems, languages and support providers. Recognized as the fastest growing database by popularity, PostgreSQL was named the DBMS of the year in both 2018 and 2017 by DB-Engines, and continues to grow in popularity in 2019. Compatibility. Extensions.

Open Source

Open Source Tuning C++ Database

Kubernetes vs Docker: What’s the difference?

Dynatrace

SEPTEMBER 29, 2021

Think of containers as the packaging for microservices that separate the content from its environment – the underlying operating system and infrastructure. Running containers : Docker Engine is a container runtime that runs in almost any environment: Mac and Windows PCs, Linux and Windows servers, the cloud, and on edge devices.

Open Source

Open Source Traffic DevOps Cloud

Dynatrace simplifies StatsD, Telegraf, and Prometheus observability with Davis AI

Dynatrace

OCTOBER 7, 2020

With this announcement, Dynatrace brings the value of its AI engine, the scale, security, and automation of Dynatrace OneAgent and the scale of our platform (which can handle 50,000 hosts) to open source technologies so that you get the best of both worlds. Dynatrace unlocks over 200 new technology integrations.

Open Source

Open Source Metrics Analytics Tuning

5 SRE best practices you can implement today

Dynatrace

JULY 6, 2022

Organizations everywhere are adopting site reliability engineering (SRE) to cope with the growing complexity of hybrid and cloud-native environments. These reduce the need to hire specialists while providing a unified view of processes and infrastructure that make SRE more focused and effective. Dynatrace news.

Best Practices

Best Practices Open Source Tuning Infrastructure

Snaring the Bad Folks

The Netflix TechBlog

DECEMBER 8, 2021

Project by Netflix’s Cloud Infrastructure Security team ( Alex Bainbridge , Mike Grima , Nick Siow) Cloud security is a hard problem, but an even harder one is cloud security at scale. We knew that given our scale, we needed to rely heavily on automations and that we needed to build our solutions using battle tested scalable infrastructure.

AWS

AWS Cloud Infrastructure Scalability

Telltale: Netflix Application Monitoring Simplified

The Netflix TechBlog

AUGUST 13, 2020

You’re half awake and wondering, “Is there really a problem or is this just an alert that needs tuning? Over the years we’ve learned from on-call engineers about the pain points of application monitoring: too many alerts, too many dashboards to scroll through, and too much configuration and maintenance. Infrastructure change events.

Monitoring

Monitoring Tuning Traffic Metrics

Why applying chaos engineering to data-intensive applications matters

Site Reliability Engineering

Trending Sources

Speed Trino Queries With These Performance-Tuning Tips

Enhancing Kubernetes cluster management key to platform engineering success

What is chaos engineering?

Building Netflix’s Distributed Tracing Infrastructure

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

How Netflix Content Engineering makes a federated graph searchable

Bring syslog into Dynatrace using OpenTelemetry to get open source value with enterprise support

Unlock log analytics: Seamless insights without writing queries

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

MySQL Performance Tuning 101: Key Tips to Improve MySQL Database Performance

Enhance data collection with Dynatrace OpenTelemetry Collector distribution

Growth Engineering at Netflix- Creating a Scalable Offers Platform

Ensure safe and secure releases at scale by providing Golden Paths

What is IT automation?

Conducting log analysis with an observability platform and full data context

Design Principles for Mathematical Engineering in Experimentation Platform

Optimizing your Kubernetes clusters without breaking the bank

Presentation: Modern Compute Stack for Scaling Large AI/ML/LLM Workloads

How Netflix uses eBPF flow logs at scale for network insight

Automated observability, security, and reliability at scale

Streaming SQL in Data Mesh

Kubernetes made simple? Kelsey Hightower and Andreas Grabner discuss the future of cloud-native technologies

Dynatrace Named a Leader and Positioned Furthest for Vision and Highest in Execution in the 2023 Gartner® Magic Quadrant™ for Application Performance Monitoring and Observability

Auto-Diagnosis and Remediation in Netflix Data Platform

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Building In-Video Search

Seamless AI-powered observability for multicloud serverless applications

Migrating Netflix to GraphQL Safely

Software intelligence as code enables tailored observability, AIOps, and application security at scale

Software engineering for machine learning: a case study

Enhance data collection with Dynatrace OTel Collector distribution

Applying Netflix DevOps Patterns to Windows

Dynatrace extends automatic and intelligent observability to cloud and Kubernetes logs for smarter automation at scale

Rebuilding Netflix Video Processing Pipeline with Microservices

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

Flexible, scalable, self-service Kubernetes native observability now in General Availability

PostgreSQL vs. Oracle: Difference in Costs, Ease of Use & Functionality

Kubernetes vs Docker: What’s the difference?

Dynatrace simplifies StatsD, Telegraf, and Prometheus observability with Davis AI

5 SRE best practices you can implement today

Snaring the Bad Folks

Telltale: Netflix Application Monitoring Simplified

Stay Connected