Engineering, Infrastructure and Systems - Technology Performance Pulse

How observability, application security, and AI enhance DevOps and platform engineering maturity

Dynatrace

APRIL 18, 2024

DevOps and platform engineering are essential disciplines that provide immense value in the realm of cloud-native technology and software delivery. Observability of applications and infrastructure serves as a critical foundation for DevOps and platform engineering, offering a comprehensive view into system performance and behavior.

DevOps

DevOps Engineering Artificial Intelligence Infrastructure

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Berg , Romain Cledat , Kayla Seeley , Shashank Srikanth , Chaoying Wang , Darin Yu Netflix uses data science and machine learning across all facets of the company, powering a wide range of business applications from our internal infrastructure and content demand modeling to media understanding.

Systems

Systems Media Cache Open Source

Site Reliability Engineering

DZone

JANUARY 19, 2024

In the dynamic world of online services, the concept of site reliability engineering (SRE) has risen as a pivotal discipline, ensuring that large-scale systems maintain their performance and reliability.

Engineering

Engineering Tuning Software Engineering Internet

Enhancing Kubernetes cluster management key to platform engineering success

Dynatrace

MARCH 29, 2024

As organizations continue to modernize their technology stacks, many turn to Kubernetes , an open source container orchestration system for automating software deployment, scaling, and management. Five of the most common include cluster instability, resource and cost management, security, observability, and stress on engineering teams.

Engineering

Engineering DevOps Operating System Open Source

Key Elements of Site Reliability Engineering (SRE)

DZone

MARCH 14, 2023

Site Reliability Engineering (SRE) is a systematic and data-driven approach to improving the reliability, scalability, and efficiency of systems. It combines principles of software engineering, operations, and quality assurance to ensure that systems meet performance goals and business objectives.

Engineering

Engineering Software Engineering Scalability Efficiency

What is chaos engineering?

Dynatrace

OCTOBER 28, 2021

Chaos engineering answers this need so organizations can deliver robust, resilient cloud-native applications that can stand up under any conditions. What is chaos engineering? Chaos engineers ask why. As chaos engineers grow confident in their testing, they change more variables and broaden the scope of the disaster.

Engineering

Engineering Entertainment Cloud Infrastructure

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

a Netflix member via Twitter This is an example of a question our on-call engineers need to answer to help resolve a member issue?—?which which is difficult when troubleshooting distributed systems. Now let’s look at how we designed the tracing infrastructure that powers Edgar.

Infrastructure

Infrastructure Transportation Storage Open Source

The platform engineer role: A game-changer or just hype?

Dynatrace

SEPTEMBER 21, 2023

Site reliability engineering first emerged to address cloud computing’s new performance needs. Today, the platform engineer role is gaining speed as the newest byproduct of scaling DevOps in the emerging but complex cloud-native world. Understanding the platform engineer role DevOps is a constantly evolving discipline.

Games

Games Engineering DevOps Education

DevOps engineer tools: Deploy, test, evaluate, repeat

Dynatrace

DECEMBER 8, 2022

As cloud-native, distributed architectures proliferate, the need for DevOps technologies and DevOps platform engineers has increased as well. DevOps engineer tools can help ease the pressure as environment complexity grows. ” What does a DevOps platform engineer do? .” What are DevOps engineer tools and platforms.

DevOps

DevOps Engineering Testing Open Source

Demystifying Interviewing for Backend Engineers @ Netflix

The Netflix TechBlog

FEBRUARY 1, 2022

By Karen Casella, Director of Engineering, Access & Identity Management Have you ever experienced one of the following scenarios while looking for your next role? Most backend engineering teams follow a process very similar to what is shown below. If so, we invite you to begin the interview process.

Engineering

Engineering Games Entertainment Innovation

Site reliability engineering: 5 things you need to know

Dynatrace

FEBRUARY 4, 2021

What is site reliability engineering? Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Dynatrace news. SRE bridges the gap between Dev and Ops teams.

Engineering

Engineering DevOps Government Latency

Path to NoOps part 2: How infrastructure as code makes cloud automation attainable—and repeatable—at scale

Dynatrace

NOVEMBER 29, 2022

Infrastructure as code is a way to automate infrastructure provisioning and management. In this blog, I explore how Dynatrace has made cloud automation attainable—and repeatable—at scale by embracing the principles of infrastructure as code. Infrastructure-as-code. But how does it work in practice?

Infrastructure

Infrastructure Code Cloud DevOps

Vulnerability assessment: key to protecting applications and infrastructure

Dynatrace

OCTOBER 13, 2021

Protecting IT infrastructure, applications, and data requires that you understand security weaknesses attackers can exploit. Vulnerability assessment is the process of identifying, quantifying, and prioritizing the cybersecurity vulnerabilities in a given IT system. Dynatrace news. Identify vulnerabilities. Assess risk.

Infrastructure

Infrastructure Open Source Virtualization Operating System

AI-powered infrastructure monitoring for your SAP HANA database (Preview)

Dynatrace

DECEMBER 9, 2020

If you’re running SAP, you’re likely already familiar with the HANA relational database management system. However, if you’re an operations engineer who’s been tasked with migrating to HANA from a legacy database system, you’ll need to get up to speed quickly.

Infrastructure

Infrastructure Database Monitoring Metrics

Chaos Mesh — A Solution for System Resiliency on Kubernetes

DZone

APRIL 22, 2020

Traditionally we use unit tests and integration tests that guarantee a system is production-ready. To better identify system vulnerabilities and improve resilience, Netflix invented Chaos Monkey , which injects various types of faults into the infrastructure and business systems. This is how Chaos Engineering began.

Systems

Systems Infrastructure Engineering Testing

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Dynatrace

DECEMBER 18, 2023

For busy site reliability engineers, ensuring system reliability, scalability, and overall health is an imperative that’s getting harder to achieve in ever-expanding, cloud-native, container-based environments. Because of its adaptability, Prometheus has become an essential tool for observability engineering.

Metrics

Metrics Engineering Energy Tuning

Achieving High Availability in CI/CD With Observability

DZone

MARCH 5, 2024

Forbes estimates that cloud budgets will break all previous records as businesses will spend over $1 trillion on cloud computing infrastructure in 2024. By integrating observability tools in CI/CD pipelines, organizations can increase deployment frequency, minimize risks, and build highly available systems.

Availability

Availability DevOps Infrastructure Scalability

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Site reliability engineering: 5 things to you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Dynatrace news. SRE bridges the gap between Dev and Ops teams. SRE focuses on automation.

Engineering

Engineering DevOps Government Latency

What Is Load Testing? Ensuring Robust System Performance Under Pressure

DZone

JULY 5, 2023

While load testing may sound like an esoteric domain exclusive to software engineers or network administrators, it is, in fact, a silent superhero in our increasingly digital world. It's the silent force keeping the digital infrastructure wheel rotating smoothly, even during peak usage times.

Systems

Systems Testing Software Engineering Performance

Site reliability engineering: Six SRE trends to unleash DevOps innovation

Dynatrace

JUNE 2, 2022

Site reliability engineering (SRE) continues to gain popularity as organizations embrace hybrid cloud strategies and IT automation at scale. By applying software engineering principles to operations and infrastructure practices, SRE enables organizations to streamline and automate IT processes. Dynatrace news.

DevOps

DevOps Innovation Engineering Benchmarking

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

Netflix’s engineering culture is predicated on Freedom & Responsibility, the idea that everyone (and every team) at Netflix is entrusted with a core responsibility and they are free to operate with freedom to satisfy their mission. All these micro-services are currently operated in AWS cloud infrastructure.

Infrastructure

Infrastructure Cloud Scalability AWS

Mastering Prometheus: Unlocking Actionable Insights and Enhanced Monitoring in Kubernetes Environments

DZone

FEBRUARY 15, 2024

Prometheus, a powerful open-source monitoring system, emerges as a perfect fit for this role, especially when integrated with Kubernetes. Prometheus Prometheus excels at providing actionable insights into the health and performance of applications and infrastructure.

Monitoring

Monitoring Open Source Metrics Scalability

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

Dynatrace

SEPTEMBER 18, 2020

Sure, cloud infrastructure requires comprehensive performance visibility, as Dynatrace provides , but the services that leverage cloud infrastructures also require close attention. Extend infrastructure observability to WSO2 API Manager. Cloud-based application architectures commonly leverage microservices. What’s next?

Infrastructure

Infrastructure Latency Metrics Analytics

Bring syslog into Dynatrace using OpenTelemetry to get open source value with enterprise support

Dynatrace

MARCH 15, 2024

Getting insights into the health and disruptions of your networking or infrastructure is fundamental to enterprise observability. This is needed to collect messages across your systems because many different types of devices and applications can produce logs in the syslog format.

Open Source

Open Source Infrastructure Network Government

Free Google Book: Building Secure and Reliable Systems

High Scalability

APRIL 9, 2020

Google added another book into their excellent SRE series: Building Secure and Reliable Systems. Copy/pasting a few paragraphs: "In this book we talk generally about systems, which is a conceptual way of thinking about the groups of components that cooperate to perform some function. It's free to download, so don't be shy.

Google

Google Systems Best Practices Strategy

What is predictive AI? How this data-driven technique gives foresight to IT teams

Dynatrace

SEPTEMBER 5, 2023

Technology and operations teams work to ensure that applications and digital systems work seamlessly and securely. They handle complex infrastructure, maintain service availability, and respond swiftly to incidents. Through predictive analytics, SREs and DevOps engineers can accurately forecast resource needs based on historical data.

Artificial Intelligence

Artificial Intelligence DevOps Analytics Engineering

Platform Engineering Teams Done Right…

Adrian Cockcroft

FEBRUARY 9, 2023

There are three current underlying reasons for the platform engineering meme today. The next layer is operating system platforms, what flavor of Linux, what version of Windows etc. We used this model effectively at Netflix when I was their cloud architect from 2010 through 2013.

Engineering

Engineering Serverless Lambda AWS

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

JUNE 1, 2023

This tier extended existing infrastructure by adding new backend components and a new remote call to our ads partner on the playback path. Replay traffic enabled us to test our new systems and algorithms at scale before launch, while also making the traffic as realistic as possible.

Traffic

Traffic Best Practices Systems Testing

Mastering Kubernetes deployments with Keptn: a comprehensive guide to enhanced visibility

Dynatrace

MARCH 20, 2024

Infrastructure health The underlying infrastructure’s health directly impacts application availability and performance. This is the problem domain on which Keptn can help Platform Engineers provide a solution and guard rails for teams to deploy software that works perfectly.

Open Source

Open Source Hardware DevOps Infrastructure

How We Unified Configuration Distribution Across Systems at Uber

Uber Engineering

MARCH 9, 2023

Uber’s configuration platform team talks about how they consolidated the infrastructure for multiple configuration systems into a unified, next-gen distribution platform, reducing CPU usage by an order of magnitude.

Systems

Systems Infrastructure

What is container orchestration?

Dynatrace

MARCH 24, 2023

Containers enable developers to package microservices or applications with the libraries, configuration files, and dependencies needed to run on any infrastructure, regardless of the target system environment. This means organizations are increasingly using Kubernetes not just for running applications, but also as an operating system.

Infrastructure

Infrastructure Open Source Operating System Cloud

What is a Site Reliability Engineer (SRE)?

Dotcom-Montior

OCTOBER 6, 2021

A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.

Engineering

Engineering DevOps Monitoring Google

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Findings provide insights into Kubernetes practitioners’ infrastructure preferences and how they use advanced Kubernetes platform technologies. As Kubernetes adoption increases and it continues to advance technologically, Kubernetes has emerged as the “operating system” of the cloud. Kubernetes moved to the cloud in 2022.

Open Source

Open Source Java Operating System Programming

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

GenAI is prone to erratic behavior due to unforeseen data scenarios or underlying system issues. Data dependencies and framework intricacies require observing the lifecycle of an AI-powered application end to end, from infrastructure and model performance to semantic caches and workflow orchestration.

Cache

Cache Azure Infrastructure Monitoring

Build automated self-healing systems with xMatters and Dynatrace (Part 2 of 3)

Dynatrace

AUGUST 27, 2019

Step 1 – Let Dynatrace analyze your infrastructure health in real-time. The Dynatrace all-in-one software intelligence platform gives your team real-time visibility into your underlying infrastructure —be it on bare metal, VMware, OpenStack, AWS, Azure, or a hybrid solution. xMatters creates and updates Jira issues.

Systems

Systems Latency DevOps Azure

Dynatrace and Google unleash cloud-native observability for GKE Autopilot

Dynatrace

AUGUST 30, 2023

GKE Autopilot empowers organizations to invest in creating elegant digital experiences for their customers in lieu of expensive infrastructure management. These CSI pods provide a unique way of solving a handful of infrastructure problems. The CSI pod is mounted to application pods using an overlay file system.

Google

Google Cloud Innovation Infrastructure

Driving your FinOps strategy with observability best practices

Dynatrace

MARCH 18, 2024

Following FinOps practices, engineering, finance, and business teams take responsibility for their cloud usage, making data-driven spending decisions in a scalable and sustainable manner. This awareness is important when the goal is to drive cost-conscious engineering.

Best Practices

Best Practices Strategy Cloud AWS

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

Site reliability engineering (SRE) has recently become a critical discipline in recent years as the world has shifted in favor of web-based interactions. This shift is leading more organizations to hire site reliability engineers to guarantee the reliability and resiliency of their services. Mobile retail e-commerce spending in the U.

Best Practices

Best Practices DevOps Latency Metrics

DevOps Infrastructure as Code: An A-Z IaC Implementation Guide

Simform

JULY 28, 2022

Even if one server experienced downtime, the entire system collapsed and finding out the issue required piecing together every misstep and miscalculation. The post DevOps Infrastructure as Code: An A-Z IaC Implementation Guide appeared first on Simform - Product Engineering Company.

DevOps

DevOps Infrastructure Code Servers

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Uber Engineering

APRIL 5, 2018

Three years ago, Uber Engineering adopted Hadoop as the storage ( HDFS ) and compute ( YARN ) infrastructure for our organization’s big data analysis.

Systems

Systems Big Data Storage Infrastructure

The Ultimate Guide to Three Types of Observability (Infrastructure, Data, ML)

Simform

NOVEMBER 7, 2022

Observability defines an automated and proactive approach to monitoring modern and complex IT systems. This article discusses the three types of observability - infrastructure, data, and ML for beneficiaries such as DevOps, data, and ML engineers. It covers critical components, tools, and real-time examples.

Infrastructure

Infrastructure DevOps Monitoring Engineering

SIEM Volume Spike Alerts Using ML

DZone

JANUARY 31, 2024

SIEM platforms offer centralized management of security operations, making it easier for organizations to monitor, manage, and secure their IT infrastructure. SIEM systems enable early detection of security threats and suspicious activities by analyzing vast amounts of log data in real time.

Storage

Storage Data Engineering Network Infrastructure

How to Prepare for Your DevOps Interview

DZone

SEPTEMBER 5, 2019

Over the past decade, DevOps has emerged as a new tech culture and career that marries the rapid iteration desired by software development with the rock-solid stability of the infrastructure operations team. As of August 2019, there are currently over 50,000 LinkedIn DevOps job listings in the United States alone.

DevOps

DevOps Software Engineering Infrastructure Engineering

How observability, application security, and AI enhance DevOps and platform engineering maturity

Supporting Diverse ML Systems at Netflix

Trending Sources

Site Reliability Engineering

Enhancing Kubernetes cluster management key to platform engineering success

Key Elements of Site Reliability Engineering (SRE)

What is chaos engineering?

Building Netflix’s Distributed Tracing Infrastructure

The platform engineer role: A game-changer or just hype?

DevOps engineer tools: Deploy, test, evaluate, repeat

Demystifying Interviewing for Backend Engineers @ Netflix

Site reliability engineering: 5 things you need to know

Path to NoOps part 2: How infrastructure as code makes cloud automation attainable—and repeatable—at scale

Vulnerability assessment: key to protecting applications and infrastructure

AI-powered infrastructure monitoring for your SAP HANA database (Preview)

Chaos Mesh — A Solution for System Resiliency on Kubernetes

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Achieving High Availability in CI/CD With Observability

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Site reliability engineering: 5 things to you need to know

What Is Load Testing? Ensuring Robust System Performance Under Pressure

Site reliability engineering: Six SRE trends to unleash DevOps innovation

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Mastering Prometheus: Unlocking Actionable Insights and Enhanced Monitoring in Kubernetes Environments

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

Bring syslog into Dynatrace using OpenTelemetry to get open source value with enterprise support

Free Google Book: Building Secure and Reliable Systems

What is predictive AI? How this data-driven technique gives foresight to IT teams

Platform Engineering Teams Done Right…

Ensuring the Successful Launch of Ads on Netflix

Mastering Kubernetes deployments with Keptn: a comprehensive guide to enhanced visibility

How We Unified Configuration Distribution Across Systems at Uber

What is container orchestration?

What is a Site Reliability Engineer (SRE)?

Kubernetes in the wild report 2023

Dynatrace accelerates business transformation with new AI observability solution

Build automated self-healing systems with xMatters and Dynatrace (Part 2 of 3)

Dynatrace and Google unleash cloud-native observability for GKE Autopilot

Driving your FinOps strategy with observability best practices

Site reliability done right: 5 SRE best practices that deliver on business objectives

DevOps Infrastructure as Code: An A-Z IaC Implementation Guide

Scaling Uber’s Apache Hadoop Distributed File System for Growth

The Ultimate Guide to Three Types of Observability (Infrastructure, Data, ML)

SIEM Volume Spike Alerts Using ML

How to Prepare for Your DevOps Interview

Stay Connected