Cloud, Latency and Software Engineering - Technology Performance Pulse

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

How site reliability engineering affects organizations’ bottom line SRE applies the disciplines of software engineering to infrastructure management, both on-premises and in the cloud. However, cloud complexity has made software delivery challenging.

Best Practices

Best Practices DevOps Latency Metrics

Software engineering for machine learning: a case study

The Morning Paper

JULY 7, 2019

Software engineering for machine learning: a case study Amershi et al., More specifically, we’ll be looking at the results of an internal study with over 500 participants designed to figure out how product development and software engineering is changing at Microsoft with the rise of AI and ML. ICSE’19.

Software Engineering

Software Engineering Engineering Software Software

Application observability meets developer observability: Unlock a 360º view of your environment

Dynatrace

NOVEMBER 6, 2023

Cloud complexity and data proliferation are two of the most significant challenges that IT teams are facing today. Modern cloud complexity is becoming nearly impossible for human beings to manage without AI and automation. The challenges that developers face with modern cloud environments are myriad.

Development

Development DevOps Programming Cloud

SRE vs DevOps: What you need to know

Dynatrace

FEBRUARY 24, 2021

The events of 2020 accelerated the trend of organizations shifting to cloud-native technologies in response to the dramatic increase in demand for online services. Cloud-native environments bring speed and agility to software development and operations (DevOps) practices. Reduced latency. Dynatrace news. SRE vs DevOps?

DevOps

DevOps Software Engineering Speed Google

Site reliability engineering: 5 things you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. ” According to Google, “SRE is what you get when you treat operations as a software problem.”

Engineering

Engineering DevOps Government Latency

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

by Tomasz Bak and Fabio Kung Introduction Titus is the Netflix cloud container runtime that runs and manages containers at scale. In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms.

Cache

Cache Latency Traffic Systems

Designing Instagram

High Scalability

JANUARY 11, 2022

When a user requests for feed then there will be two parallel threads involved in fetching the user feeds to optimize for latency. FUN FACT : In this talk , Dikang Gu, a software engineer at Instagram core infra team has mentioned about how they use Cassandra to serve critical usecases, high scalability requirements, and some pain points.

Design

Design Media Storage Logistics

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

While infrastructure has historically been treated as a bottleneck where proper scaling and compute power are applied to improve performance, these aspects are now typically addressed by hyperscalers that offer cloud-based infrastructure and infrastructure as a service.

Best Practices

Best Practices Code Infrastructure Latency

Site reliability engineering: 5 things to you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. ” According to Google, “SRE is what you get when you treat operations as a software problem.”

Engineering

Engineering DevOps Government Latency

DevOps observability: A guide for DevOps and DevSecOps teams

Dynatrace

JANUARY 18, 2023

Site reliability engineering (SRE) is a software operations methodology that enables organizations to create highly reliable and scalable applications. SRE applies software engineering principles to operations and infrastructure processes. Site reliability engineers, or SREs, lead these efforts.

DevOps

DevOps Best Practices Innovation Strategy

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN. Wednesday?—?December Thursday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN. Wednesday?—?December Thursday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Edge Authentication and Token-Agnostic Identity Propagation

The Netflix TechBlog

FEBRUARY 9, 2021

At a high-level, Zuul (cloud gateway) was to become the termination point for token inspection and payload encryption/decryption. And, we’re hiring Senior Software Engineers ! The following examples of these gains are from the primary API service. Reach out on LinkedIn if you are interested.

Architecture

Architecture Latency Servers Website

Evolution of ML Fact Store

The Netflix TechBlog

APRIL 26, 2022

We use Keystone as it is easy to use, reliable, scalable, and provides aggregation of facts from different cloud regions into a single AWS region. While “ keep the design simple ” is a frequently shared learning in software engineering, it is not always easy to achieve.

Storage

Storage Design Scalability Latency

O’Reilly serverless survey 2019: Concerns, what works, and what to expect

O'Reilly

NOVEMBER 12, 2019

More than a fifth of the respondents work in the software industry—skewing results toward the concerns of software companies, and helping explain the preponderance of those with software engineering roles. As noted earlier, the majority of survey respondents are software engineers. 1 in tools used.

Serverless

Serverless Architecture FinTech Infrastructure

Microservices – What CSPs can Learn From IT

VoltDB

DECEMBER 8, 2017

As vendors and CSPs are faced with building these virtualized systems, it’s imperative to look at the software engineering methodologies that the IT industry has successfully applied to challenges at comparable scale. Is there an initiative to define a consensus on what “cloud native” means when evaluating virtual network functions?

Latency

Latency Virtualization Cloud Software Engineering

Microservices – What CSPs can Learn From IT

VoltDB

DECEMBER 8, 2017

As vendors and CSPs are faced with building these virtualized systems, it’s imperative to look at the software engineering methodologies that the IT industry has successfully applied to challenges at comparable scale. Is there an initiative to define a consensus on what “cloud native” means when evaluating virtual network functions?

Latency

Latency Virtualization Cloud Software Engineering

Open Source at AWS re:Invent

Adrian Cockcroft

NOVEMBER 18, 2019

OPN220 Build robotic cloud simulations with ROS and AWS RoboMaker Join Camilo Buscaron, AWS Principal Open Source Technologist, and Katherine Scott, Developer Advocate, Open Robotics in this workshop to use Gazebo, a 3D simulator, and Robot Operating System (ROS) on AWS RoboMaker and learn how to spin up robotic simulations.

Open Source

Open Source AWS Lambda Serverless

Open Source at AWS re:Invent

Adrian Cockcroft

NOVEMBER 18, 2019

OPN220 Build robotic cloud simulations with ROS and AWS RoboMaker Join Camilo Buscaron, AWS Principal Open Source Technologist, and Katherine Scott, Developer Advocate, Open Robotics in this workshop to use Gazebo, a 3D simulator, and Robot Operating System (ROS) on AWS RoboMaker and learn how to spin up robotic simulations.

Open Source

Open Source AWS Lambda Serverless

Millions of tiny databases

The Morning Paper

MARCH 3, 2020

The core algorithms (chain-replication, Paxos-based consensus) aren’t the stars of the show here, instead the paper focuses on how these algorithms are deployed, and the software engineering practices behind the creation of a mission-critical production system employing them. A guiding principle. Cells have seven nodes.

Database

Database AWS Network Design

Automating chaos experiments in production

The Morning Paper

JULY 4, 2019

Netflix’s system is deployed on the public cloud as complex set of interacting microservices. degraded hardware, transient networking problem) or, more often, because of some change deployed by Netflix engineers that did not have the intended effect. On error rates.

Latency

Latency Engineering Metrics Traffic

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN. Wednesday?—?December Thursday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Curbing Connection Churn in Zuul

The Netflix TechBlog

AUGUST 16, 2023

System Metrics Given the significant reduction in connections, we saw reduced CPU utilization (~4%), heap usage (~15%), and latency (~3%) on Zuul, as well. In this case, we went from a subset size of 100 for 400 servers (a division of 4) to 50 (a division of 8).

Traffic

Traffic Servers Google Metrics

Reverb: speculative debugging for web applications

The Morning Paper

JANUARY 26, 2020

This week we’ll be looking at a selection of papers from the 2019 edition of the ACM Symposium of Cloud Computing ( SoCC ). Reverb: speculative debugging for web applications , Netravali & Mickens, SOCC’19. candidate bug-fixes) during replay.

Programming

Programming Servers Network Latency

Content Management Systems of the Future: Headless, JAMstack, ADN and Functions at the Edge

Abhishek Tiwari

NOVEMBER 3, 2018

Although some vendors have added support for APIs and cloud services most have not even bothered to adapt with changing technology landscape. In addition, traditional CMS solutions lack integration with modern software stack, cloud services, and software delivery pipelines.

Systems

Systems Cache Website Network

Communal Computing’s Many Problems

O'Reilly

JULY 20, 2021

While techniques like federated learning are on the horizon, to avoid latency issues and mass data collection, it remains to be seen whether those techniques are satisfactory for companies that collect data. Until we acknowledge that hardware put in a home is different from a cloud service, we will never get it right.

Google

Google Games Technology Technology

Technology Performance Pulse

Site reliability done right: 5 SRE best practices that deliver on business objectives

Software engineering for machine learning: a case study

Trending Sources

Application observability meets developer observability: Unlock a 360º view of your environment

SRE vs DevOps: What you need to know

Site reliability engineering: 5 things you need to know

Consistent caching mechanism in Titus Gateway

Designing Instagram

Automated observability, security, and reliability at scale

Site reliability engineering: 5 things to you need to know

DevOps observability: A guide for DevOps and DevSecOps teams

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Edge Authentication and Token-Agnostic Identity Propagation

Evolution of ML Fact Store

O’Reilly serverless survey 2019: Concerns, what works, and what to expect

Microservices – What CSPs can Learn From IT

Microservices – What CSPs can Learn From IT

Open Source at AWS re:Invent

Open Source at AWS re:Invent

Millions of tiny databases

Automating chaos experiments in production

Netflix at AWS re:Invent 2019

Curbing Connection Churn in Zuul

Reverb: speculative debugging for web applications

Content Management Systems of the Future: Headless, JAMstack, ADN and Functions at the Edge

Communal Computing’s Many Problems

Stay Connected