Infrastructure, Presentation and Tuning - Technology Performance Pulse

Presentation: Modern Compute Stack for Scaling Large AI/ML/LLM Workloads

InfoQ

MAY 8, 2024

Jules Damji discusses which infrastructure should be used for distributed fine-tuning and training, how to scale ML workloads, how to accommodate large models, and how can CPUs and GPUs be utilized? By Jules Damji

Tuning

Tuning Infrastructure Artificial Intelligence Data Engineering

What is IT automation?

Dynatrace

JULY 6, 2022

With ever-evolving infrastructure, services, and business objectives, IT teams can’t keep up with routine tasks that require human intervention. Automating IT practices without integrated AIOps presents several challenges. By tuning workflows, you can increase their efficiency and effectiveness.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Dynatrace

APRIL 25, 2023

These functions are executed by a serverless platform or provider (such as AWS Lambda, Azure Functions or Google Cloud Functions) that manages the underlying infrastructure, scaling and billing. Enable faster development and deployment cycles by abstracting away the infrastructure complexity.

Serverless

Serverless Lambda Azure AWS

Best Practices in Cloud Security Monitoring

Scalegrid

JANUARY 11, 2024

This article strips away the complexities, walking you through best practices, top tools, and strategies you’ll need for a well-defended cloud infrastructure. Cloud security monitoring is key—identifying threats in real-time and mitigating risks before they escalate.

Best Practices

Best Practices Cloud Monitoring Strategy

New Prometheus-based extensions enable intelligent observability for more than 200 additional technologies

Dynatrace

FEBRUARY 14, 2022

Among these, you can find essential elements of application and infrastructure stacks, from app gateways (like HAProxy), through app fabric (like RabbitMQ), to databases (like MongoDB) and storage systems (like NetApp, Consul, Memcached, and InfluxDB, just to name a few). Many technologies expose their metrics in the Prometheus data format.

Technology

Technology Technology Metrics Infrastructure

Flexible, scalable, self-service Kubernetes native observability now in General Availability

Dynatrace

MAY 17, 2022

To solve this problem , Dynatrace offers a fully automated approach to infrastructure and application observability including Kubernetes control plane, deployments, pods, nodes, and a wide array of cloud-native technologies. None of this complexity is exposed to application and infrastructure teams. A look to the future.

Availability

Availability Scalability Cloud Metrics

Optimizing anomaly detection and noise

Dynatrace

MARCH 11, 2021

During the implementation of the real-time visualization I presented in part three , I had an idea for another visualization; I wanted to visualize the number of detected problems globally, for a longer timeframe. I wanted to understand how I could tune Dynatrace’s problem detection, but to do that I needed to understand the situation first.

Tuning

Tuning Architecture Monitoring Big Data

DevSecOps: Recent experiences in field of Federal & Government

Dynatrace

MAY 15, 2020

Overcoming the barriers presented by legacy security practices that are typically manually intensive and slow, requires a DevSecOps mindset where security is architected and planned from project conception and automated for speed and scale throughout where possible. And this poses a significant risk.

Government

Government DevOps Infrastructure Network

Keep track of thousands of environments from 20,000 feet

Dynatrace

SEPTEMBER 28, 2020

I wanted to present as much information as possible. This is where the consolidated API, which I presented in my last post , comes into play. Stay tuned for my next part of this series where I will cover another visualization and how it helped me optimize the Dynatrace Anomaly Detection settings and our operations processes!

Storage

Storage Architecture Tuning Efficiency

Running the OpenTelemetry demo application with Dynatrace

Dynatrace

OCTOBER 6, 2022

Both methods ingest data, but by using the Dynatrace OneAgent, users can automatically discover additional insights about their infrastructure, applications, processes, services and databases. Support for metrics will be extended in the near future, so stay tuned.

Open Source

Open Source Metrics Tuning Technology

Improving our video encodes for legacy devices

The Netflix TechBlog

AUGUST 10, 2020

Continuing to innovate on this family has tremendous advantages across the whole delivery infrastructure: reducing footprint at our Content Delivery Network (CDN), Open Connect (OC) , the load on our partner ISPs’ networks and the bandwidth usage for our members. Further tuning of pre-defined encoding parameters.

Innovation

Innovation Traffic Network Efficiency

No need to compromise visibility in public clouds with new Azure services supported by Dynatrace (Part 2)

Dynatrace

AUGUST 28, 2020

The other perspective that’s presented on the Azure Automation dashboard is the state of your deployment runs. As with any integration service, there are many moving parts, which increases the probability of failed runs caused by infrastructure problems, data not arriving on time, or code issues in your pipelines. What’s next?

Azure

Azure Cloud Tuning Monitoring

MySQL Interview Questions: Wrong Answers Only

Percona

DECEMBER 6, 2023

Additional read Mike’s blog on How to Find and Tune a Slow SQL Query Q: What is your disaster recovery (DR) strategy? That said, if the delayed replica is hosted on the same infrastructure/data center, it is vulnerable to the same disaster affecting the primary. A: We have a replica under our primary database.

Strategy

Strategy Database Best Practices Tuning

A look behind the scenes of AWS Lambda and our new Lambda monitoring extension

Dynatrace

FEBRUARY 25, 2021

At AWS re:Invent in 2018, the Lambda team presented an excellent talk. Distributing accounts across the infrastructure is an architectural decision, as a given account often has similar usage patterns, languages, and sizes for their Lambda functions. Stay tuned?for A look under the hood of AWS Lambda. functionality?in

Lambda

Lambda AWS Monitoring Serverless

Easy SLA and SLO reporting for all your API endpoints with public synthetic HTTP monitors

Dynatrace

JUNE 26, 2020

And, as before, you can always use private Synthetic locations that are located within your network infrastructure to measure complex internal applications and APIs. Additionally, you can also query all metrics captured by an HTTP monitor to build reports and present data to various stakeholders. So stay tuned!

Monitoring

Monitoring Azure AWS Traffic

Sustainability Talks and Updates from AWS re:Invent 2023

Adrian Cockcroft

NOVEMBER 26, 2023

SUS101: Sustainability innovation in AWS Global Infrastructure AWS is determined to make the cloud the cleanest and most energy-efficient way to run customers’ infrastructure and business. This includes providing the efficient, resilient services AWS customers expect, while minimizing their environmental footprint.

AWS

AWS Energy IoT Best Practices

Meet Hydrogen: A React Framework For Dynamic, Contextual And Personalized E-Commerce

Smashing Magazine

NOVEMBER 8, 2021

Of course, Hydrogen also comes with a set of pre-built and optimized components that know how to speak to the Shopify Storefront API and allow you to focus on presentation — the differentiated merchant value — instead of plumbing. Stay tuned for more in 2022! Large preview ). Curious to give it a try? Large preview ).

Cache

Cache Best Practices Strategy Servers

OneAgent for Windows—Enhancements to *.msi-based deployment

Dynatrace

MAY 9, 2019

Some time ago, we decided to take a stab at a number of architectural challenges present in the OneAgent installer for Windows. This storage space was consumed not only on our own infrastructure but also on each of the Dynatrace cluster nodes in the case of Managed deployments. Dynatrace news. “So what did you change, exactly?

Storage

Storage Tuning Traffic Architecture

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

The Netflix TechBlog

OCTOBER 18, 2019

Faisal Siddiqi Infrastructure for Contextual Bandits and Reinforcement Learning?—? As with other traditional machine learning and deep learning paths, a lot of what the core algorithms can do depends upon the support they get from the surrounding infrastructure and the tooling that the ML platform provides.

Infrastructure

Infrastructure Metrics Architecture Efficiency

Accelerate Machine Learning with Amazon SageMaker

All Things Distributed

NOVEMBER 29, 2017

As there are few individuals with this expertise, an easier process presents a significant opportunity for companies who want to accelerate their ML usage. After this, there is often a long process of training that includes tuning the knobs and levers, called hyperparameters, that control the different aspects of the training algorithm.

Tuning

Tuning AWS Scalability Infrastructure

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

The Netflix TechBlog

OCTOBER 18, 2019

Faisal Siddiqi Infrastructure for Contextual Bandits and Reinforcement Learning?—? As with other traditional machine learning and deep learning paths, a lot of what the core algorithms can do depends upon the support they get from the surrounding infrastructure and the tooling that the ML platform provides.

Infrastructure

Infrastructure Metrics Architecture Efficiency

Data validation for machine learning

The Morning Paper

JUNE 4, 2019

The pipeline ingests the training data, validates it , sends it to a training algorithm to generate a model, and then pushes the trained model to a serving infrastructure for inference. Scoring/serving skew occurs when the way results are presented to the user can feed back into the training data.

Google

Google Code Infrastructure Best Practices

MongoDB Best Practices: Security, Data Modeling, & Schema Design

Percona

APRIL 17, 2023

The main objective of this post is to share my experience over the past years tuning MongoDB and centralize the diverse sources that I crossed in this journey in a unique place. systemctl stop tuned $ systemctl disable tuned Dirty ratio The dirty_ratio is the percentage of total system memory that can hold dirty pages.

Best Practices

Best Practices Design Tuning Database

PMM Access Control: A Comprehensive Guide With Use Cases and Examples

Percona

FEBRUARY 24, 2023

Your company operates a massive distributed architecture that spans multiple data centers and cloud providers, and your databases are a crucial part of this infrastructure. It’s currently part of our roadmap, so please stay tuned for upcoming releases of PMM and keep an eye on the release notes.

Metrics

Metrics Monitoring Database Engineering

Software engineering for machine learning: a case study

The Morning Paper

JULY 7, 2019

ML-centric software also sees frequent revisions initiated by model changes, parameter tuning, and data updates, the combination of which can have a significant impact on system performance. In large scale systems with more than a single model, each model’s results will affect one another’s training and tuning processes.

Software Engineering

Software Engineering Engineering Software Software

From Proprietary to Open Source: The Complete Guide to Database Migration

Percona

OCTOBER 18, 2023

Look closely at your current infrastructure (hardware, storage, networks, etc.) This is where you will fine-tune authentication mechanisms, storage paths, security policies, and memory allocation settings to optimize them for your specific use case(s). Should I be bringing in external experts to help out?

Open Source

Open Source Database Hardware Strategy

Dynatrace Application Security protects your applications in complex cloud environments

Dynatrace

DECEMBER 8, 2020

More modern tools can provide runtime insights into certain platforms, like Kubernetes or containers, but are still limited in their ability to detect which libraries are actually used vs. those that are present, but unused. Stay tuned – this is only the start. They also can’t provide deep insights unless you have source code access.

Cloud

Cloud Open Source Internet Internet

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

The Partner Infrastructure team at Netflix provides solutions to support these two significant efforts by enabling device management at scale. Together, they form the Device Management Platform, which is the infrastructural foundation for Netflix Test Studio (NTS).

Latency

Latency Traffic Transportation Hardware

What I learned at GlueCon 2023?—?Tipping Points and Generative AI

Adrian Cockcroft

JUNE 2, 2023

What I learned at GlueCon 2023 — Tipping Points and Generative AI The final slide of my GlueCon keynote featuring a sunset over a pool in Maui — picture by Adrian I’ve presented at GlueCon many times over the last decade or so. Rob Hirschfeld of RackN had this perspective on the impact of AI on his domain of infrastructure automation.

Social Media

Social Media Innovation Open Source Entertainment

5 tips for architecting fast data applications

O'Reilly Software

APRIL 4, 2018

Fast forward to the present day and we find ourselves in a world where the number of connected devices is constantly increasing. The data shape will dictate capacity planning, tuning of the backbone, and scalability analysis for individual components. What message process warranty level do we require? At least once? At most once?

Architecture

Architecture Scalability Google Operating System

Is MongoDB Open Source? Is Planet Earth Flat?

Percona

APRIL 12, 2023

So if they can’t beat ‘em in the DBaaS space, they often feel like they have to join ‘em — to the tune of total stack sharing or some proprietary arrangement. More detailed answers are presented in the article above. The effects have hit cloud vendors who can’t possibly compete with MongoDB.

Open Source

Open Source Programming Database Servers

Understanding Execution Plan Operator Timings

SQL Performance

MARCH 8, 2021

This can make it difficult to draw sound performance-tuning conclusions. More importantly for the present discussion, the row mode operator elapsed and CPU times represent the time used by the current operator and all its children. Multiple separate adjustments may be present in a single execution plan. CQScanProfileNew.

Servers

Servers Tuning Architecture Processing

Inclusion, Diversity, Equity and Awareness at Tasktop

Tasktop

OCTOBER 23, 2020

We can talk and share our thoughts and interests through multiple channels including Slack, the Tasktop Forum (an initiative where Tasktopians present on topics of interest inside and outside the company) and participate in open office hours with executive managers. . Learn more about these roles and our company culture on our careers page.

Tuning

Tuning Engineering Programming Infrastructure

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 30, 2018

Some topics are still present at LISA, such as network management and uptime (reliability), but many others have been updated over the years. Have something to say on the present & future of #ops? We hope you'll consider submitting a talk or tutorial, and plan to attend LISA 2018. Hope to see you in Nashville!

DevOps

DevOps Network Best Practices Programming

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 29, 2018

Some topics are still present at LISA, such as network management and uptime (reliability), but many others have been updated over the years. Have something to say on the present & future of #ops? We hope you'll consider submitting a talk or tutorial, and plan to attend LISA 2018. Hope to see you in Nashville!

DevOps

DevOps Network Best Practices Programming

World’s Top Web Performance Leaders To Watch

Rigor

SEPTEMBER 11, 2019

Presented here alphabetically by last name: Rachel Andrew. You’ll find his articles, initiatives, and speaking engagements on his website , and you simply must read his writing about web optimization for Google and watch nearly 50 of his presentations on YouTube. Here is our (ever-growing!) Rachel Andrew. Jason Grigsby. .”

Performance

Performance Education Google Website

A case for managed and model-less inference serving

The Morning Paper

JUNE 13, 2019

HotOS’19 is presenting me with something of a problem as there are so many interesting looking papers in the proceedings this year it’s going to be hard to cover them all! We’d like to pack models as efficiently as possible on the underlying infrastructure. A case for managed and model-less inference serving Yadwadkar et al.,

Hardware

Hardware Latency Serverless Energy

23 Useful PHP Tools for the Everyday Web Developer

KeyCDN

OCTOBER 31, 2018

Eclipse is open source and has all of the features you’d expect out of an IDE such as PHP profiling, syntax highlighting and unit testing; however, it requires a lot of resources to run, which could present an issue for smaller development teams. ScriptCase With ScriptCase you can create web-enabled applications in a matter of minutes.

Development

Development Open Source Java AWS

Execution Plan Impact on ASYNC_NETWORK_IO Waits – Part 1

SQL Performance

MARCH 5, 2020

To summarize: Focusing on ASYNC_NETWORK_IO waits alone as a tuning metric is a mistake. Since I use SQLCMD a lot for demos while presenting, I created a testscript.sql file with the following contents: PRINT 'Minimize Screen' ; GO. Also see Greg's recent post about focusing on waits alone in general.). . <WaitStats FILLER , t2.

Education

Education Servers Testing Tuning

10 Steps to Prepare Your Website for High-Load Days: Are You Ready for Black Friday?

Rigor

SEPTEMBER 3, 2019

Remember that just because your servers are active doesn’t mean your site is being delivered quickly to your users or even presenting back the information or providing the functionality necessary for them to complete their task. STEP 7: Tune Your CDN Performance. blackfriday #webperf #perfmatters Click To Tweet.

Website

Website Cache Ecommerce Traffic

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

Reloaded was created as a single monolithic system, where developers from various media teams in ET and our platform partner team Content Infrastructure and Solutions (CIS)¹ worked on the same codebase, building a single system that handled all media assets. The service also provides options that allow fine-tuning latency, throughput, etc.,

Processing

Processing Media Latency Innovation

How Netflix Content Engineering makes a federated graph searchable

The Netflix TechBlog

APRIL 12, 2022

Using the type information present in the GraphQL query template and the user specified index configuration we were able to create an index template with a set of custom Elasticsearch text analyzers that generalized well across domains. Reverse lookups You may have noticed something missing in the above explanation.

Engineering

Engineering Architecture Java Infrastructure

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

For how our machine learning recommendation systems leverage our key-value stores, please see more details on this presentation. Bulldozer abstracts the underlying infrastructure on how the data moves. Please share your thoughts and experience by posting your comments below and stay tuned for more on data movement work at Netflix.

Latency

Latency Storage Big Data Tuning

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture. Orient: Gather tuning parameters for a particular table that changed. AutoAnalyze In short, AutoAnalyze finds the best tuning/configuration parameters for a table.

Storage

Storage Latency Efficiency Data Engineering

Presentation: Modern Compute Stack for Scaling Large AI/ML/LLM Workloads

What is IT automation?

Trending Sources

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Best Practices in Cloud Security Monitoring

New Prometheus-based extensions enable intelligent observability for more than 200 additional technologies

Flexible, scalable, self-service Kubernetes native observability now in General Availability

Optimizing anomaly detection and noise

DevSecOps: Recent experiences in field of Federal & Government

Keep track of thousands of environments from 20,000 feet

Running the OpenTelemetry demo application with Dynatrace

Improving our video encodes for legacy devices

No need to compromise visibility in public clouds with new Azure services supported by Dynatrace (Part 2)

MySQL Interview Questions: Wrong Answers Only

A look behind the scenes of AWS Lambda and our new Lambda monitoring extension

Easy SLA and SLO reporting for all your API endpoints with public synthetic HTTP monitors

Sustainability Talks and Updates from AWS re:Invent 2023

Meet Hydrogen: A React Framework For Dynamic, Contextual And Personalized E-Commerce

OneAgent for Windows—Enhancements to *.msi-based deployment

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

Accelerate Machine Learning with Amazon SageMaker

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

Data validation for machine learning

MongoDB Best Practices: Security, Data Modeling, & Schema Design

PMM Access Control: A Comprehensive Guide With Use Cases and Examples

Software engineering for machine learning: a case study

From Proprietary to Open Source: The Complete Guide to Database Migration

Dynatrace Application Security protects your applications in complex cloud environments

Towards a Reliable Device Management Platform

What I learned at GlueCon 2023?—?Tipping Points and Generative AI

5 tips for architecting fast data applications

Is MongoDB Open Source? Is Planet Earth Flat?

Understanding Execution Plan Operator Timings

Inclusion, Diversity, Equity and Awareness at Tasktop

USENIX LISA 2018: CFP Now Open

USENIX LISA 2018: CFP Now Open

World’s Top Web Performance Leaders To Watch

A case for managed and model-less inference serving

23 Useful PHP Tools for the Everyday Web Developer

Execution Plan Impact on ASYNC_NETWORK_IO Waits – Part 1

10 Steps to Prepare Your Website for High-Load Days: Are You Ready for Black Friday?

Rebuilding Netflix Video Processing Pipeline with Microservices

How Netflix Content Engineering makes a federated graph searchable

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Optimizing data warehouse storage

Stay Connected