article thumbnail

Presentation: Modern Compute Stack for Scaling Large AI/ML/LLM Workloads

InfoQ

Jules Damji discusses which infrastructure should be used for distributed fine-tuning and training, how to scale ML workloads, how to accommodate large models, and how can CPUs and GPUs be utilized? By Jules Damji

Tuning 89
article thumbnail

What is IT automation?

Dynatrace

With ever-evolving infrastructure, services, and business objectives, IT teams can’t keep up with routine tasks that require human intervention. Automating IT practices without integrated AIOps presents several challenges. By tuning workflows, you can increase their efficiency and effectiveness.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Dynatrace

These functions are executed by a serverless platform or provider (such as AWS Lambda, Azure Functions or Google Cloud Functions) that manages the underlying infrastructure, scaling and billing. Enable faster development and deployment cycles by abstracting away the infrastructure complexity.

article thumbnail

Best Practices in Cloud Security Monitoring

Scalegrid

This article strips away the complexities, walking you through best practices, top tools, and strategies you’ll need for a well-defended cloud infrastructure. Cloud security monitoring is key—identifying threats in real-time and mitigating risks before they escalate.

article thumbnail

New Prometheus-based extensions enable intelligent observability for more than 200 additional technologies

Dynatrace

Among these, you can find essential elements of application and infrastructure stacks, from app gateways (like HAProxy), through app fabric (like RabbitMQ), to databases (like MongoDB) and storage systems (like NetApp, Consul, Memcached, and InfluxDB, just to name a few). Many technologies expose their metrics in the Prometheus data format.

article thumbnail

Flexible, scalable, self-service Kubernetes native observability now in General Availability

Dynatrace

To solve this problem , Dynatrace offers a fully automated approach to infrastructure and application observability including Kubernetes control plane, deployments, pods, nodes, and a wide array of cloud-native technologies. None of this complexity is exposed to application and infrastructure teams. A look to the future.

article thumbnail

Optimizing anomaly detection and noise

Dynatrace

During the implementation of the real-time visualization I presented in part three , I had an idea for another visualization; I wanted to visualize the number of detected problems globally, for a longer timeframe. I wanted to understand how I could tune Dynatrace’s problem detection, but to do that I needed to understand the situation first.

Tuning 262