2023

article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience.

Traffic 350
article thumbnail

Implementing a Self-Healing Infrastructure With Kubernetes and Prometheus

DZone

In today's world, the need for highly available and fault-tolerant systems is more important than ever. Furthermore, with the increased adoption of microservices and containerization , the need for a reliable infrastructure that can automatically detect and recover from failures has become critical. Kubernetes , an open-source container orchestration platform, and Prometheus, a popular monitoring and alerting toolkit, are two tools that can be used to implement such a self-healing infrastructure

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Kubernetes in the wild report 2023

Dynatrace

Kubernetes adoption survey executive summary. Modern, cloud-native computing is impossible to separate from containers and Kubernetes adoption. While Kubernetes is still a relatively young technology, a large majority of global enterprises use it to run business-critical applications in production. The rapid adoption is driven—and challenged by—an ever-growing ecosystem of Kubernetes technologies that add advanced platform features, such as security, microservice communications, observability, s

Java 343
article thumbnail

Consistent hashing algorithm

High Scalability

This is a guest article by NK. You can view the original article Consistent hashing explained on systemdesign.one website. How does consistent hashing work?

article thumbnail

How AI coding companions will change the way developers work

All Things Distributed

Developer tools are one area where generative AI is already having a tangible impact on productivity and speed, and it's the reason I'm excited about Amazon CodeWhisperer.

article thumbnail

So many bad takes?—?What is there to learn from the Prime Video microservices to monolith story

Adrian Cockcroft

So many bad takes — What is there to learn from the Prime Video microservices to monolith story Excerpt from Serverless First deck first published in 2019 The Prime Video team published this story: Scaling up the audio/video monitoring service and reducing costs by 90% , and the internet piled in with opinions and bad takes, mostly missing the point.

article thumbnail

Understanding Linux IOWait

Percona

I have seen many Linux Performance engineers looking at the “IOWait” portion of CPU usage as something to indicate whenever the system is I/O-bound. In this blog post, I will explain why this approach is unreliable and what better indicators you can use. Let’s start by running a little experiment – generating heavy I/O usage on the system: sysbench --threads=8 --time=0 --max-requests=0 fileio --file-num=1 --file-total-size=10G --file-io-mode=sync --file-extra-flags=direct

Cache 143

More Trending

article thumbnail

Article: Design Pattern Proposal for Autoscaling Stateful Systems

InfoQ

In this article, Rogerio Robetti discusses the challenges in auto-scaling stateful storage systems and proposes an opinionated design solution to automatically scale up (vertical) and scale out (horizontal) from a single node up to several nodes in a cluster with minimum configuration and interference of the operator.

Systems 145
article thumbnail

Writing a tiny tRPC client

tRPC

Ever wondered how tRPC works? Maybe you want to start contributing to the project but you're frightened by the internals? The aim of this post is to familiarize you with the internals of tRPC by writing a minimal client that covers the big parts of how tRPC works. info It's recommended that you understand some of the core concepts in TypeScript such as generics, conditional types, the extends keyword and recursion.

Servers 138
article thumbnail

Why you need to know your site's performance poverty line (and how to find it)

Speed Curve

"I made my pages faster, but my business and user engagement metrics didn't change. WHY???" "How do I know how fast my pages should be?" "How can I demonstrate the business value of performance to people in my organization?" If you've ever asked yourself any of these questions, then you could find the answers in identifying and understanding the performance poverty line for your site.

article thumbnail

Scaling Media Machine Learning at Netflix

The Netflix TechBlog

By Gustavo Carmo , Elliot Chow , Nagendra Kamath , Akshay Modi , Jason Ge , Wenbing Bai , Jackson de Campos , Lingyi Liu , Pablo Delgado , Meenakshi Jindal , Boris Chen , Vi Iyengar , Kelli Griggs , Amir Ziai , Prasanna Padmanabhan , and Hossein Taghavi Figure 1 - Media Machine Learning Infrastructure Introduction In 2007, Netflix started offering streaming alongside its DVD shipping services.

Media 299
article thumbnail

How to Handle Secrets in Kubernetes

DZone

Kubernetes has become the de facto standard for container orchestration, enabling organizations to build, deploy, and scale modern applications with efficiency and agility. As more organizations adopt Kubernetes, the need for proper security and management of sensitive data within these environments becomes paramount. One crucial aspect of ensuring a secure Kubernetes infrastructure is the effective management of secrets, such as API keys, passwords, and tokens.

article thumbnail

Dynatrace supports Amazon Linux 2023 as an AWS launch partner

Dynatrace

Dynatrace is proud to be an AWS launch partner in support of Amazon Linux 2023 (AL2023). Amazon’s new general-purpose Linux for AWS is designed to provide a secure, stable, and high-performance execution environment to develop and run cloud applications. The Dynatrace Software Intelligence Platform accelerates cloud operations, helping organizations achieve service-level objectives (SLOs) with automated intelligence and unmatched scalability.

AWS 282
article thumbnail

C++23 “Pandemic Edition” is complete (Trip report: Winter ISO C++ standards meeting, Issaquah, WA, USA)

Sutter's Mill

On Saturday, the ISO C++ committee completed technical work on C++23 in Issaquah, WA, USA! We resolved the remaining international comments on the C++23 draft, and are now producing the final document to be sent out for its international approval ballot (Draft International Standard, or DIS) and final editorial work, to be published later in 2023. Our hosts, the Standard C++ Foundation, WorldQuant, and Edison Design Group, arranged for high-quality facilities for our six-day meeting from Monday

C++ 125
article thumbnail

An introduction to generative AI with Swami Sivasubramanian

All Things Distributed

The VP of database, analytics and machine learning services at AWS, Swami Sivasubramanian, walks me through the broad landscape of generative AI, what we’re doing at Amazon to make large language and foundation models more accessible, and how custom silicon can help to bring down costs, speed up training, and increase energy efficiency for our customers.

AWS 163
article thumbnail

Automating the Automators: Shift Change in the Robot Factory

O'Reilly

What would you say is the job of a software developer? A layperson, an entry-level developer, or even someone who hires developers will tell you that job is to … well … write software. Pretty simple. An experienced practitioner will tell you something very different. They’d say that the job involves writing some software, sure. But deep down it’s about the purpose of software.

Code 123
article thumbnail

The Most Important MySQL Setting

Percona

If we were to select the most important MySQL setting, if we were given a freshly installed MySQL or Percona Server for MySQL and could only tune a single MySQL variable, which one would it be? It has always bothered me that “out-of-the-box” MySQL performance is subpar: if you install MySQL or Percona Server for MySQL in a new server and do not “tune it” (as in change default values for configuration settings), it just won’t be able to make the best use of the serve

Cache 135
article thumbnail

The Market for Lemons

Alex Russell

For most of the past decade, I have spent a considerable fraction of my professional life consulting with teams building on the web. It is not going well. Not only are new services being built to a self-defeatingly low UX and performance standard, existing experiences are pervasively re-developed on unspeakably slow, JS-taxed stacks. At a business level, this is a disaster, raising the question: "why are new teams buying into stacks that have failed so often before?

Latency 121
article thumbnail

Article: Magic Pocket: Dropbox’s Exabyte-Scale Blob Storage System

InfoQ

A horizontally scalable exabyte-scale blob storage system which operates out of multiple regions, Magic Pocket is used to store all of Dropbox’s data. Adopting SMR technology and erasure codes, the system has extremely high durability guarantees but is cheaper than operating in the cloud.

Storage 124
article thumbnail

Web Development Trends in 2023

KeyCDN

Smart developers are always looking ahead for ways to adapt to the ever-changing world of web development. No one could have imagined what the web would look like today 20 years ago, so who knows what the coming decades will hold. As trends emerge, new opportunities will arise. Staying on top of the latest web development trends could eventually help you land a job that doesn't exist yet.

article thumbnail

Some Notable Bugfixes in MySQL 8.0.32

Percona Community

MySQL 8.0.32 came out recently and had some important bugfixes contributed by Perconians. Here is a brief overview of the work done. Inconsistent data and GTIDs with mysqldump Marcelo Altmann (Senior Software Engineer) fixed the bug when data and GTIDs backed up by mysqldump were inconsistent. It happened when the options –single-transaction and –set-gtid-purged=ON were both used because GTIDs on the server could have already increased between the start of the transaction by mysqldum

article thumbnail

NTS: Reliable Device Testing at Scale

The Netflix TechBlog

By Benson Ma , ZZ Zimmerman With contributions from Alok Ahuja , Shravan Heroor , Michael Krasnow , Todor Minchev , Inder Singh Introduction At Netflix, we test hundreds of different device types every day, ranging from streaming sticks to smart TVs, to ensure that new version releases of the Netflix SDK continue to provide the exceptional Netflix experience that our customers expect.

Testing 294
article thumbnail

How To Collect and Ship Windows Events Logs With OpenTelemetry

DZone

If you use Windows, you will want to monitor Windows Events. A recent contribution of a distribution of the OpenTelemetry (OTel) Collector makes it much easier to monitor Windows Events with OpenTel. You can utilize this receiver either in conjunction with any OTel collector: including the OpenTelemetry Collector. In this article, we will be using observIQ’s distribution of the collector.

article thumbnail

Dynatrace simplifies OpenTelemetry metric collection for context-aware AI analytics

Dynatrace

The release candidate of OpenTelemetry metrics was announced earlier this year at Kubecon in Valencia, Spain. Since then, organizations have embraced OTLP as an all-in-one protocol for observability signals, including metrics, traces, and logs, which will also gain Dynatrace support in early 2023. Realizing the promise of OpenTelemetry is a challenge for most organizations.

Metrics 281
article thumbnail

Using anti-requirements to find system boundaries

Particular Software

We all love building greenfield projects. 1 But inevitably, starting a new project involves lots of meetings with business stakeholders to hash out initial requirements and canonical data models. Those are…not so fun. When one of those meetings occurs after a carb-heavy lunch, it’s easy for your mind to drift away…back to those university lectures about entity design.

Systems 98
article thumbnail

Demystifying LLMs with Amazon distinguished scientists

All Things Distributed

To learn more about large language models (LLMs), foundation models, and other advances in ML, I sat with two of Amazon’s distinguished scientists, Sudipta Sengupta and Dan Roth.

141
141
article thumbnail

No Start Menu for You

Randon ASCII

I tend to launch most programs on my Windows 10 laptop by typing the <Win> key, then a few letters of the program name, and then hitting enter. On my powerful laptop (SSD and 32 GB of RAM) this process usually takes as long as it takes me to type these characters, just a fraction of a second. Usually. Sometimes, however, it takes longer. A lot longer.

article thumbnail

PostgreSQL Indexes Can Hurt You: Negative Effects and the Costs Involved

Percona

Indexes are generally considered to be the panacea when it comes to SQL performance tuning, and PostgreSQL supports different types of indexes catering to different use cases. I keep seeing many articles and talks on “tuning” discussing how creating new indexes speeds up SQL but rarely ones discussing removing them. The urge to create more and more indexes is found to be causing severe damage in many systems.

Cache 127
article thumbnail

Real World Programming with ChatGPT

O'Reilly

This post is a brief commentary on Martin Fowler’s post, An Example of LLM Prompting for Programming. If all I do is get you to read that post, I’ve done my job. So go ahead–click the link, and come back here if you want. There’s a lot of excitement about how the GPT models and their successors will change programming. That excitement is merited. But what’s also clear is that the process of programming doesn’t become “ChatGPT, please build me an enterprise application to sell shoes.

article thumbnail

Real-Time Messaging Architecture at Slack

InfoQ

Slack recently described how it sends millions of messages daily in real-time across the globe. The company provides a comprehensive insight into its architecture, designed to manage real-time messages at scale. It highlights the unique challenges posed by delivering real-time messages across different time zones and regions and how Slack's engineers designed the infrastructure to handle them.

article thumbnail

The Performance Golden Rule Revisited

Tim Kadlec

There was a comment on Twitter today from Rafael Gonzaga expressing disappointment in what he sees as a tendency to focus on the frontend solely in performance discussions, while neglecting the server-side aspect. In the discussion that followed, the Golden Rule of Performance (popularized by Steve Souders) was brought up: 80-90% of the end-user response time is spent on the frontend.

article thumbnail

cppfront: Spring update

Sutter's Mill

Since the year-end mini-update , progress has continued on cppfront. (If you don’t know what this personal project is, please see the CppCon 2022 talk on YouTube.) This update covers Acknowledgments, and highlights of what’s new in the compiler and language since last time, including: simple, mathematically safe, and efficient chained comparisons named break and continue “simple and safe” starts with. main user-defined type , including unifying all special member functions as o

C++ 106
article thumbnail

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

Ruchir Jha , Brian Harrington , Yingwu Zhao TL;DR Streaming alert evaluation scales much better than the traditional approach of polling time-series databases. It allows us to overcome high dimensionality/cardinality limitations of the time-series database. It opens doors to support more exciting use-cases. Engineers want their alerting system to be realtime, reliable, and actionable.

Cache 253
article thumbnail

5 DNS Troubleshooting Tips for Network Teams

DZone

“Set it and forget it” is the approach that most network teams follow with their authoritative Domain Name System (DNS). If the system is working and end-users find network connections to revenue-generating applications, services, and content, then administrators will generally say that you shouldn’t mess with success. Unfortunately, the reliability of DNS often causes us to take it for granted.

Network 342
article thumbnail

Perform 2023 Guide: Organizations mine efficiencies with automation, causal AI

Dynatrace

Digital transformation shows no signs of slowing down. As a result, the complexity of modern multicloud ecosystems continues to increase. Data proliferation—as well as a growing need for data analysis—has accelerated. Increasingly, organizations are turning to modern observability platforms to address the complexity of, and gain visibility into, cloud environments.

article thumbnail

Disambiguating Arm, Arm ARM, Armv9, ARM9, ARM64, Aarch64, A64, A78,

Nick Desaulniers

If you’re new to the Arm ecosystem, consider this a quick primer on terms you likely have seen before but might have questions about. The Arm architecture is a family of Reduced Instruction Set Architectures (RISC) with simple addressing modes. Data processing is done on register operands otherwise relying on loads and stores to move data into and out of registers.

article thumbnail

Monoliths are not dinosaurs

All Things Distributed

Building evolvable software systems is a strategy, not a religion. And revisiting your architectures with an open mind is a must.