Hawkins: Diving into the Reasoning Behind our Design System

The Netflix TechBlog

Hawkins is the namesake that established the basis for a design system used across the Netflix Studio ecosystem. A design system, such as the one we developed for the Netflix Studio, can help alleviate most of these headaches. What is a design system?

Design 200

Reinventing virtualization with the AWS Nitro System

All Things Distributed

A great example of this approach to innovation and problem solving is the creation of the AWS Nitro System , the underlying platform for our EC2 instances. Running a business at the scale of Amazon, we often have to solve problems that no other company has faced before.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Chaos Mesh — A Solution for System Resiliency on Kubernetes

DZone

Traditionally we use unit tests and integration tests that guarantee a system is production-ready. To better identify system vulnerabilities and improve resilience, Netflix invented Chaos Monkey , which injects various types of faults into the infrastructure and business systems.

Desupport of monitoring for legacy 32-bit operating systems

Dynatrace

We’re continuously working to support the most popular operating systems with high quality OneAgent deployment options. Of course, it’s possible to run a legacy 32-bit application on a 64-bit operating system. Dynatrace news.

Design Systems and Testability With Applitools

DZone

May 2020, Applitools had the pleasure of hosting Tyler Krupicka from Intuit for an hour-long webinar discussing design systems and testability. At Intuit, Tyler works on the "Player/Design Systems" team, where he focuses on design systems.

Design 130

It’s time to upgrade the PTC System Monitor (PSM)!

Dynatrace

As a PSM system administrator, you’ve relied on AppMon as a preconfigured APM tool for detecting, diagnosing, and repairing problems that impact the operational health of your Windchill application suite. The post It’s time to upgrade the PTC System Monitor (PSM)!

Understanding When to Use a Test Tool vs. a Test System

DZone

Yet, for all the importance that testing has in the SDLC, there is a misconception among many about the difference between a testing tool and a testing system. performance test tools test systemTesting is a mission-critical aspect of the software development lifecycle (SDLC).

How Do You Test A Design System? — Advanced Topics

DZone

How do you test a design system? You got here because you either have a design system or know you need one. Marie Drake , Principal Test Automation Engineer at News UK , presented her webinar, " Roadmap To Testing A Design System ", where she discussed this topic in some detail.

Design 141

Engineering dependability and fault tolerance in a distributed system

High Scalability

This means a system that is not merely available but is also engineered with extensive redundant measures to continue to work as its users expect. In large-scale systems, the assumption has to be that component failures will happen sooner or later.

Free Google Book: Building Secure and Reliable Systems

High Scalability

Google added another book into their excellent SRE series: Building Secure and Reliable Systems. Copy/pasting a few paragraphs: "In this book we talk generally about systems, which is a conceptual way of thinking about the groups of components that cooperate to perform some function.

Google 229

Understanding, detecting and localizing partial failures in large system software

The Morning Paper

Understanding, detecting and localizing partial failures in large system software , Lou et al., Partial failures ( gray failures ) occur when some but not all of the functionalities of a system are broken. overhead in terms of system throughput.

Benchmarking spreadsheet systems

The Morning Paper

Benchmarking spreadsheet systems Rahman et al., They often freeze during computation, and are unable to import datasets well below the size limits posed by current spreadsheet systems. The other systems avoid this recomputation, but are slower than Excel for value-only datasets.

Benefits of Using an Online Bug Tracking System

DZone

When a software program or an application does not work the way it is created or designed to perform, it is called a software bug. In most cases, these errors are caused by developers or designers.

Reinventing virtualization with the AWS Nitro System

All Things Distributed

A great example of this approach to innovation and problem solving is the creation of the AWS Nitro System (Nitro System), the underlying platform for our EC2 instances. Running a business at the scale of Amazon, we often have to solve problems that no other company has faced before.

Deployment challenges with large enterprise systems

Dynatrace

For small deployments, it isn’t a problem however when scaling up to hundred or even thousands of systems things can become complicated. Even when all the systems are mapped correctly by Dynatrace, identifying these systems is a real challenge. When deploying on multiple machines, the one agent will group all the instances of the same system together. Dynatrace will automatically group both systems. System (Tibco, API-gateway, Weblogic, shared-middleTier).

Unlocking Enterprise systems using voice

All Things Distributed

The interfaces to our digital system have been dictated by the capabilities of our computer systems—keyboards, mice, graphical interfaces, remotes, and touch screens. As a result, they fail to deliver a truly seamless and customer-centric experience that integrates our digital systems into our analog lives. All of these benefits make voice a game changer for interacting with all kinds of digital systems.

Wireless attacks on aircraft instrument landing systems

The Morning Paper

Wireless attacks on aircraft instrument landing systems Sathaye et al., Today’s paper is a good reminder of just how important it is becoming to consider cyber threat models in what are primary physical systems, especially if you happen to be flying on an aeroplane – which I am right now as I write this! The first fully operational Instrument Landing System (ILS) for planes was deployed in 1932. USENIX Security Symposium 2019.

MySQL Memory Management, Memory Allocators, and Operating System

DZone

performance mysql memory operating system bug memory management memory allocatorsWhen users experience memory usage issues with any software, including MySQL, their first response is to think that it’s a symptom of a memory leak. As this story will show, this is not always the case. This story is about a bug.

Mutation Testing Systems: Improving the Quality of Tests

DZone

Professionally, I label myself as a developer, although I don’t like labels very much, and I prefer to say that the reason for my work is: to create quality software. But what is quality software ? I like to define it as follows: Quality software is that which meets the user's needs.

What Are Design Systems And How They Help Building Frontend Architectures

Simform

Design systems have turned out to be a boon for such teams as they lend coherency to user experiences across frontends. In this article, you'll learn about the variations of design systems employed by organizations worldwide and their specific characteristics.

Dynatrace and AWS Systems Manager – Automate OneAgent distribution securely, centrally and at scale

Dynatrace

We’re pleased to announce that Dynatrace is among the first set of partners to offer support for AWS Distributor , a capability of AWS Systems Manager, that allows you to select from available popular third party agents to install and manage. What is AWS Systems Manager Distributor?

AWS 194

Systems Performance: Enterprise and the Cloud, 2nd Edition

Brendan Gregg

Eight years ago I wrote _Systems Performance: Enterprise and the Cloud_ (aka the "sysperf" book) on the performance of computing systems, and this year I'm excited to be releasing the second edition. In a way, Systems Performance is volume 1 and BPF Performance Tools is volume 2.

LISA2019 Linux Systems Performance

Brendan Gregg

Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. I've been working on Systems Performance 2nd Edition, now that the [BPF book] is done.

How to Trace Linux System Calls in Production (Without Breaking Performance)

DZone

If you need to dynamically trace Linux process system calls, you might first consider strace. So are there any tools that excel at tracing system calls in a production environment? This blog post introduces perf and traceloop, two commonly used command-line tools, to help you trace system calls in a production environment. strace is simple to use and works well for issues such as "Why can't the software run on this machine?"

Orbital edge computing: nano satellite constellations as a new class of computer system

The Morning Paper

Orbital edge computing: nanosatellite constellations as a new class of computer system , Denby & Lucia, ASPLOS’20. Only space system architects don’t call it request-response, they call it a ‘ bent-pipe architecture.’. Nanosatellite systems have a GSD of around 3.0m/px.

Fleet Management System: Top 5 Benefits You Should Know About

Simform

Gone are the days when fleet managers had to maintain traditional, bulky logbooks and employee records in a single computer system periodically. Today, fleet management systems have brought more transparency and efficiency for organizations, drivers, fleet managers, and customers.

Towards federated learning at scale: system design

The Morning Paper

Towards federated learning at scale: system design Bonawitz et al., This is a high level paper describing Google’s production system for federated learning. The FL system contains a number of privacy-enhancing building blocks, but the privacy guarantees of any end-to-end system will always depend on how they are used. At the core of the system is a federated learning approach called Federated Averaging , with an optional extension for Secure Aggregation.

Checksums in Storage Systems and Why the Enterprise Should Care

DZone

Let’s assume for a moment that your data survives its many passes through a system’s DRAM and emerges intact. That data must then be safely transported over a network to the storage system where it is written to disk. Random bit flips are far more common than most people, even IT professionals, think. Surprisingly, the problem isn’t widely discussed, even though it is silently causing data corruption that can directly impact our jobs, our businesses, and our security.

Build automated self-healing systems with xMatters and Dynatrace (Part 1 of 3)

Dynatrace

In this three-part blog series, we’ll share the following three common problem scenarios that you can easily solve by building an automated self-healing system with Dynatrace and xMatters Flow Designer: Process crash. Depending on the type of Dynatrace issue, xMatters prompts on-call resources with response option buttons that launch workflows across your systems to start the automated self-healing process—and to keep stakeholders and customers updated. Dynatrace news.

Teaching rigorous distributed systems with efficient model checking

The Morning Paper

Teaching rigorous distributed systems with efficient model checking Michael et al., It describes the labs environment, DSLabs , developed at the University of Washington to accompany a course in distributed systems. During the ten week course, students implement four different assignments: an exactly-once RPC protocol; a primary-backup system; Paxos; and a scalable, transactional key-value storage system. A visual debugger/system explorer.

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Uber Engineering

This analysis powers our services and enables the delivery of more seamless and reliable user … The post Scaling Uber’s Apache Hadoop Distributed File System for Growth appeared first on Uber Engineering Blog. Three years ago, Uber Engineering adopted Hadoop as the storage ( HDFS ) and compute ( YARN ) infrastructure for our organization’s big data analysis.

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

The Morning Paper

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems Gan et al., Systems built with lots of microservices have different operational characteristics to those built from a small number of monoliths, we’d like to study and better understand those differences. In this paper we explore the implications microservices have across the cloud system stack. Operating system and network implications.

Machine learning systems are stuck in a rut

The Morning Paper

Machine learning systems are stuck in a rut Barham & Isard, HotOS’19. In this paper we argue that systems for numerical computing are stuck in a local basin of performance and programmability. Systems researchers are doing an excellent job improving the performance of 5-year old benchmarks, but gradually making it harder to explore innovative machine learning research ideas.

Build automated self-healing systems with xMatters and Dynatrace (Part 2 of 3)

Dynatrace

In this alert, xMatters includes all the important incident information from Dynatrace, so there’s no need for you to visit additional system dashboards. Based on this contextual data, resources are prompted with their pre-configured response options, each of which kicks off a workflow across systems (based on the severity of the issue). Depending on the type of the issue, xMatters launches workflows across your systems to start the automated self-healing process.

Software Systems Will Fail

Professor Beekums

Gitlab had a very public outage last month. Most companies provide some kind of explanation when their services are interrupted. Those are usually sanitized (or seem sanitized) to make things seem better than they actually are. Gitlab instead provided an extremely detailed report of the incident as well as all the things they know they could be better at. Some of my friends were extremely troubled by the report.

Amazon Aurora development team wins the 2019 ACM SIGMOD Systems Award

All Things Distributed

This week, the developers of Amazon Aurora have won the 2019 Association for Computing Machinery's (ACM) Special Interest Group on Management of Data (SIGMOD) Systems Award. The award recognizes "an individual or set of individuals for the development of a software or hardware system whose technical contributions have had significant impact on the theory or practice of large-scale data management systems

Watching you watch: the tracking system of over-the-top TV streaming devices

The Morning Paper

Watching you watch: the tracking ecosystem of over-the-top TV streaming devices , Moghaddam et al., CCS’19. The results from this paper are all too predictable: channels on Over-The-Top (OTT) streaming devices are insecure and riddled with privacy leaks.

Migrating a privacy-safe information extraction system to a Software 2.0 design

The Morning Paper

Migrating a privacy-safe information extraction system to a software 2.0 This is a comparatively short (7 pages) but very interesting paper detailing the migration of a software system to a ‘Software 2.0’ system.

How to Kill Processes in Unix/Linux

DZone

There are different options to terminate a process in Unix/Linux flavor of operating systems. java open source devops command line operating systems linux operating system linux tools/utilities system admin unix operating system unix osThis article intends to list and provide examples of each option. You can use the kill command to terminate a process by passing the process id. PID is the process ID of the process that you want to terminate.

Aligning superhuman AI with human behaviour: chess as a model system

The Morning Paper

Aligning superhuman AI with human behavior: chess as a model system , McIlroy-Young et al., Maia thus succeeds at capturing granular human behavior in a tunable way that is qualitatively beyond both traditional engines and self-play neural network systems.

Partitioned Hive Table Across Storage Systems Using Alluxio

DZone

This is where Alluxio comes in and interfaces with applications like Hive as a distributed virtual file system to create tables with multiple partitionings in a different storage system. In this regard, data will always reside in the under-storage system as the source of truth and can be residing temporarily in the Alluxio file system.

Who monitors the monitoring systems?

Adrian Cockcroft

In reality, in any non-trivial installation, there are multiple tools collecting, storing and displaying overlapping sets of metrics from many types of systems and different levels of abstraction. These monitoring systems provide critical observability capabilities that are needed to successfully configure, deploy, debug and troubleshoot installations and applications. What if your monitoring systems fail? How do you even know when a monitoring system has failed?

Article: Using the Plan-Do-Check-Act Framework to Produce Performant and Highly Available Systems

InfoQ Articles

The PDCA (plan-do-check-act) framework can be used to outline the performance, availability, and monitoring to enable teams to ensure performant and highly available applications.