SQL Server Hardware Optimization

SQL Server Performance

An important concern in optimizing the hardware platform is hardware components that restrict performance, known as bottlenecks. General DBA Performance Tuning hardwareQuite often, the problem isn’t correcting performance bottlenecks as much as it is identifying them in the first place. Start with obtaining a performance baseline. You monitor the server over time so that you can determine Server average […].

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

The Morning Paper

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems Gan et al., Hardware implications. ASPLOS’19.

Using hardware performance counters to determine how often both logical processors are active on an Intel CPU

John McCalpin

Most Intel microprocessors support “HyperThreading” (Intel’s trademark for their implementation of “simultaneous multithreading”) — which allows the hardware to support (typically) two “Logical Processors” for each physical core. Last year I was trying to diagnose a mild slowdown in a code, and wanted to be able to use the hardware performance counters to divide processor activity into four categories: Neither Logical Processor active.

C&B Session: atomic Weapons – The C++11 Memory Model and Modern Hardware

Sutter's Mill

atomic<> Weapons: The C++11 Memory Model and Modern Hardware. We’ll include clear answers to several FAQs: “how do the compiler and hardware cooperate to remember how to respect these rules?”, “what is a race condition?”, C++ Hardware Software Development Talks & Events

A Brief Guide of xPU for AI Accelerators

ACM Sigarch

HPU: Holographic Processing Unit (HPU) is the specific hardware of Microsoft’s Hololens. SPU: Stream Processing Unit (SPU) is related to the specialized hardware to process the data streams of video.

Compress objects, not cache lines: an object-based compressed memory hierarchy

The Morning Paper

… to realize these insights, hardware needs to access data at object granularity and must have control over pointers between objects. Hotpads is a hardware-managed hierarchy of scratchpad-like memories called pads. Uncategorized Hardware Operating Systems

Cache 86

Boosted race trees for low energy classification

The Morning Paper

The goal is to produce a low-energy hardware classifier for embedded applications doing local processing of sensor data. Race logic has four primary operations that are easy to implement in hardware: MAX, MIN, ADD-CONSTANT, and INHIBIT. Uncategorized Hardware Machine Learning

Approaches to System Security: Using Cryptographic Techniques to Minimize Trust

ACM Sigarch

This is the first post in a series of posts on different approaches to systems security especially as they apply to hardware and architectural security. Naively securing this system would require a large amount of trust; “guns and guards”, trusted personnel, and trusted software and hardware.

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Uber Engineering

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware. Architecture Apache Hadoop Apache Spark Big Data Capacity Planning Cassandra Cluster Management Data Center Hardware MySQL Peloton Redis Uber Uber Engineering Unified Resource Scheduler Workload Cluster

Why I hate MPI (from a performance analysis perspective)

John McCalpin

According to Dr. Bandwidth, performance analysis has two recurring themes: How fast should this code (or “simple” variations on this code) run on this hardware? This can start with either a “top-down” or “bottom-up” approach, but in complex codes running on complex hardware, what is really required is both approaches — iterated until the interactions between all the components are understood. The networking hardware.

From bare-metal to Kubernetes

High Scalability

Hardware infrastructure. This is a guest post by Hugues Alary , Lead Engineer at Betabrand , a retail clothing company and crowdfunding platform, based in San Francisco. This article was originally published here. Early infrastructure. Rackspace. The scalability and maintainability issue.

Retail 271

James Hamilton on reliability

Sutter's Mill

Don’t trust hardware or software; then you can build trustworthy hardware and software. Hardware Software DevelopmentJames Hamilton on how to write reliable software in a world where anything that can fail, will fail.

Invited Talk at SuperComputing 2016!

John McCalpin

Computer Architecture Computer Hardware Performance cache DRAM high performance computing memory bandwidth memory latency STREAM benchmark“Memory Bandwidth and System Balance in HPC Systems” If you are planning to attend the SuperComputing 2016 conference in Salt Lake City next month, be sure to reserve a spot on your calendar for my talk on Wednesday afternoon (4:15pm-5:00pm).

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

John McCalpin

Hardware performance counter results for a simple benchmark code calling Intel’s optimized DGEMM implementation for this processor (from the Intel MKL library) show that about 20% of the dynamic instruction count consists of instructions that are not packed SIMD operations (i.e., This is an uninspiring fraction of peak performance that would normally suggest significant inefficiencies in either the hardware or software.

Intel discloses “vector+SIMD” instructions for future processors

John McCalpin

It seems very likely that the hardware has to be able to merge these two load operations into a single L1 Data Cache access to keep the rate of cache accesses from being the performance bottleneck. But 2 32-bit loads is only 1/8 of a natural 512-bit cache access, and it seems unlikely that the hardware can merge cache accesses across multiple cycles. Algorithms Computer Architecture Computer Hardware Performance arithmetic high performance computing microprocessors

Cache 40

Memory Latency on the Intel Xeon Phi x200 “Knights Landing” processor

John McCalpin

Cache Coherence Implementations Computer Architecture Computer Hardware Performance memory bandwidth memory latency Xeon PhiThe Xeon Phi x200 (Knights Landing) has a lot of modes of operation (selected at boot time), and the latency and bandwidth characteristics are slightly different for each mode.

Database Metrics

SQL Shack

This data can include hardware statistics, such as measures of CPU or memory consumed over time. Summary There is a multitude of database metrics that we can collect and use to help us understand database and server resource consumption, as well as overall usage.

The future of synthetic testing is in the cloud

Dynatrace

When we wanted to add a location, we had to ship hardware and get someone to install that hardware in a rack with power and network. Hardware was outdated. Fixed hardware is a single point of failure – even when we had redundant machines. Dynatrace news.

Cloud 156

Welcome to the Jungle

Sutter's Mill

Now welcome to the hardware jungle. For the first time in the history of computing, mainstream hardware is no longer a single-processor von Neumann machine, and never will be again. Concurrency Hardware Opinion & Editorial Software Development

Games 52

Talk Video: Welcome to the Jungle

Sutter's Mill

Now welcome to the hardware jungle. Concurrency Hardware Software Development Talks & Events WebLast month in Kansas City I gave a talk on “Welcome to the Jungle,” based on my recent essay of the same name (sequel to “The Free Lunch Is Over”) concerning the turn to mainstream heterogeneous distributed computing and the end of Moore’s Law.

Cloud 40

Talk Video: Welcome to the Jungle (60 min version + Q&A)

Sutter's Mill

Now welcome to the hardware jungle. Cloud Concurrency Hardware Software Development Talks & EventsWhile visiting Facebook earlier this month, I gave a shorter version of my “Welcome to the Jungle” talk, based on the eponymous WttJ article.

Keynote at the AMD Fusion Developer Summit

Sutter's Mill

We know that getting full computational performance out of most machines—nearly all desktops and laptops, most game consoles, and the newest smartphones—already means harnessing local parallel hardware, mainly in the form of multicore CPU processing. You can expect the above keynote to be, well, keynote-y… oriented toward software product features and of course AMD’s hardware, with plenty of forward-looking industry vision style material.

Time to First Byte: What It Is and Why It Matters

CSS Wizardry

Two Sessions: C++ Concurrency and Parallelism – 2012 State of the Art (and Standard)

Sutter's Mill

Mainstream hardware – many kinds of parallelism: What’s the relationship among multi-core CPUs, hardware threads , SIMD vector units (Intel SSE and AVX , ARM Neon ), and GPGPU (general-purpose computation on GPUs, which I covered at C++ and Beyond 2011 )? Task and data parallelism: What’s the difference between task parallelism and data parallelism, which kind of of hardware does each allow you to exploit, and why?

C++ 40

C++ AMP keynote is online

Sutter's Mill

Portable: It allows shipping a single EXE that can use any combination of GPU vendors’ hardware. The initial implementation uses DirectCompute and supports all devices that are DX11 capable; DirectCompute is just an implementation detail of the first release, and the model can (and I expect will) be implemented to directly talk to any interesting hardware. More to come… C++ Concurrency Hardware Microsoft Software Development Talks & Events

C++ 40

QA Mentor Helps Clients Optimize Apps and Websites

QAMentor

The technology industry has made leaps and bounds in the last decade- in fact, so much that it’s hard to make sure all the new software and hardware available is safe and of good quality.

Why IT Needs to Look at the Network Through a 4-D Lens

DZone

This includes retiring legacy hardware and rethinking network architectures from the top-down to help facilitate a new wave of agile, cloud-delivered solutions and workflows. Someone trying to look at the network through a 4-D lens.

Faster remainders when the divisor is a constant: beating compilers and libdivide

Daniel Lemire

The division by a power of two ( / (2 N )) can be implemented as a right shift if we are working with unsigned integers, which compiles to single instruction: that is possible because the underlying hardware uses a base 2. Not all instructions on modern processors cost the same. Additions and subtractions are cheaper than multiplications which are themselves cheaper than divisions. For this reason, compilers frequently replace division instructions by multiplications.

A case for managed and model-less inference serving

The Morning Paper

As we saw with the SOAP paper last time out, even with a fixed model variant and hardware there are a lot of different ways to map a training workload over the available hardware. Expectation 2: The choice of hardware should be hidden behind the same high level API for users.

Scaling Benchmarks With More Robust UseNUMA Flag in OpenJDK

DZone

What happens when you run a Java application without checking your hardware configuration? Obviously, your application lags in terms of performance.

Two More C&B Sessions: C++0x Memory Model (Scott) and Exceptional C++0x (me)

Sutter's Mill

C++ Hardware Software Development Talks & EventsScott Meyers, Andrei Alexandrescu and I are continuing to craft and announce the technical program for C++ and Beyond (C&B) 2011 , and two more sessions are now posted. All talks are brand-new material created specifically for C&B 2011. Here are short blurbs; follow the links for longer descriptions.

C++ 40

Application Scalability — How To Do Efficient Scaling

DZone

It’s not just a simple tweak you can turn on/off; it’s a long-time process that touches almost every single item in your stack, including both hardware and software sides of the system.

Synthetic monitoring of internal applications extended to Windows-based ActiveGates!

Dynatrace

Compliance with hardware requirements. Dynatrace news.

How to maximize CPU performance for PostgreSQL 12.0 benchmarks on Linux

HammerDB

cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 1000 MHz - 4.00

RPCValet: NI-driven tail-aware balancing of µs-scale RPCs

The Morning Paper

It’s designed for “ emerging architectures featuring fully integrated NIs and hardware-terminated transport protocols.” ” The key hardware feature is that the network interface has direct access to the server’s memory hierarchy, eliminating round trips over e.g. PCIe.

Time protection: the missing OS abstraction

The Morning Paper

The paper sets out what we can do in software given today’s hardware, and along the way also highlights areas where cooperation from hardware will be needed in the future. The most obvious weakness of current hardware in this regard is in the interconnects.

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

The idea CFS operates by very frequently (every few microseconds) applying a set of heuristics which encapsulate a general concept of best practices around CPU hardware use. By Benoit Rostykus, Gabriel Hartmann Noisy Neighbors We’ve all had noisy neighbors at one point in our life.

Cache 275

How Do You Improve Network Agility?

DZone

Network Agility — the volume of change in the network over a period of time — the capability for software and hardware components to automatically configure and control itself in a complex networking ecosystem. What Is Network Agility? The rise of innovative efforts made by several vendors to expand and modernize network device interfaces is improving network agility and is seen with emerging technologies such as SD-WAN, SDN, NFV, and intent-based networking.

Stuff The Internet Says On Scalability For February 22nd, 2019

High Scalability

Wake up! It's HighScalability time: Isn't inetd a better comp? link ). Do you like this sort of Stuff? I'd greatly appreciate your support on Patreon. Know anyone who needs cloud? I wrote Explain the Cloud Like I'm 10 just for them. It has 39 mostly 5 star reviews.

Why Do We Need the Volatile Keyword?

DZone

Even if my application runs in the cloud on the JVM, despite all of those software layers abstracting away the underlying hardware, the volatile keyword is still needed due to the cache of the processor that my software runs on. What fascinates me most about the volatile keyword is that it is still necessary, for me, because my software still runs on a silicon chip.

Cache 130

Amazon Aurora development team wins the 2019 ACM SIGMOD Systems Award

All Things Distributed

The award recognizes "an individual or set of individuals for the development of a software or hardware system whose technical contributions have had significant impact on the theory or practice of large-scale data management systems

How Do You Improve Network Agility?

DZone

Network agility is represented by the volume of change in the network over a period of time and is defined as the capability for software and hardware component’s to automatically configure and control itself in a complex networking ecosystem. Organizations are in search of improving network agility, but what exactly does this mean?

The Three Types of Performance Testing

CSS Wizardry

You’re out on the world wide web—you have no idea who is turning up to the site, what their context is, what hardware, software, or infrastructure they’re using, or anything.