Big Data, Engineering and Hardware - Technology Performance Pulse

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The design of the in-stream processing engine itself was driven by the following requirements: SQL-like functionality. Strict fault-tolerance is a principal requirement for the engine.

Big Data

Big Data Processing Lambda Database

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges.

Big Data

Big Data Storage Benchmarking Hardware

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

By Vikram Srivastava and Marcelo Mayworm Netflix has one of the most complex data platforms in the cloud on which our data scientists and engineers run batch and streaming workloads. Pensive collects logs for the failed jobs launched by the step from the relevant data platform components and then extracts the stack traces.

Big Data

Big Data Infrastructure Metrics Hardware

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

Besides the traditional system hardware, storage, routers, and software, ITOps also includes virtual components of the network and cloud infrastructure. Although modern cloud systems simplify tasks, such as deploying apps and provisioning new hardware and servers, hybrid cloud and multicloud environments are often complex.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Most Kubernetes clusters in the cloud (73%) are built on top of managed distributions from the hyperscalers like AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). Big data : To store, search, and analyze large datasets, 32% of organizations use Elasticsearch.

Open Source

Open Source Java Operating System Programming

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is an open-source , hardware-agnostic MPP database for analytics, based on PostgreSQL and developed by Pivotal who was later acquired by VMware. Greenplum’s high performance eliminates the challenge most RDBMS have scaling to petabtye levels of data, as they are able to scale linearly to efficiently process data.

Big Data

Big Data Database Artificial Intelligence Open Source

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Uber Engineering

OCTOBER 30, 2018

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware.

Hardware

Hardware Infrastructure Engineering Technology

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Uber Engineering

OCTOBER 30, 2018

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware.

Hardware

Hardware Infrastructure Engineering Technology

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

You can find more information in this good blog from Marco: MySQL Sharding with ProxySQL Vitess: It is an open source database clustering solution created by PlanetScale that is compatible with the MySQL engine. MyRocks: MyRocks is a storage engine developed by Facebook and made open source. It supports native sharding.

Open Source

Open Source Storage Database Big Data

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

Additionally, ITOA gathers and processes information from applications, services, networks, operating systems, and cloud infrastructure hardware logs in real time. Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Dutch Enterprises and The Cloud

All Things Distributed

SEPTEMBER 6, 2013

Shell leverages AWS for big data analytics to help achieve these goals. Shell''s scientists, especially the geophysicists and drilling engineers, frequently use cloud computing to run models.

Cloud

Cloud Energy AWS Healthcare

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Additionally, many high-end HPC applications take advantage of knowing their in-house hardware platforms to achieve major speedup by exploiting the specific processor architecture. Cluster Computer Instances are similar to other Amazon EC2 instances but have been specifically engineered to provide high performance compute and networking.

Cloud

Cloud AWS Automotive Latency

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

Such applications track the inventory of our network gear: what devices, of which models, with which hardware components, located in which sites. Python has long been a popular programming language in the networking space because it’s an intuitive language that allows engineers to quickly solve networking problems.

Open Source

Open Source Network Infrastructure Big Data

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

Dynatrace

JUNE 29, 2022

A hybrid cloud, however, combines public infrastructure and services with on-premises resources or a private data center to create a flexible, interconnected IT environment. Hybrid environments provide more options for storing and analyzing ever-growing volumes of big data and for deploying digital services.

Infrastructure

Infrastructure Cloud Azure AWS

Structural Evolutions in Data

O'Reilly

SEPTEMBER 19, 2023

Each time, the underlying implementation changed a bit while still staying true to the larger phenomenon of “Analyzing Data for Fun and Profit.” ” They weren’t quite sure what this “data” substance was, but they’d convinced themselves that they had tons of it that they could monetize.

Hardware

Hardware Storage Big Data Blockchain

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

However, the data infrastructure to collect, store and process data is geared toward developers (e.g., In AWS’ quest to enable the best data storage options for engineers, we have built several innovative database solutions like Amazon RDS, Amazon RDS for Aurora, Amazon DynamoDB, and Amazon Redshift. Big data challenges.

Cloud

Cloud Big Data AWS Analytics

Välkommen till Stockholm – An AWS Region is coming to the Nordics

All Things Distributed

APRIL 4, 2017

After finding it cost prohibitive to use colocation centers in local markets where their users are based, iZettle decided to give up hardware. Scania, a world leading manufacturer of commercial vehicles, is using AWS to bring advanced technologies to their trucks, buses, coaches, and diesel engines.

AWS

AWS Airlines Latency Games

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

In 2018, we will see new data integration patterns those rely either on a shared high-performance distributed storage interface ( Alluxio ) or a common data format ( Apache Arrow ) sitting between compute and storage. For instance, Alluxio, originally known as Tachyon, can potentially use Arrow as its in-memory data structure.

Big Data

Big Data Artificial Intelligence Storage Hardware

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

AliGraph covers Alibaba’s distributed graph engine supporting the development of new GNN applications. Their dataset has about 7B edges… Meanwhile, AnalyticDB is Alibaba’s real-time OLAP RDBMS handling 10PB of data (in excess of 100 trillion rows!). .” Could it be Analyzing efficient stream processing on modern hardware ?

Blockchain

Blockchain Hardware Google Analytics

Expanding the AWS Cloud – Introducing the AWS Europe (Stockholm) Region

All Things Distributed

DECEMBER 12, 2018

million vehicles in more than 75 countries with services like car locator, engine remote start, driving journal, heater start, and stolen vehicle tracking. The first platform is a real time, big data platform being used for analyzing traffic usage patterns to identify congestion and connectivity issues. They support more than 3.3

AWS

AWS Cloud Games Serverless

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

This lead to the birth of the Graphics Processing Unit (GPU) which was focused on providing a very fine grained parallel model, with processing organized in multiple stages, where the data would flow through.Â Driving down the cost of Big-Data analytics. Introducing the AWS South America (Sao Paulo) Region.

AWS

AWS Latency Programming Architecture

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

This blog post gives a glimpse of the computer systems research papers presented at the USENIX Annual Technical Conference (ATC) 2019, with an emphasis on systems that use new hardware architectures. Intel Quick Assist Technology (QAT) was the focus of the QZFS paper which used this new hardware device to speed up file system compression.

Architecture

Architecture Hardware Cache Storage

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

All Things Distributed

NOVEMBER 15, 2016

They require companies to provision and maintain complex hardware infrastructure and invest in expensive software licenses, maintenance fees, and support fees that cost upwards of thousands of dollars per user per year. It is the underlying engine that allows QuickSight to deliver blazing fast response times on large data sets.

Analytics

Analytics Availability Media Social Media

Tackling the Pipeline Problem in the Architecture Research Community

ACM Sigarch

APRIL 8, 2019

big-data processing, machine learning, quantum computing, and so on). This is arguably a fundamentally hard problem for computer architecture, but efforts towards open source hardware (eg. Her current work focuses on hardware/software co-design for extremely large-scale deep learning training. Lack of Diversity.

Architecture

Architecture Open Source Hardware Software Engineering

Bringing the Magic of Amazon AI and Alexa to Apps on AWS.

All Things Distributed

NOVEMBER 30, 2016

Around 20 years ago, we used machine learning in our recommendation engine to generate personalized recommendations for our customers. The same conversational engine that powers Alexa is now available to any developer, making it easy to bring sophisticated, natural language 'chatbots' to new and existing applications.

AWS

AWS Lambda Artificial Intelligence Mobile

Technology Performance Pulse

In-Stream Big Data Processing

Kubernetes for Big Data Workloads

Trending Sources

Auto-Diagnosis and Remediation in Netflix Data Platform

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Kubernetes in the wild report 2023

What is Greenplum Database? Intro to the Big Data Database

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Why MySQL Could Be Slow With Large Tables

What is IT operations analytics? Extract more data insights from more sources

Dutch Enterprises and The Cloud

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Python at Netflix

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

Structural Evolutions in Data

Expanding the Cloud: Introducing Amazon QuickSight

Välkommen till Stockholm – An AWS Region is coming to the Nordics

5 data integration trends that will define the future of ETL in 2018

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

Expanding the AWS Cloud – Introducing the AWS Europe (Stockholm) Region

Amazon EC2 Cluster GPU Instances - All Things Distributed

The Winds of Architecture Changes at the USENIX ATC 2019

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

Tackling the Pipeline Problem in the Architecture Research Community

Bringing the Magic of Amazon AI and Alexa to Apps on AWS.

Stay Connected