Architecture, Big Data, Blog and Engineering - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data. This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes.

Big Data

Big Data Database Artificial Intelligence Open Source

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

OCTOBER 17, 2018

To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data

Big Data Latency Transportation Traffic

Databook: Turning Big Data into Knowledge with Metadata at Uber

Uber Engineering

AUGUST 3, 2018

Data powers Uber’s global marketplace, enabling more reliable and seamless user experiences across our products for riders, … The post Databook: Turning Big Data into Knowledge with Metadata at Uber appeared first on Uber Engineering Blog.

Big Data

Big Data Transportation Engineering Storage

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

By Vikram Srivastava and Marcelo Mayworm Netflix has one of the most complex data platforms in the cloud on which our data scientists and engineers run batch and streaming workloads. Pensive collects logs for the failed jobs launched by the step from the relevant data platform components and then extracts the stack traces.

Big Data

Big Data Infrastructure Metrics Hardware

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. Rule Execution Engine is responsible for matching the collected logs against a set of predefined rules.

Tuning

Tuning Efficiency Big Data Engineering

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Limited data availability constrains value creation. Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes. Grail addresses today’s challenges of big data and cloud everywhere: Grail is highly scalable, cost-effective, and super-fast.

Analytics

Analytics Artificial Intelligence Storage Serverless

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns. In this blog post, we talk about the landscape and the challenges in workflows at Netflix. Backfill: Backfilling datasets is a common operation in big data processing. data arrives too late to be useful).

Processing

Processing Big Data Efficiency Engineering

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Now, imagine yourself in the role of a software engineer responsible for a micro-service which publishes data consumed by few critical customer facing services (e.g. You are about to make structural changes to the data and want to know who and what downstream to your service will be impacted.

Infrastructure

Infrastructure Big Data Transportation Architecture

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

As Big data and ML became more prevalent and impactful, the scalability, reliability, and usability of the orchestrating ecosystem have increasingly become more important for our data scientists and the company. Meson was based on a single leader architecture with high availability.

Java

Java Scalability Traffic Architecture

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

Dynatrace

JUNE 29, 2022

To drive better outcomes using hybrid cloud architectures, it helps to understand their benefits—and how to orchestrate them seamlessly. What is hybrid cloud architecture? Hybrid cloud architecture is a computing environment that shares data and applications on a combination of public clouds and on-premises private clouds.

Infrastructure

Infrastructure Cloud Azure AWS

What is IT automation?

Dynatrace

JULY 6, 2022

As organizations continue to adopt multicloud strategies, the complexity of these environments grows, increasing the need to automate cloud engineering operations to ensure organizations can enforce their policies and architecture principles. Big data automation tools. appeared first on Dynatrace blog.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Engineering SQL Support on Apache Pinot at Uber

Uber Engineering

JANUARY 15, 2020

Uber leverages real-time analytics on aggregate data to improve the user experience across our products, from fighting fraudulent behavior on Uber Eats to forecasting demand on our platform. .

Engineering

Engineering Analytics Big Data Database

Tackling the Pipeline Problem in the Architecture Research Community

ACM Sigarch

APRIL 8, 2019

Computer architecture is an important and exciting field of computer science, which enables many other fields (eg. big-data processing, machine learning, quantum computing, and so on). For those of us who pursued computer architecture as a career, this is well understood. Why is that? Should we be alarmed as a community?

Architecture

Architecture Open Source Hardware Software Engineering

Streaming SQL in Data Mesh

The Netflix TechBlog

NOVEMBER 3, 2023

Democratizing Stream Processing @ Netflix By Guil Pires , Mark Cho , Mingliang Liu , Sujay Jain Data powers much of what we do at Netflix. On the Data Platform team, we build the infrastructure used across the company to process data at scale.

Processing

Processing Engineering Infrastructure Latency

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

AUGUST 14, 2019

Once identified, … The post Less is More: Engineering Data Warehouse Efficiency with Minimalist Design appeared first on Uber Engineering Blog. In our experience, optimizing for operational efficiency requires answering one key question: for which tables does the maintenance cost supersede utility?

Efficiency

Efficiency Engineering Design Storage

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

Our customers have frequently requested support for this first new batch of services, which cover databases, big data, networks, and computing. See the health of your big data resources at a glance. Azure HDInsight supports a broad range of use cases including data warehousing, machine learning, and IoT analytics.

Azure

Azure Cloud Big Data Virtualization

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Uber Engineering

APRIL 5, 2018

Three years ago, Uber Engineering adopted Hadoop as the storage ( HDFS ) and compute ( YARN ) infrastructure for our organization’s big data analysis.

Systems

Systems Big Data Storage Infrastructure

AIOps observability adoption ascends in healthcare

Dynatrace

MARCH 14, 2022

AIOps (or “AI for IT operations”) uses artificial intelligence so that big data can help IT teams work faster and more effectively. Over the past few years, the company has undergone a digital transformation, migrating to a hybrid, cloud-native environment built on Amazon Web Services and a microservices architecture.

Healthcare

Healthcare Artificial Intelligence Innovation Strategy

What is AIOps? Everything you wanted to know

Dynatrace

OCTOBER 14, 2021

Gartner defines AIOps as the combination of “big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.” But what is AIOps, exactly? And how can it support your organization? What is AIOps? Challenges of traditional AIOps. AIOps use cases.

Artificial Intelligence

Artificial Intelligence DevOps Innovation Metrics

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

16 years ago, our founder Peter Zaitsev covered this topic and some of the points described there are still valid, and we will cover more on this blog. With disks being faster nowadays and CPU and memory resources being cheaper, we could easily say MySQL can handle TBs of data with good performance.

Open Source

Open Source Storage Database Big Data

Uber’s Data Platform in 2019: Transforming Information to Intelligence

Uber Engineering

DECEMBER 17, 2019

Uber’s busy 2019 included our billionth delivery of an Uber Eats order, 24 million miles covered by bike and scooter riders on our platform, and trips to top destinations such as the Empire State Building, the Eiffel Tower, and the … The post Uber’s Data Platform in 2019: Transforming Information to Intelligence appeared first on Uber Engineering (..)

Engineering

Engineering Big Data Infrastructure Analytics

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

All Things Distributed

NOVEMBER 15, 2016

The reality is that many traditional BI solutions are built on top of legacy desktop and on-premises architectures that are decades old. They require teams of data engineers to spend months building complex data models and synthesizing the data before they can generate their first report. Enter Amazon QuickSight.

Analytics

Analytics Availability Media Social Media

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture. Some of the optimizations are prerequisites for a high-performance data warehouse. We will publish a follow-up blog post about AutoAnalyze in the future.

Storage

Storage Latency Efficiency Data Engineering

Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber

Uber Engineering

DECEMBER 10, 2019

Michelangelo , Uber’s machine learning (ML) platform, powers machine learning model training across various use cases at Uber, such as forecasting rider demand , fraud detection , food discovery and recommendation for Uber Eats , and improving the accuracy of … The post Productionizing Distributed XGBoost to Train Deep Tree Models with Large (..)

Engineering

Engineering Big Data Architecture

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Given that I have frequently written about many of these technologies on this blog I asked investor relations to be allowed to reprint it here. To our shareowners: Random forests, naÃ¯ve Bayesian estimators, RESTful services, gossip protocols, eventual consistency, data sharding, anti-entropy, Byzantine quorum, erasure coding, vector clocks.

Technology

Technology Technology AWS Storage

Web Performance Bookshelf

Rigor

JANUARY 13, 2020

When it comes to web content, you can easily find what you need through many different paths, from search engines and social media to playlists and blogs, jumping from one source to another with just a tap of a finger. Information Architecture. Web Performance Daybook-Volume-2. Professional Website Performance.

Performance

Performance Social Media Website Website Performance

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Uber Engineering

OCTOBER 30, 2018

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware.

Hardware

Hardware Infrastructure Engineering Technology

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

This blog post gives a glimpse of the computer systems research papers presented at the USENIX Annual Technical Conference (ATC) 2019, with an emphasis on systems that use new hardware architectures. As a consequence, the vast majority of the papers in the past has usually focused on conventional X86 or GPU-accelerated architectures.

Architecture

Architecture Hardware Cache Storage

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

It adopted Amazon Redshift, Amazon EMR and AWS Lambda to power its data warehouse, big data, and data science applications, supporting the development of product features at a fraction of the cost of competing solutions. Kik Interactive is a Canadian chat platform with hundreds of millions of users around the globe.

AWS

AWS Cloud Lambda Innovation

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

All Things Distributed

NOVEMBER 19, 2010

Big news this week was of course the launch of Cluster GPU instances for Amazon EC2. There were blog posts by Jeff Barr The Cluster GPU Instance and James Hamilton HPC in the Cloud with GPGPUs , as well as my background posting: Expanding the Cloud - Adding the Incredible Power of the Amazon EC2 Cluster GPU Instances. Contact Info.

AWS

AWS Cloud Benchmarking Storage

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

Building general purpose architectures has always been hard; there are often so many conflicting requirements that you cannot derive an architecture that will serve all, so we have often ended up focusing on one side of the requirements that allow you to serve that area really well. From CPU to GPU.

AWS

AWS Latency Programming Architecture

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Uber Engineering

OCTOBER 30, 2018

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware.

Hardware

Hardware Infrastructure Engineering Technology

Around the World in 28 Days - All Things Distributed

All Things Distributed

SEPTEMBER 30, 2010

Visiting future customers is equally exiting as you get a change to understand their current architecture, if it is a migration, and how they plan to exploit cloud services in their new setup. blog comments powered by Disqus. he posts material that doesnt belong on this blog or on twitter. Contact Info. Werner Vogels.

AWS

AWS Storage Cloud Best Practices

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

A key part of the Cloud Drive architecture is a Metadata Service that allows customers to quickly search and organize their digital collections within Cloud Drive. If you are an engineer interested in working on Amazon Cloud Drive and related technologies the team has a number of openings and would love to talk to you! Contact Info.

AWS

AWS Cloud Storage Internet

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Additionally, many high-end HPC applications take advantage of knowing their in-house hardware platforms to achieve major speedup by exploiting the specific processor architecture. Cluster Computer Instances are similar to other Amazon EC2 instances but have been specifically engineered to provide high performance compute and networking.

Cloud

Cloud AWS Automotive Latency

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

There is more than one Werner Vogels in this world and although I never get emails, snail mail or phones calls for any of my peers, I am sure they are somewhat frustrated if they type in our name in a search engine :-). If you want to learn more about Route 53 visit [link] and read the blog post at the AWS Developer weblog.

Cloud

Cloud Internet Internet AWS

Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Trending Sources

Databook: Turning Big Data into Knowledge with Metadata at Uber

Auto-Diagnosis and Remediation in Netflix Data Platform

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Incremental Processing using Netflix Maestro and Apache Iceberg

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

What is IT automation?

Engineering SQL Support on Apache Pinot at Uber

Tackling the Pipeline Problem in the Architecture Research Community

Streaming SQL in Data Mesh

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Scaling Uber’s Apache Hadoop Distributed File System for Growth

AIOps observability adoption ascends in healthcare

What is AIOps? Everything you wanted to know

Why MySQL Could Be Slow With Large Tables

Uber’s Data Platform in 2019: Transforming Information to Intelligence

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

Optimizing data warehouse storage

Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Web Performance Bookshelf

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

The Winds of Architecture Changes at the USENIX ATC 2019

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

Amazon EC2 Cluster GPU Instances - All Things Distributed

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Around the World in 28 Days - All Things Distributed

Music to my Ears - All Things Distributed

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Stay Connected