Architecture, Big Data and Engineering - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

In this blog post, we explain what Greenplum is, and break down the Greenplum architecture, advantages, major use cases, and how to get started. It’s architecture was specially designed to manage large-scale data warehouses and business intelligence workloads by giving you the ability to spread your data out across a multitude of servers.

Big Data

Big Data Database Artificial Intelligence Open Source

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The design of the in-stream processing engine itself was driven by the following requirements: SQL-like functionality. Strict fault-tolerance is a principal requirement for the engine.

Big Data

Big Data Processing Lambda Database

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

OCTOBER 17, 2018

To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data

Big Data Latency Transportation Traffic

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Databook: Turning Big Data into Knowledge with Metadata at Uber

Uber Engineering

AUGUST 3, 2018

Data powers Uber’s global marketplace, enabling more reliable and seamless user experiences across our products for riders, … The post Databook: Turning Big Data into Knowledge with Metadata at Uber appeared first on Uber Engineering Blog.

Big Data

Big Data Transportation Engineering Storage

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges.

Big Data

Big Data Storage Benchmarking Hardware

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information. Here are the six steps of a typical ITOA process : Define the data infrastructure strategy. Why use a data lakehouse for causal AI? Why is ITOA important? Apache Spark.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

By Vikram Srivastava and Marcelo Mayworm Netflix has one of the most complex data platforms in the cloud on which our data scientists and engineers run batch and streaming workloads. Pensive collects logs for the failed jobs launched by the step from the relevant data platform components and then extracts the stack traces.

Big Data

Big Data Infrastructure Metrics Hardware

What is software automation? Optimize the software lifecycle with intelligent automation

Dynatrace

JUNE 26, 2023

Software analytics offers the ability to gain and share insights from data emitted by software systems and related operational processes to develop higher-quality software faster while operating it efficiently and securely. This involves big data analytics and applying advanced AI and machine learning techniques, such as causal AI.

Software

Software Software Analytics Big Data

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. Rule Execution Engine is responsible for matching the collected logs against a set of predefined rules.

Tuning

Tuning Efficiency Big Data Engineering

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Causal AI—which brings AI-enabled actionable insights to IT operations—and a data lakehouse, such as Dynatrace Grail , can help break down silos among ITOps, DevSecOps, site reliability engineering, and business analytics teams. Logs are automatically produced and time-stamped documentation of events relevant to cloud architectures.

Analytics

Analytics Infrastructure Storage Efficiency

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Defining Hybrid Cloud Strategy The decision-making process about where to situate data and applications is vital to any hybrid cloud solution. Defining Hybrid Cloud Strategy The decision-making process about where to situate data and applications is vital to any hybrid cloud solution.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Limited data availability constrains value creation. Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes. Grail addresses today’s challenges of big data and cloud everywhere: Grail is highly scalable, cost-effective, and super-fast.

Analytics

Analytics Artificial Intelligence Storage Serverless

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on. One of the top trending open-source data storage that responds to most of the use cases is Elasticsearch.

Big Data

Big Data Government Open Source Storage

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

Dynatrace

JUNE 29, 2022

To drive better outcomes using hybrid cloud architectures, it helps to understand their benefits—and how to orchestrate them seamlessly. What is hybrid cloud architecture? Hybrid cloud architecture is a computing environment that shares data and applications on a combination of public clouds and on-premises private clouds.

Infrastructure

Infrastructure Cloud Azure AWS

Engineering SQL Support on Apache Pinot at Uber

Uber Engineering

JANUARY 15, 2020

Uber leverages real-time analytics on aggregate data to improve the user experience across our products, from fighting fraudulent behavior on Uber Eats to forecasting demand on our platform. .

Engineering

Engineering Analytics Big Data Database

What is container orchestration?

Dynatrace

MARCH 24, 2023

Docker Swarm First introduced in 2014 by Docker, Docker Swarm is an orchestration engine that popularized the use of containers with developers. The Docker file format is used broadly for orchestration engines, and Docker Engine ships with Docker Swarm and Kubernetes frameworks included. The post What is container orchestration?

Infrastructure

Infrastructure Open Source Operating System Cloud

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Key Takeaways Redis offers complex data structures and additional features for versatile data handling, while Memcached excels in simplicity with a fast, multi-threaded architecture for basic caching needs. Memcached shines in scenarios where a simple, fast, and efficient caching solution is required without data persistence.

Cache

Cache Storage Scalability Architecture

What is IT automation?

Dynatrace

JULY 6, 2022

As organizations continue to adopt multicloud strategies, the complexity of these environments grows, increasing the need to automate cloud engineering operations to ensure organizations can enforce their policies and architecture principles. Big data automation tools. Batch process automation.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Tackling the Pipeline Problem in the Architecture Research Community

ACM Sigarch

APRIL 8, 2019

Computer architecture is an important and exciting field of computer science, which enables many other fields (eg. big-data processing, machine learning, quantum computing, and so on). For those of us who pursued computer architecture as a career, this is well understood. Why is that? Should we be alarmed as a community?

Architecture

Architecture Open Source Hardware Software Engineering

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. These two narratives of reference architecture and ingestion/indexing system are interwoven throughout the paper. Why do we need a new reference architecture?

Cloud

Cloud Big Data Latency Architecture

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

AUGUST 14, 2019

Once identified, … The post Less is More: Engineering Data Warehouse Efficiency with Minimalist Design appeared first on Uber Engineering Blog. In our experience, optimizing for operational efficiency requires answering one key question: for which tables does the maintenance cost supersede utility?

Efficiency

Efficiency Engineering Design Storage

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Most Kubernetes clusters in the cloud (73%) are built on top of managed distributions from the hyperscalers like AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). Big data : To store, search, and analyze large datasets, 32% of organizations use Elasticsearch.

Open Source

Open Source Java Operating System Programming

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

AIOps (artificial intelligence for IT operations) combines big data, AI algorithms, and machine learning for actionable, real-time insights that help ITOps continuously improve operations. ITOps vs. AIOps. The three core components of an AIOps solution are the following: 1. ” The post What is ITOps?

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Flow Exporter The Flow Exporter is a sidecar that uses eBPF tracepoints to capture TCP flows at near real time on instances that power the Netflix microservices architecture. After several iterations of the architecture and some tuning, the solution has proven to be able to scale. What is BPF?

Network

Network Transportation AWS Cloud

Exploratory analytics and collaborative analytics capabilities democratize insights across teams

Dynatrace

APRIL 25, 2023

Having access to large data sets can be helpful, but only if organizations are able to leverage insights from the information. These analytics can help teams understand the stories hidden within the data and share valuable insights. and only they have access.”

Analytics

Analytics Big Data Media Operating System

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

Today’s streaming analytics architectures are not equipped to make sense of this rapidly changing information and react to it as it arrives. This data is also periodically uploaded to a data lake for offline batch analysis that calculates key statistics and looks for big trends that can help optimize operations.

IoT

IoT Analytics Big Data Architecture

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

Our customers have frequently requested support for this first new batch of services, which cover databases, big data, networks, and computing. See the health of your big data resources at a glance. Azure HDInsight supports a broad range of use cases including data warehousing, machine learning, and IoT analytics.

Azure

Azure Cloud Big Data Virtualization

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Uber Engineering

APRIL 5, 2018

Three years ago, Uber Engineering adopted Hadoop as the storage ( HDFS ) and compute ( YARN ) infrastructure for our organization’s big data analysis.

Systems

Systems Big Data Storage Infrastructure

Microsoft Engineering loves SQLBits

SQL Server According to Bob

FEBRUARY 15, 2018

Microsoft engineering is actually sending quite a few folks over the Atlantic to come talk about SQL Server 2017, SQL Server on Linux, GDPR, Performance, Security, Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse, and Azure CosmosDB. Best practices on Building a Big Data Analytics Solution – Michael Rys.

Engineering

Engineering Azure Best Practices Servers

AIOps observability adoption ascends in healthcare

Dynatrace

MARCH 14, 2022

AIOps (or “AI for IT operations”) uses artificial intelligence so that big data can help IT teams work faster and more effectively. Over the past few years, the company has undergone a digital transformation, migrating to a hybrid, cloud-native environment built on Amazon Web Services and a microservices architecture.

Healthcare

Healthcare Artificial Intelligence Innovation Strategy

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! Learn to balance architecture trade-offs and design scalable enterprise-level software. Please apply here.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! Learn to balance architecture trade-offs and design scalable enterprise-level software. Please apply here.

Education

Education Software Engineering Scalability Engineering

What is AIOps? Everything you wanted to know

Dynatrace

OCTOBER 14, 2021

Gartner defines AIOps as the combination of “big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.” But what is AIOps, exactly? And how can it support your organization? What is AIOps? Challenges of traditional AIOps. AIOps use cases.

Artificial Intelligence

Artificial Intelligence DevOps Innovation Metrics

Applying real-world AIOps use cases to your operations

Dynatrace

OCTOBER 17, 2022

Artificial intelligence for IT operations, or AIOps, combines big data and machine learning to provide actionable insight for IT teams to shape and automate their operational strategy. The four stages of data processing. AIOps supports that with the ability to assess applications during development, delivery, and deployment.

DevOps

DevOps Artificial Intelligence Healthcare Innovation

Use Digital Twins for the Next Generation in Telematics

ScaleOut Software

NOVEMBER 24, 2020

However, telematics architectures face challenges in responding to telemetry in real time. Current Telematics Architecture. The volume of incoming telemetry challenges current telematics systems to keep up and quickly make sense of all the data. Challenges for Current Architectures.

Analytics

Analytics Architecture Scalability Software Architecture

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A common theme across all these trends is to remove the complexity by simplifying data management as a whole. In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture.

Big Data

Big Data Artificial Intelligence Storage Hardware

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

ProxySQL: It is a feature-rich open-source MySQL proxy solution, that allows query routing for the most common MySQL architectures (PXC/Galera, Replication, Group Replication, etc.). MyRocks: MyRocks is a storage engine developed by Facebook and made open source. It supports native sharding being transparent for the application.

Open Source

Open Source Storage Database Big Data

Expanding the Cloud: Introducing the AWS Asia Pacific (Mumbai) Region

All Things Distributed

JUNE 26, 2016

AdiMap uses Amazon Kinesis to process real-time streaming online ad data and job feeds, and processes them for storage in petabyte-scale Amazon Redshift. Advanced problem solving that connects big data with machine learning. A workflow engine to drive business decisions. We are at the cusp of a dramatic age of technology.

AWS

AWS Cloud Healthcare Blockchain

Uber’s Data Platform in 2019: Transforming Information to Intelligence

Uber Engineering

DECEMBER 17, 2019

Uber’s busy 2019 included our billionth delivery of an Uber Eats order, 24 million miles covered by bike and scooter riders on our platform, and trips to top destinations such as the Empire State Building, the Eiffel Tower, and the … The post Uber’s Data Platform in 2019: Transforming Information to Intelligence appeared first on Uber Engineering (..)

Engineering

Engineering Big Data Infrastructure Analytics

Web Performance Bookshelf

Rigor

JANUARY 13, 2020

When it comes to web content, you can easily find what you need through many different paths, from search engines and social media to playlists and blogs, jumping from one source to another with just a tap of a finger. Information Architecture. Web Performance Daybook-Volume-2. Professional Website Performance. Web Performance Tuning.

Performance

Performance Social Media Website Website Performance

Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber

Uber Engineering

DECEMBER 10, 2019

Michelangelo , Uber’s machine learning (ML) platform, powers machine learning model training across various use cases at Uber, such as forecasting rider demand , fraud detection , food discovery and recommendation for Uber Eats , and improving the accuracy of … The post Productionizing Distributed XGBoost to Train Deep Tree Models with Large (..)

Engineering

Engineering Big Data Architecture

What is Greenplum Database? Intro to the Big Data Database

In-Stream Big Data Processing

Trending Sources

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Data Engineers of Netflix?—?Interview with Samuel Setegne

Databook: Turning Big Data into Knowledge with Metadata at Uber

Kubernetes for Big Data Workloads

What is IT operations analytics? Extract more data insights from more sources

Auto-Diagnosis and Remediation in Netflix Data Platform

What is software automation? Optimize the software lifecycle with intelligent automation

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Conducting log analysis with an observability platform and full data context

Mastering Hybrid Cloud Strategy

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

How to Optimize Elasticsearch for Better Search Performance

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

Engineering SQL Support on Apache Pinot at Uber

What is container orchestration?

Redis vs Memcached in 2024

What is IT automation?

Tackling the Pipeline Problem in the Architecture Research Community

Helios: hyperscale indexing for the cloud & edge – part 1

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Kubernetes in the wild report 2023

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

How Netflix uses eBPF flow logs at scale for network insight

Exploratory analytics and collaborative analytics capabilities democratize insights across teams

The Need for Real-Time Device Tracking

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Microsoft Engineering loves SQLBits

AIOps observability adoption ascends in healthcare

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

What is AIOps? Everything you wanted to know

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Applying real-world AIOps use cases to your operations

Use Digital Twins for the Next Generation in Telematics

5 data integration trends that will define the future of ETL in 2018

Why MySQL Could Be Slow With Large Tables

Expanding the Cloud: Introducing the AWS Asia Pacific (Mumbai) Region

Uber’s Data Platform in 2019: Transforming Information to Intelligence

Web Performance Bookshelf

Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber

Stay Connected