article thumbnail

An overview of end-to-end entity resolution for big data

The Morning Paper

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. Learning-based methods train classifiers for pruning. ACM Computing Surveys, Dec. 2020, Article No.

article thumbnail

Experiences with approximating queries in Microsoft’s production big-data clusters

The Morning Paper

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of users to run some pretty complex queries. Creating training datasets for machine learning ! VLDB’19.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber

Uber Engineering

Michelangelo , Uber’s machine learning (ML) platform, powers machine learning model training across various use cases at Uber, such as forecasting rider demand , fraud detection , food discovery and recommendation for Uber Eats , and improving the accuracy of … The post Productionizing Distributed XGBoost to Train Deep Tree Models with Large (..)

article thumbnail

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. In particular, we use a simple Feedforward Multilayer Perceptron (MLP) with two heads, one to predict each outcome.

Tuning 210
article thumbnail

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., A DNN model is trained to recognise patterns in space and time that lead to QoS violations. This retraining uses transfer learning with weights from previous training rounds stored on disk as a starting point.

article thumbnail

What is IT automation?

Dynatrace

AI that is based on machine learning needs to be trained. This requires significant data engineering efforts, as well as work to build machine-learning models. This kind of automation can support key IT operations, such as infrastructure, digital processes, business processes, and big-data automation.

article thumbnail

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results. Greenplum features a cost-based query optimizer for large-scale, big data workloads. Query Optimization.

Big Data 321