article thumbnail

An overview of end-to-end entity resolution for big data

The Morning Paper

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. Learning-based methods train classifiers for pruning. Open source ER systems. ACM Computing Surveys, Dec.

article thumbnail

Experiences with approximating queries in Microsoft’s production big-data clusters

The Morning Paper

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of users to run some pretty complex queries. Creating training datasets for machine learning ! VLDB’19.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. In particular, we use a simple Feedforward Multilayer Perceptron (MLP) with two heads, one to predict each outcome.

Tuning 210
article thumbnail

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., Seer is an online system that observes the behaviour of cloud applications (using the DeathStarBench microservices for the evaluation) and predicts when QoS violations may be about to occur. ASPLOS’19.

article thumbnail

What is IT automation?

Dynatrace

While automating IT practices can save administrators a lot of time, without AIOps, the system is only as intelligent as the humans who program it. AI that is based on machine learning needs to be trained. This requires significant data engineering efforts, as well as work to build machine-learning models.

article thumbnail

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). Check out Educative.io's bestselling new 4-course learning track: Scalability and System Design for Developers. Who's Hiring? InterviewCamp.io Try now !

Education 105
article thumbnail

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). Check out Educative.io's bestselling new 4-course learning track: Scalability and System Design for Developers. Who's Hiring? InterviewCamp.io Try now !

Education 102