article thumbnail

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes. Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment. High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages.

Big Data 321
article thumbnail

An overview of end-to-end entity resolution for big data

The Morning Paper

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. Learning-based methods train classifiers for pruning. ACM Computing Surveys, Dec. 2020, Article No.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. Auto Remediation generates recommendations by considering both performance (i.e., Multi-objective optimizations.

Tuning 210
article thumbnail

Experiences with approximating queries in Microsoft’s production big-data clusters

The Morning Paper

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of users to run some pretty complex queries. Creating training datasets for machine learning ! VLDB’19.

article thumbnail

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., Finally, we show that Seer can identify application level design bugs, and provide insights on how to better architect microservices to achieve predictable performance. ASPLOS’19.

article thumbnail

Advancing Application Performance With NVMe Storage, Part 2

DZone

Using local SSDs inside of the GPU node delivers fast access to data during training, but introduces challenges that impact the overall solution in terms of scalability, data access, and data protection.

Storage 100
article thumbnail

What is IT automation?

Dynatrace

AI that is based on machine learning needs to be trained. This requires significant data engineering efforts, as well as work to build machine-learning models. This kind of automation can support key IT operations, such as infrastructure, digital processes, business processes, and big-data automation.