article thumbnail

Experiences with approximating queries in Microsoft’s production big-data clusters

The Morning Paper

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of users to run some pretty complex queries. VLDB’19. For the larger more production-like query analysed in §4.2.1,

article thumbnail

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results. Greenplum features a cost-based query optimizer for large-scale, big data workloads. Query Optimization.

Big Data 321
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

The Need for Real-Time Device Tracking

ScaleOut Software

And it can maintain contextual information about every data source (like the medical history of a device wearer or the maintenance history of a refrigeration system) and keep it immediately at hand to enhance the analysis.

IoT 78
article thumbnail

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

Let’s explore what constitutes a data lakehouse, how it works, its pros and cons, and how it differs from data lakes and data warehouses. What is a data lakehouse? Data warehouses offer a single storage repository for structured data and provide a source of truth for organizations. Data management.

article thumbnail

Mastering Hybrid Cloud Strategy

Scalegrid

It provides significant advantages that include: Offering scalability to support business expansion Speeding up the execution of business plans Stimulating innovation throughout the company Boosting organizational flexibility, enabling quick adaptation to changing market conditions and competitive pressures.

Strategy 130
article thumbnail

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

Microsoft have a paper describing their new recovery mechanism in Azure SQL Database , the key feature being that it can recovery in constant time. Hyper Dimension Shuffle describes how Microsoft improved the cost of data shuffling, one of the most costly operations, in their petabyte-scale internal big data analytics platform, SCOPE.