Remove Data Engineering Remove Database Remove Development Remove Infrastructure
article thumbnail

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

article thumbnail

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

Central engineering teams enable this operational model by reducing the cognitive burden on innovation teams through solutions related to securing, scaling and strengthening (resilience) the infrastructure. All these micro-services are currently operated in AWS cloud infrastructure.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Ready-to-go sample data pipelines with Dataflow

The Netflix TechBlog

Dataflow Dataflow is a command line utility built to improve experience and to streamline the data pipeline development at Netflix. The most commonly used one is dataflow project , which helps folks in managing their data pipeline repositories through creation, testing, deployment and few other activities. test_sparksql_write.py

article thumbnail

5 key areas for tech leaders to watch in 2020

O'Reilly

This year’s growth in Python usage was buoyed by its increasing popularity among data scientists and machine learning (ML) and artificial intelligence (AI) engineers. Software architecture, infrastructure, and operations are each changing rapidly. Also: infrastructure and operations is trending up, while DevOps is trending down.

article thumbnail

Optimizing data warehouse storage

The Netflix TechBlog

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. Some of the optimizations are prerequisites for a high-performance data warehouse. Both automatic (event-driven) as well as manual (ad-hoc) optimization.

Storage 203
article thumbnail

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

All Things Distributed

As I mentioned, we live in a world where massive volumes of data are being generated, every day, from connected devices, websites, mobile apps, and customer applications running on top of AWS infrastructure. Put simply, data is not always readily available and accessible to organizational end users.

Analytics 152
article thumbnail

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

However, the data infrastructure to collect, store and process data is geared toward developers (e.g., Amazon Redshift, DynamoDB, Amazon EMR) whereas insights need to be derived by not just developers but also non-technical business users. SPICE automatically replicates data for high availability and performance.

Cloud 137