Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
Data / ML, Engineering

Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber

December 10, 2019 / Global
Featured image for Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber
Figure 1. A typical, high-level XGBoost training workflow consists of Feature Transformation and Cross-Validation Split phases, as well as steps that turn raw data into actionable insights through model training.
Figure 2. While the Apache Spark SparseVector leverages a tiered structure, the DenseVector features a sequential storage structure.
Figure 3. Michelangelo computes model feature importance scores to better understand which features are most valuable to a given ML model’s performance.
Figure 4. A typical Michelangelo model training workflow consists of feature transformation, training, and post-training stages. Over time, we have scaled and optimized the Michelangelo workflow to feature various Apache Spark settings, which will enable greater adaptability and flexibility during the model training process, delivering more accurate results.
Joseph Wang

Joseph Wang

Joseph Wang serves as a Principal Software Engineer on the AI Platform team at Uber, based in San Francisco. His notable achievements encompass designing the Feature Store, expanding the capacity of the real-time prediction service, developing a robust model platform, and improving the performance of key models. Presently, Wang is focusing his expertise on advancing the domain of generative AI.

Anne Holler

Anne Holler

Anne Holler is a former staff TLM for machine learning framework on Uber's Machine Learning Platform team. She was based in Sunnyvale, CA. She worked on ML model representation and management, along with training and offline serving reliability, scalability, and tuning.

Mingshi Wang

Mingshi Wang

Mingshi Wang is a senior software engineer on Uber's Machine Learning Platform team.

Michael Mui

Michael Mui

Michael Mui is a Staff Software Engineer on Uber AI's Machine Learning Platform team. He works on the distributed training infrastructure, hyperparameter optimization, model representation, and evaluation. He also co-leads Uber’s internal ML Education initiatives.

Posted by Joseph Wang, Anne Holler, Mingshi Wang, Michael Mui