Profile
Machine Learning Engineer with 10+ years of experience building scalable ML platforms and infrastructure. Currently Tech Lead of ML Data team at Adobe Firefly GenAI, specializing in large-scale dataset creation and AI platform development for text-to-image and text-to-video model training. Proven track record of optimizing ML training pipelines, achieving significant cost savings, and leading cross-functional teams. Expert in PyTorch, distributed systems, and NLP platforms with extensive open-source contributions.
Experience

Machine Learning Engineer(P5.5) at Adobe

May 2024 to Current at Adobe, Seattle, WA

Tech Lead of ML Data team in Firefly GenAI. Working closely with training team & researchers, leading the effort of creating commercial-safe datasets and building a scalable AI Platform for text-to-image & text-to-video model training.

  • Created a Data Hub to consolidate and manage Firefly's internal datasets.
  • Optimized existing data pipelines, rewrote research-style code to production grade, generated the largest image & video ML datasets in Adobe.
  • Initiated a new paradigm of Just-in-Time Embedding Computation with cache during training. It enabled foundation model training to a larger scale with thousands of GPUs and 4k+ resolutions. It saved 30%+ of the training cost. Will present it in Ray Summit 2025.

Staff Machine Learning Engineer at Cruise

Jan 2023 to May 2024 at Cruise, Bellevue, WA

Tech Lead of Training Scalability team. Led the training and eval optimization workstreams for core perception and behavior models for self-driving cars.

  • Optimized the largest perception model at Cruise. Enabled the migration to modern GPUs and transformer architectures. Achieved ~22X speedup and 3+ millions $ YoY cost savings.
  • Scaled up behavior models with larger network architecture and datasets. Delivered 25X speedup with optimizations across data processing, training, eval and orchestration stack.
  • Built debug tools to visualize profiler traces for any ML code (PyTorch, Python, C++) in a holistic view. It accelerated the performance analysis and debugging experience dramatically.

Software Engineer at Meta

May 2018 to Jan 2023 at Meta, Menlo Park, CA

  • Tech Lead of PyTorch Text team. Initiated and led NLP platforms consolidation efforts, collaborated with more than 10 research and product teams. Built a new NLP platform, migrated key customers and drove the adoption to ~200 ML engineers.
  • Co-founded TextRay (text embedding as a service) team. Shipped SOTA Transformer models (RoBERTa & XLM-R) to production with FAIR researchers. It was the largest prod model and internal service at Meta, with numerous scaling challenges resolved.
  • Core contributor & admin for a few PyTorch repos: PyText & torchtext (NLP frameworks), Torch Recipes (e2e training workflows) and pytorch/examples.

Software Engineer at IBM

Nov 2015 to May 2018 at IBM, San Jose, CA

  • Tech Lead for the Watson Health team, built ~10 NLP services to process patient safety records.
  • Built the NLP Platform, supporting tasks like classification, NER, semantic parsing, etc.

Education

MS in Computer and Information Science

Sep 2012 to Dec 2015 at University of Michigan - Dearborn

BS in Computer Science

Sep 2009 to Aug 2013 at Xi'an Jiaotong University, ShaanXi, China

Skills & Technologies
  • PyTorch
  • Python
  • Machine Learning
  • Deep Learning
  • NLP
  • Computer Vision
  • Distributed Systems
  • Data Pipelines
  • Model Training
  • Performance Optimization
  • Ray
  • Transformers
  • MLOps
  • GPU Computing
  • GenAI
  • Text-to-Image
  • Text-to-Video
  • C++