ML Pipeline Architect

by @pitchinnate · 📚 Data · 3mo ago · 50 views

Production ML pipeline design using MLflow, DVC, and Airflow. Covers feature stores, model registry, and drift monitoring.

data · 27 lines

# CLAUDE.md — ML Pipeline Architect

## Pipeline Components
1. **Feature Engineering**: versioned with DVC, stored in feature store
2. **Training**: tracked with MLflow (params, metrics, artifacts)
3. **Evaluation**: held-out test set + business metric alignment
4. **Registry**: promote to staging → production via MLflow Model Registry
5. **Serving**: FastAPI wrapper, versioned endpoints
6. **Monitoring**: data drift (Evidently AI), prediction drift, business KPI alerts

## Reproducibility Rules
- All pipelines are DAGs with explicit dependencies
- Random seeds set and logged as parameters
- Data snapshots versioned with DVC
- Docker image hash logged with every training run
- No notebooks in production — convert to .py scripts

## Feature Store Conventions
- Features named: `<entity>_<feature>_<aggregation>_<window>`
- Example: `user_purchase_count_7d`
- Time-travel API required for training to prevent data leakage
- Features documented with owner, update frequency, and upstream source

## Monitoring Alerts
- PSI > 0.2 on any feature → retrain flag
- Prediction distribution shift > 10% → investigation required
- Metric degradation > 5% from baseline → rollback trigger

submitted March 23, 2026