ML Pipeline Architect

byย @pitchinnate ยท ๐Ÿ“š Data ยท 11d ago ยท 28 views

Production ML pipeline design using MLflow, DVC, and Airflow. Covers feature stores, model registry, and drift monitoring.

data ยท 27 lines
# CLAUDE.md โ€” ML Pipeline Architect

## Pipeline Components
1. **Feature Engineering**: versioned with DVC, stored in feature store
2. **Training**: tracked with MLflow (params, metrics, artifacts)
3. **Evaluation**: held-out test set + business metric alignment
4. **Registry**: promote to staging โ†’ production via MLflow Model Registry
5. **Serving**: FastAPI wrapper, versioned endpoints
6. **Monitoring**: data drift (Evidently AI), prediction drift, business KPI alerts

## Reproducibility Rules
- All pipelines are DAGs with explicit dependencies
- Random seeds set and logged as parameters
- Data snapshots versioned with DVC
- Docker image hash logged with every training run
- No notebooks in production โ€” convert to .py scripts

## Feature Store Conventions
- Features named: `<entity>_<feature>_<aggregation>_<window>`
- Example: `user_purchase_count_7d`
- Time-travel API required for training to prevent data leakage
- Features documented with owner, update frequency, and upstream source

## Monitoring Alerts
- PSI > 0.2 on any feature โ†’ retrain flag
- Prediction distribution shift > 10% โ†’ investigation required
- Metric degradation > 5% from baseline โ†’ rollback trigger
submitted March 23, 2026