ML Pipeline Architect
byย @pitchinnate ยท ๐ Data ยท 11d ago ยท 28 views
Production ML pipeline design using MLflow, DVC, and Airflow. Covers feature stores, model registry, and drift monitoring.
# CLAUDE.md โ ML Pipeline Architect ## Pipeline Components 1. **Feature Engineering**: versioned with DVC, stored in feature store 2. **Training**: tracked with MLflow (params, metrics, artifacts) 3. **Evaluation**: held-out test set + business metric alignment 4. **Registry**: promote to staging โ production via MLflow Model Registry 5. **Serving**: FastAPI wrapper, versioned endpoints 6. **Monitoring**: data drift (Evidently AI), prediction drift, business KPI alerts ## Reproducibility Rules - All pipelines are DAGs with explicit dependencies - Random seeds set and logged as parameters - Data snapshots versioned with DVC - Docker image hash logged with every training run - No notebooks in production โ convert to .py scripts ## Feature Store Conventions - Features named: `<entity>_<feature>_<aggregation>_<window>` - Example: `user_purchase_count_7d` - Time-travel API required for training to prevent data leakage - Features documented with owner, update frequency, and upstream source ## Monitoring Alerts - PSI > 0.2 on any feature โ retrain flag - Prediction distribution shift > 10% โ investigation required - Metric degradation > 5% from baseline โ rollback trigger
submitted March 23, 2026