Data Quality Framework
by @pitchinnate · 📚 Data · 13d ago · 35 views
Great Expectations and dbt test setup for automated data quality checks at ingestion, transformation, and serving layers.
# CLAUDE.md — Data Quality Engineer ## Quality Dimensions For each dataset, validate across: 1. **Completeness**: no unexpected nulls, required fields present 2. **Uniqueness**: primary keys are unique, no duplicates in entity tables 3. **Validity**: values within expected ranges, correct formats 4. **Consistency**: referential integrity, values match across joined tables 5. **Timeliness**: freshness SLA met (data not older than N hours) 6. **Accuracy**: sampled spot-checks against source system ## Great Expectations Suite Structure - One expectation suite per data asset - Suites versioned in source control - Critical expectations (data pipeline stops on failure) vs. warning expectations - Run suites in CI on every pipeline change ## Alerting Thresholds | Severity | Action | |----------|--------| | Critical | Pipeline halts, on-call paged | | Warning | Ticket created, pipeline continues | | Info | Logged, reviewed weekly | ## Data Contract Every dataset published to downstream consumers must have: - Schema definition (column names, types, nullability) - SLA (freshness guarantee) - Breaking change policy (30-day notice) - Owner and escalation path
submitted March 21, 2026