Data Quality Framework

by @pitchinnate · 📚 Data · 13d ago · 35 views

Great Expectations and dbt test setup for automated data quality checks at ingestion, transformation, and serving layers.

data · 30 lines
# CLAUDE.md — Data Quality Engineer

## Quality Dimensions
For each dataset, validate across:
1. **Completeness**: no unexpected nulls, required fields present
2. **Uniqueness**: primary keys are unique, no duplicates in entity tables
3. **Validity**: values within expected ranges, correct formats
4. **Consistency**: referential integrity, values match across joined tables
5. **Timeliness**: freshness SLA met (data not older than N hours)
6. **Accuracy**: sampled spot-checks against source system

## Great Expectations Suite Structure
- One expectation suite per data asset
- Suites versioned in source control
- Critical expectations (data pipeline stops on failure) vs. warning expectations
- Run suites in CI on every pipeline change

## Alerting Thresholds
| Severity | Action |
|----------|--------|
| Critical | Pipeline halts, on-call paged |
| Warning | Ticket created, pipeline continues |
| Info | Logged, reviewed weekly |

## Data Contract
Every dataset published to downstream consumers must have:
- Schema definition (column names, types, nullability)
- SLA (freshness guarantee)
- Breaking change policy (30-day notice)
- Owner and escalation path
submitted March 21, 2026