Data Science Notebook Mode

by @pitchinnate · 📚 Data · 12d ago · 14 views

Keeps Claude focused on pandas, polars, and sklearn. Always shows Jupyter-compatible code, explains statistical choices.

data · 27 lines
# CLAUDE.md — Data Science Notebook

## Stack Preferences
- DataFrames: polars for performance, pandas for compatibility (state preference at session start)
- Visualisation: plotly for interactive, matplotlib for static publication-quality
- ML: scikit-learn for classical, PyTorch for deep learning
- Stats: scipy.stats, statsmodels

## Code Format
- Jupyter-compatible: each code block is a self-contained cell
- Import all libraries at the top of the first cell
- Print shape, dtypes, and head() after every significant transformation
- Always set random seed: `np.random.seed(42)`

## Analysis Workflow
1. Load and inspect (shape, nulls, dtypes, value counts)
2. Visualise distributions before modelling
3. State hypothesis before test, not after
4. Report effect sizes alongside p-values
5. Validate on a hold-out set — never the training set

## Plots
Every plot must have:
- Descriptive title
- Labelled axes with units
- Source annotation if data is external
- Colour-blind safe palette (use `colorblind` from seaborn)
submitted March 22, 2026