@modules/pandas.md @modules/numpy.md @modules/jax.md @modules/optimagic.md @modules/project-structure.md @modules/pytask.md @modules/plotting.md @modules/ml-econometrics.md @modules/dags.md
Guidelines for AI agents, mostly derived from Effective Programming Practices for Economists.
Always use type hints in all function signatures. This is mandatory.
def calculate_utility(consumption: float, gamma: float = 1.5) -> float:
return consumption ** (1 - gamma) / (1 - gamma)
def clean_data(raw: pd.DataFrame) -> pd.DataFrame: ...
def load_config(path: Path) -> dict[str, Any]: ...- Do NOT use
from __future__ import annotationsin Python 3.14+ projects — PEP 649 deferred evaluation makes it unnecessary and it changes runtime annotation semantics. For projects supporting < 3.14, use it for forward references. - Prefer
X | NoneoverOptional[X]in Python 3.10+ - Use
collections.abcfor abstract types:Sequence,Mapping,Iterable
Prefer immutable data structures throughout. This prevents bugs and enables safer concurrent code.
Use @dataclass(frozen=True) for all configuration and state objects:
from dataclasses import dataclass, field
from types import MappingProxyType
@dataclass(frozen=True)
class ModelConfig:
n_periods: int
"""Number of time periods."""
n_states: int
"""Number of discrete states per period."""
discount_factor: float = 0.95
"""Subjective discount factor."""
@property
def n_total(self) -> int:
return self.n_periods * self.n_states- Use
tupleinstead oflistfor sequences - Use
MappingProxyTypeinstead ofdict - Use
frozensetinstead ofset
from types import MappingProxyType
@dataclass(frozen=True)
class Labels:
factors: tuple[str, ...] # Not list[str]
mappings: MappingProxyType[str, int] # Not dict[str, int]
# For read-only dict views
def ensure_immutable[K, V](d: dict[K, V]) -> MappingProxyType[K, V]:
return MappingProxyType(d)Use with_* methods or dataclasses.replace() to create modified copies:
from dataclasses import replace
@dataclass(frozen=True)
class Config:
alpha: float
beta: float
def with_alpha(self, alpha: float) -> Self:
return replace(self, alpha=alpha)
# Usage
new_config = config.with_alpha(0.5)Use NewType to distinguish semantically different values of the same type:
from typing import NewType
Period = NewType("Period", int)
Age = NewType("Age", int)
def get_state(period: Period, age: Age) -> State: ...Use Enum instead of string literals or boolean flags:
from enum import Enum, auto
class FactorType(Enum):
STATE = auto()
ENDOGENOUS = auto()
CONTROL = auto()Always use pathlib.Path - never string paths.
from pathlib import Path
root = Path(__file__).parent.parent
data_path = root / "datasets" / "data.csv"Three rules:
- Always use
pathlib.Pathobjects instead of strings - Never hardcode absolute paths outside the project directory
- Concatenate paths with
/operator
Never use == for floats. Use tolerance-based comparison:
# With NumPy/JAX
if np.isclose(result, 0.3):
...Minimum Python version is 3.14 unless a project specifies otherwise. Use 3.14+ features freely, including:
except ValueError, TypeError:without parentheses (PEP 758) — this is not Python 2 syntax. It is valid when there is noasclause.
Pixi is the required package and environment manager.
DO:
pixi run python script.py- execute Python scriptspixi run pytest- run testspixi run pytask- run task pipelinepixi add <package>- add conda-forge dependenciespixi add --pypi <package>- add PyPI-only packages- Commit
pixi.lockfor reproducibility
DON'T:
- Never use
pip installorconda installdirectly - Never run
python script.pywithoutpixi runprefix - Never use the
defaultsconda channel
Use src layout:
project/
├── src/
│ └── package/
│ ├── __init__.py
│ └── module.py
├── tests/
└── pyproject.toml
lowercase_with_underscores- functions, methods, variablesUPPERCASE_WITH_UNDERSCORES- constantsCamelCase- classes- Function names start with verb:
create_,calculate_,convert_,get_ - Private functions:
_underscoreprefix - Use
func, notfn, when abbreviating "function" (e.g.,apply_func) - Avoid: abbreviations, single letters (
n,c,s,uconflict with debugger), built-in names (list,dict,type)
Write "deep" modules: important public function(s) at the top, private helpers below. Readers should see the API first without scrolling past implementation details.
Never add decorative section-separator comments like:
# ---------------------------------------------------------------------------
# Section name
# ---------------------------------------------------------------------------Code structure should be self-evident from function names and ordering.
Use Google convention (pydocstyle.convention = "google"). Use MyST syntax (not
reStructuredText) for markup inside docstrings: `code`, $math$, markdown links.
- Imperative mood in summary lines ("Calculate utility", not "Calculates utility")
- Use inline field docstrings (PEP 257) for dataclass attributes (see Frozen Dataclasses example above)
def calculate_utility(consumption: float, gamma: float = 1.5) -> float:
"""Calculate CRRA utility.
Args:
consumption: Consumption level (must be positive).
gamma: Coefficient of relative risk aversion.
Returns:
Utility value.
"""
...Docstrings and inline comments describe the code's current state in user-facing terms. The 9-month-without-PR-context reader is the audience: a docstring that survives that test stays useful; one that rehearses the diff or the prior implementation rots immediately.
This applies to all docstrings and comments — source and tests. For tests specifically, see also the "Test docstrings — describe behavior, not history" subsection in the Testing section.
State what is true now. Don't reference prior designs, removed code, or what was changed. Words like "earlier", "previously", "now", "formerly", "the old", "before the fix" are red flags.
# Good — forward-looking constraint
class _DiagnosticRow:
"""Metadata captured during the backward-induction loop.
Holds only Python-scalar metadata — no device-array references —
so every (regime, period) row stays at a few bytes regardless of
grid size.
"""
# Bad — rehearses prior design
class _DiagnosticRow:
"""Metadata captured during the backward-induction loop.
Holds only Python-scalar metadata. The earlier design captured
state_action_space and a closure directly on each row, which
pinned every period's V template in device memory until the
post-loop flush.
"""PR references (#334 removed the host stalls, the bug was fixed in #42) rot as the codebase evolves and provide no useful signal to a
reader who isn't already in context. Magic numbers tied to a specific
model size or hardware (~2 MB at production grid sizes, fits on a 16 GB device) imply a fixed scale that's only true on whichever
model/box the comment was written against. State the qualitative
dependency instead.
# Good — qualitative dependency
# Frees per-period intermediate buffers (V_arr-shaped, so
# model-dependent) so they don't stack up across the loop.
# Bad — PR reference + magic number
# Frees per-period intermediate buffers (~2 MB each at production
# grid sizes) so we don't re-introduce the host stalls that #334
# removed.When describing a fixed set of cases (log levels, regime kinds, parameter types, dispatch strategies), use one bullet per case rather than running prose. Bullets scan; prose hides cases.
# Good — scannable
# Gate falls out of the public log level:
# - `"off"` ⇒ nothing (skips even the NaN fail-fast)
# - `"warning"` / `"progress"` ⇒ NaN/Inf only
# - `"debug"` ⇒ adds the min/max/mean trio
# Bad — buried in prose
# Gate falls out of the public log level: `"off"` ⇒ nothing,
# `"warning"` / `"progress"` ⇒ NaN/Inf only, `"debug"` ⇒ adds the
# min/max/mean trio. `"off"` skips even the NaN fail-fast.Write pure functions whenever possible:
- Same inputs → same outputs
- No side effects
# Good: Separate I/O from logic
def task_example(path_in: Path, path_out: Path) -> None:
data = pd.read_csv(path_in) # I/O at boundary
result = process_data(data) # Pure logic
result.to_pickle(path_out) # I/O at boundary
def process_data(df: pd.DataFrame) -> pd.DataFrame:
"""Pure function - all logic here."""
...- Raise errors early with descriptive messages
TypeErrorfor wrong types,ValueErrorfor wrong values- Use
_fail_if_...helper functions for validation
def _fail_if_not_list(data: Any) -> None:
if not isinstance(data, list):
msg = f"data must be a list, not {type(data).__name__}"
raise TypeError(msg)Always write the test first, watch it fail, then implement. No exceptions for new behavior or bug fixes. Tests are not an afterthought, they are the spec.
The cycle:
- Red. Write a failing test that asserts the desired behavior in user-facing terms. Run it. Confirm it fails for the right reason (the missing behavior — not a typo, not an import error).
- Green. Write the smallest amount of code that makes the test pass.
- Refactor. Clean up while keeping the test green.
Apply per case:
- New feature → red-green-refactor.
- Bug fix → reproduce as a failing test before writing the fix. The test then prevents regression.
- Refactor (no behavior change) → existing tests are the spec. Keep them green before, during, and after. No new test needed if behavior is unchanged; if you find a behavior gap, fill it with a new test before refactoring.
Test docstrings state what should be true, in user-facing terms. Pretend the reader has never seen the PR. They should not need to.
# Good — behavior, in plain language
def test_simulate_with_chained_transitions_yields_expected_next_wealth():
"""`next_wealth_t = wealth_t - c_t + 0.1 * next_aime_t` holds in simulation."""
# Bad — rehearses the prior bug or implementation history
def test_solve_resolves_chain_via_dags():
"""Before the fix, `_resolve_fixed_params` raised
`InvalidParamsError: Missing required parameter: ...` because
`create_regime_params_template` classified ..."""Rule of thumb: would the docstring still make sense in 9 months without the PR context? If not, rewrite it.
Assert what the result is, not just that it didn't crash.
# Good — analytical value with explicit tolerance
np.testing.assert_allclose(curr["wealth"], expected_next_wealth, atol=1e-6)
# Bad — passes whether the math is right or not
assert not jnp.any(jnp.isnan(V_arr))
assert df["wealth"].notna().all()not isnan and no exception raised belong in CI smoke tests, not in
the unit tests for the feature itself.
- Test files:
test_<module>.py - Test functions:
test_<function>_<behavior> - One assertion per test
- Use
@pytest.mark.parametrizefor multiple inputs
@pytest.mark.parametrize("invalid_input", [-77, "typo"])
def test_clean_scale_raises_on_invalid(invalid_input: Any) -> None:
with pytest.raises(ValueError):
clean_scale(pd.Series([invalid_input]))Use ty (not mypy, not pyright) for type checking.
- Run via
pixi run ty - Suppress errors with
# ty: ignore[rule-name](not# type: ignore) - Always specify the rule name in ignore comments
# Good
x = some_call() # ty: ignore[unresolved-reference]
# Bad - don't use type: ignore
x = some_call() # type: ignoreRun these checks after making code changes. Skip any that don't apply to the project.
- Pre-commit: Stage new files, then
pixi run prek run --all-files(orprek run --all-filesif globally installed). Fix any failures. - Tests:
pixi run tests(or the project's test task). - Type checking:
pixi run ty. - Notebook diffs: If
.ipynbfiles changed- verify the diff looks like clean cell-content changes, not JSON noise (cell metadata, execution counts, output blobs). If the diff is bloated, the notebook was not properly stripped — run nbstripout before committing
- Make sure notebook cells are properly formatted (each line in a cell is a new json line, not one cell=one line).
- Use actual UTF-8 characters everywhere — in markdown cells, Python strings, and
f-strings. Never write unicode escapes like
\u2014or\u03bc; write—andμdirectly.