Add knowledgebase guide for custom PyMC models by drbenvincent · Pull Request #741 · pymc-labs/CausalPy

drbenvincent · 2026-02-27T14:55:52Z

Summary

Closes #740

Adds a knowledgebase notebook (docs/source/knowledgebase/custom_pymc_models.ipynb) documenting how to customize PyMC models in CausalPy
Documents two levels of customization:
1. Prior-based (no subclassing) — change parameter priors and the likelihood distribution via the Prior class. References the existing Student-t example in its_pymc_comparative.ipynb.
2. Subclassing PyMCModel — for structural changes like link functions, latent variables, or custom error structure
Documents the PyMCModel naming contract (required variable and dimension names) as a reference table — this was previously undocumented outside docstring examples
Includes a worked LogLinearRegression example (mu = exp(X @ beta)) for positive-valued outcomes, demonstrating the subclassing pattern with an ITS analysis on simulated revenue data

Rough edges noted for follow-up

While researching this, I identified some rough edges that could be addressed in future PRs:

Variable names ("X", "y", "mu", "y_hat", "beta") are hardcoded strings — could be class attributes to make them overridable
No validation after build_model() — violating the naming contract produces cryptic KeyErrors rather than helpful messages
print_coefficients() assumes the model has "beta" — custom models with different parameterizations silently break

Test plan

Notebook executes successfully (MCMC sampling converges, no divergences)
pre-commit run --all-files passes
Summary, coefficients, and plots render correctly

Made with Cursor

review-notebook-app · 2026-02-27T14:55:58Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2026-02-27T15:02:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.63%. Comparing base (b5e8d6b) to head (4ff296a).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #741      +/-   ##
==========================================
+ Coverage   94.60%   94.63%   +0.02%     
==========================================
  Files          46       46              
  Lines        7602     7602              
  Branches      462      462              
==========================================
+ Hits         7192     7194       +2     
+ Misses        249      248       -1     
+ Partials      161      160       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

read-the-docs-community · 2026-02-27T15:14:05Z

Documentation build overview

📚 causalpy | 🛠️ Build #31599742 | 📁 Comparing 4ff296a against latest (b5e8d6b)

🔍 Preview build

Show files changed (3 files in total): 📝 2 modified | ➕ 1 added | ➖ 0 deleted

File	Status
404.html	📝 modified
knowledgebase/custom_pymc_models.html	➕ added
knowledgebase/index.html	📝 modified

drbenvincent · 2026-02-27T17:58:11Z

🤖 Automated Agent Review

Note: This review was generated by an AI coding assistant (Cursor agent), not a human reviewer. Findings should be validated by a human before acting on them.

Overall Assessment

This is a well-structured documentation PR that fills a genuine gap. The PyMCModel naming contract was previously undocumented outside of code, and the two-level customization framing (Prior system → subclassing) is pedagogically sound. The worked LogLinearRegression example is a good choice — simple enough to follow but demonstrates a real-world need (positive-valued outcomes).

I verified the contract table against the source in pymc_models.py — all variable names, dimension names, and "Used by" references are accurate.

Issues and Suggestions

1. Missing mention of priors_from_data() — omitted customization lever

The base class provides a priors_from_data(X, y) method that allows data-adaptive priors computed during fit(). This is a meaningful customization point that sits between "pass custom priors" and "subclass everything." The notebook should at least mention it exists, especially since it's relevant when subclassing (e.g., scaling priors based on log(y).std() for the log-link model).

2. The issue's concern about model misuse is unaddressed

Issue #740 explicitly warns: "People who reach with this concern usually are doing very crazy wrong models... we'll open the door to do wrong models and because output looks nice, they might think [they] are correct." The notebook has no warning about model validation. Consider adding a :::{warning} callout advising users to validate custom models against known DGPs and check convergence diagnostics before interpreting causal effects.

3. Notebook outputs bloat the repo (~845 KB)

The diff is ~845 KB, almost entirely from embedded plot outputs (~98K + ~342K + ~404K chars). Check whether the project convention is to strip outputs from knowledgebase notebooks (e.g., via nbstripout) and regenerate them during docs builds. If so, stripping outputs would reduce the PR size by ~99%.

4. No minimal sample_kwargs for CI/docs-build efficiency

The two MCMC cells use default chains=4, draws=1000, tune=1000. For a documentation notebook, consider reducing to e.g. chains=2, draws=500, tune=500 to speed up docs builds without meaningfully affecting the pedagogical output.

5. Contract table scope is unclear

The contract applies to models used with the standard experiment workflow (InterruptedTimeSeries, DifferenceInDifferences, SyntheticControl). However, InstrumentalVariableRegression and PropensityScore in the same module use entirely different naming conventions ("beta_t", "beta_z", "mu_y", etc., no "treated_units" dimension). The notebook should clarify that the contract applies to experiment-compatible models specifically.

6. No glossary term linking

Per project conventions, glossary terms should be linked on first mention using {term} syntax. Terms like "counterfactual" and "credible interval" likely have glossary entries worth linking.

7. treated_units boilerplate could be a base-class concern

Both LinearRegression and the new LogLinearRegression include defensive boilerplate for adding a default "treated_units" coordinate. Rather than documenting this as something users must remember, consider moving it into the base class fit() method as a follow-up improvement.

8. Does this fully close #740?

The PR documents existing capability (subclassing already works). The issue also envisions a simpler interface and mentions PyMC-Marketing integration. Documenting the current approach is a valid step, but "Closes #740" may be premature — consider "Partially addresses #740" or keeping the issue open for API-level improvements.

Minor / Cosmetic

%load_ext autoreload / %autoreload 2 is useful during development but unnecessary in a committed knowledgebase notebook.
The DGP uses only sin but the model fits sin + cos — a one-line comment explaining why would help readers who notice the mismatch.
Branch naming: fix/740-... → docs/740-... would better reflect the PR type.

Summary Table

Dimension	Rating
Technical accuracy	✅ Strong — contract verified against source
Pedagogical quality	✅ Good — clear progression, effective comparison
Completeness	⚠️ Moderate — missing `priors_from_data()`, misuse warning, scope clarification
Repo hygiene	⚠️ Needs work — large outputs, default sampling, dev magics
Issue coverage	⚠️ Partial — documents existing capability, doesn't address API improvements

The core content is valuable and mostly ready. Main action items before merge: (1) add a model validation warning, (2) address notebook output size, (3) mention priors_from_data(), and (4) clarify the contract's scope.

Document the two levels of model customization in CausalPy: 1. Prior-based (changing priors and likelihood via the Prior class) 2. Subclassing PyMCModel (for structural changes like link functions) Includes the PyMCModel naming contract, a worked LogLinearRegression example, and an ITS demonstration with simulated revenue data. Closes #740 Made-with: Cursor

Replace the revenue trend data with a sine-wave DGP where values oscillate close to zero at seasonal troughs. Add a side-by-side comparison with the default LinearRegression to show that its HDI bands extend into negative territory — the core motivation for the log-link model. Made-with: Cursor

Bump log-scale amplitude from 1.5 to 2.2 and add a -0.3 baseline shift. Trough values now reach ~0.07, making the linear model's negative HDI bands visually obvious. Made-with: Cursor

drbenvincent requested a review from cetagostini February 27, 2026 14:55

drbenvincent added the documentation Improvements or additions to documentation label Feb 27, 2026

drbenvincent added 3 commits February 28, 2026 09:52

Increase seasonal amplitude so troughs dip closer to zero

4ff296a

Bump log-scale amplitude from 1.5 to 2.2 and add a -0.3 baseline shift. Trough values now reach ~0.07, making the linear model's negative HDI bands visually obvious. Made-with: Cursor

drbenvincent force-pushed the fix/740-custom-pymc-model-docs branch from 642cced to 4ff296a Compare February 28, 2026 09:54

drbenvincent mentioned this pull request Feb 28, 2026

Feature parity with Google's CausalImpact #758

Open

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add knowledgebase guide for custom PyMC models#741

Add knowledgebase guide for custom PyMC models#741
drbenvincent wants to merge 3 commits intomainfrom
fix/740-custom-pymc-model-docs

drbenvincent commented Feb 27, 2026

Uh oh!

review-notebook-app bot commented Feb 27, 2026

Uh oh!

codecov bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

read-the-docs-community bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

drbenvincent commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drbenvincent commented Feb 27, 2026

Summary

Rough edges noted for follow-up

Test plan

Uh oh!

review-notebook-app bot commented Feb 27, 2026

Uh oh!

codecov bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

read-the-docs-community bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

drbenvincent commented Feb 27, 2026

🤖 Automated Agent Review

Overall Assessment

Issues and Suggestions

Minor / Cosmetic

Summary Table

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Feb 27, 2026 •

edited

Loading

read-the-docs-community bot commented Feb 27, 2026 •

edited

Loading