Skip to content

API change for the SyntheticControl experiment class #460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

drbenvincent
Copy link
Collaborator

@drbenvincent drbenvincent commented Apr 21, 2025

  • Towards Upgrade synthetic control to model multiple treated units #456
  • This does not yet enable multiple treated units in synthetic control experiments. But it implements important prep work which will enable it. The changes focus on:
  • API changes
  • Bit of a spaghetti situation, but I had to switch from storing dataframes to xarray.DataArrays. This helps with the broadcasting. The model functions get varied input depending on the situation (e.g. experiment), so it was getting complicated. Xarray simplifies broadcasting in functions like PyMCModel.calculate_impact. (A bunch of time was used trying different solutions before it became clear that the xarray approach was the easiest).
  • Because the API is changing, I elected to get rid of some legacy backward compatibility/ depreciation handling stuff. So when we do the next release, this will include breaking API changes and abandoning backward compatibility with an old API. I'm not fussed about this, we are currently at version 0.x, so people should expect the API to change until we reach 1.x.

Remaining taks:

  • Check the multi cell geolift notebook is working as expected
  • Fix failing doctest
  • Resolve error below

Remaining bug, not captured by tests

In the scikit-learn synthetic control notebook, we are getting an error in the second part where we call

result = cp.SyntheticControl(
    df,
    treatment_time,
    control_units=["a", "b", "c", "d", "e", "f", "g"],
    treated_units=["actual"],
    model=LinearRegression(positive=True),
)

There is a broadcasting issue resulting in this:
Screenshot 2025-04-23 at 10 34 27

So I need to think about if doing this (linear model with synthetic control experiment) even make sense. If it is, then I need to add a test to catch this error because it's not covered by tests currently, and fix it.


📚 Documentation preview 📚: https://causalpy--460.org.readthedocs.build/en/460/

@drbenvincent drbenvincent marked this pull request as draft April 21, 2025 10:08
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Apr 23, 2025

Codecov Report

Attention: Patch coverage is 97.43590% with 2 lines in your changes missing coverage. Please review.

Project coverage is 94.40%. Comparing base (273daa2) to head (1f753e9).

Files with missing lines Patch % Lines
causalpy/experiments/synthetic_control.py 95.65% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #460      +/-   ##
==========================================
- Coverage   94.67%   94.40%   -0.27%     
==========================================
  Files          32       29       -3     
  Lines        2196     2075     -121     
==========================================
- Hits         2079     1959     -120     
+ Misses        117      116       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant