Skip to content

Conversation

@tristan-f-r
Copy link
Collaborator

@tristan-f-r tristan-f-r commented Jul 14, 2025

Note

This also contains changes that close #297 to resolve a dependency diamond. I was debating about splitting that to another PR, but that 'containers' PR would depend on #292 (giving this 2 degrees of dependency), and the types in PRA#run actually make that change easier to follow and motivate.

This also borrows (not a direct dependency) from #286. #286 should be merged after #329 is merged, as there's a good chance that we can just auto-generate documentation from the documentation here instead.

Closes #321, closes #296, and closes #297.

Arguments are now specified as a pydantic BaseModel with attached documentation:

class DominoParams(BaseModel):
    module_threshold: Optional[float]
    "the p-value threshold for considering a slice as relevant (optional)"

    slice_threshold: Optional[float]
    "the p-value threshold for considering a putative module as final module (optional)"

    model_config = ConfigDict(use_attribute_docstrings=True)

When constructing a PRM, this is passed in as a generic:

class DOMINO(PRM[DominoParams]):

For algorithms that don't specify parameters, the Empty type is preferred instead (whose signature is the empty BaseModel with an attached model_config that doesn't allow any other parameters.

This also changes the signature of PRA#run (reflecting the PR title) to be:

# Where T is the TypedVar which must be a pydantic.BaseModel
def run(inputs: dict[str, str | os.PathLike], output_file: str | os.PathLike, args: T, container_settings: ProcessedContainerOptions):

This has the disadvantage that inputs no longer has code completion, but this was probably something to not hide from the developer-user anyway, as we were passing inputs in via kwargs. See:

# This PR is more verbose when passing in arguments, which is a problem for people who
# are directly using PRA#run. However, I don't care about this audience.
PathLinker.run({"nodetypes": TEST_DIR+'input/sample-in-nodetypes.txt',
                "network": TEST_DIR+'input/sample-in-net.txt'},
                output_file=OUT_FILE_100,
                args=PathLinkerParams(k=100))

All of this gives us:

  • Encouraged, easily parsable PRM argument documentation
  • Parameter validation
  • a fully-specified JSON schema if feat: json schema #358
  • default factories, which are used to fix nondeterminism (feat: seeds #335).
  • Parameter types, to be used for parameter tuning for automatically determining what parameters to select
    • (and AST-based range parsing for us to easily divide the step size for parameter tuning)

For reviewers

algorithms.py is the only "dense code" in this PR. That file extends our previous eval system and includes some hard-to-follow workflow code that enables type checking for our config file.

Otherwise, most of this is just refactoring all of the PRA#run calls to meet the new signature, specifying the new parameter models, and adding more parameter documentation.

@tristan-f-r tristan-f-r changed the title Config args refactor!: typed PRA#run Jul 14, 2025
@tristan-f-r tristan-f-r changed the title refactor!: typed PRA#run feat!: typed PRA#run Jul 14, 2025
@tristan-f-r tristan-f-r added enhancement New feature or request needed for benchmarking Priority PRs needed for the benchmarking paper labels Jul 14, 2025
@tristan-f-r tristan-f-r changed the title feat!: typed PRA#run feat!: schema & typed PRA#run Jul 15, 2025
@ntalluri ntalluri requested review from agitter and ntalluri November 19, 2025 21:43
Copy link
Collaborator

@ntalluri ntalluri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first pass of this PR.

Could you add an example demonstrating how these files and components fit together for an example algorithm and its configuration file, and show how the new files are used for validation and execution?

Also, for someone looking to integrate a new algorithm into SPRAS, what details or requirements should they be aware of? I’m assuming the new key piece would involve the Pydantic models.

Co-authored-by: Neha Talluri <[email protected]>
@tristan-f-r
Copy link
Collaborator Author

Could you add an example demonstrating how these files and components fit together for an example algorithm and its configuration file, and show how the new files are used for validation and execution?

I'm confused by this question: we do this with the present algorithms.

@ntalluri
Copy link
Collaborator

Could you add an example demonstrating how these files and components fit together for an example algorithm and its configuration file, and show how the new files are used for validation and execution?

I'm confused by this question: we do this with the present algorithms.

What I mean by this is writing down in a paragraph what is happening with an example in this PR. Essentially, just spelling out what’s happening step-by-step for an example so it’s clear how everything works together.

@tristan-f-r
Copy link
Collaborator Author

tristan-f-r commented Nov 21, 2025

We have example usage in the PR description 👍

As for implementation details, the important part is the schema:

Algorithm files (e.g. pathlinker.py) do not depend on the schema, but rather depend on schema objects imported by algorithms.py, which is why we also need to separate containers.py to avoid the dependency diamond.

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can follow the overall design goals. My big picture takeaway is that I can see why we need these changes, but the typing adds indirection and hurts code readability. I don't have a solution for that.

I haven't looked at every pathway reconstruction algorithm and test case update yet, so I'll need to take at least one more pass. I wanted to leave some initial comments.

algorithms.py has some sophisticated Python. I am fairly sure I understand it when reviewing it today. I'm not sure about my ability to troubleshoot things if/when they break.

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only new comments are small.

In the spirit of helping new contributors who encounter this code base in the future, I'm wondering where to capture some of the information about the overall SPRAS design, especially the part that is changing here. Some of the useful information in the original message of this PR, e.g.

Arguments are now specified as a pydantic BaseModel with attached documentation:

provides a guide to how SPRAS works and where to find things. Is there a place to retain that knowledge in the repo? Scrolling through individual files to reconstruct it is going to get harder and hard.

@tristan-f-r
Copy link
Collaborator Author

provides a guide to how SPRAS works and where to find things. Is there a place to retain that knowledge in the repo? Scrolling through individual files to reconstruct it is going to get harder and hard.

The best place to document that would be the contributing guide: We could increase the sophistication of our AllPairs wrapping example to take in an optional argument, but I'm not sure what that would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needed for benchmarking Priority PRs needed for the benchmarking paper P-high This is a blocker for many PRs/issues/features tuning Workflow-spanning algorithm tuning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Typed PRA#run [config] nested runs [config] containers

3 participants