Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CWL job chaining tests #3696

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Conversation

douglowe
Copy link
Contributor

@douglowe douglowe commented Jul 2, 2021

This PR is a prototype for the test suite to determine if Toil is chaining CWL jobs as we expect / would like.

I'm using a list of Toil tasks, each containing a list of the CWL jobs completed within that task, to test if chaining has occurred or not. The list of tasks is compiled from the stats outputs (requiring the --stats and --logDebug flags) - if a more robust method for getting this information could be suggested I'd be very happy to change this code.

To show the expected outputs I've included an example of two CWL workflows, and a test which one fails and the other passes (because of the subworkflow wrapping the first 2 CWL jobs). While we're working out the best way to write these tests I'll work on more substantial tests for the final setup.

Changelog Entry

To be copied to the draft changelog by merger:

  • PR submitter writes their recommendation for a changelog entry here

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/cliOptions.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

The md_list_reduced*cwl workflows are added.
md_list_reduced_2nests.cwl will chain 2 jobs,
while for md_list_reduced.cwl (currently) no
jobs are chained.

The test uses the stats logfiles (so toilcwl
must be run with --stats and --logDebug),
pulling out the lists of CWL jobs which are
run in each TOIL task.
@mr-c
Copy link
Contributor

mr-c commented Jul 3, 2021

@douglowe Are both tests supposed to pass?

@douglowe
Copy link
Contributor Author

douglowe commented Jul 3, 2021

@douglowe Are both tests supposed to pass?

The second test should fail. Really I should have made the test not equal, but for this prototype I wanted it to display what was being tested, so made it fail the test, rather than testing for failure.

@mr-c
Copy link
Contributor

mr-c commented Jul 4, 2021

@douglowe Are both tests supposed to pass?

The second test should fail. Really I should have made the test not equal, but for this prototype I wanted it to display what was being tested, so made it fail the test, rather than testing for failure.

Okay, I added 72df69d so your new test would get run as part of the CI

mr-c and others added 5 commits July 4, 2021 17:54
The parent thread now pickle's toilState, passing
it to the worker thread, where it is unpickled again.

A test for completed predecessors for jobs with multiple
predecessors has been added to the worker, so that some
basic job chaining can be performed for jobs whose remaining
predecessor completes within that task.

This change does result in the parent thread not correctly
noting jobs which had multiple predecessors have now been
completed correctly. So some of the checks at the end of
the internal job routine have had to be switched off for the
moment.
Copy link
Contributor

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some complaints from the type checker:

src/toil/worker.py:59: error: Function is missing a type annotation
src/toil/worker.py:101: error: Function is missing a type annotation for one or more arguments
src/toil/worker.py:185: error: Function is missing a type annotation for one or more arguments
src/toil/utils/toilDebugJob.py:49: error: Missing positional argument "parentToilState" in call to "workerScript"

https://ucsc-ci.com/databiosphere/toil/-/jobs/91029#L1518

Copy link
Contributor

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @douglowe

As of https://github.com/DataBiosphere/toil/pull/3776/files#diff-86e13719d8c065346eadea6d6e3735e45c07874bea3be19d5f7396a22858f18fL83 jobsToBeScheduledWithMultiplePredecessors is no longer a Dict but a Set so I think your code needs some changes to catch up.

@mr-c
Copy link
Contributor

mr-c commented Sep 26, 2021

Just to see if the recent changes in Toil 5.5.0 fixed this, I ran the tests without the other code changes and test_biobb_fail still fails:

AssertionError: Lists differ: [['editconf'], ['genion'], ['grompp'], ['pdb2gmx'], ['solvate']] != [['genion', 'grompp', 'pdb2gmx', 'editconf', 'solvate']]

First differing element 0:
['editconf']
['genion', 'grompp', 'pdb2gmx', 'editconf', 'solvate']

First list contains 4 additional elements.
First extra element 1:
['genion']

- [['editconf'], ['genion'], ['grompp'], ['pdb2gmx'], ['solvate']]
+ [['genion', 'grompp', 'pdb2gmx', 'editconf', 'solvate']]

@mr-c
Copy link
Contributor

mr-c commented Nov 30, 2021

I pushed this to our repo as issues/3696-cwl-job-chaining-tests so that GitLab CI will run

Comment on lines +90 to +92
successor = toilState.jobsToBeScheduledWithMultiplePredecessors[
successor.jobStoreID
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @douglowe , it has been clarified that jobsToBeScheduledWithMultiplePredecessors is a set[str], not a dictionary. Could you refresh this test in light of changes made since you started this PR?

Thanks again!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh - yes, I will correct that now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - my brain is slowly getting back up to speed with this. Not quite the quick fix I was thinking of - as my code was using the dictionary of job information cached here to check what successors the current job had.

I think that a discussion will be needed with the Toil devs, to see where the main Toil development arc for the JobStore is heading. IIRC the intent was to have a mutable central JobStore database, which could be accessed & changed by all jobs (not just the parent job) - which would make the hashed jobstore that my hacky code passes around redundant.

Perhaps we should cut out the test code that I wrote (though this also needs some work, as the regex doesn't match the current output stored in the Stats directory), and save that - as it still shows what kind of workflow in which I'd hope full job chaining would occur. Then, separately, we can work on correcting the information flow and logic for working out what jobs can be chained and which can't?

@adamnovak
Copy link
Member

This is supposed to fix #3697.

@adamnovak
Copy link
Member

@douglowe It seems like it might make sense for us to schedule a meeting with us and @DailyDreaming to talk about getting this feature actually merged, and how it changes the scheduling design. Is that something you would be interested in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants