Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

family add and prune fix #6589

Open
wants to merge 4 commits into
base: 8.4.x
Choose a base branch
from

Conversation

dwsutherland
Copy link
Member

@dwsutherland dwsutherland commented Feb 4, 2025

Related to discussion in cylc/cylc-ui#1999

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests included.
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@dwsutherland dwsutherland added bug Something is wrong :( small labels Feb 4, 2025
@dwsutherland dwsutherland self-assigned this Feb 4, 2025
@MetRonnie MetRonnie added this to the 8.4.1 milestone Feb 5, 2025
@MetRonnie MetRonnie self-requested a review February 5, 2025 10:01
@dwsutherland dwsutherland force-pushed the clean-old-families branch 2 times, most recently from 6858c7a to 6d27d42 Compare February 10, 2025 08:39
@oliver-sanders oliver-sanders requested a review from wxtim February 13, 2025 14:34
Copy link
Member

@wxtim wxtim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like a reproducible workflow example.

Must Have

  • Duplicate test test_update_data_structure.
  • test_update_data_structure failing.

Nice to have

  • More comments

Take on board or ignore

(as your time and energy permit)

  • Wider refactoring of test_data_store_mgr.py

@dwsutherland dwsutherland force-pushed the clean-old-families branch 2 times, most recently from 90f69e9 to ef5a78a Compare February 17, 2025 00:17
@dwsutherland
Copy link
Member Author

dwsutherland commented Feb 17, 2025

Thanks for the review Tim

I'd really like a reproducible workflow example.

Here's another example in the ticket..
cylc/cylc-ui#1999 (comment)

I'm not sure the criteria for the families being visible in the UI, but graphql will show family proxies that accumulate every cycle.. (due to how suicide triggers add and remove items)

Must Have

  • Duplicate test test_update_data_structure.
  • test_update_data_structure failing.

Yeah, duplicate removed.

Nice to have

  • More comments

More comments added

Take on board or ignore

(as your time and energy permit)

  • Wider refactoring of test_data_store_mgr.py

I've refactored the code around the tests I added.. Will leave a wider refactor to a bigger change (which there may be in the near future)

Note:
If I undo my fix, the new tests fail as expected.

@wxtim
Copy link
Member

wxtim commented Feb 17, 2025

@dwsutherland - Unfortunately I couldn't replicate the bug with your example. 😢

@dwsutherland
Copy link
Member Author

@dwsutherland - Unfortunately I couldn't replicate the bug with your example. 😢

Did you use graphiql to try view the accumulation of family proxies?

query {
  workflows(ids:["<id>"]) {
    familyProxies {
      id
    }
  }
}

with that example workflow, it was 100% reproducible ... (even though the UI doesn't show them)

@dwsutherland
Copy link
Member Author

Note poll_exist has a hard coded workflow.. but should just be set the itself (strigger/run1 in this case):

[scheduling]
    initial cycle point = 20250204T2240Z
    sequential xtriggers = True
    [[xtriggers]]
        poll_exist = workflow_state(strigger/run1//%(point)s/spawn)
        poll_non_exist = workflow_state(this/workflow//%(point)s/doesnt_exist)
    [[graph]]
        PT1M = """
@wall_clock => spawn
spawn => foo & purge
@poll_non_exist => foo => fiz => fuz
foo => !purge
@poll_exist => purge
purge => !FOOS & !FIZS
"""
[runtime]
    [[root]]
        script = sleep $((1 + $RANDOM % 10))
    [[PURGEFOO]]
    [[spawn]]
        inherit = PURGEFOO
    [[FIZS]]
        inherit = PURGEFOO
    [[fiz]]
        inherit = FIZS
    [[FOOS]]
        inherit = PURGEFOO
    [[foo,fuz]]
        inherit = FOOS
    [[PURGES]]
        inherit = PURGEFOO
    [[purge]]
        inherit = PURGES

@wxtim
Copy link
Member

wxtim commented Feb 18, 2025

@dwsutherland - Unfortunately I couldn't replicate the bug with your example. 😢

Did you use graphiql to try view the accumulation of family proxies?

query {
  workflows(ids:["<id>"]) {
    familyProxies {
      id
    }
  }
}

with that example workflow, it was 100% reproducible ... (even though the UI doesn't show them)

No. I was looking for the bug as reported. Why do the proxies accumulate but not show up in the GUI the same way.

(Yes, I saw the hardcoded inter-workflow xtrigger)

@MetRonnie MetRonnie linked an issue Feb 18, 2025 that may be closed by this pull request
@MetRonnie
Copy link
Member

MetRonnie commented Feb 18, 2025

I cannot reproduce the GraphiQL family proxies accumulating on 8.4.x (21d18ba). I tried both the query and the same thing as a subscription. I assumed strigger is the name of the workflow in your example.

Also, I'm not sure if this would be the same bug as cylc/cylc-ui#1999, which seems to be an accumulation of jobs from Tom's GraphiQL output

@oliver-sanders
Copy link
Member

oliver-sanders commented Feb 18, 2025

I tried to replicate using the above example and this diff:

diff --git a/flow.cylc b/flow.cylc
index 40b932b..2c80133 100644
--- a/flow.cylc
+++ b/flow.cylc
@@ -2,7 +2,7 @@
     initial cycle point = 20250204T2240Z
     sequential xtriggers = True
     [[xtriggers]]
-        poll_exist = workflow_state(strigger/run1//%(point)s/spawn)
+        poll_exist = workflow_state(%(workflow)s//%(point)s/spawn)
         poll_non_exist = workflow_state(this/workflow//%(point)s/doesnt_exist)
     [[graph]]
         PT1M = """

Unfortunately, I couldn't replicate families piling up in GraphiQL OR the empty cycles in the GUI as originally reported.

@oliver-sanders
Copy link
Member

Have any of us managed to reproduce the "empty cycles in the GUI" issue so far?

@oliver-sanders oliver-sanders modified the milestones: 8.4.1, 8.4.x Feb 19, 2025
@oliver-sanders oliver-sanders added the needs reproducing A bug report that does not yet have a reproducible example label Feb 20, 2025
@dwsutherland
Copy link
Member Author

dwsutherland commented Feb 25, 2025

Have any of us managed to reproduce the "empty cycles in the GUI" issue so far?

I'm able to easily reproduce the accumulation of families, and the tests show this is fixed (if you undo the fix, but keep the tests, the tests fail)..

I'm not sure what mechanism produced a stuck UI representation, however, I'm certain that the non-pruning of families (addressed here) is to blame.. (along with some combination of w/e state/criteria the UI requires to display a node)

@dwsutherland
Copy link
Member Author

Also, I'm not sure if this would be the same bug as cylc/cylc-ui#1999, which seems to be an accumulation of jobs from Tom's GraphiQL output

@MetRonnie - the main issue (in the title) is the accumulation of old cycles, which is addressed here..

The job accumulation is (I assume) a separate issue, where he may be suiciding running (or soon to be) jobs.. It's a Separate issue from cylc/cylc-ui#1999 ... He should create a new issue for job accumulation..

@MetRonnie
Copy link
Member

I don't think the job accumulation is a separate issue to old cycles being shown in the UI. It seems the old cycles are showing as a result of the old jobs being retained in the data store. Annoyingly I have not been able to reproduce that issue either, using the tc_in_domain:noTC example he gave

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having worked on #6656, I think this makes sense, even if none of us were able to reproduce it outside of the added integration test

@dwsutherland
Copy link
Member Author

Having worked on #6656, I think this makes sense, even if none of us were able to reproduce it outside of the added integration test

Yeah, still a little surprised about this .. Maybe you didn't set n=0? I was only able doing that:
cylc/cylc-ui#1999 (comment)

Because, the corresponding task/family wouldn't be pruned in the same batch as it was added otherwise..

But thanks all the same..

@oliver-sanders oliver-sanders removed the needs reproducing A bug report that does not yet have a reproducible example label Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :( small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants