Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running family displayed without any tasks #2095

Open
MetRonnie opened this issue Mar 11, 2025 · 10 comments
Open

Running family displayed without any tasks #2095

MetRonnie opened this issue Mar 11, 2025 · 10 comments
Labels
bug Something isn't working needs reproducing A bug report that does not yet have a reproducible example

Comments

@MetRonnie
Copy link
Member

MetRonnie commented Mar 11, 2025

I've now seen a case where a workflow thinks it has a running task, but none are actually running (none or submittied, etc either, its not expected to run again for another 6 hours)

Image

Originally posted by @ColemanTom in #1999

GraphQL

Query:

{
  workflows(ids: ["access_g4_pp_grp11"]) {
    taskProxies(ids: "//20250310T1200Z/*") {
      id
      state
    }
    familyProxies(ids: "//20250310T1200Z/*") {
      id
      state
    }
    jobs(ids: "//20250310T1200Z/*") {
      id
      state
    }
  }
}

Response:

{
  "data": {
    "workflows": [
      {
        "taskProxies": [
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_005",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_004",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_001",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_007",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_009",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_006",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_002",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_008",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_003",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/archive_log",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/housekeep_remote",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/housekeep_local",
            "state": "succeeded"
          }
        ],
        "familyProxies": [
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/HHH_000_005",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/RUN_ARCHIVE_005",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/root",
            "state": "running"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/HHH_000_004",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/RUN_ARCHIVE_004",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/HHH_000_001",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/RUN_ARCHIVE_001",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/HHH_000_007",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/RUN_ARCHIVE_007",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/HHH_000_009",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/RUN_ARCHIVE_009",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/HHH_000_006",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/RUN_ARCHIVE_006",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/HHH_000_002",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/RUN_ARCHIVE_002",
            "state": "running"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/HHH_000_008",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/RUN_ARCHIVE_008",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/HHH_000_003",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/RUN_ARCHIVE_003",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/HOUSEKEEP",
            "state": "succeeded"
          }
        ],
        "jobs": [
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_005/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_002/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_001/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_007/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_004/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_009/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_006/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_003/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/wait_000_008/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/archive_log/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/housekeep_remote/01",
            "state": "succeeded"
          },
          {
            "id": "~user/access_g4_pp_grp11//20250310T1200Z/housekeep_local/01",
            "state": "succeeded"
          }
        ]
      }
    ]
  }
}

@MetRonnie MetRonnie added bug Something isn't working needs reproducing A bug report that does not yet have a reproducible example labels Mar 11, 2025
@MetRonnie MetRonnie added this to the 2.x milestone Mar 11, 2025
@MetRonnie MetRonnie changed the title Running task displayed without job Running family displayed without any tasks Mar 11, 2025
@MetRonnie
Copy link
Member Author

@ColemanTom what tasks are under the RUN_ARCHIVE_002 family? And does this go away if you refresh the browser?

@ColemanTom
Copy link

I should mention, ui 2.5.0 I think, and cylc-flow 8.3.6. First I've seen it, and it's been months of running many workflow. I'm not able to update versions as we are essentially in a freeze in prep forva release to operations.

@ColemanTom
Copy link

@ColemanTom what tasks are under the RUN_ARCHIVE_002 family? And does this go away if you refresh the browser?

I dont have a complete list, but at a guess, 100+.no refresh does not change anything.

@MetRonnie MetRonnie removed this from the 2.x milestone Mar 11, 2025
@MetRonnie
Copy link
Member Author

MetRonnie commented Mar 11, 2025

It's possible this is already fixed on 8.4.x, or even possibly might be fixed by cylc/cylc-flow#6589

@ColemanTom
Copy link

How does GraphQL get its information? When I look at the DB, I can't see any mention of running tasks in the task_states table. I'm guessing it is something stored internally in the running workflow process on the VM?

@hjoliver
Copy link
Member

It's probably due to a bug in constructing the datastore that feeds the UI, and probably already fixed as noted above. At least, I think a suspiciously similar problem was fixed.

Have you tried stopping and restarting the workflow, to force reconstruction of the datastore?

@ColemanTom
Copy link

Have you tried stopping and restarting the workflow, to force reconstruction of the datastore?

Not intentionally, but a colleague did happen to stop and update it today, and there is no permanent running task.

@hjoliver
Copy link
Member

Sorry, does that mean the problem remains, after the restart?

@ColemanTom
Copy link

Sorry, does that mean the problem remains, after the restart?

After stop/play, the problem is not present.

I am always hesitant to stop/play things in case there is debug information you want extracted. I do have the DB, scheduler logs and job logs saved for reference purposes, but it sounds like they wouldn't be that useful if it is a transient data store in memory.

@hjoliver
Copy link
Member

Yes, I don't think any of the routine stored info would help much to debug this (unless perhaps it happened as a result of a series of logged interventions). Good to know that the restart fixed it - that pretty much confirms it's the datastore (perhaps with a small chance of it being a bug in how the UI applies the data feed).

There's probably not much we can do unless we have a reproducible case to examine, and then the first thing would be to run it with the latest Cylc code to check if it's fixed already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs reproducing A bug report that does not yet have a reproducible example
Projects
None yet
Development

No branches or pull requests

3 participants