Watcher failure - null status issue

2025-10-24 13:40:58,654 ERROR: Error occurred during cycle: 'NoneType' object has no attribute 'lower'
Traceback (most recent call last):
  File "/global/homes/n/nmdcda/nmdc_automation/prod/nmdc_automation/nmdc_automation/workflow_automation/watch_nmdc.py", line 477, in watch
    self.cycle()
  File "/global/homes/n/nmdcda/nmdc_automation/prod/nmdc_automation/nmdc_automation/workflow_automation/watch_nmdc.py", line 434, in cycle
    successful_jobs, failed_jobs = self.job_manager.get_finished_jobs()
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/n/nmdcda/nmdc_automation/prod/nmdc_automation/nmdc_automation/workflow_automation/watch_nmdc.py", line 235, in get_finished_jobs
    if status.lower() == "succeded":
       ^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'lower'

I believe this is from a combination of status: done and results:null  values from JAWS. These come from an edge case where JAWS purges our inputs.json before the runs were started. See https://code.jgi.doe.gov/dsi/advanced-analysis/jaws/jaws-support/-/issues/338

example:

jaws status 137411
{
  "compute_site_id": "nmdc",
  "cpu_hours": null,
  "cromwell_run_id": null,
  "id": 137411,
  "input_site_id": "nmdc",
  "json_file": "/tmp/tmpkw0k3ebi.json",
  "output_dir": null,
  "result": null,
  "status": "done",
  "status_detail": "The run is complete.",
  "submitted": "2025-10-01 03:05:42",
  "tag": "nmdc:omprc-11-weea4z31/nmdc:wfmag-11-pp2xkf68.1",
  "team_id": "nmdc",
  "updated": "2025-10-16 15:05:56",
  "user_id": "nmdcda",
  "wdl_file": "/tmp/tmptx3y8ksc/tmp7lqghtzf.wdl",
  "workflow_name": null,
  "workflow_root": null
}


jaws log 137411
#STATUS_FROM       STATUS_TO          TIMESTAMP            COMMENT
created            upload queued      2025-10-01 03:06:00
upload queued      upload complete    2025-10-01 03:06:00
upload complete    ready              2025-10-01 03:06:28
ready              submission failed  2025-10-16 15:05:25  File not found: /pscratch/sd/n/nmjaws/nmdc-prod/inputs/c09f5922-4549-43d1-b997-5aede600c913.json
submission failed  slack succeeded    2025-10-16 15:05:44
slack succeeded    done               2025-10-16 15:05:56

Immediate fix would be to count these as 'failed' wrt nmdc_automation, increment the failure count and run `jaws submit`, jaws resubmit should not be used in this case b/c something went wrong with the original submission failed.

Longer term fix would be to add support for checking jaws log for more advanced debugging for job failures.

We have 518 jaws submission in this state (done + null) so we need to implement something quickly & my suggestion is to hot fix production.

This doesn't kill the watcher but it does kill the pooling cycle based on the logs so also consider exception handling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Watcher failure - null status issue #618

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Watcher failure - null status issue #618

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions