Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate the use case of slurm sacct #55

Open
khsrali opened this issue Aug 6, 2024 · 1 comment
Open

Investigate the use case of slurm sacct #55

khsrali opened this issue Aug 6, 2024 · 1 comment
Labels
enhancement New feature or request upstream Changes that need to be made to aiida-core

Comments

@khsrali
Copy link
Contributor

khsrali commented Aug 6, 2024

It seems aiida-core is using this to get the exit code in a complicated manner:

tasks.py and calcjob.py both expect a dictionary with three keys ('retval', 'stdout', 'stderr') from:
scheduler.get_detailed_job_info()
Which then calcjob.py uses it along with two other files to call again on scheduler.parse_output to get the exit code.

@khsrali khsrali added enhancement New feature or request upstream Changes that need to be made to aiida-core labels Aug 6, 2024
@khsrali
Copy link
Contributor Author

khsrali commented Aug 6, 2024

Without this feature, monitoring cannot be done.
For now, in case of a failure, not possible to know which of these caused it:
ERROR_SCHEDULER_OUT_OF_MEMORY
ERROR_SCHEDULER_OUT_OF_WALLTIME
ERROR_SCHEDULER_NODE_FAILURE etc..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request upstream Changes that need to be made to aiida-core
Projects
None yet
Development

No branches or pull requests

1 participant