-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I am running some workflows on Crusher. The stage with the largest number of tasks runs 64 of them, each using 1 CPU core. The performance analysis plots suggest, however, that around 1000 cores were reserved for this workflow. With 64 CPU cores and 4 GPUs per node you only get this if the node allocation would correspond to 1 GPU per task. I.e. reserving 16 nodes for 64 single core tasks. I hope that the code isn't actually doing that and that just the plotting is off.
The performance data is stored at
/lustre/orion/world-shared/chm136/re.session.login2.hjjvd.019706.0000
I have copied the performance plots into the same directory.
The versions of the RADICAL Cybertools packages are:
(pydeepdrivemd) [hjjvd@login2.crusher test]$ pip list | grep radical
radical.analytics 1.43.0
radical.entk 1.43.0
radical.gtod 1.43.0
radical.pilot 1.43.0
radical.saga 1.43.0
radical.utils 1.44.0
The code I am running lives at
git@github.com:hjjvandam/DeepDriveMD-pipeline.git
In branch feature/nwchem. The job I am running is specified in https://github.com/hjjvandam/DeepDriveMD-pipeline/blob/feature/nwchem/test/bba/molecular_dynamics_workflow_nwchem_test/config.yaml. Let me know if you need any further information, please.