You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, I've been running some hefty depletion simulations on my HPC. They usually finish within the max runtime, but this time, they took longer and got killed mid simulation. I'm now in the business of restarting the simulation to finish the job.
One issue that I have with the way we do restarts is that timesteps is required to be provided in the case of a depletion restart. While the user should be able to figure out what timeteps will reproduce the intended depletion steps, I really would appreciate the option for the restart to just know what was initially requested and seamlessly pick up from where it left off. I don't think it would be hard to store the initial time steps requested, though I'm not sure how much effort is required to refactor to allow this. Mainly, I think it is easy to provide different restart timesteps from what you actually wanted.
Quick anecdote as to why I think this would be beneficial:
I was playing around with this pincell depletion example and was able to cause a situation where the simulation was killed while writing the openmc_simulation_n3.h5 file. The original simulation requests time_steps = [1.0, 1.0, 1.0, 1.0, 1.0] # days. In this case, when I restarted the simulation, I told it to restart with time_steps = [1.0, 1.0] # days. Because the simulation got killed mid h5 write, it re-did the transport + depletion for that step and overwrote the faulty openmc_simulation_n3.h5 file. This then counted as one of my timesteps provided.
In this case, I was off by one because I thought it would start with openmc_simulation_n4.h5, but it really needed to redo n3. OpenMC didn't run the final eigenvalue sim + write the openmc_simulation_n5.h5, as I intended.
On HPC, it is very possible to get killed while writing one of the openmc_simulation_n<N>.h5 In a more costly simulation, I would just launch the job and come back expecting it to finish. I'd be sad if I came back later to HPC and saw that I actually needed to restart one more time / would maybe doubt that I did everything correctly and use up some time to verify what happened.
Alternatives
Status-quo: requiring the user to figure out what timesteps to provide with a restart.
Compatibility
The API would change by not requiring timesteps as an argument to openmc.deplete.abc.Integrator. In the case that it is not provided, the simulation should figure out what was initially requested and finish that set of timesteps. I don't think it would take up much memory to store this info in depletion_results.h5.
I think it would be easy to write a test for this API change as well.
The text was updated successfully, but these errors were encountered:
As discussed in-person, perhaps an appropriate option for simpler continuation runs might be to add the ability to the Integrator classes indicating that the timesteps provided are the original timesteps for the run and the integrator should pickup at the next timestep based on the timesteps present in the depletion_results.h5 file provided to the Operator.
Validation of the original timesteps should include:
validation of the previously executed timesteps and powers in the results file
if previous results don't exist, I suppose we'd simply start the depletion run from the beginning
In the absence of this flag, the timesteps provided to the Integrator would be treated as additional time steps in the calculation.
Description
Recently, I've been running some hefty depletion simulations on my HPC. They usually finish within the max runtime, but this time, they took longer and got killed mid simulation. I'm now in the business of restarting the simulation to finish the job.
One issue that I have with the way we do restarts is that
timesteps
is required to be provided in the case of a depletion restart. While the user should be able to figure out whattimeteps
will reproduce the intended depletion steps, I really would appreciate the option for the restart to just know what was initially requested and seamlessly pick up from where it left off. I don't think it would be hard to store the initial time steps requested, though I'm not sure how much effort is required to refactor to allow this. Mainly, I think it is easy to provide different restart timesteps from what you actually wanted.Quick anecdote as to why I think this would be beneficial:
I was playing around with this pincell depletion example and was able to cause a situation where the simulation was killed while writing the
openmc_simulation_n3.h5
file. The original simulation requeststime_steps = [1.0, 1.0, 1.0, 1.0, 1.0] # days
. In this case, when I restarted the simulation, I told it to restart withtime_steps = [1.0, 1.0] # days
. Because the simulation got killed midh5
write, it re-did the transport + depletion for that step and overwrote the faultyopenmc_simulation_n3.h5
file. This then counted as one of mytimesteps
provided.In this case, I was off by one because I thought it would start with
openmc_simulation_n4.h5
, but it really needed to redon3
. OpenMC didn't run the final eigenvalue sim + write theopenmc_simulation_n5.h5
, as I intended.On HPC, it is very possible to get killed while writing one of the
openmc_simulation_n<N>.h5
In a more costly simulation, I would just launch the job and come back expecting it to finish. I'd be sad if I came back later to HPC and saw that I actually needed to restart one more time / would maybe doubt that I did everything correctly and use up some time to verify what happened.Alternatives
Status-quo: requiring the user to figure out what
timesteps
to provide with a restart.Compatibility
The API would change by not requiring
timesteps
as an argument toopenmc.deplete.abc.Integrator
. In the case that it is not provided, the simulation should figure out what was initially requested and finish that set oftimesteps
. I don't think it would take up much memory to store this info indepletion_results.h5
.I think it would be easy to write a test for this API change as well.
The text was updated successfully, but these errors were encountered: