Skip to content

Commit

Permalink
added timeout handling for docker container removal
Browse files Browse the repository at this point in the history
  • Loading branch information
neel04 committed Apr 28, 2024
1 parent a91424a commit f07a984
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 2 deletions.
7 changes: 6 additions & 1 deletion run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@ TRAIN_ARGS="--save_dir ./ReAct/outputs/ --dataset 'minipile' --group 'minipile'

# Stop all running Docker containers
echo "Stopping all running Docker containers..."
sudo docker rm -f $CONTAINER_NAME

if ! timeout 300 sudo docker rm -f $CONTAINER_NAME; then
echo "Command timed out. Restarting Docker daemon & retrying..."
sudo systemctl restart docker
sleep 10s; sudo docker rm -f $CONTAINER_NAME
fi

# Git stuff
git clone -b $BRANCH https://github.com/neel04/ReAct_Jax.git
Expand Down
2 changes: 1 addition & 1 deletion train_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def main(key: PRNGKeyArray):
)

# enqueue a few handpicked hyperparams for trials
_: None = [study.enqueue_trial(hyperparams) for hyperparams in init_hyperparams]
[study.enqueue_trial(hyperparams) for hyperparams in init_hyperparams]

study.optimize(
lambda trial: kickoff_optuna(trial=trial, **trainer_kwargs),
Expand Down

0 comments on commit f07a984

Please sign in to comment.