The logging of the pre-train performance in on_policy.jl and off_policy.jl is causing some issues. If we log the pre-train performance again when resuming a solver that has already trained for some iterations, the steps will no longer be monotonically increasing, leading to plotting errors when we run plot_learning. The logging of the pre-train performance should me made optional or conditional.