Skip to content

Weights of Inner Optimizers Not Saved #2094

@BinyanHu

Description

@BinyanHu

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04 & Windows 10
  • TensorFlow version and how it was installed (source or binary): 2.3.0 from source
  • TensorFlow-Addons version and how it was installed (source or binary): 0.11.1 from source
  • Python version: 3.7
  • Is GPU used? (yes/no): yes

Describe the bug

Resume a training process needs the restoration of the optimizer states to continue training RIGHT from the previous state without any loss of accuracy. Currently, the keras interface of saving model keras.Model.save_weights checkpoints both the network parameters and the optimizer weights. However, when an optimizer is wrapped inside another, its weights can not be saved by this mean.

For example, when I was trying to use the Ranger optimizer, which is constructed by wrapping RAdam with Lookahead:

optimizer = tfa.optimizers.Lookahead(
    tfa.optimizers.RectifiedAdam()
)

I noticed a performance drop on resuming training. I found that the weights of the inner RAdam were not saved into the checkpoint. (I checked the .index file in the checkpoint folder and there are no variable names like "m" and "v", only "slow", which is the weights of Lookahead). Therefore, after loading the weights from file and restart fitting, the weights of RAdam are randomly reinitialized. This could because the weights of the inner optimizer are not automatically tracked.

Experiments

I trained the two LeNets on the FashionMNIST dataset. All the configurations are the same except for the optimizers. Both training are interrupted in the middle and then resumed.

image
Fig. TensorBoard. Blue: Ranger (Lookahead+RAdam), orange: RAdam.

Note the "bump" of the Ranger curve caused by the reinitialization of RAdam weights. Apparently, the weights of the inner optimizer are not correctly saved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions