-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Description
The paper says, "We follow classifier-free guidance and train our models with conditioning dropout: conditional inputs are set to 0 for 10% of training time."
This means during 10% of the training time, only Person UNet is to be trained, without any cross-attention, or self attention or anything in conditional inputs.
Maybe I am not familiar with the concept but how would it work without RGB-agnostic images, or how 6 channels would be passed? Do we make values 0 for RGB agnostic images? Any comments are welcome.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed