You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have a question about training the musicgen-melody model.
The text condition is control the result by concatenate the text embedding before the input sequence.
I tried to print the return from the ConditionFuser, the model.lm.fuser below. The Length of the "input_" tensor with shape (Batch, Length, 1536) is seem to be changeable depending on the max length of the text embedding of a batch.
Is the Length changeable when training? If not, what is the prefix length of text embedding when training?
import torch
import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
from audiocraft.modules.conditioners import (
ConditionFuser,
ClassifierFreeGuidanceDropout,
AttributeDropout,
ConditioningProvider,
ConditioningAttributes,
ConditionType,
)
model = MusicGen.get_pretrained('facebook/musicgen-melody')
model.set_generation_params(duration=8)
input_text = ['text_1', 'text_2', 'text_3']
attributes, prompt_tokens = model._prepare_tokens_and_attributes(input_text, None)
conditions = attributes
# prepare unconditional generation for cfg
null_conditions = ClassifierFreeGuidanceDropout(p=1.0)(conditions)
if conditions:
conditions = conditions + null_conditions
tokenized = model.lm.condition_provider.tokenize(conditions)
cfg_conditions = model.lm.condition_provider(tokenized)
prompt = torch.zeros((10, 4, 0), dtype=torch.long, device=model.device)
prompt = torch.cat([prompt, prompt], dim=0)
input_ = sum([model.lm.emb[k](prompt[:, k]) for k in range(4)])
input_, cross_attention_input = model.lm.fuser(input_, cfg_conditions)
print(input_.shape)
The text was updated successfully, but these errors were encountered:
Lonian6
changed the title
Question about text condition embedding shape of musicgen-melody in training phase?
Question about text condition embedding shape of musicgen-melody in training phase
Jul 29, 2024
Hello, I have a question about training the
musicgen-melody
model.The text condition is control the result by concatenate the text embedding before the input sequence.
I tried to print the return from the ConditionFuser, the model.lm.fuser below. The Length of the "input_" tensor with shape (Batch, Length, 1536) is seem to be changeable depending on the max length of the text embedding of a batch.
Is the Length changeable when training? If not, what is the prefix length of text embedding when training?
The text was updated successfully, but these errors were encountered: