支持qwen2-audio的dpo微调吗？ #7072

cy565025164 · 2025-02-26T04:02:30Z

Reminder

I have read the above rules and searched the existing issues.

Description

qwen2-audio的dpo微调数据格式是什么样的呢？

Pull Request

No response

BUAADreamer · 2025-02-26T04:37:28Z

参考这个图像数据集，把audio部分换成image即可

https://huggingface.co/datasets/llamafactory/RLHF-V

例子：

{
    "conversations": [ { "from": "human", "value": "<audio>What are the key features you observe in the audio?" } ],
    "chosen": "",
    "rejected":"",
    "audios": ["1.wav"],
}

cy565025164 · 2025-02-26T05:39:00Z

@BUAADreamer 好的，谢谢

cy565025164 · 2025-02-26T09:58:08Z

@BUAADreamer 你好，基于qwen2-audio的dpo微调报错：ValueError: The number of audios does not match the number of tokens.

下面是dataset_info.json里添加的一行：
"qwen2_audio_dpo": {
"file_name": "qwen_audio_train_data.json",
"ranking": true,
"formatting": "sharegpt",
"columns": {
"messages": "conversations",
"chosen": "chosen",
"rejected": "rejected",
"audios": "audios"
}
}

其中，qwen_audio_train_data.json的格式是：

{
    "conversations": [ { "from": "human", "value": "<audio>What are the key features you observe in the audio?" } ],
    "chosen": {"from":"gpt", "value":"x"},
    "rejected":{"from":"gpt", "value":"xx"},
    "audios": ["1.wav"],
}

BUAADreamer · 2025-02-26T10:04:57Z

@cy565025164 收到，我处理一下

cy565025164 · 2025-02-26T10:14:07Z

@BUAADreamer 好的，感谢！

报错在这个文件里LLaMA-Factory/src/llamafactory/data/mm_plugin.py

num_audio_tokens = 0
        for message in messages:
            content = message["content"]
            while AUDIO_PLACEHOLDER in content:
                if self.expand_mm_tokens:
                    audio_length = audio_lengths.pop(0)
                    input_length = (audio_length - 1) // 2 + 1
                    audio_seqlen = (input_length - 2) // 2 + 1
                else:
                    audio_seqlen = 1

                content = content.replace(
                    AUDIO_PLACEHOLDER, f"{bos_token}{self.audio_token * audio_seqlen}{eos_token}", 1
                )
                num_audio_tokens += 1

            message["content"] = content

        if len(audios) != num_audio_tokens:
            raise ValueError(f"The number of audios does not match the number of {AUDIO_PLACEHOLDER} tokens.")

应该是数据格式问题，看着像是这种：

{
    "messages": [
      {
        "content": "<audio>What's that sound?",
        "role": "user"
      },
      {
        "content": "It is the sound of glass shattering.",
        "role": "assistant"
      }
    ],
    "audios": [
      "mllm_demo_data/1.mp3"
    ]
  }

cy565025164 added enhancement New feature or request pending This problem is yet to be addressed labels Feb 26, 2025

hiyouga added solved This problem has been already solved and removed enhancement New feature or request pending This problem is yet to be addressed labels Feb 26, 2025

hiyouga closed this as completed Feb 26, 2025

hiyouga reopened this Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

支持qwen2-audio的dpo微调吗？ #7072

支持qwen2-audio的dpo微调吗？ #7072

cy565025164 commented Feb 26, 2025

BUAADreamer commented Feb 26, 2025

cy565025164 commented Feb 26, 2025

cy565025164 commented Feb 26, 2025 •

edited by BUAADreamer

Loading

BUAADreamer commented Feb 26, 2025 •

edited

Loading

cy565025164 commented Feb 26, 2025 •

edited by BUAADreamer

Loading

支持qwen2-audio的dpo微调吗？ #7072

支持qwen2-audio的dpo微调吗？ #7072

Comments

cy565025164 commented Feb 26, 2025

Reminder

Description

Pull Request

BUAADreamer commented Feb 26, 2025

cy565025164 commented Feb 26, 2025

cy565025164 commented Feb 26, 2025 • edited by BUAADreamer Loading

BUAADreamer commented Feb 26, 2025 • edited Loading

cy565025164 commented Feb 26, 2025 • edited by BUAADreamer Loading

cy565025164 commented Feb 26, 2025 •

edited by BUAADreamer

Loading

BUAADreamer commented Feb 26, 2025 •

edited

Loading

cy565025164 commented Feb 26, 2025 •

edited by BUAADreamer

Loading