Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

支持qwen2-audio的dpo微调吗? #7072

Open
1 task done
cy565025164 opened this issue Feb 26, 2025 · 5 comments
Open
1 task done

支持qwen2-audio的dpo微调吗? #7072

cy565025164 opened this issue Feb 26, 2025 · 5 comments
Labels
solved This problem has been already solved

Comments

@cy565025164
Copy link

Reminder

  • I have read the above rules and searched the existing issues.

Description

qwen2-audio的dpo微调数据格式是什么样的呢?

Pull Request

No response

@cy565025164 cy565025164 added enhancement New feature or request pending This problem is yet to be addressed labels Feb 26, 2025
@BUAADreamer
Copy link
Collaborator

参考这个图像数据集,把audio部分换成image即可

https://huggingface.co/datasets/llamafactory/RLHF-V

例子:

{
    "conversations": [ { "from": "human", "value": "<audio>What are the key features you observe in the audio?" } ],
    "chosen": "",
    "rejected":"",
    "audios": ["1.wav"],
}

@cy565025164
Copy link
Author

@BUAADreamer 好的,谢谢

@hiyouga hiyouga added solved This problem has been already solved and removed enhancement New feature or request pending This problem is yet to be addressed labels Feb 26, 2025
@hiyouga hiyouga closed this as completed Feb 26, 2025
@cy565025164
Copy link
Author

cy565025164 commented Feb 26, 2025

@BUAADreamer 你好,基于qwen2-audio的dpo微调报错:ValueError: The number of audios does not match the number of tokens.

下面是dataset_info.json里添加的一行:
"qwen2_audio_dpo": {
"file_name": "qwen_audio_train_data.json",
"ranking": true,
"formatting": "sharegpt",
"columns": {
"messages": "conversations",
"chosen": "chosen",
"rejected": "rejected",
"audios": "audios"
}
}

其中,qwen_audio_train_data.json的格式是:

{
    "conversations": [ { "from": "human", "value": "<audio>What are the key features you observe in the audio?" } ],
    "chosen": {"from":"gpt", "value":"x"},
    "rejected":{"from":"gpt", "value":"xx"},
    "audios": ["1.wav"],
}

@hiyouga hiyouga reopened this Feb 26, 2025
@BUAADreamer
Copy link
Collaborator

BUAADreamer commented Feb 26, 2025

@cy565025164 收到,我处理一下

@cy565025164
Copy link
Author

cy565025164 commented Feb 26, 2025

@BUAADreamer 好的,感谢!

报错在这个文件里LLaMA-Factory/src/llamafactory/data/mm_plugin.py

num_audio_tokens = 0
        for message in messages:
            content = message["content"]
            while AUDIO_PLACEHOLDER in content:
                if self.expand_mm_tokens:
                    audio_length = audio_lengths.pop(0)
                    input_length = (audio_length - 1) // 2 + 1
                    audio_seqlen = (input_length - 2) // 2 + 1
                else:
                    audio_seqlen = 1

                content = content.replace(
                    AUDIO_PLACEHOLDER, f"{bos_token}{self.audio_token * audio_seqlen}{eos_token}", 1
                )
                num_audio_tokens += 1

            message["content"] = content

        if len(audios) != num_audio_tokens:
            raise ValueError(f"The number of audios does not match the number of {AUDIO_PLACEHOLDER} tokens.")

应该是数据格式问题,看着像是这种:

{
    "messages": [
      {
        "content": "<audio>What's that sound?",
        "role": "user"
      },
      {
        "content": "It is the sound of glass shattering.",
        "role": "assistant"
      }
    ],
    "audios": [
      "mllm_demo_data/1.mp3"
    ]
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

3 participants