Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3-llava-next-8b 全参微调报错 #7102

Open
1 task done
liboaccn opened this issue Feb 27, 2025 · 3 comments
Open
1 task done

llama3-llava-next-8b 全参微调报错 #7102

liboaccn opened this issue Feb 27, 2025 · 3 comments
Labels
bug Something isn't working pending This problem is yet to be addressed

Comments

@liboaccn
Copy link

Reminder

  • I have read the above rules and searched the existing issues.

System Info

llamafactory 已经更新至最新版本

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-5.10.0-1.0.0.32-x86_64-with-glibc2.17
  • Python version: 3.9.18
  • PyTorch version: 2.1.0+cu118 (GPU)
  • Transformers version: 4.45.0
  • Datasets version: 2.19.2
  • Accelerate version: 0.34.2
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A100-SXM4-40GB
  • GPU number: 8
  • GPU memory: 39.39GB
  • DeepSpeed version: 0.15.4

Reproduction

全参数微调过程报错。 一共有2个报错,

第一个是image_sizes = iter(mm_inputs["image_sizes"].tolist()) 这里的mm_inputs 中没有image_sizes,image_sizes在后面的orig_height, orig_width = next(image_sizes) 会用到
第二个是 height, width = get_image_size(to_numpy_array(mm_inputs["pixel_values"][0][0])) 中的 mm_inputs["pixel_values"][0][0] 是不是应该是mm_inputs["pixel_values"][0]??

raceback (most recent call last):
  File "/home/users/code/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>
    launch()
  File "/home/users/code/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
    run_exp()
  File "/home/users/code/LLaMA-Factory/src/llamafactory/train/tuner.py", line 93, in run_exp
    _training_function(config={"args": args, "callbacks": callbacks})
  File "/home/users/code/LLaMA-Factory/src/llamafactory/train/tuner.py", line 67, in _training_function
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/home/users/code/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 51, in run_sft
    dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
  File "/home/users/code/LLaMA-Factory/src/llamafactory/data/loader.py", line 325, in get_dataset
    dataset = _get_preprocessed_dataset(
  File "/home/users/code/LLaMA-Factory/src/llamafactory/data/loader.py", line 258, in _get_preprocessed_dataset
    dataset = dataset.map(
  File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3156, in map
    for rank, done, content in Dataset._map_single(**dataset_kwargs):
  File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3547, in _map_single
    batch = apply_function_on_filtered_inputs(
  File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3416, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/home/users/code/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 99, in preprocess_dataset
    input_ids, labels = self._encode_data_example(
  File "/home/users/code/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 43, in _encode_data_example
    messages = self.template.mm_plugin.process_messages(prompt + response, images, videos, audios, self.processor)
  File "/home/users/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 397, in process_messages
    height, width = get_image_size(to_numpy_array(mm_inputs["pixel_values"][0][0]))
  File "/home/users/miniconda3/lib/python3.9/site-packages/transformers/image_utils.py", line 295, in get_image_size
    channel_dim = infer_channel_dimension_format(image)
  File "/home/users/miniconda3/lib/python3.9/site-packages/transformers/image_utils.py", line 243, in infer_channel_dimension_format
    raise ValueError(f"Unsupported number of image dimensions: {image.ndim}")
ValueError: Unsupported number of image dimensions: 2
[2025-02-27 19:49:42,031] torch.distributed.elastic.mu

对应代码部分

class LlavaNextPlugin(BasePlugin):
    @override
    def process_messages(
        self,
        messages: Sequence[Dict[str, str]],
        images: Sequence["ImageInput"],
        videos: Sequence["VideoInput"],
        audios: Sequence["AudioInput"],
        processor: Optional["ProcessorMixin"],
    ) -> List[Dict[str, str]]:
        self._validate_input(processor, images, videos, audios)
        num_image_tokens = 0
        messages = deepcopy(messages)
        mm_inputs = self._get_mm_inputs(images, videos, audios, processor)
        if "pixel_values" in mm_inputs:
            image_sizes = iter(mm_inputs["image_sizes"].tolist())   <-----------------mm_inputs 中 image_sizes  应该不存在,所以这一行有报错
            height, width = get_image_size(to_numpy_array(mm_inputs["pixel_values"][0][0]))<---------------- 这是是不是应该是mm_inputs["pixel_values"][0]  而不是 mm_inputs["pixel_values"][0][0] 否则有上面的报错 ValueError: Unsupported number of image dimensions: 2




### Others

_No response_
@liboaccn liboaccn added bug Something isn't working pending This problem is yet to be addressed labels Feb 27, 2025
@liboaccn
Copy link
Author

报错的不是 transformers/models/llava_next/processing_llava_next.py 而是llamafactory 的代码

而是这 src/llamafactory/data/mm_plugin.py", line 396,

Traceback (most recent call last):
  File "/home/users/code/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>
    launch()
  File "/home/users/code/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
    run_exp()
  File "/home/users/code/LLaMA-Factory/src/llamafactory/train/tuner.py", line 93, in run_exp
    _training_function(config={"args": args, "callbacks": callbacks})
  File "/home/users/code/LLaMA-Factory/src/llamafactory/train/tuner.py", line 67, in _training_function
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/home/users/code/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 51, in run_sft
    dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
  File "/home/users/code/LLaMA-Factory/src/llamafactory/data/loader.py", line 325, in get_dataset
    dataset = _get_preprocessed_dataset(
  File "/home/users/code/LLaMA-Factory/src/llamafactory/data/loader.py", line 258, in _get_preprocessed_dataset
    dataset = dataset.map(
  File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3156, in map
    for rank, done, content in Dataset._map_single(**dataset_kwargs):
  File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3547, in _map_single
    batch = apply_function_on_filtered_inputs(
  File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3416, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/home/users/code/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 99, in preprocess_dataset
    input_ids, labels = self._encode_data_example(
  File "/home/users/code/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 43, in _encode_data_example
    messages = self.template.mm_plugin.process_messages(prompt + response, images, videos, audios, self.processor)
  File "/home/users/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 396, in process_messages
    image_sizes = iter(mm_inputs["image_sizes"].tolist())
KeyError: 'image_sizes'

@Kuangdd01
Copy link
Contributor

https://github.com/huggingface/transformers/blob/51083d1bac7905aa8316b75f7897bdd4e5302044/src/transformers/models/llava_next/image_processing_llava_next.py#L726C9-L728C10

  return BatchFeature(
      data={"pixel_values": processed_images, "image_sizes": image_sizes}, tensor_type=return_tensors
  )

经过了llava-next的image_processor之后应该会存在这个image_sizes key的,图片输入正确吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

3 participants