-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i try to start train with bash scripts/finetune.sh #22
Comments
I think the version of your cuda and the environment is a bit different. Can you reinstall the torch to your current cuda version and retry? |
can u tell me the version cuda and torch ? |
i use |
The error code says you have cuda 12.0. Thats odd. |
@MahmoudElsayedMahmoud It might be a problem with gcc. |
ok i will try and tell u thx for try to help me |
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.20s/it]
Installed CUDA version 12.0 does not match the version torch was compiled with 12.4 but since the APIs are compatible, accepting this combination
Using /home/mahmoud/.cache/torch_extensions/py310_cu124 as PyTorch extensions root...
Emitting ninja build file /home/mahmoud/.cache/torch_extensions/py310_cu124/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
[rank0]: Traceback (most recent call last):
[rank0]: File "/media/mahmoud/새 볼륨/Llama3.2-Vision-Finetune/src/training/train.py", line 225, in
[rank0]: train()
[rank0]: File "/media/mahmoud/새 볼륨/Llama3.2-Vision-Finetune/src/training/train.py", line 200, in train
[rank0]: trainer.train()
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/transformers/trainer.py", line 2122, in train
[rank0]: return inner_training_loop(
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/transformers/trainer.py", line 2277, in _inner_training_loop
[rank0]: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/accelerate/accelerator.py", line 1318, in prepare
[rank0]: result = self._prepare_deepspeed(*args)
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/accelerate/accelerator.py", line 1815, in _prepare_deepspeed
[rank0]: engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/deepspeed/init.py", line 193, in initialize
[rank0]: engine = DeepSpeedEngine(args=args,
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 313, in init
[rank0]: self._configure_optimizer(optimizer, model_parameters)
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1276, in _configure_optimizer
[rank0]: basic_optimizer = self._configure_basic_optimizer(model_parameters)
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1347, in _configure_basic_optimizer
[rank0]: optimizer = DeepSpeedCPUAdam(model_parameters,
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in init
[rank0]: self.ds_opt_adam = CPUAdamBuilder().load()
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 531, in load
[rank0]: return self.jit_load(verbose)
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 578, in jit_load
[rank0]: op_module = load(name=self.name,
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1314, in load
[rank0]: return _jit_compile(
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1746, in _jit_compile
[rank0]: return _import_module_from_library(name, build_directory, is_python_module)
[rank0]: File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2140, in _import_module_from_library
[rank0]: module = importlib.util.module_from_spec(spec)
[rank0]: File "", line 571, in module_from_spec
[rank0]: File "", line 1176, in create_module
[rank0]: File "", line 241, in _call_with_frames_removed
[rank0]: ImportError: /home/mahmoud/anaconda3/envs/llama3/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /home/mahmoud/.cache/torch_extensions/py310_cu124/cpu_adam/cpu_adam.so)
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7a8af0d48d30>
Traceback (most recent call last):
File "/home/mahmoud/anaconda3/envs/llama3/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
The text was updated successfully, but these errors were encountered: