Unable to get TinyTrainer to run on a single-GPU setup

I'm trying to run the sample finetuning test in the README, but no matter how I configure the `accelerate` tool it keeps trying to use two GPUs.  My most recent config is:

```
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: 'NO'
downcast_bf16: 'no'
enable_cpu_affinity: false
gpu_ids: '0'
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```

I'm running on Ubuntu 24.04 LTS, Python 3.12.11, with the following installed devices and CUDA versions:

```
$ nvidia-smi
Tue Dec  2 02:05:45 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08              Driver Version: 575.57.08      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:65:00.0 Off |                  Off |
|  0%   40C    P8             19W /  450W |       1MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
```

Logging tail is:

```
============================================================
sft/train_sft.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-12-02_02:06:19
  host      : beast
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 31116)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
```
Full log can be found here: https://gist.github.com/arpieb/d03351dbc74212b1e92de77962e61926

Debugging through the exception shows it's trying to request a GPU with an ID of 1, which would be the second GPU if one was installed, even though I have explicitly set the GPU device list to "0" in the config.

It looks like maybe something to do with the `LOCAL_RANK` env var is in play here as it defaults to 0 but is getting set to 1 for some reason...?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to get TinyTrainer to run on a single-GPU setup #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unable to get TinyTrainer to run on a single-GPU setup #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions