Commit 6bffe05
committed
fix(libfabric): Use PCI bus ID for GPU-to-EFA mapping
Fix incorrect EFA device selection when CUDA_VISIBLE_DEVICES is set by using
PCI bus IDs instead of enumeration order. Query physical GPU via cuPointerGetAttributes(),
map to hwloc topology index, and select correct EFA devices based on PCIe proximity.
Fixes GPU device ID mismatch between CUDA and hwloc enumeration that caused
wrong EFA rails to be selected in vLLM and multi-GPU workloads.1 parent 0f64414 commit 6bffe05
File tree
6 files changed
+141
-58
lines changed- src
- plugins/libfabric
- utils/libfabric
- test/unit/utils/libfabric
6 files changed
+141
-58
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
| 59 | + | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
78 | 91 | | |
79 | 92 | | |
80 | 93 | | |
| |||
89 | 102 | | |
90 | 103 | | |
91 | 104 | | |
| 105 | + | |
92 | 106 | | |
93 | 107 | | |
94 | 108 | | |
95 | 109 | | |
96 | 110 | | |
97 | 111 | | |
98 | 112 | | |
99 | | - | |
| 113 | + | |
100 | 114 | | |
101 | 115 | | |
102 | 116 | | |
| |||
734 | 748 | | |
735 | 749 | | |
736 | 750 | | |
| 751 | + | |
737 | 752 | | |
738 | 753 | | |
739 | 754 | | |
| |||
760 | 775 | | |
761 | 776 | | |
762 | 777 | | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
763 | 791 | | |
764 | 792 | | |
765 | 793 | | |
| |||
777 | 805 | | |
778 | 806 | | |
779 | 807 | | |
780 | | - | |
| 808 | + | |
| 809 | + | |
781 | 810 | | |
782 | 811 | | |
783 | 812 | | |
784 | 813 | | |
785 | 814 | | |
| 815 | + | |
786 | 816 | | |
787 | 817 | | |
788 | 818 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
| 50 | + | |
49 | 51 | | |
50 | 52 | | |
51 | 53 | | |
| |||
321 | 323 | | |
322 | 324 | | |
323 | 325 | | |
324 | | - | |
| 326 | + | |
| 327 | + | |
325 | 328 | | |
326 | 329 | | |
327 | 330 | | |
328 | 331 | | |
329 | 332 | | |
330 | 333 | | |
331 | | - | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
332 | 343 | | |
333 | | - | |
| 344 | + | |
334 | 345 | | |
335 | 346 | | |
336 | 347 | | |
| |||
340 | 351 | | |
341 | 352 | | |
342 | 353 | | |
343 | | - | |
| 354 | + | |
344 | 355 | | |
345 | 356 | | |
346 | 357 | | |
347 | 358 | | |
348 | 359 | | |
349 | 360 | | |
350 | 361 | | |
351 | | - | |
352 | | - | |
| 362 | + | |
| 363 | + | |
353 | 364 | | |
354 | 365 | | |
355 | 366 | | |
356 | 367 | | |
357 | | - | |
358 | | - | |
| 368 | + | |
| 369 | + | |
359 | 370 | | |
360 | 371 | | |
361 | 372 | | |
362 | | - | |
| 373 | + | |
363 | 374 | | |
364 | 375 | | |
365 | 376 | | |
| |||
390 | 401 | | |
391 | 402 | | |
392 | 403 | | |
| 404 | + | |
393 | 405 | | |
394 | 406 | | |
395 | 407 | | |
| |||
398 | 410 | | |
399 | 411 | | |
400 | 412 | | |
401 | | - | |
402 | | - | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
403 | 418 | | |
404 | 419 | | |
405 | 420 | | |
| |||
429 | 444 | | |
430 | 445 | | |
431 | 446 | | |
| 447 | + | |
432 | 448 | | |
433 | 449 | | |
434 | 450 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
| 113 | + | |
113 | 114 | | |
114 | 115 | | |
115 | 116 | | |
| |||
120 | 121 | | |
121 | 122 | | |
122 | 123 | | |
| 124 | + | |
123 | 125 | | |
124 | 126 | | |
125 | 127 | | |
| |||
316 | 318 | | |
317 | 319 | | |
318 | 320 | | |
319 | | - | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
320 | 325 | | |
321 | 326 | | |
322 | 327 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
142 | 166 | | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | 167 | | |
147 | | - | |
148 | | - | |
149 | | - | |
| 168 | + | |
150 | 169 | | |
151 | 170 | | |
152 | 171 | | |
| |||
165 | 184 | | |
166 | 185 | | |
167 | 186 | | |
168 | | - | |
169 | | - | |
| 187 | + | |
| 188 | + | |
170 | 189 | | |
171 | | - | |
| 190 | + | |
172 | 191 | | |
173 | 192 | | |
174 | 193 | | |
| |||
423 | 442 | | |
424 | 443 | | |
425 | 444 | | |
426 | | - | |
| 445 | + | |
427 | 446 | | |
428 | 447 | | |
429 | 448 | | |
430 | 449 | | |
431 | 450 | | |
432 | 451 | | |
433 | 452 | | |
434 | | - | |
| 453 | + | |
435 | 454 | | |
436 | 455 | | |
437 | 456 | | |
| |||
527 | 546 | | |
528 | 547 | | |
529 | 548 | | |
530 | | - | |
531 | | - | |
532 | | - | |
533 | | - | |
534 | | - | |
535 | | - | |
536 | | - | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
537 | 560 | | |
538 | 561 | | |
539 | 562 | | |
| |||
543 | 566 | | |
544 | 567 | | |
545 | 568 | | |
546 | | - | |
547 | | - | |
548 | | - | |
549 | | - | |
550 | | - | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
551 | 572 | | |
552 | 573 | | |
553 | 574 | | |
554 | | - | |
555 | 575 | | |
556 | 576 | | |
557 | 577 | | |
| |||
607 | 627 | | |
608 | 628 | | |
609 | 629 | | |
610 | | - | |
611 | | - | |
| 630 | + | |
| 631 | + | |
612 | 632 | | |
613 | 633 | | |
614 | 634 | | |
| |||
621 | 641 | | |
622 | 642 | | |
623 | 643 | | |
624 | | - | |
| 644 | + | |
625 | 645 | | |
626 | 646 | | |
627 | 647 | | |
| |||
637 | 657 | | |
638 | 658 | | |
639 | 659 | | |
640 | | - | |
| 660 | + | |
641 | 661 | | |
642 | 662 | | |
643 | 663 | | |
| |||
0 commit comments