RFC-0045-decoupling-cuda-code.md [commenting] #82

bjtuwjx · 2025-09-02T06:19:36Z

This RFC proposal aims to decouple the CUDA-related code from the PyTorch main codebase and refactor it into an independent and modularized directory hierarchy with the help of a build optimization toolkit. Specifically, the proposal covers the following work:

Decouple CUDA-related code from the main codebase at both inter-file and intra-file levels, reducing the PyTorch core framework's direct dependency on CUDA.
Propose a modularized and standardized directory hierarchy and consolidate all CUDA-related code within it as a reference for other third-party backend integration.
Redesign the build system to support standalone compilation of the CUDA backend and develop a wrapped cmake toolkit to support and streamline the build process.

Click here for a preview of this RFC.

highlights manuscript

file decoupling and directory restructuring

Hl

update

Prior Art

Add next steps

Update RFC-0039-decouple-cuda-codes.md

Add next steps

Check word usage

meta-cla · 2025-09-02T06:19:42Z

Hi @bjtuwjx!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

meta-cla · 2025-09-02T07:15:47Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

albanD · 2025-09-02T13:13:44Z

RFC-0045-decoupling-cuda-code.md

+## **Motivation**
+
+For a long time, NVIDIA GPUs and the CUDA architecture have dominated the PyTorch ecosystem. However, as an increasing number of vendors introduce their high-performance AI chips, current ecosystem is revealing the following key issues:
+- *Code coupling*. The CUDA code is too tightly coupled with the PyTorch codebase, resulting in poor modularity and high maintenance costs.


Can you give some details on this? What is the more concrete impact on modularity? And how this would reduce maintenance cost?

albanD · 2025-09-02T13:15:53Z

RFC-0045-decoupling-cuda-code.md

+
+For a long time, NVIDIA GPUs and the CUDA architecture have dominated the PyTorch ecosystem. However, as an increasing number of vendors introduce their high-performance AI chips, current ecosystem is revealing the following key issues:
+- *Code coupling*. The CUDA code is too tightly coupled with the PyTorch codebase, resulting in poor modularity and high maintenance costs.
+- *Integration effort*. Currently, different hardware backends may adopt varying integration methods into PyTorch. The integration approaches and code lack standardization and consistency, leading to a significant amount of repetitive code and substantial integration effort.


We have done a lot of work with the PrivateUse1 backend extension point. In particular, added all the asked-for extension points, building OpenReg as a in-tree testing backend for this extension, added autoload, actively working on more documentation and updated many tools to use the accelerator API to enable smooth transition for the end user.
What would the proposed RFC provide on top of this? And why would we prefer investing in this refactor rather than continue improving the PrivateUse1-related extension points?

albanD · 2025-09-02T13:17:31Z

RFC-0045-decoupling-cuda-code.md

+For a long time, NVIDIA GPUs and the CUDA architecture have dominated the PyTorch ecosystem. However, as an increasing number of vendors introduce their high-performance AI chips, current ecosystem is revealing the following key issues:
+- *Code coupling*. The CUDA code is too tightly coupled with the PyTorch codebase, resulting in poor modularity and high maintenance costs.
+- *Integration effort*. Currently, different hardware backends may adopt varying integration methods into PyTorch. The integration approaches and code lack standardization and consistency, leading to a significant amount of repetitive code and substantial integration effort.
+- *Code migration*. Due to the lack of integration code specification, different hardware backends provide APIs with varying names and styles, resulting in high code migration costs for PyTorch users.


We are building the accelerator API to address this particular points (relatively independent on how the backend itself is implemented (in/out of tree)).
How would the proposal here help compared to continuing to extend the accelerator API?

[email protected] and others added 30 commits May 21, 2025 10:16

add afteam RFC template

80baf38

task assignment

ddf4d3c

assign plan

3b3fc46

highlights manuscript

9f278d8

Merge pull request #1 from bjtuwjx/ymw

1c45e89

highlights manuscript

file decoupling and directory restructuring

6a1b9fb

Merge pull request #2 from bjtuwjx/zhangjing

bd1227a

file decoupling and directory restructuring

compiling implementation

90204a4

Increase indentation

39124ce

Merge pull request #3 from bjtuwjx/hl

1802b8d

Hl

add Summary section

521d5ff

add newline

7b8ad85

优缺点

2849b6f

优缺点

7feca61

format update

9890226

update hyperlink

c456d8f

update capital

51623be

add decoupling.pptx

62254d5

Merge pull request #4 from bjtuwjx/zhangjing

7e0c0df

update

add Alternatives

1d37b31

add motivation

df80461

Prior Art

0078eba

Merge pull request #5 from deadsec1994/cw

0df18fd

Prior Art

Update RFC-0039-decouple-cuda-codes.md

0f79d3f

Add next steps

Merge pull request #1 from liyagit21/liyagit21-patch-1

9c1758e

Update RFC-0039-decouple-cuda-codes.md

Merge pull request #6 from liyagit21/master

ccb3a8a

Add next steps

revise some content

84dc107

revise fig insert

a08256c

redraw dir-restructured fig and add Unresolved questions

d0c1dec

translate fig 3

e34a120

bjtuwjx and others added 20 commits August 13, 2025 16:17

translate summary, highlights and implemantation sections

390057b

translate Unresolved Questions section

05c8efa

revise fig layout

d07b186

translate Next Steps and Alternatives sections

ca2c02a

translate Prior Art

f1efafb

translate motivation section

10bcc9c

fix typo

f19dad1

minor fix

bc97d6b

revise Summary and Highlights

9268f7b

merge figs

c728add

revise Proposed Implementation

6ed2a94

revise Drawbacks

7a0428f

revise Next Steps

32797e1

rename RFC file

efdab8e

revise fig and nex steps

8797b6d

revise Proposed Implementation

d0823c8

rephrase

cbaf173

Merge pull request #7 from bjtuwjx/yww

80e2f66

Check word usage

change rfc number

3ffecce

fix figure dir

3ea4cf0

bjtuwjx mentioned this pull request Sep 2, 2025

[RFC] CUDA code decoupling and directory refactoring pytorch/pytorch#161954

Open

meta-cla bot added the cla signed label Sep 2, 2025

albanD reviewed Sep 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC-0045-decoupling-cuda-code.md [commenting] #82

RFC-0045-decoupling-cuda-code.md [commenting] #82

bjtuwjx commented Sep 2, 2025

Uh oh!

meta-cla bot commented Sep 2, 2025

Uh oh!

meta-cla bot commented Sep 2, 2025

Uh oh!

albanD Sep 2, 2025

Uh oh!

albanD Sep 2, 2025

Uh oh!

albanD Sep 2, 2025

Uh oh!

Uh oh!

RFC-0045-decoupling-cuda-code.md [commenting] #82

Are you sure you want to change the base?

RFC-0045-decoupling-cuda-code.md [commenting] #82

Conversation

bjtuwjx commented Sep 2, 2025

Uh oh!

meta-cla bot commented Sep 2, 2025

Action Required

Process

Uh oh!

meta-cla bot commented Sep 2, 2025

Uh oh!

albanD Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

albanD Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

albanD Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!