Possible solution of using DLPack to handle TVM FFI by haoran35-jpg · Pull Request #181 · HazyResearch/ThunderKittens

haoran35-jpg · 2026-03-17T07:58:25Z

Hey! I was trying to research for opportunity of lowering JAX to ThunderKittens, and met the same issue as mentioned in [TVM FFI support #176]. I searched internet and figured that DLPack kind of borrows the idea of TVM FFI, although there is certain differences: https://tvm.apache.org/ffi/concepts/tensor.html. I tried to implement DLPack methods and tested with both routes, getting the following difference in VRAM use:

Without DLPack Using DLPack

100.000 MB 50.000 MB

The two routes are respectively: tensor.clone() -> TKParallelTensor(tensor, ...) and JAX → __dlpack__() → TKParallelTensor(capsule). The VRAM use is integer because I'm only testing with a single fixed tensor size (50 MB)—i.e. 50×1024×1024/4 float32 elements—so the copy path allocates exactly two buffers (100 MB) and the DLPack path one (50 MB), giving round numbers. Not sure if this is in a good shape of PR? Thank you!

Hrancheng and others added 3 commits March 16, 2026 19:35

DLPack support for converting py.tensor to KTtensor

7fd42ca

remove duplicated defination of DLPack type conflicting with ATen

82f5979

Merge branch 'HazyResearch:main' into main

6c28385

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible solution of using DLPack to handle TVM FFI#181

Possible solution of using DLPack to handle TVM FFI#181
haoran35-jpg wants to merge 3 commits intoHazyResearch:mainfrom
haoran35-jpg:main

haoran35-jpg commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

haoran35-jpg commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants