[feat] Add support for encoder-only transformers (e.g. BERT) #131

OxxoCodes · 2024-08-27T23:58:35Z

🚀 The feature, motivation and pitch

Liger Kernel is currently incompatible with encoder-only transformer architectures such as BERT, DistilBERT, RoBERTa, XLM-R, and DeBERTa.

Given the importance these models still have in research and industry use-cases, it would be great to see support added to further decrease memory requirements and increase training throughput.

Alternatives

No response

Additional context

No response

## Summary - Added Embedding forward/backwards kernels + LigerEmbedding class which maps to nn.Embedding - nn.Embedding is useful for encoder-only models such as BERT - ref: #131  ## Testing Done  - tested against nn.Embedding for correctness on various inputs - tested with and without padding_idx  - Hardware Type: RTX 3090 + RTX 4090 - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x] run `make test-convergence` to ensure convergence --------- Co-authored-by: Shao Tang <[email protected]>

OxxoCodes changed the title ~~Add support for encoder-only transformers (e.g. BERT)~~ [feat] Add support for encoder-only transformers (e.g. BERT) Aug 27, 2024

ByronHsu added the feature label Aug 28, 2024

AndreSlavescu mentioned this issue Aug 28, 2024

custom Embedding kernel #135

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add support for encoder-only transformers (e.g. BERT) #131

[feat] Add support for encoder-only transformers (e.g. BERT) #131

OxxoCodes commented Aug 27, 2024

[feat] Add support for encoder-only transformers (e.g. BERT) #131

[feat] Add support for encoder-only transformers (e.g. BERT) #131

Comments

OxxoCodes commented Aug 27, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context