Skip to content

Commit 4086a79

Browse files
aaronj0vgvassilev
authored andcommitted
Add project on Clad LLM Training
1 parent d690e22 commit 4086a79

File tree

1 file changed

+39
-2
lines changed

1 file changed

+39
-2
lines changed

_data/openprojectlist.yml

+39-2
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,45 @@
8080
* Extended: To be able to execute on GPU using CUDA or OpenMP
8181
* Optional: Extend the magics for the wasm use case (xeus-cpp-lite)
8282
* Present the work at the relevant meetings and conferences
83-
84-
- name: "Integrate Clad to PyTorch and compare the gradient execution times"
83+
84+
- name: "Enhancing LLM Training with Clad for efficient differentiation"
85+
description: |
86+
This project aims to leverage Clad, an automatic differentiation (AD)
87+
plugin for Clang, to optimize large language model (LLM) training primarily
88+
in C++. Automatic differentiation is a crucial component of deep learning
89+
training, enabling efficient computation of gradients for optimization
90+
algorithms such as stochastic gradient descent (SGD). While most modern LLM
91+
frameworks rely on Python-based ecosystems, their heavy reliance on
92+
interpreted code and dynamic computation graphs can introduce performance
93+
bottlenecks. By integrating Clad into C++-based deep learning pipelines,
94+
we can enable high-performance differentiation at the compiler level,
95+
reducing computational overhead and improving memory efficiency. This will
96+
allow developers to build more optimized training workflows without
97+
sacrificing flexibility or precision.
98+
99+
Beyond performance improvements, integrating Clad with LLM training in C++
100+
opens new possibilities for deploying AI models in resource-constrained
101+
environments, such as embedded systems and HPC clusters, where minimizing
102+
memory footprint and maximizing computational efficiency are critical.
103+
Additionally, this work will bridge the gap between modern deep learning
104+
research and traditional scientific computing by providing a more robust
105+
and scalable AD solution for physics-informed machine learning models. By
106+
optimizing the differentiation process at the compiler level, this project
107+
has the potential to enhance both research and production-level AI
108+
applications, aligning with compiler-research.org's broader goal of
109+
advancing computational techniques for scientific discovery.
110+
111+
tasks: |
112+
* Develop a simplified LLM setup in C++
113+
* Apply Clad to compute gradients for selected layers and loss functions
114+
* Enhance clad to support it if necessary, and prepare performance benchmarks
115+
* Enhance the LLM complexity to cover larger projects such as llama
116+
* Repeat bugfixing and benchmarks
117+
* Develop tests to ensure correctness, numerical stability, and efficiency
118+
* Document the approach, implementation details, and performance gains
119+
* Present progress and findings at relevant meetings and conferences
120+
121+
- name: "Integrate Clad in PyTorch and compare the gradient execution times"
85122
description: |
86123
PyTorch is a popular machine learning framework that includes its own
87124
automatic differentiation engine, while Clad is a Clang plugin for

0 commit comments

Comments
 (0)