|
80 | 80 | * Extended: To be able to execute on GPU using CUDA or OpenMP
|
81 | 81 | * Optional: Extend the magics for the wasm use case (xeus-cpp-lite)
|
82 | 82 | * Present the work at the relevant meetings and conferences
|
83 |
| - |
84 |
| -- name: "Integrate Clad to PyTorch and compare the gradient execution times" |
| 83 | +
|
| 84 | +- name: "Enhancing LLM Training with Clad for efficient differentiation" |
| 85 | + description: | |
| 86 | + This project aims to leverage Clad, an automatic differentiation (AD) |
| 87 | + plugin for Clang, to optimize large language model (LLM) training primarily |
| 88 | + in C++. Automatic differentiation is a crucial component of deep learning |
| 89 | + training, enabling efficient computation of gradients for optimization |
| 90 | + algorithms such as stochastic gradient descent (SGD). While most modern LLM |
| 91 | + frameworks rely on Python-based ecosystems, their heavy reliance on |
| 92 | + interpreted code and dynamic computation graphs can introduce performance |
| 93 | + bottlenecks. By integrating Clad into C++-based deep learning pipelines, |
| 94 | + we can enable high-performance differentiation at the compiler level, |
| 95 | + reducing computational overhead and improving memory efficiency. This will |
| 96 | + allow developers to build more optimized training workflows without |
| 97 | + sacrificing flexibility or precision. |
| 98 | +
|
| 99 | + Beyond performance improvements, integrating Clad with LLM training in C++ |
| 100 | + opens new possibilities for deploying AI models in resource-constrained |
| 101 | + environments, such as embedded systems and HPC clusters, where minimizing |
| 102 | + memory footprint and maximizing computational efficiency are critical. |
| 103 | + Additionally, this work will bridge the gap between modern deep learning |
| 104 | + research and traditional scientific computing by providing a more robust |
| 105 | + and scalable AD solution for physics-informed machine learning models. By |
| 106 | + optimizing the differentiation process at the compiler level, this project |
| 107 | + has the potential to enhance both research and production-level AI |
| 108 | + applications, aligning with compiler-research.org's broader goal of |
| 109 | + advancing computational techniques for scientific discovery. |
| 110 | + |
| 111 | + tasks: | |
| 112 | + * Develop a simplified LLM setup in C++ |
| 113 | + * Apply Clad to compute gradients for selected layers and loss functions |
| 114 | + * Enhance clad to support it if necessary, and prepare performance benchmarks |
| 115 | + * Enhance the LLM complexity to cover larger projects such as llama |
| 116 | + * Repeat bugfixing and benchmarks |
| 117 | + * Develop tests to ensure correctness, numerical stability, and efficiency |
| 118 | + * Document the approach, implementation details, and performance gains |
| 119 | + * Present progress and findings at relevant meetings and conferences |
| 120 | + |
| 121 | +- name: "Integrate Clad in PyTorch and compare the gradient execution times" |
85 | 122 | description: |
|
86 | 123 | PyTorch is a popular machine learning framework that includes its own
|
87 | 124 | automatic differentiation engine, while Clad is a Clang plugin for
|
|
0 commit comments