Improve Documentation

ZuseZ4 · ZuseZ4 · commit 3089d70abe7f · 2024-02-04T18:45:30.000-05:00
diff --git a/src/Debugging.md b/src/Debugging.md
@@ -0,0 +1,3 @@
+# How to Debug AD?
+
+
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -4,12 +4,15 @@
 - [What is Autodiff?](./chapter_1.md)
 - [Motivation](./motivation.md)
 - [Prior Art](./prior_art.md)
+- [Current Limitations](./limitations.md)
+  - [Safety](./limitations/safety.md)
+  - [Runtime Performance](./limitations/runtime.md)
+  - [Compile Times](./limitations/comptime.md)
+  - [Higher Order Derivatives](./limitations/higher.md)
+- [How to Debug](./Debugging.md)
+# Reference Guide
+- [Other Enzyme frontends](./other_Frontends.md)
 - [User facing design](./user_design.md)
 - [rustc internal design](./rustc_design.md)
-- [Other Enzyme frontends](./other_Frontends.md)
-# Reference Guide
-- [Forward Mode](./fwd.md)
-- [Reverse Mode](./rev.md)
-- [Current Limitations](./limitations.md)
 
 - [Acknowledgments](./acknowledgments.md)
diff --git a/src/acknowledgments.md b/src/acknowledgments.md
@@ -1 +1,3 @@
 # Acknowledgments
+
+
diff --git a/src/fwd.md b/src/fwd.md
@@ -1,7 +1,8 @@
 # Forward Mode
 
-In Forward mode we are only allowed to mark input arguments 
-The return value of forward mode with a Duplicated return is a tuple containing as the first value the primal return value and as the second value the derivative.
+In Forward mode we are only allowed to mark input arguments with Dual or Const.
+The return value of forward mode with a Dual return is a tuple containing as the first value the primal return value and as the second value the derivative.
+
+In forward mode Dual(x, 0.0) is equivalent to Const(x), except that we can perform more optimizations for Const.
 
-In forward mode Duplicated(x, 0.0) is equivalent to Const(x), except that we can perform more optimizations for Const.
 
diff --git a/src/limitations.md b/src/limitations.md
@@ -1,23 +1,10 @@
 # Current limitations
 
 1) Enzyme currently does only support freestanding functions. We added some support for `self`, but don't link pieces together correctly in all cases. `self` does not exist on llvm-ir level, so it's just a matter of fixing our macro and should be easy to solve.
-
-2) Soundness: Enzyme currently does assume that the user passes shadow arguments (`dx`, `dy`, ...) of appropriate size. That's a current research project of Manuel, so we hope to check at least basic DST (vectors, enums) till end of November. If we remember the backprop function from above, there is no way for the type system to guarantee that `dweights` is at least as large as the `weights` vector. Adding length checks for vectors and making sure that in case of enums primal and shadow are of the same variant should get us a large step towards soundness. Once implemented, we can still evaluate how many more checks we can insert automatically and where we want to fall back to unsafe. We also consider to allow the user to (unsafely) implement a safety check for his own types which we would then insert. Concretely, here we would add the following check at the top of backprop (above the code generated by enzyme) `assert!(dweights.len() >= weights.len())`
-```rust
-fn backprop(images: &[f32], weights: &[f32], dweights: &mut [f32]) { ... }
-```
-    
-3) Computing higher order derivatives (hessians, ...) can be done with Enzyme by differentiating functions that compute lower order derivatives. [This example](https://github.com/EnzymeAD/rust/blob/master/library/autodiff/examples/hessian_sin.rs) requires that rustc first uses Enzyme to fill the implementation of the `jac` function, before it uses Enzyme to fill the implementation of `hess`, by differentiatng `jac`. This is currently not guaranteed. It should be comparably easy to fix. Enzyme also considers adding helper function to directly compute common higher order derivatives.
-
-
+ 
 4) Parallelism: Enzyme currently does not handle Rust parallelism (rayon). Enzyme does (partly) support various parallel paradigms: OpenMP, MPI, CUDA, Rocm, Julia tasks. Enzyme only does need to support the lowest level of parallelism for each language, so adding support for Rust is not hard, but also not a high priority.
 
 
-5) Compile Times: Enzyme can create the TypeTrees it requires for each variable based on its LLVM-IR reads/writes/dereferences/usages. This is slow for large types and programms. A (real-world) examples which updates a vector of length 50k element by element, line by line e.g. takes 6hrs to compile with C++ Enzyme due to TypeTree creation, while JAX only takes 1hr. Due to leveraging rustc information (see next point) Rust-Enzyme manages to compile the code in 5 minutes. Compiling the C++ code without AD however only takes ~1 second.
-
-6) Rust ABI optimizations: In order to improve the compile times mentioned above we create Enzyme TypeTrees based on the rustc type knowledge. These trees unfortunately are not correct anymore once Rust ABI optimization take place. E.g. `fn rosenbrock(x: &[f64; 2]) -> f64 {...}` will get lowered into an LLVM-IR function comparable to `fn rosenbrock(f64, f64) -> f64`. A TypeTree therefore needs to be updated on each applied optimization. For now, we just block optimizations on the outermost functions since they tend to have a small performance effect, so we want to focus on other parts first.
-
-7) FAT-LTO requirement: Rust-Enzyme currently requires fat-lto when AutoDiff is used. We technically only need lto if the function being differentiated calls functions in other compilation units. Other solutions are possible but this is the most simple one. Since the compile time overhead of lto is small compared to the compile time overhead of differentiating larger functions this is not a priority.
 
 Enzyme does support custom allocators, but Rust-Enzyme does not expose support for it yet. Low priority.
 
diff --git a/src/limitations/comptime.md b/src/limitations/comptime.md
@@ -0,0 +1,27 @@
+# Compile Times
+
+Enzyme will often achieve excellent runtime performance, but might increase your compile time by a large factor. 
+For Rust, we already have made significant improvements and have a list of further improvements planed - please reach out if you have time to help here.
+
+## Type Analysis
+Most of the times, Type Analysis (TA) is the reason of large (>5x) compile time increases when using Enzyme. 
+This poster explains why we need to run Type Analysis in the bottom left part: [Poster Link](https://c.wsmoses.com/posters/Enzyme-llvmdev.pdf).
+
+Enzyme's TA will create TypeTrees based on usage patterns in the code.
+Due to a suboptimal datastructure this process scales very poorly. 
+Transfer the code (~1200 Lines of C++) to a better suited trie should remove most of this overhead, please reach out if you can help.
+For the meantime, we do initialize TypeTrees for outermost function (those to which you apply '#[autodiff(...)]` based on the Rust types. 
+In some real-worl applications (50k LoC), this improved the compile times by over 1000x - reducing them from hours to single minutes. 
+
+## Duplicated Optimizations
+The key reason for Enzyme offering often excellent performance is that Enzyme does differentiate already optimized LLVM-IR. 
+However, we also (have to) run LLVM's optimization pipeline after differentiating, to make sure that the code which Enzyme generates is optimized properly. 
+This is currently done approximately, but in certain cases some code will be optimized too often, while other code is not getting optimized enough. Tuning this could allow both compile time and runtime improvements.
+
+
+## FAT-LTO 
+The usage of '#[autodiff(...)]' currently requires compiling your project with fat-lto. 
+We technically only need lto if the function being differentiated calls functions in other compilation units. 
+Therefore other solutions are possible but this is the most simple one to get started. 
+The compile time overhead of lto is small compared to the current compile time overhead of differentiating larger functions so this limitation is currently not a priority.
+
diff --git a/src/limitations/higher.md b/src/limitations/higher.md
@@ -0,0 +1,9 @@
+# Higher Order Derivatives
+
+Computing higher order derivatives like hessians can be done with Enzyme by differentiating functions that compute lower order derivatives. 
+[This example](https://github.com/EnzymeAD/rust/blob/master/library/autodiff/examples/hessian_sin.rs) requires that rustc first uses Enzyme to fill the implementation of the `jac` function, before it uses Enzyme to fill the implementation of `hess`, by differentiatng `jac`. 
+This is currently not guaranteed and only works by coincidence in some cases. 
+This should be easy to fix, so please reach out if you would like to contribute and need some help to get started!
+
+Enzyme also considers adding helper function to directly compute common higher order derivatives in the future.
+
diff --git a/src/limitations/runtime.md b/src/limitations/runtime.md
@@ -0,0 +1,8 @@
+# Runtime Performance
+
+While Enzymes performance should already be good in most cases, there are some optimizations left to apply. One is mentioned in the following compile time section.
+The other optimization left to apply is re-enabling Rust's ABI optimizations.
+The Rust compiler might change how Rust types are represented on a lower level, to allow faster function calls. These optimizations are mainly relevant when you call a small functions many times. 
+We don't expect this to be the main application of autodiff, where we assume that you will often differentiate math-heavy code that for example calls faer, ndarray, or nalgebra matrix operations. 
+We therefore disabled this optimization for the outermost function (the one to which one applies '#[autodiff(...)]`, to enable compile time improvements. 
+However, it would be nice to teach Enzyme about these Rust ABI optimizations so we can have the best of both worlds.
diff --git a/src/limitations/safety.md b/src/limitations/safety.md
@@ -0,0 +1,8 @@
+# Safety and Soundness
+
+Enzyme currently does assume that the user passes shadow arguments (`dx`, `dy`, ...) of appropriate size. That's a current research project of Manuel, so we hope to check at least basic DST (vectors, enums) soon. If we remember the backprop function from above, there is no way for the type system to guarantee that `dweights` is at least as large as the `weights` vector. Adding length checks for vectors and making sure that in case of enums primal and shadow are of the same variant should get us a large step towards soundness. Once implemented, we can still evaluate how many more checks we can insert automatically and where we want to fall back to unsafe. We also consider to allow the user to (unsafely) implement a safety check for his own types which we would then insert. Concretely, here we would add the following check at the top of backprop (above the code generated by enzyme) `assert!(dweights.len() >= weights.len())`
+```rust
+fn backprop(images: &[f32], weights: &[f32], dweights: &mut [f32]) { ... }
+```
+
+This is an ongoing effort. Please use `carg expand <FunctionName>` to get a feeling for which safety checks we currently insert.
diff --git a/src/safety.md b/src/safety.md
@@ -0,0 +1 @@
+# Safety