-
Notifications
You must be signed in to change notification settings - Fork 190
[WIP] mHC: Manifold-constrained Hyper Connection #1859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
anhminhnguyenhoang
wants to merge
44
commits into
main
Choose a base branch
from
feat/mhc-deepseek
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
c35c8ac
add initial implementation of projection mapping
anhminhnguyenhoang 2a8325c
Refactor mHC kernel and wrapper to include sigmoid activation in proj…
waqahmed-amd-fi f11244d
Add Sinkhorn-Knopp log-domain kernel implementation
anhminhnguyenhoang 87e5839
clean up sinkhorn-knopp tests
anhminhnguyenhoang 152bb21
review invalid test case
anhminhnguyenhoang 18a62a1
Refactor mHC kernel and wrapper to implement equations 14-18 as fused…
waqahmed-amd-fi 36e36d6
Fix H dims
waqahmed-amd-fi 6156c13
fix test_mhc_output_range
waqahmed-amd-fi 80b8d34
Refactor test cases in mHC and Sinkhorn-Knopp implementations and sim…
anhminhnguyenhoang 4801d63
Fix issues (#1878)
waqahmed-amd-fi d952829
optimization to loads x_tile once, reducing memory bandwidth
waqahmed-amd-fi 30741ca
Update mHC implementation to apply Sinkhorn-Knopp (Equation 19) to ma…
waqahmed-amd-fi 686711f
Refactor mHC implementation to separate projection (phi) matrices int…
waqahmed-amd-fi 831572b
Enhance mHC fused kernel to implement stream-aware processing
anhminhnguyenhoang ad198b3
Refactor mHC implementation
anhminhnguyenhoang 05810eb
Adjust tolerance levels in mHC tests based on input size to improve a…
anhminhnguyenhoang 7d333a8
Add benchmark scripts for mHC kernel performance evaluation
waqahmed-amd-fi 46d023a
add modes to bench
waqahmed-amd-fi e8f4464
- Add naive configs for fused mHC and Sinkhorn-Knopp kernels
anhminhnguyenhoang 68a2df8
switch to using exp2/log2 for sinkhorn-knopp for optimization
anhminhnguyenhoang df24377
Sort benchmark configurations by hidden dimension and refine FLOPs ca…
waqahmed-amd-fi a4a1793
Refactor Sinkhorn-Knopp kernel to support batch processing
anhminhnguyenhoang 23609ed
better tuned configs
anhminhnguyenhoang 514f3f5
Refactor mHC fused kernel for improved arithmetic operations and clar…
waqahmed-amd-fi 10b122a
Add split-K support to mHC kernel by new split and reduce kernels to …
anhminhnguyenhoang 9f6aba3
add better config with split reduce usage
anhminhnguyenhoang 833c47a
Apply optim in mhc_fused to split reduce kernels, rename functions fo…
anhminhnguyenhoang 70490ad
Add json config loading
anhminhnguyenhoang b01c00a
Add tuned JSON configuration files for fused mhc kernels
anhminhnguyenhoang 42c944d
inittial implementation of zero-iteration Sinkhorn-Knopp (mHC-Lite). …
waqahmed-amd-fi 190122e
optimized zero-iteration Sinkhorn-Knopp (mHC-Lite)
waqahmed-amd-fi 507d95a
Removed Unused Projection Code (Wrapper Function)
waqahmed-amd-fi c2ccafc
add config loading bug fix due to caching and better tuned configs
anhminhnguyenhoang 7862631
2D grid parallelization. Key improvements:
waqahmed-amd-fi cd84075
remove _sinkhorn_knopp_lite, and implement mHC_lite i.e., non-iterati…
waqahmed-amd-fi 3e85b9d
add config loading bug fix due to caching and better tuned configs
anhminhnguyenhoang 5374eb6
add mhc-lite
anhminhnguyenhoang 4f4c272
revised mHC and mHC-Lite description for clarity
waqahmed-amd-fi 57c75ac
update comments and replace if-else with assert check
waqahmed-amd-fi 68c76b0
revised _mhc_lite_fused_split_kernel kernel
waqahmed-amd-fi 604426d
revised _mhc_lite_fused_reduce_kernel
waqahmed-amd-fi e52ebdc
add mhc-lite bench mode
anhminhnguyenhoang e70f3dd
integrate mhc-lite into mhc_fused
anhminhnguyenhoang 41b4908
update config loading for mode
anhminhnguyenhoang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run into test failure because of this for similar tests that you need to relax the tolerance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, mainly because of sinkhorn which is an iterative process and returns higher differences due you only 10 iterations. May be we can try 20 for better results?