Skip to content

Add EKFAC tests and fix a couple of bugs#125

Merged
LouisYRYJ merged 3 commits intoEleutherAI:ekfac-finalfrom
smarter:ekfac-final+tests
Jan 16, 2026
Merged

Add EKFAC tests and fix a couple of bugs#125
LouisYRYJ merged 3 commits intoEleutherAI:ekfac-finalfrom
smarter:ekfac-final+tests

Conversation

@smarter
Copy link
Collaborator

@smarter smarter commented Jan 16, 2026

This includes the tests that were originally in #13 and the tests added in #68
This is still missing FSDP support and test_apply_ekfac.py from #68

Co-Authored-By: LouisYRYJ louis.yousif@yahoo.de

@smarter smarter force-pushed the ekfac-final+tests branch 3 times, most recently from f473c60 to 8c170a5 Compare January 16, 2026 00:21
@smarter
Copy link
Collaborator Author

smarter commented Jan 16, 2026

https://github.com/EleutherAI/bergson/blob/main/.github/workflows/build.yml#L7-L9 means the tests aren't run for this PR since it's targeting the ekfac-final branch currently.

smarter and others added 3 commits January 16, 2026 01:25
The backward_hook was using g.reshape(-1, O) which includes padding
positions in the covariance computation. This causes incorrect results
when batches have different sequence lengths.

Before this commit, the added test failed with:
> FAILED tests/ekfac_tests/test_batch_size_invariance.py::test_trace_batch_invariant[seq_lengths1-20] - AssertionError: Scalars are not close!
>
> Expected 1.231401894309304 but got 0.8983965093439276.
> Absolute difference: 0.33300538496537635 (up to 1e-4 allowed)
> Relative difference: 0.27042786478102654 (up to 0.01 allowed)
The condition `if not hessian_cfg.use_dataset_labels:` was inverted,
causing the empirical Fisher (with dataset labels) to use sampled
labels and vice versa.

Add test_fim_accuracy.py which verifies that KFAC approximates the
Fisher Information Matrix within tolerance for both empirical FIM
(dataset labels) and true FIM (sampled labels).
This is still missing FSDP support and test_apply_ekfac.py from
EleutherAI#68

Co-Authored-By: LouisYRYJ <louis.yousif@yahoo.de>
@LouisYRYJ LouisYRYJ merged commit 6eedb83 into EleutherAI:ekfac-final Jan 16, 2026
3 of 4 checks passed
LouisYRYJ added a commit that referenced this pull request Jan 26, 2026
* ekfac implementation done (untested)

* remove unnecessary squeeze

* add tkfac

* fix claude issues

* shampoo

* minor fix

* Add EKFAC tests and fix a couple of bugs (#125)

* Fix mask bug and add batch size invariance test wih toy model

The backward_hook was using g.reshape(-1, O) which includes padding
positions in the covariance computation. This causes incorrect results
when batches have different sequence lengths.

Before this commit, the added test failed with:
> FAILED tests/ekfac_tests/test_batch_size_invariance.py::test_trace_batch_invariant[seq_lengths1-20] - AssertionError: Scalars are not close!
>
> Expected 1.231401894309304 but got 0.8983965093439276.
> Absolute difference: 0.33300538496537635 (up to 1e-4 allowed)
> Relative difference: 0.27042786478102654 (up to 0.01 allowed)

* Fix use_dataset_labels condition and add FIM accuracy test

The condition `if not hessian_cfg.use_dataset_labels:` was inverted,
causing the empirical Fisher (with dataset labels) to use sampled
labels and vice versa.

Add test_fim_accuracy.py which verifies that KFAC approximates the
Fisher Information Matrix within tolerance for both empirical FIM
(dataset labels) and true FIM (sampled labels).

* Add ground truth ekfac tests

This is still missing FSDP support and test_apply_ekfac.py from
#68

Co-Authored-By: LouisYRYJ <louis.yousif@yahoo.de>

* ekfac_tests/test_batch_size_invariance.py: Fix error thresholds when running on CPU

* Cleanup EKFAC tests

- Replace set_all_seeds by existing setup_reproducibility
- Reuse approximate_hessians instead of doing something
  equivalent manually.

* Add --token_batch_size option to EKFAC tests

* Add --n_samples option to EKFAC tests

Allow configuring the number of samples from pile-10k dataset via
pytest command line option instead of hardcoding 100. The dataset
directory is now named dynamically (e.g., pile_100_examples).

* hessians: Fix distributed support and test it

Restore the calls to dist.barrier that existed in
#13, without them the process would
hang when running with world_size > 1.

For testing, we add _allocate_batches_world to compute the batches for the
ground truth. The tests don't pass due to numerical errors, this is handled in
the next commit by changing our comparison logic.

* ekfac_tests: Use appropriate metrics for each comparison

- Eigenvectors: Check |cosine_similarity| ≈ 1 per column, which naturally
  handles sign ambiguity (eigenvectors are only defined up to sign)
- Covariances: Check relative Frobenius norm since values should match exactly
- Eigenvalue corrections: Align signs based on eigenvector orientation, then
  check relative error (λ[i,j] transforms as sign_G[i] * sign_A[j])
  - Also reenable CPU tests which pass after this change.

* ekfac_tests: Relax thresholds for distributed runs

With world_size > 1, floating-point reduction order differs between ground
truth (single process) and distributed run, causing larger numerical
differences in some layers.

For eigenvectors, use average |cos_sim| instead of minimum - this tolerates
occasional outlier eigenvectors while maintaining a stricter threshold
(1e-3 vs 0.1 that would be needed for min).

For eigenvalue corrections, use atol=0.2 when world_size > 1.

* adjust test + normalize shampoo and tkfac

* minor fixes, correct tensor handling in shampoo and tkfac, introduce apply_hessian (WIP)

---------

Co-authored-by: Guillaume Martres <smarter@ubuntu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants