Skip to content

Latest commit

 

History

History
32 lines (24 loc) · 1.49 KB

File metadata and controls

32 lines (24 loc) · 1.49 KB

Softer softcap

This record, by Braden Koszarsky, increases the degree of logit softcapping, yielding a 7% speedup. reproducible log

Previously, logits were softcapped (via tanh) to be at most 30. The new record lowers that to 15, which boosts performance such that the step count can be reduced from 1490 to 1390.

Lowering the tanh softcap can be understood as a form of extra structure which we are imposing on the network, which improves performance in the small-scale regime.

Running this new record 80 times yielded the following series of val losses:

accs = [3.2798, 3.2804, 3.2837, 3.2808, 3.2782, 3.2801, 3.283, 3.2825, 3.2777, 3.2769, 3.2834, 3.2832, 3.2753,
        3.2809, 3.2778, 3.2801, 3.2799, 3.2804, 3.2765, 3.2792, 3.2786, 3.2792, 3.2801, 3.2762, 3.2803, 3.2784,
        3.2792, 3.2791, 3.2769, 3.279, 3.2784, 3.2775, 3.283, 3.2785, 3.2753, 3.2805, 3.2766, 3.2766, 3.2781,
        3.2819, 3.2754, 3.2827, 3.2803, 3.2784, 3.2802, 3.2794, 3.2765, 3.278, 3.2782, 3.278, 3.2816, 3.279,
        3.2771, 3.2791, 3.2768, 3.2781, 3.2794, 3.2798, 3.2785, 3.2804, 3.2777, 3.2765, 3.2796, 3.278, 3.2803,
        3.2793, 3.2793, 3.2788, 3.2797, 3.278, 3.2799, 3.2813, 3.2803, 3.2768, 3.2803, 3.2796, 3.28, 3.2796,
        3.2783, 3.278]

import scipy.stats
print('p=%.4f' % scipy.stats.ttest_1samp(accs, 3.28, alternative='less').pvalue)
# p=0.0001

import torch
print(torch.std_mean(torch.tensor(accs)))
# (tensor(0.0019), tensor(3.2791))