Skip to content

Best sigma choice based on algorithmic complexity#868

Open
magisterbrown wants to merge 1 commit into
flatironinstitute:masterfrom
magisterbrown:best-complexity-simd
Open

Best sigma choice based on algorithmic complexity#868
magisterbrown wants to merge 1 commit into
flatironinstitute:masterfrom
magisterbrown:best-complexity-simd

Conversation

@magisterbrown

@magisterbrown magisterbrown commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

This pr adds a new best_upsampling_factor_complexity function that sets the sigma when the user does not specify it. This function loops over all sigma values in the range [1.25, 2.00] with a step size of 0.05 and estimates number of flops. The total number of flops includes the sum of spreading and FFT steps. The prediction of the spreading complexity, which quite accurate according to emperical results, is equal to the number of input points times the nspread of each point. The following images show the predictions and measurements for two different transforms.
test_tol1e-06_Nd80x80x1_density1_type1
test_tol1e-06_Nd10000x1x1_density1_type1
FFT complexity is estimated as nlog(n) where n is a grid size, because of different backends and FFT algorithm it does not predict complexity well. Total complexity equals to spreadinterp+1000FFT, where 1000 is a bit of an arbitrary constant that requires a bit of tuning.

Type 3 transforms have a different best sigma heuristics that suggests the lowest sigma satisfying the check_sigma function.

@github-actions

Copy link
Copy Markdown

FFT backend: DUCC

Perftest plot

Perftest plot

Numbers are advisory: GitHub-hosted runners have variable hardware. Treat <1.10× as noise.

CPU and compiler configuration

CPU name: AMD EPYC 7763 64-Core Processor.

Arch: X86_64.

Core count: 2.

ISA extensions: 3dnowext, 3dnowprefetch, abm, adx, aes, aperfmperf, apic, arat, avx, avx2, bmi1, bmi2, clflush, clflushopt, clwb, clzero, cmov, cmp_legacy, constant_tsc, cpuid, cr8_legacy, cx16, cx8, de, decodeassists, erms, extd_apicid, f16c, flushbyasid, fma, fpu, fsgsbase, fsrm, fxsr, fxsr_opt, ht, hypervisor, invpcid, lahf_lm, lm, mca, mce, misalignsse, mmx, mmxext, movbe, msr, mtrr, nonstop_tsc, nopl, npt, nrip_save, nx, osvw, osxsave, pae, pat, pausefilter, pcid, pclmulqdq, pdpe1gb, pfthreshold, pge, pni, popcnt, pse, pse36, rdpid, rdpru, rdrand, rdrnd, rdseed, rdtscp, rep_good, sep, sha, sha_ni, smap, smep, sse, sse2, sse4_1, sse4_2, sse4a, ssse3, svm, syscall, topoext, tsc, tsc_known_freq, tsc_reliable, tsc_scale, umip, user_shstk, v_vmsave_vmload, vaes, vmcb_clean, vme, vmmcall, vpclmulqdq, xgetbv1, xsave, xsavec, xsaveerptr, xsaveopt, xsaves.

Compiler version: c++ (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0.

Compiler flags: -march=native.

perftest commands
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=0.002 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=0.002 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=0.002 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=0.0001 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=0.0001 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=0.0001 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=0 --M=10000000.0 --tol=0.0001 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=0 --M=10000000.0 --tol=0.0001 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=0 --M=10000000.0 --tol=0.0001 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=d --N1=192 --N2=192 --N3=128 --ntransf=1 --threads=0 --M=10000000.0 --tol=1e-07 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=d --N1=192 --N2=192 --N3=128 --ntransf=1 --threads=0 --M=10000000.0 --tol=1e-07 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=d --N1=192 --N2=192 --N3=128 --ntransf=1 --threads=0 --M=10000000.0 --tol=1e-07 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3

github-actions Bot added a commit that referenced this pull request Jun 11, 2026
@DiamonDinoia

DiamonDinoia commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

It is good practise to open a PR when ctest and the other unit test at least work locally. If they break, is the test wrong or the code wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants