Skip to content

Conversation

@MaxenceGollier
Copy link
Collaborator

#270 (comment)

@MohamedLaghdafHABIBOULLAH @d1Lab,

Can you fetch my branch and make some benchmarks ? I think we should try to see if it works because it is very cheap.

@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 77.41935% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.60%. Comparing base (e0f214d) to head (4732ee7).
⚠️ Report is 259 commits behind head on master.

Files with missing lines Patch % Lines
src/utils.jl 80.00% 5 Missing ⚠️
src/R2N.jl 50.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master     #275       +/-   ##
===========================================
+ Coverage   61.53%   84.60%   +23.07%     
===========================================
  Files          11       13        +2     
  Lines        1292     1650      +358     
===========================================
+ Hits          795     1396      +601     
+ Misses        497      254      -243     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

return approx, !isnan(approx)
end

# Compute upper bounds for μ‖B‖₂, where μ ∈ (0, 1].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Compute upper bounds for μ‖B‖₂, where μ ∈ (0, 1].
# Compute upper bounds for ‖B‖₂.

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

MohamedLaghdafHABIBOULLAH commented Jan 8, 2026

@MaxenceGollier @dpo @d1Lab, I conducted some benchmarks with the joss example, with solve! being more constraining:

  solve!(solver, reg_nlp, stats;
       atol=1e-6,
       rtol=1e-6,
       verbose=1,
       sub_kwargs=(max_iter=1000,),
       opnorm_maxiter= ...  % see below
   )

Below are my remarks concerning different operator–norm estimation
strategies when L-SR1 is used, which becomes problematic because $B_k$ may
be indefinite with a very small norm. Tests were performed on Julia
1.11.7 and Julia 1.12.2.

  1. Behavior with opnorm_maxiter in {5,100}
    The solver fails on both versions. (with $$\xi < 0$$).

  2. Behavior with opnorm_maxiter = 10

It works on both versions, showing that the instability is not simply due to using too few or too many power-method iterations.

  1. Behavior with opnorm_maxiter = -1 (upper bound)

fails on both versions. (with $$\xi < 0$$).

  1. Behavior with ARPACK

Using ARPACK to compute the dominant eigenvalue leads to instability:

  • repeated runs do not produce identical results,
  • sometimes the method converges,
  • sometimes it fails with xi < 0,
  • both Julia versions exhibit this behavior.

ARPACK introduces non-determinism and rounding differences that R2N does
not tolerate well.

Even for a small value such as $\theta = 0.1$, all the failures above persist. This suggests that choosing an appropriate $\theta$ is delicate: the solver succeeds with $\theta$ close to~1 when opnorm_maxiter = 10, but fails with $\theta = 0.1$

L-BFGS showed no errors with any operator-norm option.
So the instabilities above might be caused by the potentially indefinite LSR1 matrix rather than by the operator-norm estimation.

@MaxenceGollier
Copy link
Collaborator Author

MaxenceGollier commented Jan 8, 2026

Wouldn't it be more numerically stable to have $$\sigma‖s_{cp}‖$$ as stationarity measure ? It is also a generalization of the norm of the gradient for smooth optimization and does not involve potential catastrophic cancellation which might be the cause for the $$\xi &lt; 0$$.
Moreover, the inequality $$\sigma‖s_{cp}‖ \leq \sigma^{-1/2}\xi^{1/2}$$ keeps the convergence proof valid with that stationarity measure.
Just a suggestion... @dpo @d1Lab.

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

MohamedLaghdafHABIBOULLAH commented Jan 9, 2026

Hey @MaxenceGollier — I apologize, most of the errors mentioned earlier
(specifically when opnorm = -1) are actually due to R2 and not R2N. The
core issue comes from the fact that:

    ERROR: LoadError: R2: prox-gradient step should produce a decrease but ξ = -7.653729494313895e288

In this situation, the proximal operator is not computed accurately for
the RootNormLhalf (which contains cos and arcos...), which leads to an incorrect (and extremely large
negative) value of ξ in contrast to the L0 or L1.

I ran additional experiments using:

    sub_kwargs = (max_iter = 0,)
    atol = rtol = 1e-4

to force the selection of $$s_{k,cp}$$ exactly.
Under these conditions, if the norm of Bk is not well approximated, we may still encounter an error.

Specifically:

  • When opnorm = -1, we do not observe errors of the form
    ERROR: LoadError: R2N: failed to compute a step: ξ
    which confirms that R2N is robust in this case.

  • These errors do appear when opnorm ∈ {10, 50, 100}, which suggests
    that inaccurate spectral‑norm estimates can destabilize the step
    computation.

So overall, this is good news:

  • R2N is not the source of the problematic behavior if opnorm_maxiter = -1
  • The main issue is linked to R2 combined with inaccurate norm
    estimation for Bk and the proximal operator of RootNormLhalf we should address this issue

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

MohamedLaghdafHABIBOULLAH commented Jan 9, 2026

Wouldn't it be more numerically stable to have σ ‖ s c p ‖ as stationarity measure ? It is also a generalization of the norm of the gradient for smooth optimization and does not involve potential catastrophic cancellation which might be the cause for the ξ < 0 . Moreover, the inequality σ ‖ s c p ‖ ≤ σ − 1 / 2 ξ 1 / 2 keeps the convergence proof valid with that stationarity measure. Just a suggestion... @dpo @d1Lab.

@MaxenceGollier

I am not entirely sure whether this will make the problem disappear, because it is not
$$\xi$$ that is used as the stationarity measure, but rather $$\xi_1$$, which is related to
the step $$s_{\mathrm{cp}}$$ as you mentioned above.

The check on the condition $$\xi &lt; 0$$ is mainly for debugging purposes and to make the
implementation consistent with the theoretical framework (sufficient decrease).

In fact, as far as I can see in the benchmarks $$\xi_1$$ is always positive as long as you compute your prox exactly.

@MaxenceGollier
Copy link
Collaborator Author

Ok. Still, can we compare in term of time or number of iterations opnorm_max_iter ?
Actually, it would be most interesting to see the upper_bound approach vs the approach with arpack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants