Skip to content

Improvements to nls

Arkajyoti Bhattacharjee edited this page Apr 9, 2021 · 3 revisions

Background

nls() is the primary nonlinear modelling tool in base R. It has a great many features, but it is about two decades old and has a number of weaknesses, as well as some gaps in documentation. Recently, the proposing mentor for this project has submitted a patch that overcomes one of the deficiencies in nls() and, furthermore, does this in a way that allows legacy operation to continue as the default.

This project aims at providing documentation and possible patches to incorporate other improvements, including better diagnostics to assist users to understand when output results may be inadequate.

Related work

Packages nlsr and minpack.lm both address the lack of Levenberg-Marquardt stabilization in nls(), which uses a plain Gauss-Newton solver to carry out the internal iterations to solve the underlying nonlinear least squares problem. nlsr also offers analytic/symbolic derivatives which can improve the solution or allow it to be found. However, users frequently do not discover these packages, or do not understand some of the details. Merging some of the advantages of these packages into nls() would likely give users better quality output.

Package optimx offers access to a number of nonlinear optimization packages. These can be used to minimize (weighted) sum of squares objective functions, but generally are not as efficient at finding the solutions.

Details of your coding project

Two particular tasks are to merge the analytic derivatives of nlsr into the model parsing (to compute the Jacobian used in the Gauss-Newton equations for nonlinear least squares) and the addition of a Levenberg- Marquardt stabilization of the solution of those equations.

The first stage work would be to find ways to incorporate such ideas. A second stage is to work out how to allow the changes to be activated only by easily-executed user actions, so that legacy behaviour is retained, as nls() has a large number of reverse dependencies.

Clearly, any code patches require parallel documentations, and there should be a development vignette to allow for ongoing maintenance. (At the moment, nls() is not very well documented from this perspective.)

Tests can and probably should be simple extensions of existing tests for nls() and/or nlsr and minpack.lm.

Other potential improvements

nls() has the option of using

algorithm="plinear"

but the proposing mentor has at least one example where the this choice gives a different model for the same formula as the default model. Clearly this could be problematic for users and should be corrected.

The goal of plinear -- partially linear models -- should be addressed.

Similarly, nls() can handle indexed parameters, that is, parameters that can be referenced by an integer so that a suite of related estimates can be stored in a table or array. This should be better documented, especially from the point of program maintenance and improvement, with the goal to extend the functionality to nlsr or minpack.lm.

Expected impact

If successful, the changes will modernize an important tool in base R. Furthermore, if well-organised programmer documentation can be provided, future maintainers will have an easier job.

Mentors

MENTORS:

  • EVALUATING MENTOR: John C. Nash, [email protected]. I have been a mentor and also an Org Admin for R's Google Summer of Code for over a decade. One of the creators of packages nlsr and optimx among others, and author of several books on nonlinear optimization and numerical computing.
  • Other Mentors:
    • Hans W. Borchers, [email protected]. I have been a mentor and co-mentor for several R-GSoC projects during the last years.
    • Heather Turner, [email protected]. I am the lead developer of several statistical modelling packages, notably the gnm package for generalized nonlinear models.

Tests

Some data for the tests.

time          y
    5  0.0074203
    6  0.3188325
    7  0.2815891
    8 -0.3171173
    9 -0.0305409
   10  0.2266773
   11 -0.0216102
   12  0.2319695
   13 -0.1082007
   14  0.2246899
   15  0.6144181
   16  1.1655192
   17  1.8038330
   18  2.7644418
   19  4.1104270
   20  5.0470456
   21  6.1896092
   22  6.4128618
   23  7.2974793
   24  7.8965245
   25  8.4364991
   26  8.8252770
   27  8.9836204
   28  9.6607736
   29  9.1746182
   30  9.5348823
   31 10.0421165
   32  9.8477874
   33  9.2886090
   34  9.3169916
   35  9.6270209

Easy:

Estimate, or try to estimate, a logistic sigmoid growth curve to this data.

Medium:

Estimate, or try to estimate, the alternative form of the 3 parameter logistic growth curve (The following Latex form may not show up on Github.)

y = a / ( 1 + b e x p ( c t i m e ) )

Can you explain why this is more difficult to estimate?

Hard:

Convert the problem to one that uses a function for the residuals (and ideally the Jacobian) and solve the nonlinear least squares problem with a suitable tool from packages nlsr and minpack.lm.

Show how to do this with both analytic Jacobian and one or more approximations.

The Evaluating Mentor has prepared solutions to each of the tests to verify that they are doable.

Solutions of tests

Students, please post a link to your test results here.

S No. STUDENT NAME GITHUB PROFILE TEST RESULTS LINK
1 Aarnob Guha KW781 https://github.com/KW781/nls-improvements-Tests
2 Arkajyoti Bhattacharjee ArkaB-DS https://github.com/ArkaB-DS/Improvements-to-nls--Solutions-to-Tests
Clone this wiki locally