Skip to content

Adding Adaptive Bayesian SLOPE (ABSLOPE) to the SLOPE Package

kushagragpt99 edited this page Apr 2, 2021 · 7 revisions

Background

SLOPE [1] is sparse generalized linear regression model and is a generalization of the well-known lasso [2]. SLOPE adds penalization to the regression problem in the form of the sorted L1-norm, which sorts coefficients in decreasing order according to their absolute values and then applies a non-increasing sequence of penalization weights to each coefficient. SLOPE is therefore both more flexible than the lasso, leading to better predictive performance, and also induces clustering among predictors (features), which enables better handling of scenarios where predictors are heavily correlated. Disciplined choices of the sequence of penalization weights in the problem also enables control of the false discovery rate (FDR) under certain assumptions. Recently, there ABSLOPE [3] is an extension of SLOPE that combines the original SLOPE formulation with the spike and slab lasso. ABSLOPE both provides better control of the FDR and can also handle missing data in a natural way.

SLOPE is available on CRAN in the SLOPE package (https://CRAN.R-project.org/package=SLOPE). For ABSLOPE, however, there exists only a rough implementation at the github repository https://github.com/wjiang94/ABSLOPE, which lacks many of the features of the SLOPE package. This implementation will, however, serve as a basis for the new functionality that is to be added to the SLOPE package.

Related work

Software

Literature

[1] M. Bogdan, E. van den Berg, Chiara Sabatti, Weijie Su, and Emmanuel J. Candès, “SLOPE – adaptive variable selection via convex optimization,” Ann Appl Stat, vol. 9, no. 3, pp. 1103–1140, 2015, doi: 10.1214/15-AOAS842.

[2] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996, Accessed: Mar. 12, 2018. [Online]. Available: http://www.jstor.org/stable/2346178.

[3] W. Jiang, M. Bogdan, J. Josse, B. Miasojedow, V. Rockova, and T. Group, “Adaptive Bayesian SLOPE – high-dimensional model selection with missing values,” arXiv:1909.06631 [stat], Nov. 2019, Accessed: May 06, 2020. [Online]. Available: http://arxiv.org/abs/1909.06631.

[4] J. Larsson, M. Bogdan, and J. Wallin, “The strong screening rule for SLOPE,” in Advances in Neural Information Processing Systems 33, Virtual, Dec. 2020, vol. 33, pp. 14592–14603, [Online]. Available: https://proceedings.neurips.cc/paper/2020/file/a7d8ae4569120b5bec12e7b6e9648b86-Paper.pdf.

Details of your coding project

The main goal of this project is to implement ABSLOPE as a part of the SLOPE package and, by the end of the project, submit an updated version of the SLOPE package to CRAN with this functionality added. The implementation will likely use https://github.com/wjiang94/ABSLOPE as a starting point but improve, extend, as well as integrate this work with the SLOPE package.

The student will write a new tests for ABSLOPE as well as document the new functionality with manual entries and examples. The student is also expected to write a vignette to go along with the new functionality.

Objectives

  • Implement ABSLOPE into the SLOPE package
  • Write tests for ABSLOPE
  • Write documentation for ABSLOPE
  • Write a vignette for the new functionality
  • Prepare and submit and updated version of the SLOPE package to CRAN

Bonus Objectives

If there is additional time left for the project, the student will continue on the work from last year's GSOC student by implementing additional solvers for the SLOPE package.

Expected impact

The R community will gain access to a new and promising method for dealing with high-dimensional data with applications to many important fields in statistics and machine learning.

What You Will Learn or Improve Upon while Working on This Project

  • R-package development, including (automated) testing, documentation, and debugging
  • Vesion control and collaboration via git and github
  • Implementation of algorithms using object-oriented programming in C++ (Rcpp)
  • Optimization of C++ code by profiling

Useful (But Not Required) Skills

  • R programming experience
  • C++ programming experience (including knowledge of templates, objects, polymorphism)
  • R-package development experience
  • Some background in maths or statistics
  • Experience with version-control through git and github

Mentors

Students, please contact mentors below after completing at least one of the tests below.

  • EVALUATING MENTOR: Johan Larsson ([email protected]) is a PhD student in statistics at the Department of statistics, Lund University and maintainer of the SLOPE package and other R packages such as eulerr and qualpalr. Johan was mentor for RGSOC in 2019 and 2020 and student in GSOC during 2018.
  • Jonas Wallin ([email protected]) is an assistant professor at the department of statistics, Lund University. PhD in mathematical statistics. Jonas was a mentor for RGSOC in 2020.

Tests

Students, please do one or more of the following tests before contacting the mentors above.

  • Easy: Install SLOPE and fit SLOPE and lasso (hint: see the lambda argument in SLOPE()) models using the SLOPE package to the abalone data set that comes with SLOPE using a Poisson model. Plot the results. How do the paths from SLOPE and lasso compare?
  • Easy: Fork the SLOPE repository and build and check the package with devtools::check().
  • Medium: Write a function using RcppArmadillo that computes the proximal operator for SLOPE using Algorithm 3 (FastProxSL1) from Bogdan et al 2015 (SLOPE: adaptive variable selection via convex optimization). Compare the result with SLOPE:::prox_sorted_L1() (observe that this function uses a different algorithm than the one you are supposed to implement)
  • Medium: Submit a pull request that fixes the issue at https://github.com/jolars/SLOPE/issues/10.
  • Hard: write an R package using RcppArmadillo (as a backend) that uses proximal gradient descent or ADMM to solve SLOPE-penalized ordinary least squares regression. Make use of the function to compute the proximal operator that you implemented in the previous test. Compare your results to the SLOPE package (and make sure they match). Put the package on github.

Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally