-
Notifications
You must be signed in to change notification settings - Fork 6
hyperSpec
Package hyperSpec provides "infrastructure" for working with spectroscopic data in R, e.g.
- import functions for various proprietary file formats
- plotting functions
- functions that allow seamless or almost seamless use of hyperSpec objects with models as
pls::plsr()
,MASS::lda()
etc. - arithmetic functions that allow typical preprocessing done with spectra such as intensity normalization.
Over the years, some parts of hyperSpec have seen steady growth, in particular the file import functions. Unfortunately, this has lead to hyperSpec having many dependencies as well as a large base of test data (i.e. spectra files in a wide variety of proprietary [often binary] file formats), making hyperSpec not as easy to maintain as it could and should be.
Last year's GSoC started to move parts of hyperSpec into separate packages, which are found at r-hyperSpec/ and set up continuous integration and testing as well as pkgdown documentation.
This year, we'd like to continue this process.
We'd like to better integrate some of the following packages with hyperSpec:
-
Packages providing preprocessing for spectra: baseline and EMSC
Claudia has contact to their creator/maintainer Kristian. -
ggplot2 and tidyverse: hyperSpec has rudimentary
qplot()
functionality, and we recently started a hyperSpec.tidyverse package to fortify hyperSpec for use with dplyr and magrittr functions. -
File import: readJDX(maintained by Bryan)
There are a few packages that one may use instead of hyperSpec, but they are less extensive and instead specialized on particular applications or particular types of spectroscopy. Bryan maintains a long list of FOSS packages for spectroscopy.
There are several possibilities from which the student can choose:
- Move further file import functions out of hyperSpec into new packages and possibly implement import filters for new file formats.
- Improve the hyperSpec's "fortification" so that it integrates well with tidyverse (dplyr, magrittr).
- Provide spectroscopy-related functionality, i.e. integration with baseline and EMSC packages
- Provide integration with matrixStats
As this is quite modular, different parts can also be combined and/or the projects can take into account the more limited scope of this year's GSoC.
This project should produce several small packages which provide two enhancements to the spectroscopy community:
- Small packages are easier to install, use and maintain than one big hyperSpec with lots of dependencies: they "shield" hyperSpec from dependency changes.
- Enhanced functionality for packages that "bridge" hyperSpec with other packages such as baseline or EMSC.
Students, please contact us in the hyperSpec GSoC 2020 issue after completing at least one of the tests below.
- Claudia Beleites ([email protected]) - creator of hyperSpec, chemist/spectroscopist, mentored with R/GSoC several times;
- Vilmantas Gegzna ([email protected]) - future maintainer of hyperSpec, biophysicist, data analyst and spectroscopist as well as a lecturer in biostatistics and R at Life Sciences Center, Vilnius University;
- Bryan Hanson, ([email protected]) EVALUATING, creator of readJDX and Chemospec, mentored with R/GSoC several times (incl. together with Claudia)
Please contact us if you are stuck with your task. These tests are unlike an exam in that there is no penalty to communicating with us mentors: on the contrary, good communication is one of the key aspects of a good Google Summer of Code.
- Install R, RStudio and R pagkages packages hyperSpec, covr, styler and lintr from CRAN and hySpc.testthat from GitHub.
-
In hyperSpec's GitHub repository, there are 3 issues that ask to delete 3 files with unused functions. Fork hyperSpec from GitHub, solve these issues, create 3 separate informative Git commit messages (follow these rules) for each deletion and create a pull request. Pull request (PR) must contain "magical words" (such as
fix
) that close GitHub issues when PR is accepted: make sure that the pull request is correctly linked to all 3 issues. -
Fork hyperSpec from GitHub, use tools from package lintr to detect R code style errors (use "tidyverse" style) in at least one "R Script" (
.R
) file, fix the errors (standardize code formatting) and submit a pull request with the improved code.
-
hyperSpec and the packages at
r-hyperspec/
organization have some issues marked as good first issue. Note in the issue thread that you'll tackle this, fork the repo and write code, documentation, unit test, and a brief explanation of how to use this in the vignette (if that package has one) and submit a pull request. -
Find some function that does not yet have unit tests. Fork the package from GitHub and write a unit test for one of these functions.
The packages have their unit tests in the.R
files after the respective function definition, using a custom functiontest<-()
to attach them to the function in question.Unit tests for file import functions count as very hard, see below.
This requires close communication with us, mentors.
- Choose a file import function from the development version of hyperSpec package that should be moved to a separate package.
- Set up a GitHub repo that contains a new R package. There is a so-called package skeleton repository that could be used as a template. Copy file import code of the function you chose as well as its roxygen2 documentation and unit tests.
- Write one (additional) unit test.
- Fix style the R code (use the tidyverse style rules).
- Update the documentation.
- Build, check and test the package to make sure it installs and works properly.
- Properly deprecate the function in hyperSpec package.
Students, please post a link to your test results here.
Sang T. Truong, https://github.com/sangttruong, https://github.com/sangttruong/hyperSpec