Skip to content

Productionizing `bddashbaord`

Rahul Saxena edited this page Apr 8, 2021 · 2 revisions

Background

Aggregating publicly available biodiversity data from heterogeneous sources (e.g., scientific research, citizen science, and natural history collections), has the potential to answer a staggering variety of research questions. Yet, biodiversity data are prone to various data quality issues and biases, which may invalidate its usage in research. Furthermore, complex technical and analytical skills are required for handling biodiversity big-data. The bdverse is a family of R packages that form a general framework for facilitating biodiversity data science. It comprises various packages in a hierarchical structure - providing different R functionalities and a GUI (modular Shiny apps), that can easily be adopted by users, with or without programming capabilities. Hopefully, the bdverse will serve as a sustainable and agile infrastructure that enhances the value of biodiversity data by allowing users to conveniently employ R, for user-level data standardization, exploration, quality assessment, and cleaning.

Data quality issues may encompass missing, doubtful, or wrong information in one of the record's many attributes (e.g. taxonomic, spatial or temporal), a formatting inconsistency, or a potential duplication due to various aggregation mechanisms (most are untraceable). Furthermore, another type of quality control is vital - the identification and removal of data that is not necessarily erroneous, but rather unsuitable for a particular application or purpose (i.e. data fitness for use). These case-specific procedures derive from a user’s own research questions, its intended analysis and algorithms, the data being used, and the properties of the chosen species/ taxonomic group. Hence, without even acknowledging the challenges of developing a robust research analysis, building and performing a comprehensive data quality assessment is overwhelmingly demanding. Therefore, supplying users with a flexible, reproducible and exceptionally user-friendly toolset is the only practical course of action.

Diagnostic visualization can unveil hidden patterns and anomalies in the data, and allow quick and efficient exploration of massive datasets. The development of an interactive and flexible dashboard, that can be easily deployed locally or remotely, is a highly valuable biodiversity informatics tool.

Related work

To the best of our knowledge, no user-level biodiversity data dashboard exists. The closest project is the Rshiny LifeWatch Data Explorer, developed by the Flanders Marine Institute (VLIZ) in 2015-2016. This interactive online tool gives access to only sensor data collected in the framework of the Flemish LifeWatch project.

During the GSoC 2020, Rahul Chauhan has developed a set of modules using which anyone can create a custom visualization dashboard in a short time. Some of the main features of those modules are that they are reactive and can be interactive. There are many other features such as: Field Selector: Field selector allows the user to change the X, Y-axis on plots, columns to be visible on the table and some settings of leaflet allow. All that in real-time without reloading the page or disturbing the reactivity.

Plot Navigation: Navigation allows the user to decide how many values he wants to see on a single plot. For example there, maybe 100 different bars on a bar plot. User can use the slider to change how many bars he wants to see. If he selects 10 on the slider, then the plot is divided into 10 pages, each showing 10 bars. There are navigation buttons that will automatically pop-ups when needed.

You can understand more about those modules by looking at the dashboard.experiment which is a sample dashboard created by using those modules. Last year’s project serves as a stepping stone for developing a production-ready, state-of-the-art biodiversity data dashboard.

Details of your coding project

During the last GSoC, we have already explored various different visualization packages and compared them. We have already developed various modules for field selection, interactivity, reactivity, and etc. with drilling down capability.

The main goal of this project is to prepare it for a CRAN released by developing sufficient testing, CI/CD integration, and submit it for a software peer review.

Testing + CI/CD:

One of the most important tasks of this project is to develop a framework for comprehensive tests for a dashboard, as this is crucial for the production version. In order to identify bugs and failures, dashboard.demo will be tested with different datasets which vary in size, taxon, and data publisher. Bugs will be fixed and workarounds will be developed for failing features (omitted if unresolved), coupled with the development of appropriate unit testing. Integration tests for reactivity and shinytest for the UI will also be developed. Once a sufficient suite of tests will be implemented, different CI/CD strategies will be evaluated, in seeking a good balance between test sensitivity and test maintenance.

Towards a CRAN release:

The bddashboard package will be submitted to rOpenSci for a software peer review, and a short manuscript for JOSS will be written. After its hopefully smooth acceptance, we will submit it to CRAN. The packages will be added to the bdverse family of packages and a new bdverse version will be pushed to CRAN.

Skills Required

R, Shiny (advanced level), data visualizations, HTML, Javascript, testing (testthat; shinytest), CI/CD.

Expected impact

Developing novel interactive visualizations coupled with a modular dashboard system for biodiversity data, that can easily be employed by R experts and novices alike; will undoubtedly promote biodiversity research. Feasibly, this project has the potential to spark a practical scheme for encapsulating key interactivity and reactivity functionality with testing units within a bdvis object. Engineering such an object will significantly speed up our ability to develop a diverse collection of dashboards, without compromising for robustness and integrity.

Mentors

  • Rahul Chauhan [email protected] Rahul joined bdverse as a Google Summer of Code student developer in 2019. During GSOC he develops an interactive shiny package that allows the user to visualize different aspects of bioDiversity data such as temporal, taxonomic, and spatial without worrying about coding.
  • Thiloshon Nagarajah [email protected] is the Shiny lead of the bdverse development team. He was a past GSoC and GCI student for Fedora Project, Sahana Foundation, and R Language. Thiloshon joined bdverse as a Google Summer of Code student developer in 2017 and has been a student, contributor, mentor, and now, a core member of the bdverse team. All things Shiny of bdverse is the magic of Thiloshon.
  • Vijay Barve [email protected] is the author and maintainer of bdvis and a key member of the bdverse development team. Vijay is a biodiversity data scientist and has been a GSoC student and mentor since 2012 with the R project organization. Vijay has contributed to several packages on CRAN.
  • Tomer Gueta [email protected] is leading the bdverse project, which was born out of his Ph.D. work. The bdverse development team/family has founded thanks to GSoC. Today, Tomer is dividing his time between establishing a citizen science national center and developing an IT infrastructure for Hamaarag - Israel's National Nature Assessment Program. Both projects are for the Steinhardt Museum of Natural History, Tel Aviv University.

Tests

Students, please do one or more of the following tests before contacting the mentors above.

  • Easy: Download 10,000 GBIF’s occurrence records of Mammals in the U.S (georeferenced records only), using the ‘rgbif’ R package.
  • Easy: Create a simple shiny dashboard. Add some plot with ‘bdvis’ that most effectively summarizes the Mammals data you downloaded.
  • Medium: Convert your dashboard into a shiny app using Golem.
  • Hard: Add few test cases, testing each module of your app

Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally