Skip to content

ChatR: R Expert Chatbot

Gabe Becker edited this page Mar 28, 2025 · 6 revisions

Background

R users and developers commonly use LLMs for coding, as LLMs are proficient in generating R code and answering general questions about R. However, general foundational LLMs are less accurate when it comes to specifics, such as the precise API of a third party package or best practices for contributing to R itself. A chatbot that has been customized to yield accurate information about the R ecosystem and the contribution process would be useful to every R programmer and especially those looking for an on-ramp to becoming a contributor to the core. In order to be maximally inclusive, it is also important for the bot to run on local hardware, even in the absence of a GPU, while also supporting commercial models.

Related work

Proprietary platforms like chat.openai.com offer a number of R-oriented chatbots; however, none of these are freely available, which strictly limits their reach and excludes many R users and potential contributors.

The closest prior work to our awareness is the Shiny Assistant, which is a freely available bot that has been customized to answer questions about Shiny and even generate entire Shiny applications.

There are also many R/LLM interfaces, which will likely be useful for implementing our customizations and for providing a demonstrative chat interface directly in the R and/or RStudio session.

Details of your coding project

The contributor will experiment with prompt engineering, tool calling and RAG-based approaches to customizing a chatbot to R programming and contribution tasks. The output will be an R package that encapsulates the customizations and provides an interface to the bot. The package will rely on existing packages for capabilities like communicating with the model, embedding a chat widget in a simple Shiny app and indexing R-related documentation.

Expected impact

Every R user would benefit from a more accurate R-oriented chatbot. The bot will also help new contributors learn how to work with the R codebase and collaborate with R core, and new contributors are critical for the longevity of the project.

Mentors

  • EVALUATING mentor: Michael Lawrence [email protected]: Member of R core, experienced with LLM customization and former GSOC mentor.
  • Assisting mentor: Gabriel Becker [email protected]: Expert R programmer, committed advocate for new contributors to R and former GSOC mentor.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

  • Easy: Install the Ollama software, pull the Q4_K_M quantization of llama3.2:3b-instruct and ask it a question that could be answered by reading Writing R Extensions.
  • Medium: Use the ellmer package to perform the easy task above but programmatically.
  • Hard: Create an R package that depends on ellmer and provides a function that takes the name of a package returns a character vector of functions exported by that package by only using ellmer and llama3.2:3b from Ollama. The list of functions does not need to be correct (making it correct is the whole point of this project).

Solutions of tests

Contributors, please post a link to your test results here.