ChatR: R Expert Chatbot

Background

R users and developers commonly use LLMs for coding, as LLMs are proficient in generating R code and answering general questions about R. However, general foundational LLMs are less accurate when it comes to specifics, such as the precise API of a third party package or best practices for contributing to R itself. A chatbot that has been customized to yield accurate information about the R ecosystem and the contribution process would be useful to every R programmer and especially those looking for an on-ramp to becoming a contributor to the core. In order to be maximally inclusive, it is also important for the bot to run on local hardware, even in the absence of a GPU, while also supporting commercial models.

Related work

Proprietary platforms like chat.openai.com offer a number of R-oriented chatbots; however, none of these are freely available, which strictly limits their reach and excludes many R users and potential contributors.

The closest prior work to our awareness is the Shiny Assistant, which is a freely available bot that has been customized to answer questions about Shiny and even generate entire Shiny applications.

There are also many R/LLM interfaces, which will likely be useful for implementing our customizations and for providing a demonstrative chat interface directly in the R and/or RStudio session.

Details of your coding project

The contributor will experiment with prompt engineering, tool calling and RAG-based approaches to customizing a chatbot to R programming and contribution tasks. The output will be an R package that encapsulates the customizations and provides an interface to the bot. The package will rely on existing packages for capabilities like communicating with the model, embedding a chat widget in a simple Shiny app and indexing R-related documentation.

Expected impact

Every R user would benefit from a more accurate R-oriented chatbot. The bot will also help new contributors learn how to work with the R codebase and collaborate with R core, and new contributors are critical for the longevity of the project.

Mentors

EVALUATING mentor: Michael Lawrence [email protected]: Member of R core, experienced with LLM customization and former GSOC mentor.
Assisting mentor: Gabriel Becker [email protected]: Expert R programmer, committed advocate for new contributors to R and former GSOC mentor.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

Easy: Install the Ollama software, pull the Q4_K_M quantization of llama3.2:3b-instruct and ask it a question that could be answered by reading Writing R Extensions.
Medium: Use the ellmer package to perform the easy task above but programmatically.
Hard: Create an R package that depends on ellmer and provides a function that takes the name of a package returns a character vector of functions exported by that package by only using ellmer and llama3.2:3b from Ollama. The list of functions does not need to be correct (making it correct is the whole point of this project).

Solutions of tests

Contributors, please post a link to your test results here.

Dev Goel, Test Solutions
Afraaz Ali, GITHUB PROFILE, EASY TEST
Mayank Yadav, Github, Solutions to all tests
Jason Adika Tanuwijaya, Github, Solution Easy, Medium, Hard
Jegadit Sakthi Saravanan, Github, Solution Write-ups
David Baruch N. AKPOVI, Profile, Solutions to test
Muhammad Fatir, Test Solutions
Elabonga Atuo, Medium Test Solution
Mohammad Kazimuddin,https://github.com/kazimuddin

Please do not edit this footer. Instead click Edit in the top right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly