Compar:IA est un outil permettant de comparer à l’aveugle différents modèles d'IA conversationnelle pour sensibiliser aux enjeux de l'IA générative (biais, impact environmental) et constituer des jeux de données de préférence en français.
Compar:IA is a tool for blindly comparing different conversational AI models to raise awareness about the challenges of generative AI (bias, environmental impact) and to build up French-language preference datasets.
🌐 comparia.beta.gouv.fr · 📚 À propos · 🚀 Description de la startup d'Etat
Le comparateur est basé sur Gradio et FastChat, le code de l'arène Chatbot Arena par LMSYS (voir Project architecture and rationale (en) plus bas).
The comparator is based on Gradio and FastChat, the Chatbot Arena code by LMSYS (see Project architecture and rationale below).
- Rename
register-api-endpoint-file.json.dist
toregister-api-endpoint-file.json
and add valid API keys
cd docker/; docker compose up -d
Due to how Gradio's Custom Components work and because they haven't been published as Python packages, building them manually is a bit tedious. At the moment we use 4 custom components:
pip install -r requirements.txt
cd custom_components/frinput
gradio cc install;gradio cc build --no-generate-docs
cd ../../custom_components/customradiocard
gradio cc install;gradio cc build --no-generate-docs
cd ../../custom_components/customdropdown
gradio cc install;gradio cc build --no-generate-docs
cd ../../custom_components/customchatbot
gradio cc install; npm install @gouvfr/dsfr;gradio cc build --no-generate-docs
cd ../..
then export LANGUIA_DEBUG=True; uvicorn main:app --reload --timeout-graceful-shutdown 1
or simpy uvicorn main:app
We initially forked LMSYS' FastChat codebase, used at https://lmarena.ai to get an immediately running arena. Its architecture was composed of:
- the arena (a Gradio project with 2-3 Python files)
- a controller to register model workers
But as it was easier to run models in vLLM Docker containers or by using external APIs, the controller / model workers architecture ended up being unused code. Furthermore, we needed a dashboard for the controller so it got recoded.
Our main focus with compar:IA is to invest heavily on overall design and UX/UI. Thanks to Gradio's Custom Components we can customize any Gradio component as a Svelte app, and control the user interface look and feel.
We currently use 4 distinct (and sometimes poorly named) Custom Components:
FrInput
: the DSFR input componentCustomDropdown
: encompasses most of the first screen, with mode selection, models selection, and initial textareaCustomRadioCard
: used in the first screen for suggestions and later for votingCustomChatbot
: a component crafted for the specific compar:IA experience, allowing you to compare two chatbots' response to one user message, and receive user's feedback
Because we needed a static website as well, we used Gradio's mount_gradio_app
feature, allowing you to customize how FastAPI serves the gradio app (Gradio is based on FastAPI), while using the underlying FastAPI app to serve other pages. This lives in main.py
while most of the Gradio code is split between languia/block_arena.py
and languia/listeners.py
.
The static site's pages are in the templates/
folder, which also hosts the complex Jinja2 template files needed in the arena (especially after the "reveal" step).
After 8 months of intensive development, the Gradio framework may show some limits, especially when it comes to fully custom CSS. Ugly CSS overrides are used heavily throughout this repo (especially in the infamous assets/custom-arena.css
), while the integration of the French design system (DSFR) is made difficult by how Gradio adds a lot of Svelte-generated CSS everywhere.
Furthermore, since the app is now more stable, we don't need to iterate quickly anymore, which is what Gradio allowed, and we could gain some snappiness by using a Svelte SPA (Single Page App) and a lighter frontend-backend communication.
I feel there is a gradual path consisting in decapsulating the Custom Components one by one into a basic Svelte app, and replace Gradio with a basic FastAPI endpoint, screen-by-screen and iterating. If you have opinions on this, I warmy welcome you to open an issue on the matter 🙃