CLEF 2025 JOKER Track: No Pun Left Behind

This repository hosts an experimental codebase for the CLEF 2025 JOKER Track on computational wordplay. It focuses primarily on Task 2 (Pun Translation EN→FR) while remaining extensible to Task 1 (Humour-aware IR) and Task 3 (Onomastic Wordplay Translation). It provides:

Supervised fine-tuning (SFT) pipeline with LoRA
Alternative alignment / preference optimization (ARPO-style CPO/SimPO) training
Structured JSON config system for reproducibility
Batched inference + submission packaging
Optional Unsloth acceleration & 4/8-bit loading
Integrated on-the-fly COMET (translation quality) evaluation callback (optional)

🧩 Tasks Overview

Task	Description	Example Challenge
Task 1	Humour-aware Information Retrieval	Retrieve jokes relevant to a semantic query ("physics", "dating", etc.) preserving humorous intent.
Task 2	Pun Translation (EN→FR)	Preserve dual meanings + humor: `I used to be a banker but I lost interest` → `J'ai été banquier mais j'en ai perdu tout l'intérêt`.
Task 3	Onomastic Wordplay Translation	Maintain name-based wordplay (proper nouns, famous figures) while retaining pun plausibility.

✨ Key Features

Unified training interfaces: src/sft.py (supervised) and src/arpo.py (preference / constrained policy optimization style)
LoRA integration (PEFT) + optional Unsloth fast adapters
Configurable generation defaults saved alongside model artifacts
Completion-only loss mode with response template masking
Mid-training NMT quality probing via COMET (optional callback)
Reproducible, declarative experiment configs (JSON)
Submission inference helper that auto-zips predictions for Task 2

🗂 Repository Layout (Essentials)

src/
  sft.py                     # Supervised fine-tuning entry
  arpo.py                    # Alignment / preference optimization training
  run_submission_inference.py# Batch generation + packaging for submissions
  utils/                     # Collators, callbacks, metrics, seeding, IO
  scripts/                   # Prompt templates, helpers
configs/                     # Experiment JSON configs (SFT + ARPO)
data/                        # Place your local datasets (not tracked)
runs/                        # Shell scripts & run outputs
Experiments.ipynb            # Exploratory notebook

🔧 Installation

Create an environment (example with uv or conda). Dependencies are standard: transformers, trl, datasets, accelerate, peft, wandb, unsloth (optional), tqdm, comet-ml / unbabel-comet (if using COMET callback).

python -m venv .venv
source .venv/bin/activate
pip install -U pip
# (Optional) create requirements.txt later; for now install minimal stack:
pip install transformers accelerate trl peft datasets wandb tqdm unsloth
# Optional metrics (only if using NMT callback)
pip install unbabel-comet

Login to Hugging Face & Weights & Biases if pushing to hub / logging:

huggingface-cli login
wandb login

🗃 Data Preparation

Expected raw JSON lists for training / evaluation:

SFT format (chat-style) example item:

{
  "messages": [
    {"role": "user", "content": "Translate this English pun into French: I used to be a banker but I lost interest"},
    {"role": "assistant", "content": "J'ai été banquier mais j'en ai perdu tout l'intérêt"}
  ]
}

(If using NMT callback) script will derive instruction + target fields during evaluation formatting when absent.

Place curated files under data/ and reference them in a config (see below).

⚙️ Configuration Schema (Summary)

Each JSON in configs/ fully describes a run. Core fields:

Key	Purpose
`train_file` / `eval_file`	Paths to JSON lists of examples
`model_name`	Base HF model (chat / instruct style)
`lora`	PEFT LoRA block (omit or set `null` to disable)
`generation_config`	Saved inference defaults (temperature, beams, etc.)
`max_tokens_count` / `max_length`	Sequence length control
`completion_only`	If true, masks loss to assistant response only
`response_template`	String token prefix marking assistant region
`use_nmt_callback`	Enable COMET evaluation mid-training
`trainer`	Training hyperparameters passed to TRL / custom trainer
`seed`	Reproducibility
`output_dir`	Where checkpoints & tokenizer get written

Minimal SFT config skeleton:

{
  "train_file": "data/task2/train.json",
  "eval_file": "data/task2/dev.json",
  "model_name": "croissantllm/CroissantLLMChat-v0.1",
  "max_tokens_count": 512,
  "completion_only": true,
  "response_template": "<|im_start|>assistant",
  "lora": {"r": 32, "lora_alpha": 32, "lora_dropout": 0.05, "bias": "none", "target_modules": ["q_proj","v_proj"]},
  "trainer": {"num_train_epochs": 1, "per_device_train_batch_size": 8, "gradient_accumulation_steps": 4, "learning_rate": 5e-5, "eval_strategy": "steps", "eval_steps": 50, "save_steps": 200, "report_to": "wandb", "push_to_hub": true, "hub_model_id": "user/project-sft-v1"},
  "seed": 3407,
  "output_dir": "models/project-sft-v1"
}

🏋️ Training (Supervised Fine-Tuning)

python src/sft.py train \
  --config_file configs/skommarkhos_croissantllmchat_v0.1_1b_sft_v1.json \
  --output_dir models/sft_run_1

Notes:

Set use_unsloth=True for faster adapter training (8-bit/4-bit)
completion_only rewrites internal collator to focus on assistant spans
Generation config is saved for downstream evaluation

🤝 Alignment / Preference Optimization (ARPO / CPO / SimPO)

ARPO training mimics constrained policy optimization with a custom trainer (CPOTrainer). Similar invocation:

python src/arpo.py train \
  --config_file configs/skommarkhos_croissantllmchat_v0.1_1b_arpo_v1.json \
  --output_dir models/arpo_run_1

Key deltas vs SFT:

loss_type (e.g. simpo) inside trainer
Separate prompt/completion length caps (max_prompt_length, max_completion_length)
Lower learning rate typical (5e-7 in example)

🔄 Mid-Training Evaluation (Optional NMT Callback)

Enable by setting "use_nmt_callback": true. The callback:

Derives instruction/target pairs if absent
Generates translations using saved generation_config
Scores with COMET22 (if unbabel-comet installed)
Logs metrics (W&B if enabled)

Runs ~2 times per training by dynamically spacing evaluation steps.

🚀 Inference & Submission Packaging (Task 2)

python src/run_submission_inference.py main \
  --model_path models/sft_run_1 \
  --test_data data/task2/joker_pun_translation_2025_test.json \
  --output_dir submissions/task2 \
  --batch_size 32

Outputs:

JSON with fields: run_id, manual, id_en, en, fr
Auto-generated ZIP containing prediction.json (ready for upload)

Temperature / sampling settings currently defined inline (tune as desired inside run_submission_inference.py).

🧪 Reproducibility Checklist

Fixed seed in config (seed)
Explicit tokenizer + special tokens saved to output_dir
Generation parameters versioned
LoRA adapter weights merged only if you export them explicitly (default: PEFT format)

🛠 Extending

Goal	Where to Modify
New metric	`src/utils/metrics.py`
Alternate reward / loss	`utils/cpo_trainer.py` (custom trainer)
Prompt template logic	`src/scripts/prompt.py`
Custom collator	`utils/collators.py`

📌 Roadmap (Planned)

Add retrieval baseline for Task 1 (BM25 + reranker)
Add name-entity augmentation patterns for Task 3
Publish structured requirements file & lightweight Dockerfile
Add evaluation harness for BLEU / chrF / pun-preservation score
Merge LoRA weights export utility script

📄 License

This project is licensed under the Apache License 2.0.

Copyright 2025 Igor Kuzmin

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

✍️ Citation

If you use this codebase or derivatives in academic work, please cite:

@inproceedings{kuzmin2025joker,
  author    = {Igor Kuzmin},
  title     = {{CLEF} 2025 {JOKER} Track: No Pun Left Behind},
  booktitle = {CLEF 2025 Labs and Workshops, Notebook Papers},
  series    = {CEUR Workshop Proceedings},
  volume    = {4038},
  publisher = {CEUR-WS.org},
  year      = {2025},
  url       = {https://ceur-ws.org/Vol-4038/paper_225.pdf},
  issn      = {1613-0073},
  note      = {Paper 225}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
runs		runs
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Experiments.ipynb		Experiments.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CLEF 2025 JOKER Track: No Pun Left Behind

🧩 Tasks Overview

✨ Key Features

🗂 Repository Layout (Essentials)

🔧 Installation

🗃 Data Preparation

⚙️ Configuration Schema (Summary)

🏋️ Training (Supervised Fine-Tuning)

🤝 Alignment / Preference Optimization (ARPO / CPO / SimPO)

🔄 Mid-Training Evaluation (Optional NMT Callback)

🚀 Inference & Submission Packaging (Task 2)

🧪 Reproducibility Checklist

🛠 Extending

📌 Roadmap (Planned)

📄 License

✍️ Citation

About

Uh oh!

Releases

Packages

Languages

igorktech/joker-project

Folders and files

Latest commit

History

Repository files navigation

CLEF 2025 JOKER Track: No Pun Left Behind

🧩 Tasks Overview

✨ Key Features

🗂 Repository Layout (Essentials)

🔧 Installation

🗃 Data Preparation

⚙️ Configuration Schema (Summary)

🏋️ Training (Supervised Fine-Tuning)

🤝 Alignment / Preference Optimization (ARPO / CPO / SimPO)

🔄 Mid-Training Evaluation (Optional NMT Callback)

🚀 Inference & Submission Packaging (Task 2)

🧪 Reproducibility Checklist

🛠 Extending

📌 Roadmap (Planned)

📄 License

✍️ Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages