Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
example1.json	example1.json
example2.json	example2.json

Name

Last commit message

Last commit date

example1.json

example2.json

Subtask 2

Prompts
Dataset distribution
Metrics
Example

Prompts

Zero-shot ICL using LLMs

User prompt:


Dados los siguientes textos y preguntas, relacione cada pregunta con el texto más relevante y proporcione una respuesta concisa basada 
únicamente en la información de los textos. Devuelva el resultado estrictamente como un objeto JSON en el formato

 {"1": "A",
  "2": "B",
   ...},

donde las claves son los números de las preguntas y los valores son las respuestas correspondientes.
No incluyas ninguna explicación ni texto adicional.\n\n {textos}\n{preguntas}

Evaluation dataset

Proficiency level	A1	A2	B1	B2	C1	C2	Total
Number of exercises	7	4	8	5	4	2	30

Metrics

The evaluation metric used was accuracy (%).

Strategy	Model	A1	A2	B1	B2	C1	C2	Overall
Embedding similarity	paraphrase-multilingual-mpnet-base-v2	76,19%	65,79%	64,71%	36,17%	50,00%	56,25%	58,41%
Embedding similarity	Ada-002	90,48%	71,05%	68,63%	63,83%	56,25%	37,50%	68,14%
Embedding similarity	bge-m3	83,33%	55,26%	58,82%	57,45%	34,38%	50,00%	58,41%
Embedding similarity	text-embedding-3-small	88,10%	63,16%	60,78%	59,57%	53,12%	37,50%	63,27%
Ensemble, embedding similarity	ada-002 + bge-m3 + text-embbeding-3	90,48%	65,79%	68,63%	65,96%	59,38%	37,50%	68,14%
ICL (zero-shot)	gemini-2.0-flash-thinking-exp-01-21	100,00%	100,00%	100,00%	95,74%	95,31%	93,75%	98,11%
ICL (zero-shot)	GPT4-o	97,62%	93,18%	84,31%	95,74%	100,00%	87,50%	93,10%
ICL (zero-shot)	Llama 3.3 - 70B	100,00%	92,00%	68,63%	91,49%	96,88%	87,50%	88,66%
ICL (zero-shot)	DeepSeek R1	100,00%	100,00%	98,04%	95,74%	100,00%	93,75%	98,23%
ICL (zero-shot)	Claude Sonnet 3.7	100,00%	94,74%	86,27%	95,74%	100,00%	87,50%	94,25%
ICL (zero-shot)	Gemma 3 12B	97,62%	89,47%	90,20%	91,49%	90,62%	81,25%	91,15%
ICL (zero-shot)	Phi 4 14B	100,00%	89,47%	98,04%	85,11%	93,75%	81,25%	92,48%
ICL (zero-shot)	Qwen 2.5 14B Instruct 1M	94,29%	89,47%	90,20%	91,49%	90,62%	87,50%	90,87%
ICL (zero-shot)	Ministral 8B Instruct	71,43%	81,58%	88,24%	80,85%	75,00%	68,75%	79,20%
Ensemble, ICL (zero-shot)	Gemma3 + Phi4 + Qwen2.5	100,00%	89,47%	98,04%	85,11%	93,75%	81,25%	92,48%

Example

There are two different formats for these exercises:

Example 1, where there are more source texts than target texts.
Example 2, where there are more target texts than source texts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Subtask 2

Prompts

Zero-shot ICL using LLMs

Evaluation dataset

Metrics

Example

FilesExpand file tree

subtask2

Directory actions

More options

Directory actions

More options

Latest commit

History

subtask2

Folders and files

parent directory

README.md

Subtask 2

Prompts

Zero-shot ICL using LLMs

Evaluation dataset

Metrics

Example