diff --git a/data/xml/2015.jeptalnrecital.xml b/data/xml/2015.jeptalnrecital.xml
index c7b7447739..4d26eb7f0a 100644
--- a/data/xml/2015.jeptalnrecital.xml
+++ b/data/xml/2015.jeptalnrecital.xml
@@ -711,9 +711,9 @@
     <paper id="43">
       <title>Etiquetage morpho-syntaxique de tweets avec des <fixed-case>CRF</fixed-case></title>
       <author><first>Tian</first><last>Tian</last></author>
-      <author><first>Dinarelli</first><last>Marco</last></author>
-      <author><first>Tellier</first><last>Isabelle</last></author>
-      <author><first>Cardoso</first><last>Pedro</last></author>
+      <author><first>Marco</first><last>Dinarelli</last></author>
+      <author><first>Isabelle</first><last>Tellier</last></author>
+      <author><first>Pedro</first><last>Cardoso</last></author>
       <pages>291–297</pages>
       <abstract>Nous nous intéressons dans cet article à l’apprentissage automatique d’un étiqueteur mopho-syntaxique pour les tweets en anglais. Nous proposons tout d’abord un jeu d’étiquettes réduit avec 17 étiquettes différentes, qui permet d’obtenir de meilleures performances en exactitude par rapport au jeu d’étiquettes traditionnel qui contient 45 étiquettes. Comme nous disposons de peu de tweets étiquetés, nous essayons ensuite de compenser ce handicap en ajoutant dans l’ensemble d’apprentissage des données issues de textes bien formés. Les modèles mixtes obtenus permettent d’améliorer les résultats par rapport aux modèles appris avec un seul corpus, qu’il soit issu de Twitter ou de textes journalistiques.</abstract>
       <url hash="926a32a5">2015.jeptalnrecital-court.43</url>
diff --git a/data/xml/2016.jeptalnrecital.xml b/data/xml/2016.jeptalnrecital.xml
index cb92b74be8..d96d7c9a99 100644
--- a/data/xml/2016.jeptalnrecital.xml
+++ b/data/xml/2016.jeptalnrecital.xml
@@ -278,13 +278,13 @@
     </paper>
     <paper id="23">
       <title>La distinction entre les paraphasies phonétiques et phonologiques dans l’aphasie : Etude de cas de deux patients aphasiques (The distinction between phonetic and phonological paraphasias in aphasia: A multiple casestudy of aphasic patients)</title>
-      <language>fra</language>
       <author><first>Clémence</first><last>Verhaegen</last></author>
       <author><first>Véronique</first><last>Delvaux</last></author>
       <author><first>Kathy</first><last>Huet</last></author>
-      <author><first>Fagniart</first><last>Sophie</last></author>
+      <author><first>Sophie</first><last>Fagniart</last></author>
       <author><first>Myriam</first><last>Piccaluga</last></author>
       <author><first>Bernard</first><last>Harmegnies</last></author>
+      <language>fra</language>
       <pages>200–210</pages>
       <abstract>La spécificité phonologique ou phonétique des erreurs de production orale observées chez les patients aphasiques reste débattue. Cependant, la distinction entre ces deux types d’erreurs est fréquemment basée sur des analyses perceptives qui peuvent être influencées par le système perceptif de l’expérimentateur. Afin de pallier ce biais, nous avons réalisé des analyses acoustiques des productions de deux patients aphasiques, dans une tâche de répétition de non-mots. Nous nous sommes centrés sur l’analyse de consonnes occlusives. Les résultats ont montré la présence de difficultés de gestion du voisement chez les deux patients, indiquant la présence de troubles phonétiques. En outre, les résultats montrent une grande diversité des manifestations des troubles langagiers des patients ainsi que l’intervention potentielle de stratégies de compensation de leurs difficultés. L’intérêt de procéder à des analyses acoustiques précises utilisant des indices multiples est discuté.</abstract>
       <url hash="d6d71199">2016.jeptalnrecital-jep.23</url>
@@ -722,12 +722,12 @@
     </paper>
     <paper id="63">
       <title>Que disents nos silences? Apport des données acoustiques, articulatoires et physiologiques pour l’étude des pauses silencieuses (What do our silences say? Contribution of acoustic, articulatory and physiological data to the study on silent pauses)</title>
+      <author><first>Muriel</first><last>Lalain</last></author>
+      <author><first>Thierry</first><last>Legou</last></author>
+      <author><first>Camille</first><last>Fauth</last></author>
+      <author><first>Fabrice</first><last>Hirsch</last></author>
+      <author><first>Ivana</first><last>Didirkova</last></author>
       <language>fra</language>
-      <author><first>Lalain</first><last>Muriel</last></author>
-      <author><first>Legou</first><last>Thierry</last></author>
-      <author><first>Fauth</first><last>Camille</last></author>
-      <author><first>Hirsch</first><last>Fabrice</last></author>
-      <author><first>Didirkova</first><last>Ivana</last></author>
       <pages>563–570</pages>
       <abstract>Si la rhétorique s’est intéressée très tôt à la pause, il a fallu attendre le XXème siècle pour que d’autres disciplines – la psycholinguistique, le traitement automatique des langues, la phonétique – accordent à ces moments de silence l’intérêt qu’ils méritent. Il a ainsi été montré que ces ruptures dans le signal acoustique, loin de signer une absence d’activité, constituaient en réalité le lieu d’une activité physiologique (la respiration) et/ou cognitive (planification du discours) qui participent tout autant au message que la parole elle-même. Dans cette étude pilote, nous proposons des observations et des pistes de réflexions à partir de l’analyse des pauses silencieuses dans un corpus de parole lue et semi dirigée. Nous mettons notamment en évidence l’apport de l’analyse conjointe de données acoustiques, articulatoires (EMA) et physiologiques (respiratoires) pour l’identification, parmi les pauses silencieuses, des pauses respiratoires, syntaxiques et d’hésitation.</abstract>
       <url hash="8baa6e4f">2016.jeptalnrecital-jep.63</url>
@@ -756,15 +756,15 @@
     </paper>
     <paper id="66">
       <title>Quels tests d’intelligibilité pour évaluer les troubles de production de la parole ? (What kind of intelligibility test to assess speech production disorders?)</title>
-      <language>fra</language>
       <author><first>Alain</first><last>Ghio</last></author>
       <author><first>Laurence</first><last>Giusti</last></author>
       <author><first>Emilie</first><last>Blanc</last></author>
       <author><first>Serge</first><last>Pinto</last></author>
-      <author><first>Lalain</first><last>Muriel</last></author>
+      <author><first>Muriel</first><last>Lalain</last></author>
       <author><first>Danièle</first><last>Robert</last></author>
       <author><first>Corine</first><last>Fredouille</last></author>
       <author><first>Virginie</first><last>Woisard</last></author>
+      <language>fra</language>
       <pages>589–596</pages>
       <abstract>L’intelligibilité de la parole se définit comme le degré de précision avec lequel un message est compris par un auditeur. A ce titre, la perte d’intelligibilité représente souvent une plainte importante pour les patients atteints de troubles de production de la parole, puisqu’elle participe à la diminution de la qualité de vie au niveau communicationnel. Plusieurs outils existent actuellement pour évaluer l’intelligibilité mais aucun ne satisfait pleinement les contraintes cliniques. Dans une première étude, nous avons adapté au français la version 2 du Frenchay Dysarthria Assessment, un test reconnu dans le milieu anglo-saxon pour l’évaluation de locuteurs dysarthriques. Nous avons créé le corpus de mots français en nous appuyant sur les critères définis dans le FDA-2 puis nous avons testé le protocole sur une cinquantaine de locuteurs. Les résultats sont satisfaisants mais divers biais méthodologiques nous ont conduits à poursuivre notre démarche en proposant des listes de pseudo-mots apparentant le test à du décodage acoustico-phonétique.</abstract>
       <url hash="20146cec">2016.jeptalnrecital-jep.66</url>
diff --git a/data/xml/2020.gamnlp.xml b/data/xml/2020.gamnlp.xml
index 272055ea90..84ee0871d2 100644
--- a/data/xml/2020.gamnlp.xml
+++ b/data/xml/2020.gamnlp.xml
@@ -48,7 +48,7 @@
     <paper id="4">
       <title>Game Design Evaluation of <fixed-case>GWAP</fixed-case>s for Collecting Word Associations</title>
       <author><first>Mathieu</first><last>Lafourcade</last></author>
-      <author><first>Le Brun</first><last>Nathalie</last></author>
+      <author><first>Nathalie</first><last>Le Brun</last></author>
       <pages>26–33</pages>
       <abstract>GWAP design might have a tremendous effect on its popularity of course but also on the quality of the data collected. In this paper, a comparison is undertaken between two GWAPs for building term association lists, namely JeuxDeMots and Quicky Goose. After comparing both game designs, the Cohen kappa of associative lists in various configurations is computed in order to assess likeness and differences of the data they provide.</abstract>
       <url hash="9e10add5">2020.gamnlp-1.4</url>
diff --git a/data/xml/2020.lrec.xml b/data/xml/2020.lrec.xml
index bdc6c53de3..b253968941 100644
--- a/data/xml/2020.lrec.xml
+++ b/data/xml/2020.lrec.xml
@@ -5399,7 +5399,7 @@
     <paper id="439">
       <title>Neural Disambiguation of Lemma and Part of Speech in Morphologically Rich Languages</title>
       <author><first>José María</first><last>Hoya Quecedo</last></author>
-      <author><first>Koppatz</first><last>Maximilian</last></author>
+      <author><first>Maximilian W.</first><last>Koppatz</last></author>
       <author><first>Roman</first><last>Yangarber</last></author>
       <pages>3573–3582</pages>
       <abstract>We consider the problem of disambiguating the lemma and part of speech of ambiguous words in morphologically rich languages. We propose a method for disambiguating ambiguous words in context, using a large un-annotated corpus of text, and a morphological analyser—with no manual disambiguation or data annotation. We assume that the morphological analyser produces multiple analyses for ambiguous words. The idea is to train recurrent neural networks on the output that the morphological analyser produces for unambiguous words. We present performance on POS and lemma disambiguation that reaches or surpasses the state of the art—including supervised models—using no manually annotated data. We evaluate the method on several morphologically rich languages.</abstract>
diff --git a/data/xml/2020.nlpcovid19.xml b/data/xml/2020.nlpcovid19.xml
index cc9bf4a325..519bcfdfa6 100644
--- a/data/xml/2020.nlpcovid19.xml
+++ b/data/xml/2020.nlpcovid19.xml
@@ -635,7 +635,7 @@
       <title><fixed-case>A</fixed-case>sk<fixed-case>M</fixed-case>e: A <fixed-case>LAPPS</fixed-case> <fixed-case>G</fixed-case>rid-based <fixed-case>NLP</fixed-case> Query and Retrieval System for Covid-19 Literature</title>
       <author><first>Keith</first><last>Suderman</last></author>
       <author><first>Nancy</first><last>Ide</last></author>
-      <author><first>Verhagen</first><last>Marc</last></author>
+      <author><first>Marc</first><last>Verhagen</last></author>
       <author><first>Brent</first><last>Cochran</last></author>
       <author><first>James</first><last>Pustejovsky</last></author>
       <abstract>In a recent project, the Language Application Grid was augmented to support the mining of scientific publications. The results of that ef- fort have now been repurposed to focus on Covid-19 literature, including modification of the LAPPS Grid “AskMe” query and retrieval engine. We describe the AskMe system and discuss its functionality as compared to other query engines available to search covid-related publications.</abstract>
diff --git a/data/xml/2021.acl.xml b/data/xml/2021.acl.xml
index fad41c1c9f..a11edf5681 100644
--- a/data/xml/2021.acl.xml
+++ b/data/xml/2021.acl.xml
@@ -9803,7 +9803,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO
     <paper id="136">
       <title><fixed-case>S</fixed-case>a<fixed-case>R</fixed-case>o<fixed-case>C</fixed-case>o: Detecting Satire in a Novel <fixed-case>R</fixed-case>omanian Corpus of News Articles</title>
       <author><first>Ana-Cristina</first><last>Rogoz</last></author>
-      <author><first>Gaman</first><last>Mihaela</last></author>
+      <author><first>Mihaela</first><last>Găman</last></author>
       <author><first>Radu Tudor</first><last>Ionescu</last></author>
       <pages>1073–1079</pages>
       <abstract>In this work, we introduce a corpus for satire detection in Romanian news. We gathered 55,608 public news articles from multiple real and satirical news sources, composing one of the largest corpora for satire detection regardless of language and the only one for the Romanian language. We provide an official split of the text samples, such that training news articles belong to different sources than test news articles, thus ensuring that models do not achieve high performance simply due to overfitting. We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus. Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.</abstract>
diff --git a/data/xml/2021.eacl.xml b/data/xml/2021.eacl.xml
index 42e6cb24a8..889938feb9 100644
--- a/data/xml/2021.eacl.xml
+++ b/data/xml/2021.eacl.xml
@@ -968,8 +968,8 @@
     </paper>
     <paper id="81">
       <title>Clustering Word Embeddings with Self-Organizing Maps. Application on <fixed-case>L</fixed-case>a<fixed-case>R</fixed-case>o<fixed-case>S</fixed-case>e<fixed-case>D</fixed-case>a - A Large <fixed-case>R</fixed-case>omanian Sentiment Data Set</title>
-      <author><first>Anca</first><last>Tache</last></author>
-      <author><first>Gaman</first><last>Mihaela</last></author>
+      <author><first>Anca Maria</first><last>Tache</last></author>
+      <author><first>Mihaela</first><last>Găman</last></author>
       <author><first>Radu Tudor</first><last>Ionescu</last></author>
       <pages>949–956</pages>
       <abstract>Romanian is one of the understudied languages in computational linguistics, with few resources available for the development of natural language processing tools. In this paper, we introduce LaRoSeDa, a Large Romanian Sentiment Data Set, which is composed of 15,000 positive and negative reviews collected from the largest Romanian e-commerce platform. We employ two sentiment classification methods as baselines for our new data set, one based on low-level features (character n-grams) and one based on high-level features (bag-of-word-embeddings generated by clustering word embeddings with k-means). As an additional contribution, we replace the k-means clustering algorithm with self-organizing maps (SOMs), obtaining better results because the generated clusters of word embeddings are closer to the Zipf’s law distribution, which is known to govern natural language. We also demonstrate the generalization capacity of using SOMs for the clustering of word embeddings on another recently-introduced Romanian data set, for text categorization by topic.</abstract>
diff --git a/data/xml/2021.vardial.xml b/data/xml/2021.vardial.xml
index 19319a8dc4..cbecce338a 100644
--- a/data/xml/2021.vardial.xml
+++ b/data/xml/2021.vardial.xml
@@ -22,7 +22,7 @@
     <paper id="1">
       <title>Findings of the <fixed-case>V</fixed-case>ar<fixed-case>D</fixed-case>ial Evaluation Campaign 2021</title>
       <author><first>Bharathi Raja</first><last>Chakravarthi</last></author>
-      <author><first>Gaman</first><last>Mihaela</last></author>
+      <author><first>Mihaela</first><last>Găman</last></author>
       <author><first>Radu Tudor</first><last>Ionescu</last></author>
       <author><first>Heidi</first><last>Jauhiainen</last></author>
       <author><first>Tommi</first><last>Jauhiainen</last></author>
@@ -121,7 +121,7 @@
     </paper>
     <paper id="10">
       <title><fixed-case>U</fixed-case>nibuc<fixed-case>K</fixed-case>ernel: Geolocating <fixed-case>S</fixed-case>wiss <fixed-case>G</fixed-case>erman Jodels Using Ensemble Learning</title>
-      <author><first>Gaman</first><last>Mihaela</last></author>
+      <author><first>Mihaela</first><last>Găman</last></author>
       <author><first>Sebastian</first><last>Cojocariu</last></author>
       <author><first>Radu Tudor</first><last>Ionescu</last></author>
       <pages>84–95</pages>
diff --git a/data/xml/2022.starsem.xml b/data/xml/2022.starsem.xml
index 0d50a9be88..c0a446c8b2 100644
--- a/data/xml/2022.starsem.xml
+++ b/data/xml/2022.starsem.xml
@@ -291,12 +291,12 @@
     </paper>
     <paper id="25">
       <title>Speech acts and Communicative Intentions for Urgency Detection</title>
-      <author><first>Laurenti</first><last>Enzo</last></author>
-      <author><first>Bourgon</first><last>Nils</last></author>
+      <author><first>Enzo</first><last>Laurenti</last></author>
+      <author><first>Nils</first><last>Bourgon</last></author>
       <author><first>Farah</first><last>Benamara</last></author>
-      <author><first>Mari</first><last>Alda</last></author>
+      <author><first>Alda</first><last>Mari</last></author>
       <author><first>Véronique</first><last>Moriceau</last></author>
-      <author><first>Courgeon</first><last>Camille</last></author>
+      <author><first>Camille</first><last>Courgeon</last></author>
       <pages>289-298</pages>
       <abstract>Recognizing speech acts (SA) is crucial for capturing meaning beyond what is said, making communicative intentions particularly relevant to identify urgent messages. This paper attempts to measure for the first time the impact of SA on urgency detection during crises,006in tweets. We propose a new dataset annotated for both urgency and SA, and develop several deep learning architectures to inject SA into urgency detection while ensuring models generalisability. Our results show that taking speech acts into account in tweet analysis improves information type detection in an out-of-type configuration where models are evaluated in unseen event types during training. These results are encouraging and constitute a first step towards SA-aware disaster management in social media.</abstract>
       <url hash="b167524b">2022.starsem-1.25</url>
diff --git a/data/xml/2023.acl.xml b/data/xml/2023.acl.xml
index 5e46db6700..e144df20b8 100644
--- a/data/xml/2023.acl.xml
+++ b/data/xml/2023.acl.xml
@@ -4360,10 +4360,10 @@
       <video href="2023.acl-long.307.mp4"/>
     </paper>
     <paper id="308">
-      <title>No clues good clues: out of context Lexical Relation Classification</title>
-      <author orcid="0000-0002-6734-8808"><first>Lucia</first><last>Pitarch</last><affiliation>University of Zaragoza</affiliation></author>
-      <author orcid="0000-0001-8531-353X"><first>Jordi</first><last>Bernad</last><affiliation>University of Zaragoza</affiliation></author>
-      <author orcid="0000-0002-9169-5287"><first>Lacramioara</first><last>Dranca</last><affiliation>Centro Universitario de la Defensa</affiliation></author>
+      <title>No clues, good clues: Out of context Lexical Relation Classification</title>
+      <author orcid="0000-0002-6734-8808"><first>Lucía</first><last>Pitarch</last><affiliation>University of Zaragoza</affiliation></author>
+      <author orcid="0000-0001-8531-353X"><first>Jorge</first><last>Bernad</last><affiliation>University of Zaragoza</affiliation></author>
+      <author orcid="0000-0002-9169-5287"><first>Licri</first><last>Dranca</last><affiliation>Centro Universitario de la Defensa</affiliation></author>
       <author orcid="0000-0003-4239-8785"><first>Carlos</first><last>Bobed Lisbona</last><affiliation>University of Zaragoza, Spain</affiliation></author>
       <author orcid="0000-0001-6452-7627"><first>Jorge</first><last>Gracia</last><affiliation>University of Zaragoza</affiliation></author>
       <pages>5607-5625</pages>
diff --git a/data/xml/2023.clinicalnlp.xml b/data/xml/2023.clinicalnlp.xml
index c27f41e010..ebb1df98e7 100644
--- a/data/xml/2023.clinicalnlp.xml
+++ b/data/xml/2023.clinicalnlp.xml
@@ -521,11 +521,11 @@
       <author><first>Robert</first><last>Tinn</last></author>
       <author><first>Sid</first><last>Kiblawi</last><affiliation>Microsoft</affiliation></author>
       <author><first>Yu</first><last>Gu</last><affiliation>Microsoft</affiliation></author>
-      <author><first>Akshay</first><last>Chaudhari</last><affiliation>Stanford University and Subtle Medical</affiliation></author>
+      <author><first>Akshay S.</first><last>Chaudhari</last><affiliation>Stanford University and Subtle Medical</affiliation></author>
       <author><first>Hoifung</first><last>Poon</last><affiliation>Microsoft</affiliation></author>
       <author><first>Sheng</first><last>Zhang</last><affiliation>Microsoft</affiliation></author>
       <author><first>Mu</first><last>Wei</last><affiliation>Microsoft</affiliation></author>
-      <author orcid="0000-0002-5017-6042"><first>J.</first><last>Preston</last></author>
+      <author orcid="0000-0002-5017-6042"><first>Joseph S.</first><last>Preston</last></author>
       <pages>373-384</pages>
       <abstract>Motivated by the scarcity of high-quality labeled biomedical text, as well as the success of data programming, we introduce KRISS-Search. By leveraging the Unified Medical Language Systems (UMLS) ontology, KRISS-Search addresses an interactive few-shot span recommendation task that we propose. We first introduce unsupervised KRISS-Search and show that our method outperforms existing methods in identifying spans that are semantically similar to a given span of interest, with &gt;50% AUPRC improvement relative to PubMedBERT. We then introduce supervised KRISS-Search, which leverages human interaction to improve the notion of similarity used by unsupervised KRISS-Search. Through simulated human feedback, we demonstrate an enhanced F1 score of 0.68 in classifying spans as semantically similar or different in the low-label setting, outperforming PubMedBERT by 2 F1 points. Finally, supervised KRISS-Search demonstrates competitive or superior performance compared to PubMedBERT in few-shot biomedical named entity recognition (NER) across five benchmark datasets, with an average improvement of 5.6 F1 points. We envision KRISS-Search increasing the efficiency of programmatic data labeling and also providing broader utility as an interactive biomedical search engine.</abstract>
       <url hash="a3998099">2023.clinicalnlp-1.40</url>
diff --git a/data/xml/2023.jeptalnrecital.xml b/data/xml/2023.jeptalnrecital.xml
index 74ccf5db56..dfb92a98b0 100644
--- a/data/xml/2023.jeptalnrecital.xml
+++ b/data/xml/2023.jeptalnrecital.xml
@@ -160,7 +160,7 @@
     <paper id="14">
       <title>Augmentation des modèles de langage français par graphes de connaissances pour la reconnaissance des entités biomédicales</title>
       <author><first>Aidan</first><last>Mannion</last></author>
-      <author><first>Schwab</first><last>Didier</last></author>
+      <author><first>Didier</first><last>Schwab</last></author>
       <author><first>Lorraine</first><last>Goeuriot</last></author>
       <author><first>Thierry</first><last>Chevalier</last></author>
       <pages>177–189</pages>
@@ -387,7 +387,7 @@ In NLP, the automatic detection of logical contradictions between statements is
     <paper id="5">
       <title>Les textes cliniques français générés sont-ils dangereusement similaires à leur source ? Analyse par plongements de phrases</title>
       <author><first>Nicolas</first><last>Hiebel</last></author>
-      <author><first>Ferret</first><last>Olivier</last></author>
+      <author><first>Olivier</first><last>Ferret</last></author>
       <author><first>Karën</first><last>Fort</last></author>
       <author><first>Aurélie</first><last>Névéol</last></author>
       <pages>46–54</pages>
@@ -808,7 +808,7 @@ In NLP, the automatic detection of logical contradictions between statements is
     <paper id="5">
       <title>Recherche cross-modale pour répondre à des questions visuelles</title>
       <author><first>Paul</first><last>Lerner</last></author>
-      <author><first>Ferret</first><last>Olivier</last></author>
+      <author><first>Olivier</first><last>Ferret</last></author>
       <author><first>Camille</first><last>Guinaudeau</last></author>
       <pages>74–92</pages>
       <abstract>Répondre à des questions visuelles à propos d’entités nommées (KVQAE) est une tâche difficile qui demande de rechercher des informations dans une base de connaissances multimodale. Nous étudions ici comment traiter cette tâche avec une recherche cross-modale et sa combinaison avec une recherche mono-modale, en se focalisant sur le modèle CLIP, un modèle multimodal entraîné sur des images appareillées à leur légende textuelle. Nos résultats démontrent la supériorité de la recherche cross-modale, mais aussi la complémentarité des deux, qui peuvent être combinées facilement. Nous étudions également différentes manières d’ajuster CLIP et trouvons que l’optimisation cross-modale est la meilleure solution, étant en adéquation avec son pré-entraînement. Notre méthode surpasse les approches précédentes, tout en étant plus simple et moins coûteuse. Ces gains de performance sont étudiés intrinsèquement selon la pertinence des résultats de la recherche et extrinsèquement selon l’exactitude de la réponse extraite par un module externe. Nous discutons des différences entre ces métriques et de ses implications pour l’évaluation de la KVQAE.</abstract>
diff --git a/data/xml/2023.jlcl.xml b/data/xml/2023.jlcl.xml
index 69c09b0c3e..68c261dd45 100644
--- a/data/xml/2023.jlcl.xml
+++ b/data/xml/2023.jlcl.xml
@@ -4,7 +4,7 @@
     <meta>
       <booktitle>Journal for Language Technology and Computational Linguistics, Vol. 36 No. 1</booktitle>
       <editor><first>Roman</first><last>Schneider</last></editor>
-      <editor><first>Faaß</first><last>Gertrud</last></editor>
+      <editor><first>Gertrud</first><last>Faaß</last></editor>
       <publisher>German Society for Computational Lingustics and Language Technology</publisher>
       <address>unknown</address>
       <month>May</month>
@@ -18,7 +18,7 @@
     <paper id="1">
       <title>Computerlinguistische Herausforderungen, empirische Erforschung &amp; multidisziplinäres Potenzial deutschsprachiger Songtexte</title>
       <author><first>Roman</first><last>Schneider</last></author>
-      <author><first>Faaß</first><last>Gertrud</last></author>
+      <author><first>Gertrud</first><last>Faaß</last></author>
       <pages>iii-v</pages>
       <url hash="d35cfcbc">2023.jlcl-1.1</url>
       <doi>10.21248/jlcl.36.2023.234</doi>
diff --git a/data/xml/2023.paclic.xml b/data/xml/2023.paclic.xml
index cd54dc6122..ca7e759175 100644
--- a/data/xml/2023.paclic.xml
+++ b/data/xml/2023.paclic.xml
@@ -419,13 +419,13 @@
     </paper>
     <paper id="43">
       <title>An empirical, corpus-based, approach to <fixed-case>C</fixed-case>antonese nominal expressions</title>
-      <author><first>Gr ̈¦goire</first><last>Winterstein</last></author>
+      <author><first>Grégoire</first><last>Winterstein</last></author>
       <author><first>David</first><last>Vergnaud</last></author>
-      <author><first>Hannah Hoi Tung</first><last>Yu</last></author>
-      <author><first>J ̈¦r ̈¦mie</first><last>Lupien</last></author>
-      <author><first>Laperle</first><last>Samuel</last></author>
-      <author><first>Pei Sui</first><last>Luk</last></author>
+      <author><first>Jérémie</first><last>Lupien</last></author>
+      <author><first>Samuel</first><last>Laperle</last></author>
+      <author><first>Hannah</first><last>Yu</last></author>
       <author><first>Christopher</first><last>Davis</last></author>
+      <author><first>Zoe Pei Sui</first><last>Luk</last></author>
       <pages>436–445</pages>
       <url hash="ab9a4d34">2023.paclic-1.43</url>
       <bibkey>winterstein-etal-2023-empirical</bibkey>
diff --git a/data/xml/2024.dravidianlangtech.xml b/data/xml/2024.dravidianlangtech.xml
index f3fabcb616..97c5ca33c3 100644
--- a/data/xml/2024.dravidianlangtech.xml
+++ b/data/xml/2024.dravidianlangtech.xml
@@ -60,10 +60,10 @@
     </paper>
     <paper id="4">
       <title>Social Media Fake News Classification Using Machine Learning Algorithm</title>
-      <author orcid="0000-0002-1872-4555"><first>Girma</first><last>Bade</last></author>
+      <author orcid="0000-0002-1872-4555"><first>Girma Yohannis</first><last>Bade</last></author>
       <author><first>Olga</first><last>Kolesnikova</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
       <author orcid="0000-0003-3901-3522"><first>Grigori</first><last>Sidorov</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
-      <author><first>José</first><last>Oropeza</last></author>
+      <author><first>José Luis</first><last>Oropeza</last></author>
       <pages>24-29</pages>
       <abstract>The rise of social media has facilitated easier communication, information sharing, and current affairs updates. However, the prevalence of misleading and deceptive content, commonly referred to as fake news, poses a significant challenge. This paper focuses on the classification of fake news in Malayalam, a Dravidian language, utilizing natural language processing (NLP) techniques. To develop a model, we employed a random forest machine learning method on a dataset provided by a shared task(DravidianLangTech@EACL 2024)1. When evaluated by the separate test dataset, our developed model achieved a 0.71 macro F1 measure.</abstract>
       <url hash="e50b3179">2024.dravidianlangtech-1.4</url>
@@ -350,7 +350,7 @@
     </paper>
     <paper id="26">
       <title>Habesha@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024: Detecting Fake News Detection in <fixed-case>D</fixed-case>ravidian Languages using Deep Learning</title>
-      <author orcid="0000-0003-1913-2612"><first>Mesay</first><last>Yigezu</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
+      <author orcid="0000-0003-1913-2612"><first>Mesay Gemeda</first><last>Yigezu</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
       <author><first>Olga</first><last>Kolesnikova</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
       <author orcid="0000-0003-3901-3522"><first>Grigori</first><last>Sidorov</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
       <author orcid="0000-0001-7845-9039"><first>Alexander</first><last>Gelbukh</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
@@ -543,10 +543,10 @@
     </paper>
     <paper id="40">
       <title>Social Media Hate and Offensive Speech Detection Using Machine Learning method</title>
-      <author orcid="0000-0002-1872-4555"><first>Girma</first><last>Bade</last></author>
+      <author orcid="0000-0002-1872-4555"><first>Girma Yohannis</first><last>Bade</last></author>
       <author><first>Olga</first><last>Kolesnikova</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
       <author orcid="0000-0003-3901-3522"><first>Grigori</first><last>Sidorov</last><affiliation>Instituto Politécnico Nacional</affiliation></author>
-      <author><first>José</first><last>Oropeza</last></author>
+      <author><first>José Luis</first><last>Oropeza</last></author>
       <pages>240-244</pages>
       <abstract>Even though the improper use of social media is increasing nowadays, there is also technology that brings solutions. Here, improperness is posting hate and offensive speech that might harm an individual or group. Hate speech refers to an insult toward an individual or group based on their identities. Spreading it on social media platforms is a serious problem for society. The solution, on the other hand, is the availability of natural language processing(NLP) technology that is capable to detect and handle such problems. This paper presents the detection of social media’s hate and offensive speech in the code-mixed Telugu language. For this, the task and golden standard dataset were provided for us by the shared task organizer (DravidianLangTech@ EACL 2024)1. To this end, we have employed the TF-IDF technique for numeric feature extraction and used a random forest algorithm for modeling hate speech detection. Finally, the developed model was evaluated on the test dataset and achieved 0.492 macro-F1.</abstract>
       <url hash="a8e33719">2024.dravidianlangtech-1.40</url>
diff --git a/data/xml/2024.emnlp.xml b/data/xml/2024.emnlp.xml
index 5ce13a9639..0692dd95fe 100644
--- a/data/xml/2024.emnlp.xml
+++ b/data/xml/2024.emnlp.xml
@@ -3296,7 +3296,7 @@
       <author><first>Tyler A.</first><last>Chang</last><affiliation>Google and University of California, San Diego</affiliation></author>
       <author orcid="0000-0003-0448-5415"><first>Catherine</first><last>Arnett</last></author>
       <author><first>Zhuowen</first><last>Tu</last><affiliation>University of California, San Diego</affiliation></author>
-      <author orcid="0000-0002-9395-9151"><first>Ben</first><last>Bergen</last><affiliation>University of California, San Diego</affiliation></author>
+      <author orcid="0000-0002-9395-9151"><first>Benjamin K.</first><last>Bergen</last><affiliation>University of California, San Diego</affiliation></author>
       <pages>4074-4096</pages>
       <abstract>Multilingual language models are widely used to extend NLP systems to low-resource languages. However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce. Here, we pre-train over 10,000 monolingual and multilingual language models for over 250 languages, including multiple language families that are under-studied in NLP. We assess how language modeling performance in each language varies as a function of (1) monolingual dataset size, (2) added multilingual dataset size, (3) linguistic similarity of the added languages, and (4) model size (up to 45M parameters). We find that in moderation, adding multilingual data improves low-resource language modeling performance, similar to increasing low-resource dataset sizes by up to 33%. Improvements depend on the syntactic similarity of the added multilingual data, with marginal additional effects of vocabulary overlap. However, high-resource languages consistently perform worse in multilingual pre-training scenarios. As dataset sizes increase, adding multilingual data begins to hurt performance for both low-resource and high-resource languages, likely due to limited model capacity (the “curse of multilinguality”). These results suggest that massively multilingual pre-training may not be optimal for any languages involved, but that more targeted models can significantly improve performance.</abstract>
       <url hash="ecad32f4">2024.emnlp-main.236</url>
@@ -11864,8 +11864,8 @@
       <author orcid="0009-0002-6872-557X"><first>Chuang</first><last>Wang</last></author>
       <author orcid="0009-0007-8044-5518"><first>Jian</first><last>Yao</last></author>
       <author orcid="0000-0002-5582-650X"><first>Li</first><last>Liu</last><affiliation>Jiangnan University</affiliation></author>
-      <author><first>Fang</first><last>Wei</last><affiliation>Jiangnan University</affiliation></author>
-      <author orcid="0000-0002-5701-1080"><first>Eddie Y.k.</first><last>Eddie</last></author>
+      <author><first>Wei</first><last>Fang</last><affiliation>Jiangnan University</affiliation></author>
+      <author orcid="0000-0002-5701-1080"><first>Eddie-Yin-Kwee</first><last>Ng</last></author>
       <pages>15257-15269</pages>
       <abstract>Knowledge graph completion (KGC) aims to infer missing or incomplete parts in knowledge graph. The existing models are generally divided into structure-based and description-based models, among description-based models often require longer training and inference times as well as increased memory usage. In this paper, we propose Pre-Encoded Masked Language Model (PEMLM) to efficiently solve KGC problem. By encoding textual descriptions into semantic representations before training, the necessary resources are significantly reduced. Furthermore, we introduce a straightforward but effective fusion framework to integrate structural embedding with pre-encoded semantic description, which enhances the model’s prediction performance on 1-N relations. The experimental results demonstrate that our proposed strategy attains state-of-the-art performance on the WN18RR (MRR+5.4% and Hits@1+6.4%) and UMLS datasets. Compared to existing models, we have increased inference speed by 30x and reduced training memory by approximately 60%.</abstract>
       <url hash="6eb7a1de">2024.emnlp-main.851</url>
diff --git a/data/xml/2024.findings.xml b/data/xml/2024.findings.xml
index 7403e63112..c064827178 100644
--- a/data/xml/2024.findings.xml
+++ b/data/xml/2024.findings.xml
@@ -28660,7 +28660,7 @@ and high variation in performance on the subset, suggesting our plausibility cri
       <author><first>Karen Jia-Hui</first><last>Li</last><affiliation>Charles University Prague</affiliation></author>
       <author><first>Rafael</first><last>Sargsyan</last></author>
       <author orcid="0000-0003-3958-4704"><first>Vivek</first><last>Kumar</last><affiliation>University of the Bundeswehr Munich</affiliation></author>
-      <author><first>Diego</first><last>Reforgiato</last></author>
+      <author><first>Diego</first><last>Reforgiato Recupero</last></author>
       <author><first>Daniele</first><last>Riboni</last><affiliation>University of Cagliari</affiliation></author>
       <author orcid="0000-0002-1415-1702"><first>Ondrej</first><last>Dusek</last><affiliation>Charles University, Prague</affiliation></author>
       <pages>11519-11545</pages>
diff --git a/data/xml/2024.jeptalnrecital.xml b/data/xml/2024.jeptalnrecital.xml
index d7742b51ec..d56b5a5dae 100644
--- a/data/xml/2024.jeptalnrecital.xml
+++ b/data/xml/2024.jeptalnrecital.xml
@@ -1444,8 +1444,8 @@
       <title>Jargon : Une suite de modèles de langues et de référentiels d’évaluation pour les domaines spécialisés du français</title>
       <author><first>Vincent</first><last>Segonne</last></author>
       <author><first>Aidan</first><last>Mannion</last></author>
-      <author><first>Laura</first><last>Alonzo-Canul</last></author>
-      <author><first>Audibert</first><last>Alexandre</last></author>
+      <author><first>Laura Cristina</first><last>Alonzo Canul</last></author>
+      <author><first>Alexandre</first><last>Audibert</last></author>
       <author><first>Xingyu</first><last>Liu</last></author>
       <author><first>Cécile</first><last>Macaire</last></author>
       <author><first>Adrien</first><last>Pupier</last></author>
@@ -1455,7 +1455,7 @@
       <author><first>Magali</first><last>Norré</last></author>
       <author><first>Massih-Reza</first><last>Amini</last></author>
       <author><first>Pierrette</first><last>Bouillon</last></author>
-      <author><first>Iris</first><last>Eshkol Taravella</last></author>
+      <author><first>Iris</first><last>Eshkol-Taravella</last></author>
       <author><first>Emmanuelle</first><last>Esparança-Rodier</last></author>
       <author><first>Thomas</first><last>François</last></author>
       <author><first>Lorraine</first><last>Goeuriot</last></author>
diff --git a/data/xml/2024.naacl.xml b/data/xml/2024.naacl.xml
index 10e2b9e6f7..5a49bd2507 100644
--- a/data/xml/2024.naacl.xml
+++ b/data/xml/2024.naacl.xml
@@ -4094,8 +4094,8 @@
     </paper>
     <paper id="290">
       <title>Does <fixed-case>GPT</fixed-case>-4 pass the <fixed-case>T</fixed-case>uring test?</title>
-      <author><first>Cameron</first><last>Jones</last><affiliation>University of California, San Diego</affiliation></author>
-      <author><first>Ben</first><last>Bergen</last></author>
+      <author><first>Cameron R.</first><last>Jones</last><affiliation>University of California, San Diego</affiliation></author>
+      <author><first>Benjamin K.</first><last>Bergen</last></author>
       <pages>5183-5210</pages>
       <abstract>We evaluated GPT-4 in a public online Turing test. The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%), but falling short of the baseline set by human participants (66%). Participants’ decisions were based mainly on linguistic style (35%) and socioemotional traits (27%), supporting the idea that intelligence, narrowly conceived, is not sufficient to pass the Turing test. Participant knowledge about LLMs and number of games played positively correlated with accuracy in detecting AI, suggesting learning and practice as possible strategies to mitigate deception. Despite known limitations as a test of intelligence, we argue that the Turing test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.</abstract>
       <url hash="24e4d738">2024.naacl-long.290</url>
diff --git a/data/xml/2025.dravidianlangtech.xml b/data/xml/2025.dravidianlangtech.xml
index f2493406fb..5ebc93ad7c 100644
--- a/data/xml/2025.dravidianlangtech.xml
+++ b/data/xml/2025.dravidianlangtech.xml
@@ -1528,14 +1528,14 @@
     </paper>
     <paper id="124">
       <title>Overview of the Shared Task on Sentiment Analysis in <fixed-case>T</fixed-case>amil and <fixed-case>T</fixed-case>ulu</title>
-      <author orcid="0000-0003-0681-6628"><first>Thenmozhi</first><last>Durairaj</last></author>
+      <author orcid="0000-0003-0681-6628"><first>Durairaj</first><last>Thenmozhi</last></author>
       <author orcid="0000-0002-4575-7934"><first>Bharathi Raja</first><last>Chakravarthi</last><affiliation>University of Galway</affiliation></author>
       <author><first>Asha</first><last>Hegde</last><affiliation>Mangalore University</affiliation></author>
       <author><first>Hosahalli Lakshmaiah</first><last>Shashirekha</last><affiliation>Mangalore University</affiliation></author>
       <author><first>Rajeswari</first><last>Natarajan</last></author>
       <author><first>Sajeetha</first><last>Thavareesan</last></author>
       <author orcid="0000-0002-5689-6470"><first>Ratnasingam</first><last>Sakuntharaj</last><affiliation>Eastern University of Sri Lanka</affiliation></author>
-      <author orcid="0000-0001-9466-8121"><first>Krishnakumari</first><last>K</last></author>
+      <author orcid="0000-0001-9466-8121"><first>Krishnakumari</first><last>Kalyanasundaram</last></author>
       <author><first>Charmathi</first><last>Rajkumar</last></author>
       <author orcid="0009-0004-2243-5176"><first>Poorvi</first><last>Shetty</last></author>
       <author><first>Harshitha S</first><last>Kumar</last></author>
diff --git a/data/xml/2025.findings.xml b/data/xml/2025.findings.xml
index 0ec705378b..49ae1799b3 100644
--- a/data/xml/2025.findings.xml
+++ b/data/xml/2025.findings.xml
@@ -4619,7 +4619,7 @@
     <paper id="344">
       <title><fixed-case>M</fixed-case>-<fixed-case>IFE</fixed-case>val: Multilingual Instruction-Following Evaluation</title>
       <author><first>Antoine</first><last>Dussolle</last></author>
-      <author orcid="0009-0001-7628-5618"><first>A.</first><last>Cardeña</last></author>
+      <author orcid="0009-0001-7628-5618"><first>Andrea</first><last>Cardeña Díaz</last></author>
       <author><first>Shota</first><last>Sato</last><affiliation>Lightblue</affiliation></author>
       <author orcid="0000-0002-8083-320X"><first>Peter</first><last>Devine</last></author>
       <pages>6161-6176</pages>
diff --git a/data/xml/2025.naacl.xml b/data/xml/2025.naacl.xml
index f008d30822..c784e7715f 100644
--- a/data/xml/2025.naacl.xml
+++ b/data/xml/2025.naacl.xml
@@ -10781,7 +10781,7 @@
       <author orcid="0009-0002-7653-8459"><first>Daniil</first><last>Grebenkin</last></author>
       <author><first>Oleg</first><last>Sedukhin</last><affiliation>Siberian Neuronets LLC</affiliation></author>
       <author><first>Mikhail</first><last>Klementev</last></author>
-      <author orcid="0009-0008-5894-7041"><first>Derunets</first><last>Roman</last><affiliation>Novosibirsk State University</affiliation></author>
+      <author orcid="0009-0008-5894-7041"><first>Roman</first><last>Derunets</last><affiliation>Novosibirsk State University</affiliation></author>
       <author><first>Lyudmila</first><last>Budneva</last><affiliation>Novosibirsk State University</affiliation></author>
       <pages>988-997</pages>
       <abstract>This work presents a speech-to-text system “Pisets” for scientists and journalists which is based on a three-component architecture aimed at improving speech recognition accuracy while minimizing errors and hallucinations associated with the Whisper model. The architecture comprises primary recognition using Wav2Vec2, false positive filtering via the Audio Spectrogram Transformer (AST), and final speech recognition through Whisper. The implementation of curriculum learning methods and the utilization of diverse Russian-language speech corpora significantly enhanced the system’s effectiveness. Additionally, advanced uncertainty modeling techniques were introduced, contributing to further improvements in transcription quality. The proposed approaches ensure robust transcribing of long audio data across various acoustic conditions compared to WhisperX and the usual Whisper model. The source code of “Pisets” system is publicly available at GitHub: https://github.com/bond005/pisets.</abstract>
@@ -11400,7 +11400,7 @@
       <title>Streamlining <fixed-case>LLM</fixed-case>s: Adaptive Knowledge Distillation for Tailored Language Models</title>
       <author><first>Prajvi</first><last>Saxena</last><affiliation>German Research Center for AI</affiliation></author>
       <author><first>Sabine</first><last>Janzen</last></author>
-      <author orcid="0000-0003-4057-0924"><first>Wolfgang</first><last>Maass</last><affiliation>Universität des Saarlandes</affiliation></author>
+      <author orcid="0000-0003-4057-0924"><first>Wolfgang</first><last>Maaß</last><affiliation>Universität des Saarlandes</affiliation></author>
       <pages>448-455</pages>
       <abstract>Large language models (LLMs) like GPT-4 and LLaMA-3 offer transformative potential across industries, e.g., enhancing customer service, revolutionizing medical diagnostics, or identifying crises in news articles. However, deploying LLMs faces challenges such as limited training data, high computational costs, and issues with transparency and explainability. Our research focuses on distilling compact, parameter-efficient tailored language models (TLMs) from LLMs for domain-specific tasks with comparable performance. Current approaches like knowledge distillation, fine-tuning, and model parallelism address computational efficiency but lack hybrid strategies to balance efficiency, adaptability, and accuracy. We present ANON - an adaptive knowledge distillation framework integrating knowledge distillation with adapters to generate computationally efficient TLMs without relying on labeled datasets. ANON uses cross-entropy loss to transfer knowledge from the teacher’s outputs and internal representations while employing adaptive prompt engineering and a progressive distillation strategy for phased knowledge transfer. We evaluated ANON’s performance in the crisis domain, where accuracy is critical and labeled data is scarce. Experiments showed that ANON outperforms recent approaches of knowledge distillation, both in terms of the resulting TLM performance and in reducing the computational costs for training and maintaining accuracy compared to LLMs for domain-specific applications.</abstract>
       <url hash="c397bb83">2025.naacl-srw.43</url>
diff --git a/data/xml/2025.nllp.xml b/data/xml/2025.nllp.xml
index a6128aa655..db7760f4d4 100644
--- a/data/xml/2025.nllp.xml
+++ b/data/xml/2025.nllp.xml
@@ -399,9 +399,9 @@
     <paper id="32">
       <title>Extract-Explain-Abstract: A Rhetorical Role-Driven Domain-Specific Summarisation Framework for <fixed-case>I</fixed-case>ndian Legal Documents</title>
       <author><first>Veer</first><last>Chheda</last><affiliation>Dwarkadas J. Sanghvi College Of Engineering</affiliation></author>
-      <author><first>Aaditya Uday</first><last>Ghaisas</last></author>
+      <author><first>Aaditya</first><last>Ghaisas</last></author>
       <author><first>Avantika</first><last>Sankhe</last></author>
-      <author orcid="0000-0002-2507-4140"><first>Dr. Narendra</first><last>Shekokar</last><affiliation>Dwarkadas J. Sanghvi College Of Engineering, Dhirubhai Ambani Institute Of Information and Communication Technology</affiliation></author>
+      <author orcid="0000-0002-2507-4140"><first>Narendra</first><last>Shekokar</last><affiliation>Dwarkadas J. Sanghvi College Of Engineering, Dhirubhai Ambani Institute Of Information and Communication Technology</affiliation></author>
       <pages>439-455</pages>
       <abstract>Legal documents are characterized by theirlength, intricacy, and dense use of jargon, making efficacious summarisation both paramountand challenging. Existing zero-shot methodologies in small language models struggle tosimplify this jargon and are prone to punts andhallucinations with longer prompts. This paperintroduces the Rhetorical Role-based Extract-Explain-Abstract (EEA) Framework, a novelthree-stage methodology for summarisation ofIndian legal documents in low-resource settings. The approach begins by segmenting legaltexts using rhetorical roles, such as facts, issues and arguments, through a domain-specificphrase corpus and extraction based on TF-IDF.In the explanation stage, the segmented output is enriched with logical connections to ensure coherence and legal fidelity. The final abstraction phase condenses these interlinked segments into cogent, high-level summaries thatpreserve critical legal reasoning. Experimentson Indian legal datasets show that the EEAframework typically outperforms in ROUGE,BERTScore, Flesch Reading Ease, Age of Acquisition, SummaC and human evaluations. Wealso employ InLegalBERTScore as a metric tocapture domain specific semantics of Indianlegal documents.</abstract>
       <url hash="574db858">2025.nllp-1.32</url>
diff --git a/data/xml/2025.semeval.xml b/data/xml/2025.semeval.xml
index 41a69e5d43..d3f682ab9b 100644
--- a/data/xml/2025.semeval.xml
+++ b/data/xml/2025.semeval.xml
@@ -1977,13 +1977,13 @@
     </paper>
     <paper id="185">
       <title>Amado at <fixed-case>S</fixed-case>em<fixed-case>E</fixed-case>val-2025 Task 11: Multi-label Emotion Detection in <fixed-case>A</fixed-case>mharic and <fixed-case>E</fixed-case>nglish Data</title>
-      <author><first>Girma</first><last>Bade</last><affiliation>CIC,IPN,MX</affiliation></author>
+      <author><first>Girma Yohannis</first><last>Bade</last><affiliation>CIC,IPN,MX</affiliation></author>
       <author><first>Olga</first><last>Kolesnikova</last><affiliation>CIC,IPN,MX</affiliation></author>
-      <author><first>Jose</first><last>Oropeza</last><affiliation>CIC,IPN,MX</affiliation></author>
+      <author><first>José Luis</first><last>Oropeza</last><affiliation>CIC,IPN,MX</affiliation></author>
       <author><first>Grigori</first><last>Sidorov</last><affiliation>CIC,IPN,MX</affiliation></author>
-      <author><first>Mesay</first><last>Yigezu</last><affiliation>CIC,IPN,Mx</affiliation></author>
+      <author><first>Mesay Gemeda</first><last>Yigezu</last><affiliation>CIC,IPN,Mx</affiliation></author>
       <pages>1406-1410</pages>
-      <abstract>Amado at SemEval-2025 Task 11: Multi-label Emotion Detection inAmharic and English DataGirma Yohannis Bade, Olga Kolesnikova, José Luis OropezaGrigori Sidorov, Mesay Gemeda Yigezua(Centro de Investigaciones en Computación(CIC),Instituto Politécnico Nacional(IPN), Miguel Othon de Mendizabal,Ciudad de México, 07320, México.)</abstract>
+      <abstract>Recently, social media has become a platform for different human emotions. Although most existing works treat the user’s opinions into a single emotion, the reality is that one user can have more than one emotion at a time, representing multiple emotions at the same time. Multi-label emotion detection is a more advanced and realistic approach, as it acknowledges the complexity of human emotions and their overlapping nature. This paper presents multi-label emotion detection in Amharic and English data. The work is part of SemEval2025 shared task 11, where tasks and datasets are offered by task organizers. To accomplish the aim of the given task, we fine-tune transformers base BERT model, passing through all different workflow pipelines. On unseen test data, the model evaluation achieved 0.6300 and 0.7025 an average macro F1-score for Amharic and English, respectively.</abstract>
       <url hash="237babc9">2025.semeval-1.185</url>
       <bibkey>bade-etal-2025-amado</bibkey>
     </paper>
diff --git a/data/xml/L16.xml b/data/xml/L16.xml
index db5973c623..d6495d2adb 100644
--- a/data/xml/L16.xml
+++ b/data/xml/L16.xml
@@ -2745,8 +2745,8 @@
     <paper id="257">
       <title>Towards a Corpus of Violence Acts in <fixed-case>A</fixed-case>rabic Social Media</title>
       <author><first>Ayman</first><last>Alhelbawy</last></author>
-      <author><first>Poesio</first><last>Massimo</last></author>
       <author><first>Udo</first><last>Kruschwitz</last></author>
+      <author><first>Massimo</first><last>Poesio</last></author>
       <pages>1627–1631</pages>
       <abstract>In this paper we present a new corpus of Arabic tweets that mention some form of violent event, developed to support the automatic identification of Human Rights Abuse. The dataset was manually labelled for seven classes of violence using crowdsourcing.</abstract>
       <url hash="bf427158">L16-1257</url>
@@ -4541,7 +4541,7 @@
     <paper id="424">
       <title>A Gold Standard for Scalar Adjectives</title>
       <author><first>Bryan</first><last>Wilkinson</last></author>
-      <author><first>Oates</first><last>Tim</last></author>
+      <author><first>Tim</first><last>Oates</last></author>
       <pages>2669–2675</pages>
       <abstract>We present a gold standard for evaluating scale membership and the order of scalar adjectives. In addition to evaluating existing methods of ordering adjectives, this knowledge will aid in studying the organization of adjectives in the lexicon. This resource is the result of two elicitation tasks conducted with informants from Amazon Mechanical Turk. The first task is notable for gathering open-ended lexical data from informants. The data is analyzed using Cultural Consensus Theory, a framework from anthropology, to not only determine scale membership but also the level of consensus among the informants (Romney et al., 1986). The second task gathers a culturally salient ordering of the words determined to be members. We use this method to produce 12 scales of adjectives for use in evaluation.</abstract>
       <url hash="526008a9">L16-1424</url>
diff --git a/data/xml/P95.xml b/data/xml/P95.xml
index 785d16d4ba..807885a8b1 100644
--- a/data/xml/P95.xml
+++ b/data/xml/P95.xml
@@ -161,7 +161,7 @@
     <paper id="17">
       <title>Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies</title>
       <author><first>Chinatsu</first><last>Aone</last></author>
-      <author><first>Scott</first><last>William</last></author>
+      <author><first>Scott</first><last>William Bennett</last></author>
       <doi>10.3115/981658.981675</doi>
       <pages>122–129</pages>
       <url hash="172e0dd8">P95-1017</url>
diff --git a/data/xml/W16.xml b/data/xml/W16.xml
index 36ac270b7c..a63833255f 100644
--- a/data/xml/W16.xml
+++ b/data/xml/W16.xml
@@ -8577,7 +8577,7 @@
     </paper>
     <paper id="3">
       <title><fixed-case>C</fixed-case>o<fixed-case>C</fixed-case>o<fixed-case>G</fixed-case>en - Complexity Contour Generator: Automatic Assessment of Linguistic Complexity Using a Sliding-Window Technique</title>
-      <author><first>Ströbel</first><last>Marcus</last></author>
+      <author><first>Marcus</first><last>Ströbel</last></author>
       <author><first>Elma</first><last>Kerz</last></author>
       <author><first>Daniel</first><last>Wiechmann</last></author>
       <author><first>Stella</first><last>Neumann</last></author>
diff --git a/data/xml/W19.xml b/data/xml/W19.xml
index 20aecc075f..1e0b653276 100644
--- a/data/xml/W19.xml
+++ b/data/xml/W19.xml
@@ -8458,9 +8458,10 @@ One of the references was wrong therefore it is corrected to cite the appropriat
     <paper id="51">
       <title>Modeling language learning using specialized Elo rating</title>
       <author><first>Jue</first><last>Hou</last></author>
-      <author><first>Koppatz</first><last>Maximilian</last></author>
+      <author><first>Maximilian W.</first><last>Koppatz</last></author>
       <author><first>José María</first><last>Hoya Quecedo</last></author>
       <author><first>Nataliya</first><last>Stoyanova</last></author>
+      <author><first>Mikhail</first><last>Kopotev</last></author>
       <author><first>Roman</first><last>Yangarber</last></author>
       <pages>494–506</pages>
       <abstract>Automatic assessment of the proficiency levels of the learner is a critical part of Intelligent Tutoring Systems. We present methods for assessment in the context of language learning. We use a specialized Elo formula used in conjunction with educational data mining. We simultaneously obtain ratings for the proficiency of the learners and for the difficulty of the linguistic concepts that the learners are trying to master. From the same data we also learn a graph structure representing a domain model capturing the relations among the concepts. This application of Elo provides ratings for learners and concepts which correlate well with subjective proficiency levels of the learners and difficulty levels of the concepts.</abstract>
@@ -16513,9 +16514,9 @@ In this tutorial on MT and post-editing we would like to continue sharing the la
     </paper>
     <paper id="26">
       <title>Character-level Annotation for <fixed-case>C</fixed-case>hinese Surface-Syntactic <fixed-case>U</fixed-case>niversal <fixed-case>D</fixed-case>ependencies</title>
+      <author><first>Chuanming</first><last>Dong</last></author>
       <author><first>Yixuan</first><last>Li</last></author>
-      <author><first>Gerdes</first><last>Kim</last></author>
-      <author><first>Dong</first><last>Chuanming</last></author>
+      <author><first>Kim</first><last>Gerdes</last></author>
       <pages>216–226</pages>
       <url hash="03a8daaf">W19-7726</url>
       <doi>10.18653/v1/W19-7726</doi>