diff --git a/data/xml/2015.jeptalnrecital.xml b/data/xml/2015.jeptalnrecital.xml
index c7b7447739..4d26eb7f0a 100644
--- a/data/xml/2015.jeptalnrecital.xml
+++ b/data/xml/2015.jeptalnrecital.xml
@@ -711,9 +711,9 @@
Etiquetage morpho-syntaxique de tweets avec des CRF
TianTian
- DinarelliMarco
- TellierIsabelle
- CardosoPedro
+ MarcoDinarelli
+ IsabelleTellier
+ PedroCardoso
291–297
Nous nous intéressons dans cet article à l’apprentissage automatique d’un étiqueteur mopho-syntaxique pour les tweets en anglais. Nous proposons tout d’abord un jeu d’étiquettes réduit avec 17 étiquettes différentes, qui permet d’obtenir de meilleures performances en exactitude par rapport au jeu d’étiquettes traditionnel qui contient 45 étiquettes. Comme nous disposons de peu de tweets étiquetés, nous essayons ensuite de compenser ce handicap en ajoutant dans l’ensemble d’apprentissage des données issues de textes bien formés. Les modèles mixtes obtenus permettent d’améliorer les résultats par rapport aux modèles appris avec un seul corpus, qu’il soit issu de Twitter ou de textes journalistiques.
2015.jeptalnrecital-court.43
diff --git a/data/xml/2016.jeptalnrecital.xml b/data/xml/2016.jeptalnrecital.xml
index cb92b74be8..d96d7c9a99 100644
--- a/data/xml/2016.jeptalnrecital.xml
+++ b/data/xml/2016.jeptalnrecital.xml
@@ -278,13 +278,13 @@
La distinction entre les paraphasies phonétiques et phonologiques dans l’aphasie : Etude de cas de deux patients aphasiques (The distinction between phonetic and phonological paraphasias in aphasia: A multiple casestudy of aphasic patients)
- fra
ClémenceVerhaegen
VéroniqueDelvaux
KathyHuet
- FagniartSophie
+ SophieFagniart
MyriamPiccaluga
BernardHarmegnies
+ fra
200–210
La spécificité phonologique ou phonétique des erreurs de production orale observées chez les patients aphasiques reste débattue. Cependant, la distinction entre ces deux types d’erreurs est fréquemment basée sur des analyses perceptives qui peuvent être influencées par le système perceptif de l’expérimentateur. Afin de pallier ce biais, nous avons réalisé des analyses acoustiques des productions de deux patients aphasiques, dans une tâche de répétition de non-mots. Nous nous sommes centrés sur l’analyse de consonnes occlusives. Les résultats ont montré la présence de difficultés de gestion du voisement chez les deux patients, indiquant la présence de troubles phonétiques. En outre, les résultats montrent une grande diversité des manifestations des troubles langagiers des patients ainsi que l’intervention potentielle de stratégies de compensation de leurs difficultés. L’intérêt de procéder à des analyses acoustiques précises utilisant des indices multiples est discuté.
2016.jeptalnrecital-jep.23
@@ -722,12 +722,12 @@
Que disents nos silences? Apport des données acoustiques, articulatoires et physiologiques pour l’étude des pauses silencieuses (What do our silences say? Contribution of acoustic, articulatory and physiological data to the study on silent pauses)
+ MurielLalain
+ ThierryLegou
+ CamilleFauth
+ FabriceHirsch
+ IvanaDidirkova
fra
- LalainMuriel
- LegouThierry
- FauthCamille
- HirschFabrice
- DidirkovaIvana
563–570
Si la rhétorique s’est intéressée très tôt à la pause, il a fallu attendre le XXème siècle pour que d’autres disciplines – la psycholinguistique, le traitement automatique des langues, la phonétique – accordent à ces moments de silence l’intérêt qu’ils méritent. Il a ainsi été montré que ces ruptures dans le signal acoustique, loin de signer une absence d’activité, constituaient en réalité le lieu d’une activité physiologique (la respiration) et/ou cognitive (planification du discours) qui participent tout autant au message que la parole elle-même. Dans cette étude pilote, nous proposons des observations et des pistes de réflexions à partir de l’analyse des pauses silencieuses dans un corpus de parole lue et semi dirigée. Nous mettons notamment en évidence l’apport de l’analyse conjointe de données acoustiques, articulatoires (EMA) et physiologiques (respiratoires) pour l’identification, parmi les pauses silencieuses, des pauses respiratoires, syntaxiques et d’hésitation.
2016.jeptalnrecital-jep.63
@@ -756,15 +756,15 @@
Quels tests d’intelligibilité pour évaluer les troubles de production de la parole ? (What kind of intelligibility test to assess speech production disorders?)
- fra
AlainGhio
LaurenceGiusti
EmilieBlanc
SergePinto
- LalainMuriel
+ MurielLalain
DanièleRobert
CorineFredouille
VirginieWoisard
+ fra
589–596
L’intelligibilité de la parole se définit comme le degré de précision avec lequel un message est compris par un auditeur. A ce titre, la perte d’intelligibilité représente souvent une plainte importante pour les patients atteints de troubles de production de la parole, puisqu’elle participe à la diminution de la qualité de vie au niveau communicationnel. Plusieurs outils existent actuellement pour évaluer l’intelligibilité mais aucun ne satisfait pleinement les contraintes cliniques. Dans une première étude, nous avons adapté au français la version 2 du Frenchay Dysarthria Assessment, un test reconnu dans le milieu anglo-saxon pour l’évaluation de locuteurs dysarthriques. Nous avons créé le corpus de mots français en nous appuyant sur les critères définis dans le FDA-2 puis nous avons testé le protocole sur une cinquantaine de locuteurs. Les résultats sont satisfaisants mais divers biais méthodologiques nous ont conduits à poursuivre notre démarche en proposant des listes de pseudo-mots apparentant le test à du décodage acoustico-phonétique.
2016.jeptalnrecital-jep.66
diff --git a/data/xml/2020.gamnlp.xml b/data/xml/2020.gamnlp.xml
index 272055ea90..84ee0871d2 100644
--- a/data/xml/2020.gamnlp.xml
+++ b/data/xml/2020.gamnlp.xml
@@ -48,7 +48,7 @@
Game Design Evaluation of GWAPs for Collecting Word Associations
MathieuLafourcade
- Le BrunNathalie
+ NathalieLe Brun
26–33
GWAP design might have a tremendous effect on its popularity of course but also on the quality of the data collected. In this paper, a comparison is undertaken between two GWAPs for building term association lists, namely JeuxDeMots and Quicky Goose. After comparing both game designs, the Cohen kappa of associative lists in various configurations is computed in order to assess likeness and differences of the data they provide.
2020.gamnlp-1.4
diff --git a/data/xml/2020.lrec.xml b/data/xml/2020.lrec.xml
index bdc6c53de3..b253968941 100644
--- a/data/xml/2020.lrec.xml
+++ b/data/xml/2020.lrec.xml
@@ -5399,7 +5399,7 @@
Neural Disambiguation of Lemma and Part of Speech in Morphologically Rich Languages
José MaríaHoya Quecedo
- KoppatzMaximilian
+ Maximilian W.Koppatz
RomanYangarber
3573–3582
We consider the problem of disambiguating the lemma and part of speech of ambiguous words in morphologically rich languages. We propose a method for disambiguating ambiguous words in context, using a large un-annotated corpus of text, and a morphological analyser—with no manual disambiguation or data annotation. We assume that the morphological analyser produces multiple analyses for ambiguous words. The idea is to train recurrent neural networks on the output that the morphological analyser produces for unambiguous words. We present performance on POS and lemma disambiguation that reaches or surpasses the state of the art—including supervised models—using no manually annotated data. We evaluate the method on several morphologically rich languages.
diff --git a/data/xml/2020.nlpcovid19.xml b/data/xml/2020.nlpcovid19.xml
index cc9bf4a325..519bcfdfa6 100644
--- a/data/xml/2020.nlpcovid19.xml
+++ b/data/xml/2020.nlpcovid19.xml
@@ -635,7 +635,7 @@
AskMe: A LAPPS Grid-based NLP Query and Retrieval System for Covid-19 Literature
KeithSuderman
NancyIde
- VerhagenMarc
+ MarcVerhagen
BrentCochran
JamesPustejovsky
In a recent project, the Language Application Grid was augmented to support the mining of scientific publications. The results of that ef- fort have now been repurposed to focus on Covid-19 literature, including modification of the LAPPS Grid “AskMe” query and retrieval engine. We describe the AskMe system and discuss its functionality as compared to other query engines available to search covid-related publications.
diff --git a/data/xml/2021.acl.xml b/data/xml/2021.acl.xml
index fad41c1c9f..a11edf5681 100644
--- a/data/xml/2021.acl.xml
+++ b/data/xml/2021.acl.xml
@@ -9803,7 +9803,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO
SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles
Ana-CristinaRogoz
- GamanMihaela
+ MihaelaGăman
Radu TudorIonescu
1073–1079
In this work, we introduce a corpus for satire detection in Romanian news. We gathered 55,608 public news articles from multiple real and satirical news sources, composing one of the largest corpora for satire detection regardless of language and the only one for the Romanian language. We provide an official split of the text samples, such that training news articles belong to different sources than test news articles, thus ensuring that models do not achieve high performance simply due to overfitting. We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus. Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.
diff --git a/data/xml/2021.eacl.xml b/data/xml/2021.eacl.xml
index 42e6cb24a8..889938feb9 100644
--- a/data/xml/2021.eacl.xml
+++ b/data/xml/2021.eacl.xml
@@ -968,8 +968,8 @@
Clustering Word Embeddings with Self-Organizing Maps. Application on LaRoSeDa - A Large Romanian Sentiment Data Set
- AncaTache
- GamanMihaela
+ Anca MariaTache
+ MihaelaGăman
Radu TudorIonescu
949–956
Romanian is one of the understudied languages in computational linguistics, with few resources available for the development of natural language processing tools. In this paper, we introduce LaRoSeDa, a Large Romanian Sentiment Data Set, which is composed of 15,000 positive and negative reviews collected from the largest Romanian e-commerce platform. We employ two sentiment classification methods as baselines for our new data set, one based on low-level features (character n-grams) and one based on high-level features (bag-of-word-embeddings generated by clustering word embeddings with k-means). As an additional contribution, we replace the k-means clustering algorithm with self-organizing maps (SOMs), obtaining better results because the generated clusters of word embeddings are closer to the Zipf’s law distribution, which is known to govern natural language. We also demonstrate the generalization capacity of using SOMs for the clustering of word embeddings on another recently-introduced Romanian data set, for text categorization by topic.
diff --git a/data/xml/2021.vardial.xml b/data/xml/2021.vardial.xml
index 19319a8dc4..cbecce338a 100644
--- a/data/xml/2021.vardial.xml
+++ b/data/xml/2021.vardial.xml
@@ -22,7 +22,7 @@
Findings of the VarDial Evaluation Campaign 2021
Bharathi RajaChakravarthi
- GamanMihaela
+ MihaelaGăman
Radu TudorIonescu
HeidiJauhiainen
TommiJauhiainen
@@ -121,7 +121,7 @@
UnibucKernel: Geolocating Swiss German Jodels Using Ensemble Learning
- GamanMihaela
+ MihaelaGăman
SebastianCojocariu
Radu TudorIonescu
84–95
diff --git a/data/xml/2022.starsem.xml b/data/xml/2022.starsem.xml
index 0d50a9be88..c0a446c8b2 100644
--- a/data/xml/2022.starsem.xml
+++ b/data/xml/2022.starsem.xml
@@ -291,12 +291,12 @@
Speech acts and Communicative Intentions for Urgency Detection
- LaurentiEnzo
- BourgonNils
+ EnzoLaurenti
+ NilsBourgon
FarahBenamara
- MariAlda
+ AldaMari
VéroniqueMoriceau
- CourgeonCamille
+ CamilleCourgeon
289-298
Recognizing speech acts (SA) is crucial for capturing meaning beyond what is said, making communicative intentions particularly relevant to identify urgent messages. This paper attempts to measure for the first time the impact of SA on urgency detection during crises,006in tweets. We propose a new dataset annotated for both urgency and SA, and develop several deep learning architectures to inject SA into urgency detection while ensuring models generalisability. Our results show that taking speech acts into account in tweet analysis improves information type detection in an out-of-type configuration where models are evaluated in unseen event types during training. These results are encouraging and constitute a first step towards SA-aware disaster management in social media.
2022.starsem-1.25
diff --git a/data/xml/2023.acl.xml b/data/xml/2023.acl.xml
index 5e46db6700..e144df20b8 100644
--- a/data/xml/2023.acl.xml
+++ b/data/xml/2023.acl.xml
@@ -4360,10 +4360,10 @@
- No clues good clues: out of context Lexical Relation Classification
- LuciaPitarchUniversity of Zaragoza
- JordiBernadUniversity of Zaragoza
- LacramioaraDrancaCentro Universitario de la Defensa
+ No clues, good clues: Out of context Lexical Relation Classification
+ LucíaPitarchUniversity of Zaragoza
+ JorgeBernadUniversity of Zaragoza
+ LicriDrancaCentro Universitario de la Defensa
CarlosBobed LisbonaUniversity of Zaragoza, Spain
JorgeGraciaUniversity of Zaragoza
5607-5625
diff --git a/data/xml/2023.clinicalnlp.xml b/data/xml/2023.clinicalnlp.xml
index c27f41e010..ebb1df98e7 100644
--- a/data/xml/2023.clinicalnlp.xml
+++ b/data/xml/2023.clinicalnlp.xml
@@ -521,11 +521,11 @@
RobertTinn
SidKiblawiMicrosoft
YuGuMicrosoft
- AkshayChaudhariStanford University and Subtle Medical
+ Akshay S.ChaudhariStanford University and Subtle Medical
HoifungPoonMicrosoft
ShengZhangMicrosoft
MuWeiMicrosoft
- J.Preston
+ Joseph S.Preston
373-384
Motivated by the scarcity of high-quality labeled biomedical text, as well as the success of data programming, we introduce KRISS-Search. By leveraging the Unified Medical Language Systems (UMLS) ontology, KRISS-Search addresses an interactive few-shot span recommendation task that we propose. We first introduce unsupervised KRISS-Search and show that our method outperforms existing methods in identifying spans that are semantically similar to a given span of interest, with >50% AUPRC improvement relative to PubMedBERT. We then introduce supervised KRISS-Search, which leverages human interaction to improve the notion of similarity used by unsupervised KRISS-Search. Through simulated human feedback, we demonstrate an enhanced F1 score of 0.68 in classifying spans as semantically similar or different in the low-label setting, outperforming PubMedBERT by 2 F1 points. Finally, supervised KRISS-Search demonstrates competitive or superior performance compared to PubMedBERT in few-shot biomedical named entity recognition (NER) across five benchmark datasets, with an average improvement of 5.6 F1 points. We envision KRISS-Search increasing the efficiency of programmatic data labeling and also providing broader utility as an interactive biomedical search engine.
2023.clinicalnlp-1.40
diff --git a/data/xml/2023.jeptalnrecital.xml b/data/xml/2023.jeptalnrecital.xml
index 74ccf5db56..dfb92a98b0 100644
--- a/data/xml/2023.jeptalnrecital.xml
+++ b/data/xml/2023.jeptalnrecital.xml
@@ -160,7 +160,7 @@
Augmentation des modèles de langage français par graphes de connaissances pour la reconnaissance des entités biomédicales
AidanMannion
- SchwabDidier
+ DidierSchwab
LorraineGoeuriot
ThierryChevalier
177–189
@@ -387,7 +387,7 @@ In NLP, the automatic detection of logical contradictions between statements is
Les textes cliniques français générés sont-ils dangereusement similaires à leur source ? Analyse par plongements de phrases
NicolasHiebel
- FerretOlivier
+ OlivierFerret
KarënFort
AurélieNévéol
46–54
@@ -808,7 +808,7 @@ In NLP, the automatic detection of logical contradictions between statements is
Recherche cross-modale pour répondre à des questions visuelles
PaulLerner
- FerretOlivier
+ OlivierFerret
CamilleGuinaudeau
74–92
Répondre à des questions visuelles à propos d’entités nommées (KVQAE) est une tâche difficile qui demande de rechercher des informations dans une base de connaissances multimodale. Nous étudions ici comment traiter cette tâche avec une recherche cross-modale et sa combinaison avec une recherche mono-modale, en se focalisant sur le modèle CLIP, un modèle multimodal entraîné sur des images appareillées à leur légende textuelle. Nos résultats démontrent la supériorité de la recherche cross-modale, mais aussi la complémentarité des deux, qui peuvent être combinées facilement. Nous étudions également différentes manières d’ajuster CLIP et trouvons que l’optimisation cross-modale est la meilleure solution, étant en adéquation avec son pré-entraînement. Notre méthode surpasse les approches précédentes, tout en étant plus simple et moins coûteuse. Ces gains de performance sont étudiés intrinsèquement selon la pertinence des résultats de la recherche et extrinsèquement selon l’exactitude de la réponse extraite par un module externe. Nous discutons des différences entre ces métriques et de ses implications pour l’évaluation de la KVQAE.
diff --git a/data/xml/2023.jlcl.xml b/data/xml/2023.jlcl.xml
index 69c09b0c3e..68c261dd45 100644
--- a/data/xml/2023.jlcl.xml
+++ b/data/xml/2023.jlcl.xml
@@ -4,7 +4,7 @@
Journal for Language Technology and Computational Linguistics, Vol. 36 No. 1
RomanSchneider
- FaaßGertrud
+ GertrudFaaß
German Society for Computational Lingustics and Language Technology
unknown
May
@@ -18,7 +18,7 @@
Computerlinguistische Herausforderungen, empirische Erforschung & multidisziplinäres Potenzial deutschsprachiger Songtexte
RomanSchneider
- FaaßGertrud
+ GertrudFaaß
iii-v
2023.jlcl-1.1
10.21248/jlcl.36.2023.234
diff --git a/data/xml/2023.paclic.xml b/data/xml/2023.paclic.xml
index cd54dc6122..ca7e759175 100644
--- a/data/xml/2023.paclic.xml
+++ b/data/xml/2023.paclic.xml
@@ -419,13 +419,13 @@
An empirical, corpus-based, approach to Cantonese nominal expressions
- Gr ̈¦goireWinterstein
+ GrégoireWinterstein
DavidVergnaud
- Hannah Hoi TungYu
- J ̈¦r ̈¦mieLupien
- LaperleSamuel
- Pei SuiLuk
+ JérémieLupien
+ SamuelLaperle
+ HannahYu
ChristopherDavis
+ Zoe Pei SuiLuk
436–445
2023.paclic-1.43
winterstein-etal-2023-empirical
diff --git a/data/xml/2024.dravidianlangtech.xml b/data/xml/2024.dravidianlangtech.xml
index f3fabcb616..97c5ca33c3 100644
--- a/data/xml/2024.dravidianlangtech.xml
+++ b/data/xml/2024.dravidianlangtech.xml
@@ -60,10 +60,10 @@
Social Media Fake News Classification Using Machine Learning Algorithm
- GirmaBade
+ Girma YohannisBade
OlgaKolesnikovaInstituto Politécnico Nacional
GrigoriSidorovInstituto Politécnico Nacional
- JoséOropeza
+ José LuisOropeza
24-29
The rise of social media has facilitated easier communication, information sharing, and current affairs updates. However, the prevalence of misleading and deceptive content, commonly referred to as fake news, poses a significant challenge. This paper focuses on the classification of fake news in Malayalam, a Dravidian language, utilizing natural language processing (NLP) techniques. To develop a model, we employed a random forest machine learning method on a dataset provided by a shared task(DravidianLangTech@EACL 2024)1. When evaluated by the separate test dataset, our developed model achieved a 0.71 macro F1 measure.
2024.dravidianlangtech-1.4
@@ -350,7 +350,7 @@
Habesha@DravidianLangTech 2024: Detecting Fake News Detection in Dravidian Languages using Deep Learning
- MesayYigezuInstituto Politécnico Nacional
+ Mesay GemedaYigezuInstituto Politécnico Nacional
OlgaKolesnikovaInstituto Politécnico Nacional
GrigoriSidorovInstituto Politécnico Nacional
AlexanderGelbukhInstituto Politécnico Nacional
@@ -543,10 +543,10 @@
Social Media Hate and Offensive Speech Detection Using Machine Learning method
- GirmaBade
+ Girma YohannisBade
OlgaKolesnikovaInstituto Politécnico Nacional
GrigoriSidorovInstituto Politécnico Nacional
- JoséOropeza
+ José LuisOropeza
240-244
Even though the improper use of social media is increasing nowadays, there is also technology that brings solutions. Here, improperness is posting hate and offensive speech that might harm an individual or group. Hate speech refers to an insult toward an individual or group based on their identities. Spreading it on social media platforms is a serious problem for society. The solution, on the other hand, is the availability of natural language processing(NLP) technology that is capable to detect and handle such problems. This paper presents the detection of social media’s hate and offensive speech in the code-mixed Telugu language. For this, the task and golden standard dataset were provided for us by the shared task organizer (DravidianLangTech@ EACL 2024)1. To this end, we have employed the TF-IDF technique for numeric feature extraction and used a random forest algorithm for modeling hate speech detection. Finally, the developed model was evaluated on the test dataset and achieved 0.492 macro-F1.
2024.dravidianlangtech-1.40
diff --git a/data/xml/2024.emnlp.xml b/data/xml/2024.emnlp.xml
index 5ce13a9639..0692dd95fe 100644
--- a/data/xml/2024.emnlp.xml
+++ b/data/xml/2024.emnlp.xml
@@ -3296,7 +3296,7 @@
Tyler A.ChangGoogle and University of California, San Diego
CatherineArnett
ZhuowenTuUniversity of California, San Diego
- BenBergenUniversity of California, San Diego
+ Benjamin K.BergenUniversity of California, San Diego
4074-4096
Multilingual language models are widely used to extend NLP systems to low-resource languages. However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce. Here, we pre-train over 10,000 monolingual and multilingual language models for over 250 languages, including multiple language families that are under-studied in NLP. We assess how language modeling performance in each language varies as a function of (1) monolingual dataset size, (2) added multilingual dataset size, (3) linguistic similarity of the added languages, and (4) model size (up to 45M parameters). We find that in moderation, adding multilingual data improves low-resource language modeling performance, similar to increasing low-resource dataset sizes by up to 33%. Improvements depend on the syntactic similarity of the added multilingual data, with marginal additional effects of vocabulary overlap. However, high-resource languages consistently perform worse in multilingual pre-training scenarios. As dataset sizes increase, adding multilingual data begins to hurt performance for both low-resource and high-resource languages, likely due to limited model capacity (the “curse of multilinguality”). These results suggest that massively multilingual pre-training may not be optimal for any languages involved, but that more targeted models can significantly improve performance.
2024.emnlp-main.236
@@ -11864,8 +11864,8 @@
ChuangWang
JianYao
LiLiuJiangnan University
- FangWeiJiangnan University
- Eddie Y.k.Eddie
+ WeiFangJiangnan University
+ Eddie-Yin-KweeNg
15257-15269
Knowledge graph completion (KGC) aims to infer missing or incomplete parts in knowledge graph. The existing models are generally divided into structure-based and description-based models, among description-based models often require longer training and inference times as well as increased memory usage. In this paper, we propose Pre-Encoded Masked Language Model (PEMLM) to efficiently solve KGC problem. By encoding textual descriptions into semantic representations before training, the necessary resources are significantly reduced. Furthermore, we introduce a straightforward but effective fusion framework to integrate structural embedding with pre-encoded semantic description, which enhances the model’s prediction performance on 1-N relations. The experimental results demonstrate that our proposed strategy attains state-of-the-art performance on the WN18RR (MRR+5.4% and Hits@1+6.4%) and UMLS datasets. Compared to existing models, we have increased inference speed by 30x and reduced training memory by approximately 60%.
2024.emnlp-main.851
diff --git a/data/xml/2024.findings.xml b/data/xml/2024.findings.xml
index 7403e63112..c064827178 100644
--- a/data/xml/2024.findings.xml
+++ b/data/xml/2024.findings.xml
@@ -28660,7 +28660,7 @@ and high variation in performance on the subset, suggesting our plausibility cri
Karen Jia-HuiLiCharles University Prague
RafaelSargsyan
VivekKumarUniversity of the Bundeswehr Munich
- DiegoReforgiato
+ DiegoReforgiato Recupero
DanieleRiboniUniversity of Cagliari
OndrejDusekCharles University, Prague
11519-11545
diff --git a/data/xml/2024.jeptalnrecital.xml b/data/xml/2024.jeptalnrecital.xml
index d7742b51ec..d56b5a5dae 100644
--- a/data/xml/2024.jeptalnrecital.xml
+++ b/data/xml/2024.jeptalnrecital.xml
@@ -1444,8 +1444,8 @@
Jargon : Une suite de modèles de langues et de référentiels d’évaluation pour les domaines spécialisés du français
VincentSegonne
AidanMannion
- LauraAlonzo-Canul
- AudibertAlexandre
+ Laura CristinaAlonzo Canul
+ AlexandreAudibert
XingyuLiu
CécileMacaire
AdrienPupier
@@ -1455,7 +1455,7 @@
MagaliNorré
Massih-RezaAmini
PierretteBouillon
- IrisEshkol Taravella
+ IrisEshkol-Taravella
EmmanuelleEsparança-Rodier
ThomasFrançois
LorraineGoeuriot
diff --git a/data/xml/2024.naacl.xml b/data/xml/2024.naacl.xml
index 10e2b9e6f7..5a49bd2507 100644
--- a/data/xml/2024.naacl.xml
+++ b/data/xml/2024.naacl.xml
@@ -4094,8 +4094,8 @@
Does GPT-4 pass the Turing test?
- CameronJonesUniversity of California, San Diego
- BenBergen
+ Cameron R.JonesUniversity of California, San Diego
+ Benjamin K.Bergen
5183-5210
We evaluated GPT-4 in a public online Turing test. The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%), but falling short of the baseline set by human participants (66%). Participants’ decisions were based mainly on linguistic style (35%) and socioemotional traits (27%), supporting the idea that intelligence, narrowly conceived, is not sufficient to pass the Turing test. Participant knowledge about LLMs and number of games played positively correlated with accuracy in detecting AI, suggesting learning and practice as possible strategies to mitigate deception. Despite known limitations as a test of intelligence, we argue that the Turing test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.
2024.naacl-long.290
diff --git a/data/xml/2025.dravidianlangtech.xml b/data/xml/2025.dravidianlangtech.xml
index f2493406fb..5ebc93ad7c 100644
--- a/data/xml/2025.dravidianlangtech.xml
+++ b/data/xml/2025.dravidianlangtech.xml
@@ -1528,14 +1528,14 @@
Overview of the Shared Task on Sentiment Analysis in Tamil and Tulu
- ThenmozhiDurairaj
+ DurairajThenmozhi
Bharathi RajaChakravarthiUniversity of Galway
AshaHegdeMangalore University
Hosahalli LakshmaiahShashirekhaMangalore University
RajeswariNatarajan
SajeethaThavareesan
RatnasingamSakuntharajEastern University of Sri Lanka
- KrishnakumariK
+ KrishnakumariKalyanasundaram
CharmathiRajkumar
PoorviShetty
Harshitha SKumar
diff --git a/data/xml/2025.findings.xml b/data/xml/2025.findings.xml
index 0ec705378b..49ae1799b3 100644
--- a/data/xml/2025.findings.xml
+++ b/data/xml/2025.findings.xml
@@ -4619,7 +4619,7 @@
M-IFEval: Multilingual Instruction-Following Evaluation
AntoineDussolle
- A.Cardeña
+ AndreaCardeña Díaz
ShotaSatoLightblue
PeterDevine
6161-6176
diff --git a/data/xml/2025.naacl.xml b/data/xml/2025.naacl.xml
index f008d30822..c784e7715f 100644
--- a/data/xml/2025.naacl.xml
+++ b/data/xml/2025.naacl.xml
@@ -10781,7 +10781,7 @@
DaniilGrebenkin
OlegSedukhinSiberian Neuronets LLC
MikhailKlementev
- DerunetsRomanNovosibirsk State University
+ RomanDerunetsNovosibirsk State University
LyudmilaBudnevaNovosibirsk State University
988-997
This work presents a speech-to-text system “Pisets” for scientists and journalists which is based on a three-component architecture aimed at improving speech recognition accuracy while minimizing errors and hallucinations associated with the Whisper model. The architecture comprises primary recognition using Wav2Vec2, false positive filtering via the Audio Spectrogram Transformer (AST), and final speech recognition through Whisper. The implementation of curriculum learning methods and the utilization of diverse Russian-language speech corpora significantly enhanced the system’s effectiveness. Additionally, advanced uncertainty modeling techniques were introduced, contributing to further improvements in transcription quality. The proposed approaches ensure robust transcribing of long audio data across various acoustic conditions compared to WhisperX and the usual Whisper model. The source code of “Pisets” system is publicly available at GitHub: https://github.com/bond005/pisets.
@@ -11400,7 +11400,7 @@
Streamlining LLMs: Adaptive Knowledge Distillation for Tailored Language Models
PrajviSaxenaGerman Research Center for AI
SabineJanzen
- WolfgangMaassUniversität des Saarlandes
+ WolfgangMaaßUniversität des Saarlandes
448-455
Large language models (LLMs) like GPT-4 and LLaMA-3 offer transformative potential across industries, e.g., enhancing customer service, revolutionizing medical diagnostics, or identifying crises in news articles. However, deploying LLMs faces challenges such as limited training data, high computational costs, and issues with transparency and explainability. Our research focuses on distilling compact, parameter-efficient tailored language models (TLMs) from LLMs for domain-specific tasks with comparable performance. Current approaches like knowledge distillation, fine-tuning, and model parallelism address computational efficiency but lack hybrid strategies to balance efficiency, adaptability, and accuracy. We present ANON - an adaptive knowledge distillation framework integrating knowledge distillation with adapters to generate computationally efficient TLMs without relying on labeled datasets. ANON uses cross-entropy loss to transfer knowledge from the teacher’s outputs and internal representations while employing adaptive prompt engineering and a progressive distillation strategy for phased knowledge transfer. We evaluated ANON’s performance in the crisis domain, where accuracy is critical and labeled data is scarce. Experiments showed that ANON outperforms recent approaches of knowledge distillation, both in terms of the resulting TLM performance and in reducing the computational costs for training and maintaining accuracy compared to LLMs for domain-specific applications.
2025.naacl-srw.43
diff --git a/data/xml/2025.nllp.xml b/data/xml/2025.nllp.xml
index a6128aa655..db7760f4d4 100644
--- a/data/xml/2025.nllp.xml
+++ b/data/xml/2025.nllp.xml
@@ -399,9 +399,9 @@
Extract-Explain-Abstract: A Rhetorical Role-Driven Domain-Specific Summarisation Framework for Indian Legal Documents
VeerChhedaDwarkadas J. Sanghvi College Of Engineering
- Aaditya UdayGhaisas
+ AadityaGhaisas
AvantikaSankhe
- Dr. NarendraShekokarDwarkadas J. Sanghvi College Of Engineering, Dhirubhai Ambani Institute Of Information and Communication Technology
+ NarendraShekokarDwarkadas J. Sanghvi College Of Engineering, Dhirubhai Ambani Institute Of Information and Communication Technology
439-455
Legal documents are characterized by theirlength, intricacy, and dense use of jargon, making efficacious summarisation both paramountand challenging. Existing zero-shot methodologies in small language models struggle tosimplify this jargon and are prone to punts andhallucinations with longer prompts. This paperintroduces the Rhetorical Role-based Extract-Explain-Abstract (EEA) Framework, a novelthree-stage methodology for summarisation ofIndian legal documents in low-resource settings. The approach begins by segmenting legaltexts using rhetorical roles, such as facts, issues and arguments, through a domain-specificphrase corpus and extraction based on TF-IDF.In the explanation stage, the segmented output is enriched with logical connections to ensure coherence and legal fidelity. The final abstraction phase condenses these interlinked segments into cogent, high-level summaries thatpreserve critical legal reasoning. Experimentson Indian legal datasets show that the EEAframework typically outperforms in ROUGE,BERTScore, Flesch Reading Ease, Age of Acquisition, SummaC and human evaluations. Wealso employ InLegalBERTScore as a metric tocapture domain specific semantics of Indianlegal documents.
2025.nllp-1.32
diff --git a/data/xml/2025.semeval.xml b/data/xml/2025.semeval.xml
index 41a69e5d43..d3f682ab9b 100644
--- a/data/xml/2025.semeval.xml
+++ b/data/xml/2025.semeval.xml
@@ -1977,13 +1977,13 @@
Amado at SemEval-2025 Task 11: Multi-label Emotion Detection in Amharic and English Data
- GirmaBadeCIC,IPN,MX
+ Girma YohannisBadeCIC,IPN,MX
OlgaKolesnikovaCIC,IPN,MX
- JoseOropezaCIC,IPN,MX
+ José LuisOropezaCIC,IPN,MX
GrigoriSidorovCIC,IPN,MX
- MesayYigezuCIC,IPN,Mx
+ Mesay GemedaYigezuCIC,IPN,Mx
1406-1410
- Amado at SemEval-2025 Task 11: Multi-label Emotion Detection inAmharic and English DataGirma Yohannis Bade, Olga Kolesnikova, José Luis OropezaGrigori Sidorov, Mesay Gemeda Yigezua(Centro de Investigaciones en Computación(CIC),Instituto Politécnico Nacional(IPN), Miguel Othon de Mendizabal,Ciudad de México, 07320, México.)
+ Recently, social media has become a platform for different human emotions. Although most existing works treat the user’s opinions into a single emotion, the reality is that one user can have more than one emotion at a time, representing multiple emotions at the same time. Multi-label emotion detection is a more advanced and realistic approach, as it acknowledges the complexity of human emotions and their overlapping nature. This paper presents multi-label emotion detection in Amharic and English data. The work is part of SemEval2025 shared task 11, where tasks and datasets are offered by task organizers. To accomplish the aim of the given task, we fine-tune transformers base BERT model, passing through all different workflow pipelines. On unseen test data, the model evaluation achieved 0.6300 and 0.7025 an average macro F1-score for Amharic and English, respectively.
2025.semeval-1.185
bade-etal-2025-amado
diff --git a/data/xml/L16.xml b/data/xml/L16.xml
index db5973c623..d6495d2adb 100644
--- a/data/xml/L16.xml
+++ b/data/xml/L16.xml
@@ -2745,8 +2745,8 @@
Towards a Corpus of Violence Acts in Arabic Social Media
AymanAlhelbawy
- PoesioMassimo
UdoKruschwitz
+ MassimoPoesio
1627–1631
In this paper we present a new corpus of Arabic tweets that mention some form of violent event, developed to support the automatic identification of Human Rights Abuse. The dataset was manually labelled for seven classes of violence using crowdsourcing.
L16-1257
@@ -4541,7 +4541,7 @@
A Gold Standard for Scalar Adjectives
BryanWilkinson
- OatesTim
+ TimOates
2669–2675
We present a gold standard for evaluating scale membership and the order of scalar adjectives. In addition to evaluating existing methods of ordering adjectives, this knowledge will aid in studying the organization of adjectives in the lexicon. This resource is the result of two elicitation tasks conducted with informants from Amazon Mechanical Turk. The first task is notable for gathering open-ended lexical data from informants. The data is analyzed using Cultural Consensus Theory, a framework from anthropology, to not only determine scale membership but also the level of consensus among the informants (Romney et al., 1986). The second task gathers a culturally salient ordering of the words determined to be members. We use this method to produce 12 scales of adjectives for use in evaluation.
L16-1424
diff --git a/data/xml/P95.xml b/data/xml/P95.xml
index 785d16d4ba..807885a8b1 100644
--- a/data/xml/P95.xml
+++ b/data/xml/P95.xml
@@ -161,7 +161,7 @@
Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies
ChinatsuAone
- ScottWilliam
+ ScottWilliam Bennett
10.3115/981658.981675
122–129
P95-1017
diff --git a/data/xml/W16.xml b/data/xml/W16.xml
index 36ac270b7c..a63833255f 100644
--- a/data/xml/W16.xml
+++ b/data/xml/W16.xml
@@ -8577,7 +8577,7 @@
CoCoGen - Complexity Contour Generator: Automatic Assessment of Linguistic Complexity Using a Sliding-Window Technique
- StröbelMarcus
+ MarcusStröbel
ElmaKerz
DanielWiechmann
StellaNeumann
diff --git a/data/xml/W19.xml b/data/xml/W19.xml
index 20aecc075f..1e0b653276 100644
--- a/data/xml/W19.xml
+++ b/data/xml/W19.xml
@@ -8458,9 +8458,10 @@ One of the references was wrong therefore it is corrected to cite the appropriat
Modeling language learning using specialized Elo rating
JueHou
- KoppatzMaximilian
+ Maximilian W.Koppatz
José MaríaHoya Quecedo
NataliyaStoyanova
+ MikhailKopotev
RomanYangarber
494–506
Automatic assessment of the proficiency levels of the learner is a critical part of Intelligent Tutoring Systems. We present methods for assessment in the context of language learning. We use a specialized Elo formula used in conjunction with educational data mining. We simultaneously obtain ratings for the proficiency of the learners and for the difficulty of the linguistic concepts that the learners are trying to master. From the same data we also learn a graph structure representing a domain model capturing the relations among the concepts. This application of Elo provides ratings for learners and concepts which correlate well with subjective proficiency levels of the learners and difficulty levels of the concepts.
@@ -16513,9 +16514,9 @@ In this tutorial on MT and post-editing we would like to continue sharing the la
Character-level Annotation for Chinese Surface-Syntactic Universal Dependencies
+ ChuanmingDong
YixuanLi
- GerdesKim
- DongChuanming
+ KimGerdes
216–226
W19-7726
10.18653/v1/W19-7726