diff --git a/data/xml/2015.jeptalnrecital.xml b/data/xml/2015.jeptalnrecital.xml index c7b7447739..4d26eb7f0a 100644 --- a/data/xml/2015.jeptalnrecital.xml +++ b/data/xml/2015.jeptalnrecital.xml @@ -711,9 +711,9 @@ Etiquetage morpho-syntaxique de tweets avec des <fixed-case>CRF</fixed-case> TianTian - DinarelliMarco - TellierIsabelle - CardosoPedro + MarcoDinarelli + IsabelleTellier + PedroCardoso 291–297 Nous nous intéressons dans cet article à l’apprentissage automatique d’un étiqueteur mopho-syntaxique pour les tweets en anglais. Nous proposons tout d’abord un jeu d’étiquettes réduit avec 17 étiquettes différentes, qui permet d’obtenir de meilleures performances en exactitude par rapport au jeu d’étiquettes traditionnel qui contient 45 étiquettes. Comme nous disposons de peu de tweets étiquetés, nous essayons ensuite de compenser ce handicap en ajoutant dans l’ensemble d’apprentissage des données issues de textes bien formés. Les modèles mixtes obtenus permettent d’améliorer les résultats par rapport aux modèles appris avec un seul corpus, qu’il soit issu de Twitter ou de textes journalistiques. 2015.jeptalnrecital-court.43 diff --git a/data/xml/2016.jeptalnrecital.xml b/data/xml/2016.jeptalnrecital.xml index cb92b74be8..d96d7c9a99 100644 --- a/data/xml/2016.jeptalnrecital.xml +++ b/data/xml/2016.jeptalnrecital.xml @@ -278,13 +278,13 @@ La distinction entre les paraphasies phonétiques et phonologiques dans l’aphasie : Etude de cas de deux patients aphasiques (The distinction between phonetic and phonological paraphasias in aphasia: A multiple casestudy of aphasic patients) - fra ClémenceVerhaegen VéroniqueDelvaux KathyHuet - FagniartSophie + SophieFagniart MyriamPiccaluga BernardHarmegnies + fra 200–210 La spécificité phonologique ou phonétique des erreurs de production orale observées chez les patients aphasiques reste débattue. Cependant, la distinction entre ces deux types d’erreurs est fréquemment basée sur des analyses perceptives qui peuvent être influencées par le système perceptif de l’expérimentateur. Afin de pallier ce biais, nous avons réalisé des analyses acoustiques des productions de deux patients aphasiques, dans une tâche de répétition de non-mots. Nous nous sommes centrés sur l’analyse de consonnes occlusives. Les résultats ont montré la présence de difficultés de gestion du voisement chez les deux patients, indiquant la présence de troubles phonétiques. En outre, les résultats montrent une grande diversité des manifestations des troubles langagiers des patients ainsi que l’intervention potentielle de stratégies de compensation de leurs difficultés. L’intérêt de procéder à des analyses acoustiques précises utilisant des indices multiples est discuté. 2016.jeptalnrecital-jep.23 @@ -722,12 +722,12 @@ Que disents nos silences? Apport des données acoustiques, articulatoires et physiologiques pour l’étude des pauses silencieuses (What do our silences say? Contribution of acoustic, articulatory and physiological data to the study on silent pauses) + MurielLalain + ThierryLegou + CamilleFauth + FabriceHirsch + IvanaDidirkova fra - LalainMuriel - LegouThierry - FauthCamille - HirschFabrice - DidirkovaIvana 563–570 Si la rhétorique s’est intéressée très tôt à la pause, il a fallu attendre le XXème siècle pour que d’autres disciplines – la psycholinguistique, le traitement automatique des langues, la phonétique – accordent à ces moments de silence l’intérêt qu’ils méritent. Il a ainsi été montré que ces ruptures dans le signal acoustique, loin de signer une absence d’activité, constituaient en réalité le lieu d’une activité physiologique (la respiration) et/ou cognitive (planification du discours) qui participent tout autant au message que la parole elle-même. Dans cette étude pilote, nous proposons des observations et des pistes de réflexions à partir de l’analyse des pauses silencieuses dans un corpus de parole lue et semi dirigée. Nous mettons notamment en évidence l’apport de l’analyse conjointe de données acoustiques, articulatoires (EMA) et physiologiques (respiratoires) pour l’identification, parmi les pauses silencieuses, des pauses respiratoires, syntaxiques et d’hésitation. 2016.jeptalnrecital-jep.63 @@ -756,15 +756,15 @@ Quels tests d’intelligibilité pour évaluer les troubles de production de la parole ? (What kind of intelligibility test to assess speech production disorders?) - fra AlainGhio LaurenceGiusti EmilieBlanc SergePinto - LalainMuriel + MurielLalain DanièleRobert CorineFredouille VirginieWoisard + fra 589–596 L’intelligibilité de la parole se définit comme le degré de précision avec lequel un message est compris par un auditeur. A ce titre, la perte d’intelligibilité représente souvent une plainte importante pour les patients atteints de troubles de production de la parole, puisqu’elle participe à la diminution de la qualité de vie au niveau communicationnel. Plusieurs outils existent actuellement pour évaluer l’intelligibilité mais aucun ne satisfait pleinement les contraintes cliniques. Dans une première étude, nous avons adapté au français la version 2 du Frenchay Dysarthria Assessment, un test reconnu dans le milieu anglo-saxon pour l’évaluation de locuteurs dysarthriques. Nous avons créé le corpus de mots français en nous appuyant sur les critères définis dans le FDA-2 puis nous avons testé le protocole sur une cinquantaine de locuteurs. Les résultats sont satisfaisants mais divers biais méthodologiques nous ont conduits à poursuivre notre démarche en proposant des listes de pseudo-mots apparentant le test à du décodage acoustico-phonétique. 2016.jeptalnrecital-jep.66 diff --git a/data/xml/2020.gamnlp.xml b/data/xml/2020.gamnlp.xml index 272055ea90..84ee0871d2 100644 --- a/data/xml/2020.gamnlp.xml +++ b/data/xml/2020.gamnlp.xml @@ -48,7 +48,7 @@ Game Design Evaluation of <fixed-case>GWAP</fixed-case>s for Collecting Word Associations MathieuLafourcade - Le BrunNathalie + NathalieLe Brun 26–33 GWAP design might have a tremendous effect on its popularity of course but also on the quality of the data collected. In this paper, a comparison is undertaken between two GWAPs for building term association lists, namely JeuxDeMots and Quicky Goose. After comparing both game designs, the Cohen kappa of associative lists in various configurations is computed in order to assess likeness and differences of the data they provide. 2020.gamnlp-1.4 diff --git a/data/xml/2020.lrec.xml b/data/xml/2020.lrec.xml index bdc6c53de3..b253968941 100644 --- a/data/xml/2020.lrec.xml +++ b/data/xml/2020.lrec.xml @@ -5399,7 +5399,7 @@ Neural Disambiguation of Lemma and Part of Speech in Morphologically Rich Languages José MaríaHoya Quecedo - KoppatzMaximilian + Maximilian W.Koppatz RomanYangarber 3573–3582 We consider the problem of disambiguating the lemma and part of speech of ambiguous words in morphologically rich languages. We propose a method for disambiguating ambiguous words in context, using a large un-annotated corpus of text, and a morphological analyser—with no manual disambiguation or data annotation. We assume that the morphological analyser produces multiple analyses for ambiguous words. The idea is to train recurrent neural networks on the output that the morphological analyser produces for unambiguous words. We present performance on POS and lemma disambiguation that reaches or surpasses the state of the art—including supervised models—using no manually annotated data. We evaluate the method on several morphologically rich languages. diff --git a/data/xml/2020.nlpcovid19.xml b/data/xml/2020.nlpcovid19.xml index cc9bf4a325..519bcfdfa6 100644 --- a/data/xml/2020.nlpcovid19.xml +++ b/data/xml/2020.nlpcovid19.xml @@ -635,7 +635,7 @@ <fixed-case>A</fixed-case>sk<fixed-case>M</fixed-case>e: A <fixed-case>LAPPS</fixed-case> <fixed-case>G</fixed-case>rid-based <fixed-case>NLP</fixed-case> Query and Retrieval System for Covid-19 Literature KeithSuderman NancyIde - VerhagenMarc + MarcVerhagen BrentCochran JamesPustejovsky In a recent project, the Language Application Grid was augmented to support the mining of scientific publications. The results of that ef- fort have now been repurposed to focus on Covid-19 literature, including modification of the LAPPS Grid “AskMe” query and retrieval engine. We describe the AskMe system and discuss its functionality as compared to other query engines available to search covid-related publications. diff --git a/data/xml/2021.acl.xml b/data/xml/2021.acl.xml index fad41c1c9f..a11edf5681 100644 --- a/data/xml/2021.acl.xml +++ b/data/xml/2021.acl.xml @@ -9803,7 +9803,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO <fixed-case>S</fixed-case>a<fixed-case>R</fixed-case>o<fixed-case>C</fixed-case>o: Detecting Satire in a Novel <fixed-case>R</fixed-case>omanian Corpus of News Articles Ana-CristinaRogoz - GamanMihaela + MihaelaGăman Radu TudorIonescu 1073–1079 In this work, we introduce a corpus for satire detection in Romanian news. We gathered 55,608 public news articles from multiple real and satirical news sources, composing one of the largest corpora for satire detection regardless of language and the only one for the Romanian language. We provide an official split of the text samples, such that training news articles belong to different sources than test news articles, thus ensuring that models do not achieve high performance simply due to overfitting. We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus. Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research. diff --git a/data/xml/2021.eacl.xml b/data/xml/2021.eacl.xml index 42e6cb24a8..889938feb9 100644 --- a/data/xml/2021.eacl.xml +++ b/data/xml/2021.eacl.xml @@ -968,8 +968,8 @@ Clustering Word Embeddings with Self-Organizing Maps. Application on <fixed-case>L</fixed-case>a<fixed-case>R</fixed-case>o<fixed-case>S</fixed-case>e<fixed-case>D</fixed-case>a - A Large <fixed-case>R</fixed-case>omanian Sentiment Data Set - AncaTache - GamanMihaela + Anca MariaTache + MihaelaGăman Radu TudorIonescu 949–956 Romanian is one of the understudied languages in computational linguistics, with few resources available for the development of natural language processing tools. In this paper, we introduce LaRoSeDa, a Large Romanian Sentiment Data Set, which is composed of 15,000 positive and negative reviews collected from the largest Romanian e-commerce platform. We employ two sentiment classification methods as baselines for our new data set, one based on low-level features (character n-grams) and one based on high-level features (bag-of-word-embeddings generated by clustering word embeddings with k-means). As an additional contribution, we replace the k-means clustering algorithm with self-organizing maps (SOMs), obtaining better results because the generated clusters of word embeddings are closer to the Zipf’s law distribution, which is known to govern natural language. We also demonstrate the generalization capacity of using SOMs for the clustering of word embeddings on another recently-introduced Romanian data set, for text categorization by topic. diff --git a/data/xml/2021.vardial.xml b/data/xml/2021.vardial.xml index 19319a8dc4..cbecce338a 100644 --- a/data/xml/2021.vardial.xml +++ b/data/xml/2021.vardial.xml @@ -22,7 +22,7 @@ Findings of the <fixed-case>V</fixed-case>ar<fixed-case>D</fixed-case>ial Evaluation Campaign 2021 Bharathi RajaChakravarthi - GamanMihaela + MihaelaGăman Radu TudorIonescu HeidiJauhiainen TommiJauhiainen @@ -121,7 +121,7 @@ <fixed-case>U</fixed-case>nibuc<fixed-case>K</fixed-case>ernel: Geolocating <fixed-case>S</fixed-case>wiss <fixed-case>G</fixed-case>erman Jodels Using Ensemble Learning - GamanMihaela + MihaelaGăman SebastianCojocariu Radu TudorIonescu 84–95 diff --git a/data/xml/2022.starsem.xml b/data/xml/2022.starsem.xml index 0d50a9be88..c0a446c8b2 100644 --- a/data/xml/2022.starsem.xml +++ b/data/xml/2022.starsem.xml @@ -291,12 +291,12 @@ Speech acts and Communicative Intentions for Urgency Detection - LaurentiEnzo - BourgonNils + EnzoLaurenti + NilsBourgon FarahBenamara - MariAlda + AldaMari VéroniqueMoriceau - CourgeonCamille + CamilleCourgeon 289-298 Recognizing speech acts (SA) is crucial for capturing meaning beyond what is said, making communicative intentions particularly relevant to identify urgent messages. This paper attempts to measure for the first time the impact of SA on urgency detection during crises,006in tweets. We propose a new dataset annotated for both urgency and SA, and develop several deep learning architectures to inject SA into urgency detection while ensuring models generalisability. Our results show that taking speech acts into account in tweet analysis improves information type detection in an out-of-type configuration where models are evaluated in unseen event types during training. These results are encouraging and constitute a first step towards SA-aware disaster management in social media. 2022.starsem-1.25 diff --git a/data/xml/2023.acl.xml b/data/xml/2023.acl.xml index 5e46db6700..e144df20b8 100644 --- a/data/xml/2023.acl.xml +++ b/data/xml/2023.acl.xml @@ -4360,10 +4360,10 @@ - No clues good clues: out of context Lexical Relation Classification - LuciaPitarchUniversity of Zaragoza - JordiBernadUniversity of Zaragoza - LacramioaraDrancaCentro Universitario de la Defensa + No clues, good clues: Out of context Lexical Relation Classification + LucíaPitarchUniversity of Zaragoza + JorgeBernadUniversity of Zaragoza + LicriDrancaCentro Universitario de la Defensa CarlosBobed LisbonaUniversity of Zaragoza, Spain JorgeGraciaUniversity of Zaragoza 5607-5625 diff --git a/data/xml/2023.clinicalnlp.xml b/data/xml/2023.clinicalnlp.xml index c27f41e010..ebb1df98e7 100644 --- a/data/xml/2023.clinicalnlp.xml +++ b/data/xml/2023.clinicalnlp.xml @@ -521,11 +521,11 @@ RobertTinn SidKiblawiMicrosoft YuGuMicrosoft - AkshayChaudhariStanford University and Subtle Medical + Akshay S.ChaudhariStanford University and Subtle Medical HoifungPoonMicrosoft ShengZhangMicrosoft MuWeiMicrosoft - J.Preston + Joseph S.Preston 373-384 Motivated by the scarcity of high-quality labeled biomedical text, as well as the success of data programming, we introduce KRISS-Search. By leveraging the Unified Medical Language Systems (UMLS) ontology, KRISS-Search addresses an interactive few-shot span recommendation task that we propose. We first introduce unsupervised KRISS-Search and show that our method outperforms existing methods in identifying spans that are semantically similar to a given span of interest, with >50% AUPRC improvement relative to PubMedBERT. We then introduce supervised KRISS-Search, which leverages human interaction to improve the notion of similarity used by unsupervised KRISS-Search. Through simulated human feedback, we demonstrate an enhanced F1 score of 0.68 in classifying spans as semantically similar or different in the low-label setting, outperforming PubMedBERT by 2 F1 points. Finally, supervised KRISS-Search demonstrates competitive or superior performance compared to PubMedBERT in few-shot biomedical named entity recognition (NER) across five benchmark datasets, with an average improvement of 5.6 F1 points. We envision KRISS-Search increasing the efficiency of programmatic data labeling and also providing broader utility as an interactive biomedical search engine. 2023.clinicalnlp-1.40 diff --git a/data/xml/2023.jeptalnrecital.xml b/data/xml/2023.jeptalnrecital.xml index 74ccf5db56..dfb92a98b0 100644 --- a/data/xml/2023.jeptalnrecital.xml +++ b/data/xml/2023.jeptalnrecital.xml @@ -160,7 +160,7 @@ Augmentation des modèles de langage français par graphes de connaissances pour la reconnaissance des entités biomédicales AidanMannion - SchwabDidier + DidierSchwab LorraineGoeuriot ThierryChevalier 177–189 @@ -387,7 +387,7 @@ In NLP, the automatic detection of logical contradictions between statements is Les textes cliniques français générés sont-ils dangereusement similaires à leur source ? Analyse par plongements de phrases NicolasHiebel - FerretOlivier + OlivierFerret KarënFort AurélieNévéol 46–54 @@ -808,7 +808,7 @@ In NLP, the automatic detection of logical contradictions between statements is Recherche cross-modale pour répondre à des questions visuelles PaulLerner - FerretOlivier + OlivierFerret CamilleGuinaudeau 74–92 Répondre à des questions visuelles à propos d’entités nommées (KVQAE) est une tâche difficile qui demande de rechercher des informations dans une base de connaissances multimodale. Nous étudions ici comment traiter cette tâche avec une recherche cross-modale et sa combinaison avec une recherche mono-modale, en se focalisant sur le modèle CLIP, un modèle multimodal entraîné sur des images appareillées à leur légende textuelle. Nos résultats démontrent la supériorité de la recherche cross-modale, mais aussi la complémentarité des deux, qui peuvent être combinées facilement. Nous étudions également différentes manières d’ajuster CLIP et trouvons que l’optimisation cross-modale est la meilleure solution, étant en adéquation avec son pré-entraînement. Notre méthode surpasse les approches précédentes, tout en étant plus simple et moins coûteuse. Ces gains de performance sont étudiés intrinsèquement selon la pertinence des résultats de la recherche et extrinsèquement selon l’exactitude de la réponse extraite par un module externe. Nous discutons des différences entre ces métriques et de ses implications pour l’évaluation de la KVQAE. diff --git a/data/xml/2023.jlcl.xml b/data/xml/2023.jlcl.xml index 69c09b0c3e..68c261dd45 100644 --- a/data/xml/2023.jlcl.xml +++ b/data/xml/2023.jlcl.xml @@ -4,7 +4,7 @@ Journal for Language Technology and Computational Linguistics, Vol. 36 No. 1 RomanSchneider - FaaßGertrud + GertrudFaaß German Society for Computational Lingustics and Language Technology
unknown
May @@ -18,7 +18,7 @@ Computerlinguistische Herausforderungen, empirische Erforschung & multidisziplinäres Potenzial deutschsprachiger Songtexte RomanSchneider - FaaßGertrud + GertrudFaaß iii-v 2023.jlcl-1.1 10.21248/jlcl.36.2023.234 diff --git a/data/xml/2023.paclic.xml b/data/xml/2023.paclic.xml index cd54dc6122..ca7e759175 100644 --- a/data/xml/2023.paclic.xml +++ b/data/xml/2023.paclic.xml @@ -419,13 +419,13 @@ An empirical, corpus-based, approach to <fixed-case>C</fixed-case>antonese nominal expressions - Gr ̈¦goireWinterstein + GrégoireWinterstein DavidVergnaud - Hannah Hoi TungYu - J ̈¦r ̈¦mieLupien - LaperleSamuel - Pei SuiLuk + JérémieLupien + SamuelLaperle + HannahYu ChristopherDavis + Zoe Pei SuiLuk 436–445 2023.paclic-1.43 winterstein-etal-2023-empirical diff --git a/data/xml/2024.dravidianlangtech.xml b/data/xml/2024.dravidianlangtech.xml index f3fabcb616..97c5ca33c3 100644 --- a/data/xml/2024.dravidianlangtech.xml +++ b/data/xml/2024.dravidianlangtech.xml @@ -60,10 +60,10 @@ Social Media Fake News Classification Using Machine Learning Algorithm - GirmaBade + Girma YohannisBade OlgaKolesnikovaInstituto Politécnico Nacional GrigoriSidorovInstituto Politécnico Nacional - JoséOropeza + José LuisOropeza 24-29 The rise of social media has facilitated easier communication, information sharing, and current affairs updates. However, the prevalence of misleading and deceptive content, commonly referred to as fake news, poses a significant challenge. This paper focuses on the classification of fake news in Malayalam, a Dravidian language, utilizing natural language processing (NLP) techniques. To develop a model, we employed a random forest machine learning method on a dataset provided by a shared task(DravidianLangTech@EACL 2024)1. When evaluated by the separate test dataset, our developed model achieved a 0.71 macro F1 measure. 2024.dravidianlangtech-1.4 @@ -350,7 +350,7 @@ Habesha@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024: Detecting Fake News Detection in <fixed-case>D</fixed-case>ravidian Languages using Deep Learning - MesayYigezuInstituto Politécnico Nacional + Mesay GemedaYigezuInstituto Politécnico Nacional OlgaKolesnikovaInstituto Politécnico Nacional GrigoriSidorovInstituto Politécnico Nacional AlexanderGelbukhInstituto Politécnico Nacional @@ -543,10 +543,10 @@ Social Media Hate and Offensive Speech Detection Using Machine Learning method - GirmaBade + Girma YohannisBade OlgaKolesnikovaInstituto Politécnico Nacional GrigoriSidorovInstituto Politécnico Nacional - JoséOropeza + José LuisOropeza 240-244 Even though the improper use of social media is increasing nowadays, there is also technology that brings solutions. Here, improperness is posting hate and offensive speech that might harm an individual or group. Hate speech refers to an insult toward an individual or group based on their identities. Spreading it on social media platforms is a serious problem for society. The solution, on the other hand, is the availability of natural language processing(NLP) technology that is capable to detect and handle such problems. This paper presents the detection of social media’s hate and offensive speech in the code-mixed Telugu language. For this, the task and golden standard dataset were provided for us by the shared task organizer (DravidianLangTech@ EACL 2024)1. To this end, we have employed the TF-IDF technique for numeric feature extraction and used a random forest algorithm for modeling hate speech detection. Finally, the developed model was evaluated on the test dataset and achieved 0.492 macro-F1. 2024.dravidianlangtech-1.40 diff --git a/data/xml/2024.emnlp.xml b/data/xml/2024.emnlp.xml index 5ce13a9639..0692dd95fe 100644 --- a/data/xml/2024.emnlp.xml +++ b/data/xml/2024.emnlp.xml @@ -3296,7 +3296,7 @@ Tyler A.ChangGoogle and University of California, San Diego CatherineArnett ZhuowenTuUniversity of California, San Diego - BenBergenUniversity of California, San Diego + Benjamin K.BergenUniversity of California, San Diego 4074-4096 Multilingual language models are widely used to extend NLP systems to low-resource languages. However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce. Here, we pre-train over 10,000 monolingual and multilingual language models for over 250 languages, including multiple language families that are under-studied in NLP. We assess how language modeling performance in each language varies as a function of (1) monolingual dataset size, (2) added multilingual dataset size, (3) linguistic similarity of the added languages, and (4) model size (up to 45M parameters). We find that in moderation, adding multilingual data improves low-resource language modeling performance, similar to increasing low-resource dataset sizes by up to 33%. Improvements depend on the syntactic similarity of the added multilingual data, with marginal additional effects of vocabulary overlap. However, high-resource languages consistently perform worse in multilingual pre-training scenarios. As dataset sizes increase, adding multilingual data begins to hurt performance for both low-resource and high-resource languages, likely due to limited model capacity (the “curse of multilinguality”). These results suggest that massively multilingual pre-training may not be optimal for any languages involved, but that more targeted models can significantly improve performance. 2024.emnlp-main.236 @@ -11864,8 +11864,8 @@ ChuangWang JianYao LiLiuJiangnan University - FangWeiJiangnan University - Eddie Y.k.Eddie + WeiFangJiangnan University + Eddie-Yin-KweeNg 15257-15269 Knowledge graph completion (KGC) aims to infer missing or incomplete parts in knowledge graph. The existing models are generally divided into structure-based and description-based models, among description-based models often require longer training and inference times as well as increased memory usage. In this paper, we propose Pre-Encoded Masked Language Model (PEMLM) to efficiently solve KGC problem. By encoding textual descriptions into semantic representations before training, the necessary resources are significantly reduced. Furthermore, we introduce a straightforward but effective fusion framework to integrate structural embedding with pre-encoded semantic description, which enhances the model’s prediction performance on 1-N relations. The experimental results demonstrate that our proposed strategy attains state-of-the-art performance on the WN18RR (MRR+5.4% and Hits@1+6.4%) and UMLS datasets. Compared to existing models, we have increased inference speed by 30x and reduced training memory by approximately 60%. 2024.emnlp-main.851 diff --git a/data/xml/2024.findings.xml b/data/xml/2024.findings.xml index 7403e63112..c064827178 100644 --- a/data/xml/2024.findings.xml +++ b/data/xml/2024.findings.xml @@ -28660,7 +28660,7 @@ and high variation in performance on the subset, suggesting our plausibility cri Karen Jia-HuiLiCharles University Prague RafaelSargsyan VivekKumarUniversity of the Bundeswehr Munich - DiegoReforgiato + DiegoReforgiato Recupero DanieleRiboniUniversity of Cagliari OndrejDusekCharles University, Prague 11519-11545 diff --git a/data/xml/2024.jeptalnrecital.xml b/data/xml/2024.jeptalnrecital.xml index d7742b51ec..d56b5a5dae 100644 --- a/data/xml/2024.jeptalnrecital.xml +++ b/data/xml/2024.jeptalnrecital.xml @@ -1444,8 +1444,8 @@ Jargon : Une suite de modèles de langues et de référentiels d’évaluation pour les domaines spécialisés du français VincentSegonne AidanMannion - LauraAlonzo-Canul - AudibertAlexandre + Laura CristinaAlonzo Canul + AlexandreAudibert XingyuLiu CécileMacaire AdrienPupier @@ -1455,7 +1455,7 @@ MagaliNorré Massih-RezaAmini PierretteBouillon - IrisEshkol Taravella + IrisEshkol-Taravella EmmanuelleEsparança-Rodier ThomasFrançois LorraineGoeuriot diff --git a/data/xml/2024.naacl.xml b/data/xml/2024.naacl.xml index 10e2b9e6f7..5a49bd2507 100644 --- a/data/xml/2024.naacl.xml +++ b/data/xml/2024.naacl.xml @@ -4094,8 +4094,8 @@ Does <fixed-case>GPT</fixed-case>-4 pass the <fixed-case>T</fixed-case>uring test? - CameronJonesUniversity of California, San Diego - BenBergen + Cameron R.JonesUniversity of California, San Diego + Benjamin K.Bergen 5183-5210 We evaluated GPT-4 in a public online Turing test. The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%), but falling short of the baseline set by human participants (66%). Participants’ decisions were based mainly on linguistic style (35%) and socioemotional traits (27%), supporting the idea that intelligence, narrowly conceived, is not sufficient to pass the Turing test. Participant knowledge about LLMs and number of games played positively correlated with accuracy in detecting AI, suggesting learning and practice as possible strategies to mitigate deception. Despite known limitations as a test of intelligence, we argue that the Turing test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness. 2024.naacl-long.290 diff --git a/data/xml/2025.dravidianlangtech.xml b/data/xml/2025.dravidianlangtech.xml index f2493406fb..5ebc93ad7c 100644 --- a/data/xml/2025.dravidianlangtech.xml +++ b/data/xml/2025.dravidianlangtech.xml @@ -1528,14 +1528,14 @@ Overview of the Shared Task on Sentiment Analysis in <fixed-case>T</fixed-case>amil and <fixed-case>T</fixed-case>ulu - ThenmozhiDurairaj + DurairajThenmozhi Bharathi RajaChakravarthiUniversity of Galway AshaHegdeMangalore University Hosahalli LakshmaiahShashirekhaMangalore University RajeswariNatarajan SajeethaThavareesan RatnasingamSakuntharajEastern University of Sri Lanka - KrishnakumariK + KrishnakumariKalyanasundaram CharmathiRajkumar PoorviShetty Harshitha SKumar diff --git a/data/xml/2025.findings.xml b/data/xml/2025.findings.xml index 0ec705378b..49ae1799b3 100644 --- a/data/xml/2025.findings.xml +++ b/data/xml/2025.findings.xml @@ -4619,7 +4619,7 @@ <fixed-case>M</fixed-case>-<fixed-case>IFE</fixed-case>val: Multilingual Instruction-Following Evaluation AntoineDussolle - A.Cardeña + AndreaCardeña Díaz ShotaSatoLightblue PeterDevine 6161-6176 diff --git a/data/xml/2025.naacl.xml b/data/xml/2025.naacl.xml index f008d30822..c784e7715f 100644 --- a/data/xml/2025.naacl.xml +++ b/data/xml/2025.naacl.xml @@ -10781,7 +10781,7 @@ DaniilGrebenkin OlegSedukhinSiberian Neuronets LLC MikhailKlementev - DerunetsRomanNovosibirsk State University + RomanDerunetsNovosibirsk State University LyudmilaBudnevaNovosibirsk State University 988-997 This work presents a speech-to-text system “Pisets” for scientists and journalists which is based on a three-component architecture aimed at improving speech recognition accuracy while minimizing errors and hallucinations associated with the Whisper model. The architecture comprises primary recognition using Wav2Vec2, false positive filtering via the Audio Spectrogram Transformer (AST), and final speech recognition through Whisper. The implementation of curriculum learning methods and the utilization of diverse Russian-language speech corpora significantly enhanced the system’s effectiveness. Additionally, advanced uncertainty modeling techniques were introduced, contributing to further improvements in transcription quality. The proposed approaches ensure robust transcribing of long audio data across various acoustic conditions compared to WhisperX and the usual Whisper model. The source code of “Pisets” system is publicly available at GitHub: https://github.com/bond005/pisets. @@ -11400,7 +11400,7 @@ Streamlining <fixed-case>LLM</fixed-case>s: Adaptive Knowledge Distillation for Tailored Language Models PrajviSaxenaGerman Research Center for AI SabineJanzen - WolfgangMaassUniversität des Saarlandes + WolfgangMaaßUniversität des Saarlandes 448-455 Large language models (LLMs) like GPT-4 and LLaMA-3 offer transformative potential across industries, e.g., enhancing customer service, revolutionizing medical diagnostics, or identifying crises in news articles. However, deploying LLMs faces challenges such as limited training data, high computational costs, and issues with transparency and explainability. Our research focuses on distilling compact, parameter-efficient tailored language models (TLMs) from LLMs for domain-specific tasks with comparable performance. Current approaches like knowledge distillation, fine-tuning, and model parallelism address computational efficiency but lack hybrid strategies to balance efficiency, adaptability, and accuracy. We present ANON - an adaptive knowledge distillation framework integrating knowledge distillation with adapters to generate computationally efficient TLMs without relying on labeled datasets. ANON uses cross-entropy loss to transfer knowledge from the teacher’s outputs and internal representations while employing adaptive prompt engineering and a progressive distillation strategy for phased knowledge transfer. We evaluated ANON’s performance in the crisis domain, where accuracy is critical and labeled data is scarce. Experiments showed that ANON outperforms recent approaches of knowledge distillation, both in terms of the resulting TLM performance and in reducing the computational costs for training and maintaining accuracy compared to LLMs for domain-specific applications. 2025.naacl-srw.43 diff --git a/data/xml/2025.nllp.xml b/data/xml/2025.nllp.xml index a6128aa655..db7760f4d4 100644 --- a/data/xml/2025.nllp.xml +++ b/data/xml/2025.nllp.xml @@ -399,9 +399,9 @@ Extract-Explain-Abstract: A Rhetorical Role-Driven Domain-Specific Summarisation Framework for <fixed-case>I</fixed-case>ndian Legal Documents VeerChhedaDwarkadas J. Sanghvi College Of Engineering - Aaditya UdayGhaisas + AadityaGhaisas AvantikaSankhe - Dr. NarendraShekokarDwarkadas J. Sanghvi College Of Engineering, Dhirubhai Ambani Institute Of Information and Communication Technology + NarendraShekokarDwarkadas J. Sanghvi College Of Engineering, Dhirubhai Ambani Institute Of Information and Communication Technology 439-455 Legal documents are characterized by theirlength, intricacy, and dense use of jargon, making efficacious summarisation both paramountand challenging. Existing zero-shot methodologies in small language models struggle tosimplify this jargon and are prone to punts andhallucinations with longer prompts. This paperintroduces the Rhetorical Role-based Extract-Explain-Abstract (EEA) Framework, a novelthree-stage methodology for summarisation ofIndian legal documents in low-resource settings. The approach begins by segmenting legaltexts using rhetorical roles, such as facts, issues and arguments, through a domain-specificphrase corpus and extraction based on TF-IDF.In the explanation stage, the segmented output is enriched with logical connections to ensure coherence and legal fidelity. The final abstraction phase condenses these interlinked segments into cogent, high-level summaries thatpreserve critical legal reasoning. Experimentson Indian legal datasets show that the EEAframework typically outperforms in ROUGE,BERTScore, Flesch Reading Ease, Age of Acquisition, SummaC and human evaluations. Wealso employ InLegalBERTScore as a metric tocapture domain specific semantics of Indianlegal documents. 2025.nllp-1.32 diff --git a/data/xml/2025.semeval.xml b/data/xml/2025.semeval.xml index 41a69e5d43..d3f682ab9b 100644 --- a/data/xml/2025.semeval.xml +++ b/data/xml/2025.semeval.xml @@ -1977,13 +1977,13 @@ Amado at <fixed-case>S</fixed-case>em<fixed-case>E</fixed-case>val-2025 Task 11: Multi-label Emotion Detection in <fixed-case>A</fixed-case>mharic and <fixed-case>E</fixed-case>nglish Data - GirmaBadeCIC,IPN,MX + Girma YohannisBadeCIC,IPN,MX OlgaKolesnikovaCIC,IPN,MX - JoseOropezaCIC,IPN,MX + José LuisOropezaCIC,IPN,MX GrigoriSidorovCIC,IPN,MX - MesayYigezuCIC,IPN,Mx + Mesay GemedaYigezuCIC,IPN,Mx 1406-1410 - Amado at SemEval-2025 Task 11: Multi-label Emotion Detection inAmharic and English DataGirma Yohannis Bade, Olga Kolesnikova, José Luis OropezaGrigori Sidorov, Mesay Gemeda Yigezua(Centro de Investigaciones en Computación(CIC),Instituto Politécnico Nacional(IPN), Miguel Othon de Mendizabal,Ciudad de México, 07320, México.) + Recently, social media has become a platform for different human emotions. Although most existing works treat the user’s opinions into a single emotion, the reality is that one user can have more than one emotion at a time, representing multiple emotions at the same time. Multi-label emotion detection is a more advanced and realistic approach, as it acknowledges the complexity of human emotions and their overlapping nature. This paper presents multi-label emotion detection in Amharic and English data. The work is part of SemEval2025 shared task 11, where tasks and datasets are offered by task organizers. To accomplish the aim of the given task, we fine-tune transformers base BERT model, passing through all different workflow pipelines. On unseen test data, the model evaluation achieved 0.6300 and 0.7025 an average macro F1-score for Amharic and English, respectively. 2025.semeval-1.185 bade-etal-2025-amado diff --git a/data/xml/L16.xml b/data/xml/L16.xml index db5973c623..d6495d2adb 100644 --- a/data/xml/L16.xml +++ b/data/xml/L16.xml @@ -2745,8 +2745,8 @@ Towards a Corpus of Violence Acts in <fixed-case>A</fixed-case>rabic Social Media AymanAlhelbawy - PoesioMassimo UdoKruschwitz + MassimoPoesio 1627–1631 In this paper we present a new corpus of Arabic tweets that mention some form of violent event, developed to support the automatic identification of Human Rights Abuse. The dataset was manually labelled for seven classes of violence using crowdsourcing. L16-1257 @@ -4541,7 +4541,7 @@ A Gold Standard for Scalar Adjectives BryanWilkinson - OatesTim + TimOates 2669–2675 We present a gold standard for evaluating scale membership and the order of scalar adjectives. In addition to evaluating existing methods of ordering adjectives, this knowledge will aid in studying the organization of adjectives in the lexicon. This resource is the result of two elicitation tasks conducted with informants from Amazon Mechanical Turk. The first task is notable for gathering open-ended lexical data from informants. The data is analyzed using Cultural Consensus Theory, a framework from anthropology, to not only determine scale membership but also the level of consensus among the informants (Romney et al., 1986). The second task gathers a culturally salient ordering of the words determined to be members. We use this method to produce 12 scales of adjectives for use in evaluation. L16-1424 diff --git a/data/xml/P95.xml b/data/xml/P95.xml index 785d16d4ba..807885a8b1 100644 --- a/data/xml/P95.xml +++ b/data/xml/P95.xml @@ -161,7 +161,7 @@ Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies ChinatsuAone - ScottWilliam + ScottWilliam Bennett 10.3115/981658.981675 122–129 P95-1017 diff --git a/data/xml/W16.xml b/data/xml/W16.xml index 36ac270b7c..a63833255f 100644 --- a/data/xml/W16.xml +++ b/data/xml/W16.xml @@ -8577,7 +8577,7 @@ <fixed-case>C</fixed-case>o<fixed-case>C</fixed-case>o<fixed-case>G</fixed-case>en - Complexity Contour Generator: Automatic Assessment of Linguistic Complexity Using a Sliding-Window Technique - StröbelMarcus + MarcusStröbel ElmaKerz DanielWiechmann StellaNeumann diff --git a/data/xml/W19.xml b/data/xml/W19.xml index 20aecc075f..1e0b653276 100644 --- a/data/xml/W19.xml +++ b/data/xml/W19.xml @@ -8458,9 +8458,10 @@ One of the references was wrong therefore it is corrected to cite the appropriat Modeling language learning using specialized Elo rating JueHou - KoppatzMaximilian + Maximilian W.Koppatz José MaríaHoya Quecedo NataliyaStoyanova + MikhailKopotev RomanYangarber 494–506 Automatic assessment of the proficiency levels of the learner is a critical part of Intelligent Tutoring Systems. We present methods for assessment in the context of language learning. We use a specialized Elo formula used in conjunction with educational data mining. We simultaneously obtain ratings for the proficiency of the learners and for the difficulty of the linguistic concepts that the learners are trying to master. From the same data we also learn a graph structure representing a domain model capturing the relations among the concepts. This application of Elo provides ratings for learners and concepts which correlate well with subjective proficiency levels of the learners and difficulty levels of the concepts. @@ -16513,9 +16514,9 @@ In this tutorial on MT and post-editing we would like to continue sharing the la Character-level Annotation for <fixed-case>C</fixed-case>hinese Surface-Syntactic <fixed-case>U</fixed-case>niversal <fixed-case>D</fixed-case>ependencies + ChuanmingDong YixuanLi - GerdesKim - DongChuanming + KimGerdes 216–226 W19-7726 10.18653/v1/W19-7726