Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
a0d9639
Process metadata corrections for 2025.naacl-srw.43 (closes #7041)
weissenh Dec 30, 2025
f5999d6
Process metadata corrections for 2023.acl-long.308 (closes #7040)
weissenh Dec 30, 2025
df228dc
Process metadata corrections for 2025.nllp-1.32 (closes #7039)
weissenh Dec 30, 2025
9e7ceb8
Process metadata corrections for 2023.clinicalnlp-1.40 (closes #7038)
weissenh Dec 30, 2025
7572706
Process metadata corrections for 2024.emnlp-main.851 (closes #7037)
weissenh Dec 30, 2025
7e5d2d5
Process metadata corrections for 2024.naacl-long.290 (closes #7036)
weissenh Dec 30, 2025
528c9a4
Process metadata corrections for 2024.emnlp-main.236 (closes #7035)
weissenh Dec 30, 2025
79ccb42
Process metadata corrections for 2024.findings-emnlp.674 (closes #7034)
weissenh Dec 30, 2025
3055c9e
Process metadata corrections for 2025.dravidianlangtech-1.124 (closes…
weissenh Dec 30, 2025
86cdd43
Process metadata corrections for 2024.dravidianlangtech-1.26 (closes …
weissenh Dec 30, 2025
e7497e8
Process metadata corrections for 2025.semeval-1.185 (closes #7030)
weissenh Dec 30, 2025
cc1ebef
Process metadata corrections for 2024.dravidianlangtech-1.4 (closes #…
weissenh Dec 30, 2025
f81a101
Process metadata corrections for 2025.findings-naacl.344 (closes #7028)
weissenh Dec 30, 2025
ecb7b3c
Process metadata corrections for W19-4451 (closes #7027)
weissenh Dec 30, 2025
6e879a8
Process metadata corrections for 2020.lrec-1.439 (closes #7026)
weissenh Dec 30, 2025
6534d7b
Process metadata corrections for W19-7726 (closes #7025)
weissenh Dec 30, 2025
f3b5d45
Process metadata corrections for 2021.vardial-1.10 (closes #7024)
weissenh Dec 30, 2025
3bcf761
Process metadata corrections for 2021.vardial-1.1 (closes #7023)
weissenh Dec 30, 2025
85584b9
Process metadata corrections for 2021.eacl-main.81 (closes #7022)
weissenh Dec 30, 2025
398f27b
Process metadata corrections for 2021.acl-short.136 (closes #7021)
weissenh Dec 30, 2025
6422f4f
Process metadata corrections for 2023.jeptalnrecital-short.5 (closes …
weissenh Dec 30, 2025
7a7f63b
Process metadata corrections for 2023.jeptalnrecital-coria.5 (closes …
weissenh Dec 30, 2025
4d5cd50
Process metadata corrections for 2016.jeptalnrecital-jep.23 (closes #…
weissenh Dec 30, 2025
5199389
Process metadata corrections for 2023.jlcl-1.1 (closes #7017)
weissenh Dec 30, 2025
3fd0563
Process metadata corrections for 2023.jlcl-1.0 (closes #7016)
weissenh Dec 30, 2025
e52f702
Process metadata corrections for 2022.starsem-1.25 (closes #7015)
weissenh Dec 30, 2025
041bd22
Process metadata corrections for 2023.jeptalnrecital-long.14 (closes …
weissenh Dec 30, 2025
ed184f9
Process metadata corrections for 2025.naacl-industry.74 (closes #7013)
weissenh Dec 30, 2025
fcaf002
Process metadata corrections for P95-1017 (closes #7012)
weissenh Dec 30, 2025
3f9e2d7
Process metadata corrections for L16-1424 (closes #7011)
weissenh Dec 30, 2025
7cee45c
Process metadata corrections for L16-1257 (closes #7010)
weissenh Dec 30, 2025
31696a5
Process metadata corrections for W16-4103 (closes #7009)
weissenh Dec 30, 2025
90cf2a2
Process metadata corrections for 2020.nlpcovid19-2.28 (closes #7008)
weissenh Dec 30, 2025
4e81a99
Process metadata corrections for 2020.gamnlp-1.4 (closes #7007)
weissenh Dec 30, 2025
c59d966
Process metadata corrections for 2023.paclic-1.43 (closes #7006)
weissenh Dec 30, 2025
d3da045
Process metadata corrections for 2016.jeptalnrecital-jep.66 (closes #…
weissenh Dec 30, 2025
ce9f9a8
Process metadata corrections for 2016.jeptalnrecital-jep.63 (closes #…
weissenh Dec 30, 2025
10d0c10
Process metadata corrections for 2015.jeptalnrecital-court.43 (closes…
weissenh Dec 30, 2025
b489e46
Process metadata corrections for 2024.jeptalnrecital-trad.6 (closes #…
weissenh Dec 30, 2025
9187b0c
delete superfluous id introduced by bulk processing script
weissenh Dec 31, 2025
8776bf6
Process metadata corrections for 2024.dravidianlangtech-1.40 (closes …
weissenh Dec 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions data/xml/2015.jeptalnrecital.xml
Original file line number Diff line number Diff line change
Expand Up @@ -711,9 +711,9 @@
<paper id="43">
<title>Etiquetage morpho-syntaxique de tweets avec des <fixed-case>CRF</fixed-case></title>
<author><first>Tian</first><last>Tian</last></author>
<author><first>Dinarelli</first><last>Marco</last></author>
<author><first>Tellier</first><last>Isabelle</last></author>
<author><first>Cardoso</first><last>Pedro</last></author>
<author><first>Marco</first><last>Dinarelli</last></author>
<author><first>Isabelle</first><last>Tellier</last></author>
<author><first>Pedro</first><last>Cardoso</last></author>
<pages>291–297</pages>
<abstract>Nous nous intéressons dans cet article à l’apprentissage automatique d’un étiqueteur mopho-syntaxique pour les tweets en anglais. Nous proposons tout d’abord un jeu d’étiquettes réduit avec 17 étiquettes différentes, qui permet d’obtenir de meilleures performances en exactitude par rapport au jeu d’étiquettes traditionnel qui contient 45 étiquettes. Comme nous disposons de peu de tweets étiquetés, nous essayons ensuite de compenser ce handicap en ajoutant dans l’ensemble d’apprentissage des données issues de textes bien formés. Les modèles mixtes obtenus permettent d’améliorer les résultats par rapport aux modèles appris avec un seul corpus, qu’il soit issu de Twitter ou de textes journalistiques.</abstract>
<url hash="926a32a5">2015.jeptalnrecital-court.43</url>
Expand Down
18 changes: 9 additions & 9 deletions data/xml/2016.jeptalnrecital.xml
Original file line number Diff line number Diff line change
Expand Up @@ -278,13 +278,13 @@
</paper>
<paper id="23">
<title>La distinction entre les paraphasies phonétiques et phonologiques dans l’aphasie : Etude de cas de deux patients aphasiques (The distinction between phonetic and phonological paraphasias in aphasia: A multiple casestudy of aphasic patients)</title>
<language>fra</language>
<author><first>Clémence</first><last>Verhaegen</last></author>
<author><first>Véronique</first><last>Delvaux</last></author>
<author><first>Kathy</first><last>Huet</last></author>
<author><first>Fagniart</first><last>Sophie</last></author>
<author><first>Sophie</first><last>Fagniart</last></author>
<author><first>Myriam</first><last>Piccaluga</last></author>
<author><first>Bernard</first><last>Harmegnies</last></author>
<language>fra</language>
<pages>200–210</pages>
<abstract>La spécificité phonologique ou phonétique des erreurs de production orale observées chez les patients aphasiques reste débattue. Cependant, la distinction entre ces deux types d’erreurs est fréquemment basée sur des analyses perceptives qui peuvent être influencées par le système perceptif de l’expérimentateur. Afin de pallier ce biais, nous avons réalisé des analyses acoustiques des productions de deux patients aphasiques, dans une tâche de répétition de non-mots. Nous nous sommes centrés sur l’analyse de consonnes occlusives. Les résultats ont montré la présence de difficultés de gestion du voisement chez les deux patients, indiquant la présence de troubles phonétiques. En outre, les résultats montrent une grande diversité des manifestations des troubles langagiers des patients ainsi que l’intervention potentielle de stratégies de compensation de leurs difficultés. L’intérêt de procéder à des analyses acoustiques précises utilisant des indices multiples est discuté.</abstract>
<url hash="d6d71199">2016.jeptalnrecital-jep.23</url>
Expand Down Expand Up @@ -722,12 +722,12 @@
</paper>
<paper id="63">
<title>Que disents nos silences? Apport des données acoustiques, articulatoires et physiologiques pour l’étude des pauses silencieuses (What do our silences say? Contribution of acoustic, articulatory and physiological data to the study on silent pauses)</title>
<author><first>Muriel</first><last>Lalain</last></author>
<author><first>Thierry</first><last>Legou</last></author>
<author><first>Camille</first><last>Fauth</last></author>
<author><first>Fabrice</first><last>Hirsch</last></author>
<author><first>Ivana</first><last>Didirkova</last></author>
<language>fra</language>
<author><first>Lalain</first><last>Muriel</last></author>
<author><first>Legou</first><last>Thierry</last></author>
<author><first>Fauth</first><last>Camille</last></author>
<author><first>Hirsch</first><last>Fabrice</last></author>
<author><first>Didirkova</first><last>Ivana</last></author>
<pages>563–570</pages>
<abstract>Si la rhétorique s’est intéressée très tôt à la pause, il a fallu attendre le XXème siècle pour que d’autres disciplines – la psycholinguistique, le traitement automatique des langues, la phonétique – accordent à ces moments de silence l’intérêt qu’ils méritent. Il a ainsi été montré que ces ruptures dans le signal acoustique, loin de signer une absence d’activité, constituaient en réalité le lieu d’une activité physiologique (la respiration) et/ou cognitive (planification du discours) qui participent tout autant au message que la parole elle-même. Dans cette étude pilote, nous proposons des observations et des pistes de réflexions à partir de l’analyse des pauses silencieuses dans un corpus de parole lue et semi dirigée. Nous mettons notamment en évidence l’apport de l’analyse conjointe de données acoustiques, articulatoires (EMA) et physiologiques (respiratoires) pour l’identification, parmi les pauses silencieuses, des pauses respiratoires, syntaxiques et d’hésitation.</abstract>
<url hash="8baa6e4f">2016.jeptalnrecital-jep.63</url>
Expand Down Expand Up @@ -756,15 +756,15 @@
</paper>
<paper id="66">
<title>Quels tests d’intelligibilité pour évaluer les troubles de production de la parole ? (What kind of intelligibility test to assess speech production disorders?)</title>
<language>fra</language>
<author><first>Alain</first><last>Ghio</last></author>
<author><first>Laurence</first><last>Giusti</last></author>
<author><first>Emilie</first><last>Blanc</last></author>
<author><first>Serge</first><last>Pinto</last></author>
<author><first>Lalain</first><last>Muriel</last></author>
<author><first>Muriel</first><last>Lalain</last></author>
<author><first>Danièle</first><last>Robert</last></author>
<author><first>Corine</first><last>Fredouille</last></author>
<author><first>Virginie</first><last>Woisard</last></author>
<language>fra</language>
<pages>589–596</pages>
<abstract>L’intelligibilité de la parole se définit comme le degré de précision avec lequel un message est compris par un auditeur. A ce titre, la perte d’intelligibilité représente souvent une plainte importante pour les patients atteints de troubles de production de la parole, puisqu’elle participe à la diminution de la qualité de vie au niveau communicationnel. Plusieurs outils existent actuellement pour évaluer l’intelligibilité mais aucun ne satisfait pleinement les contraintes cliniques. Dans une première étude, nous avons adapté au français la version 2 du Frenchay Dysarthria Assessment, un test reconnu dans le milieu anglo-saxon pour l’évaluation de locuteurs dysarthriques. Nous avons créé le corpus de mots français en nous appuyant sur les critères définis dans le FDA-2 puis nous avons testé le protocole sur une cinquantaine de locuteurs. Les résultats sont satisfaisants mais divers biais méthodologiques nous ont conduits à poursuivre notre démarche en proposant des listes de pseudo-mots apparentant le test à du décodage acoustico-phonétique.</abstract>
<url hash="20146cec">2016.jeptalnrecital-jep.66</url>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2020.gamnlp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
<paper id="4">
<title>Game Design Evaluation of <fixed-case>GWAP</fixed-case>s for Collecting Word Associations</title>
<author><first>Mathieu</first><last>Lafourcade</last></author>
<author><first>Le Brun</first><last>Nathalie</last></author>
<author><first>Nathalie</first><last>Le Brun</last></author>
<pages>26–33</pages>
<abstract>GWAP design might have a tremendous effect on its popularity of course but also on the quality of the data collected. In this paper, a comparison is undertaken between two GWAPs for building term association lists, namely JeuxDeMots and Quicky Goose. After comparing both game designs, the Cohen kappa of associative lists in various configurations is computed in order to assess likeness and differences of the data they provide.</abstract>
<url hash="9e10add5">2020.gamnlp-1.4</url>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2020.lrec.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5399,7 +5399,7 @@
<paper id="439">
<title>Neural Disambiguation of Lemma and Part of Speech in Morphologically Rich Languages</title>
<author><first>José María</first><last>Hoya Quecedo</last></author>
<author><first>Koppatz</first><last>Maximilian</last></author>
<author><first>Maximilian W.</first><last>Koppatz</last></author>
<author><first>Roman</first><last>Yangarber</last></author>
<pages>3573–3582</pages>
<abstract>We consider the problem of disambiguating the lemma and part of speech of ambiguous words in morphologically rich languages. We propose a method for disambiguating ambiguous words in context, using a large un-annotated corpus of text, and a morphological analyser—with no manual disambiguation or data annotation. We assume that the morphological analyser produces multiple analyses for ambiguous words. The idea is to train recurrent neural networks on the output that the morphological analyser produces for unambiguous words. We present performance on POS and lemma disambiguation that reaches or surpasses the state of the art—including supervised models—using no manually annotated data. We evaluate the method on several morphologically rich languages.</abstract>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2020.nlpcovid19.xml
Original file line number Diff line number Diff line change
Expand Up @@ -635,7 +635,7 @@
<title><fixed-case>A</fixed-case>sk<fixed-case>M</fixed-case>e: A <fixed-case>LAPPS</fixed-case> <fixed-case>G</fixed-case>rid-based <fixed-case>NLP</fixed-case> Query and Retrieval System for Covid-19 Literature</title>
<author><first>Keith</first><last>Suderman</last></author>
<author><first>Nancy</first><last>Ide</last></author>
<author><first>Verhagen</first><last>Marc</last></author>
<author><first>Marc</first><last>Verhagen</last></author>
<author><first>Brent</first><last>Cochran</last></author>
<author><first>James</first><last>Pustejovsky</last></author>
<abstract>In a recent project, the Language Application Grid was augmented to support the mining of scientific publications. The results of that ef- fort have now been repurposed to focus on Covid-19 literature, including modification of the LAPPS Grid “AskMe” query and retrieval engine. We describe the AskMe system and discuss its functionality as compared to other query engines available to search covid-related publications.</abstract>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2021.acl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -9803,7 +9803,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO
<paper id="136">
<title><fixed-case>S</fixed-case>a<fixed-case>R</fixed-case>o<fixed-case>C</fixed-case>o: Detecting Satire in a Novel <fixed-case>R</fixed-case>omanian Corpus of News Articles</title>
<author><first>Ana-Cristina</first><last>Rogoz</last></author>
<author><first>Gaman</first><last>Mihaela</last></author>
<author><first>Mihaela</first><last>Găman</last></author>
<author><first>Radu Tudor</first><last>Ionescu</last></author>
<pages>1073–1079</pages>
<abstract>In this work, we introduce a corpus for satire detection in Romanian news. We gathered 55,608 public news articles from multiple real and satirical news sources, composing one of the largest corpora for satire detection regardless of language and the only one for the Romanian language. We provide an official split of the text samples, such that training news articles belong to different sources than test news articles, thus ensuring that models do not achieve high performance simply due to overfitting. We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus. Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.</abstract>
Expand Down
4 changes: 2 additions & 2 deletions data/xml/2021.eacl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -968,8 +968,8 @@
</paper>
<paper id="81">
<title>Clustering Word Embeddings with Self-Organizing Maps. Application on <fixed-case>L</fixed-case>a<fixed-case>R</fixed-case>o<fixed-case>S</fixed-case>e<fixed-case>D</fixed-case>a - A Large <fixed-case>R</fixed-case>omanian Sentiment Data Set</title>
<author><first>Anca</first><last>Tache</last></author>
<author><first>Gaman</first><last>Mihaela</last></author>
<author><first>Anca Maria</first><last>Tache</last></author>
<author><first>Mihaela</first><last>Găman</last></author>
<author><first>Radu Tudor</first><last>Ionescu</last></author>
<pages>949–956</pages>
<abstract>Romanian is one of the understudied languages in computational linguistics, with few resources available for the development of natural language processing tools. In this paper, we introduce LaRoSeDa, a Large Romanian Sentiment Data Set, which is composed of 15,000 positive and negative reviews collected from the largest Romanian e-commerce platform. We employ two sentiment classification methods as baselines for our new data set, one based on low-level features (character n-grams) and one based on high-level features (bag-of-word-embeddings generated by clustering word embeddings with k-means). As an additional contribution, we replace the k-means clustering algorithm with self-organizing maps (SOMs), obtaining better results because the generated clusters of word embeddings are closer to the Zipf’s law distribution, which is known to govern natural language. We also demonstrate the generalization capacity of using SOMs for the clustering of word embeddings on another recently-introduced Romanian data set, for text categorization by topic.</abstract>
Expand Down
4 changes: 2 additions & 2 deletions data/xml/2021.vardial.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<paper id="1">
<title>Findings of the <fixed-case>V</fixed-case>ar<fixed-case>D</fixed-case>ial Evaluation Campaign 2021</title>
<author><first>Bharathi Raja</first><last>Chakravarthi</last></author>
<author><first>Gaman</first><last>Mihaela</last></author>
<author><first>Mihaela</first><last>Găman</last></author>
<author><first>Radu Tudor</first><last>Ionescu</last></author>
<author><first>Heidi</first><last>Jauhiainen</last></author>
<author><first>Tommi</first><last>Jauhiainen</last></author>
Expand Down Expand Up @@ -121,7 +121,7 @@
</paper>
<paper id="10">
<title><fixed-case>U</fixed-case>nibuc<fixed-case>K</fixed-case>ernel: Geolocating <fixed-case>S</fixed-case>wiss <fixed-case>G</fixed-case>erman Jodels Using Ensemble Learning</title>
<author><first>Gaman</first><last>Mihaela</last></author>
<author><first>Mihaela</first><last>Găman</last></author>
<author><first>Sebastian</first><last>Cojocariu</last></author>
<author><first>Radu Tudor</first><last>Ionescu</last></author>
<pages>84–95</pages>
Expand Down
8 changes: 4 additions & 4 deletions data/xml/2022.starsem.xml
Original file line number Diff line number Diff line change
Expand Up @@ -291,12 +291,12 @@
</paper>
<paper id="25">
<title>Speech acts and Communicative Intentions for Urgency Detection</title>
<author><first>Laurenti</first><last>Enzo</last></author>
<author><first>Bourgon</first><last>Nils</last></author>
<author><first>Enzo</first><last>Laurenti</last></author>
<author><first>Nils</first><last>Bourgon</last></author>
<author><first>Farah</first><last>Benamara</last></author>
<author><first>Mari</first><last>Alda</last></author>
<author><first>Alda</first><last>Mari</last></author>
<author><first>Véronique</first><last>Moriceau</last></author>
<author><first>Courgeon</first><last>Camille</last></author>
<author><first>Camille</first><last>Courgeon</last></author>
<pages>289-298</pages>
<abstract>Recognizing speech acts (SA) is crucial for capturing meaning beyond what is said, making communicative intentions particularly relevant to identify urgent messages. This paper attempts to measure for the first time the impact of SA on urgency detection during crises,006in tweets. We propose a new dataset annotated for both urgency and SA, and develop several deep learning architectures to inject SA into urgency detection while ensuring models generalisability. Our results show that taking speech acts into account in tweet analysis improves information type detection in an out-of-type configuration where models are evaluated in unseen event types during training. These results are encouraging and constitute a first step towards SA-aware disaster management in social media.</abstract>
<url hash="b167524b">2022.starsem-1.25</url>
Expand Down
8 changes: 4 additions & 4 deletions data/xml/2023.acl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4360,10 +4360,10 @@
<video href="2023.acl-long.307.mp4"/>
</paper>
<paper id="308">
<title>No clues good clues: out of context Lexical Relation Classification</title>
<author orcid="0000-0002-6734-8808"><first>Lucia</first><last>Pitarch</last><affiliation>University of Zaragoza</affiliation></author>
<author orcid="0000-0001-8531-353X"><first>Jordi</first><last>Bernad</last><affiliation>University of Zaragoza</affiliation></author>
<author orcid="0000-0002-9169-5287"><first>Lacramioara</first><last>Dranca</last><affiliation>Centro Universitario de la Defensa</affiliation></author>
<title>No clues, good clues: Out of context Lexical Relation Classification</title>
<author orcid="0000-0002-6734-8808"><first>Lucía</first><last>Pitarch</last><affiliation>University of Zaragoza</affiliation></author>
<author orcid="0000-0001-8531-353X"><first>Jorge</first><last>Bernad</last><affiliation>University of Zaragoza</affiliation></author>
<author orcid="0000-0002-9169-5287"><first>Licri</first><last>Dranca</last><affiliation>Centro Universitario de la Defensa</affiliation></author>
<author orcid="0000-0003-4239-8785"><first>Carlos</first><last>Bobed Lisbona</last><affiliation>University of Zaragoza, Spain</affiliation></author>
<author orcid="0000-0001-6452-7627"><first>Jorge</first><last>Gracia</last><affiliation>University of Zaragoza</affiliation></author>
<pages>5607-5625</pages>
Expand Down
Loading