Inferring the source of official texts: can SVM beat ULMFiT?

This repo holds the dataset and source code described in the paper below:

Pedro H. Luz de Araujo, Teófilo E. de Campos, Marcelo M. Silva de Sousa
Inferring the source of official texts: can SVM beat ULMFiT?
International Conference on the Computational Processing of Portuguese (PROPOR), March 2-4, Évora, Portugal, 2020.
Download: [ paper | slides | bib ]

We kindly request that users cite our paper in any publication that is generated as a result of the use of our code or our dataset.

A snapshot of the code that was used to generate the results of the paper above is available from the static page of this project at https://cic.unb.br/~teodecampos/KnEDLe/propor2020.

Update (27/05/20)

The pre-trained language model used in this work was not originally released with its tokenizer model and vocabulary data, so our fine-tuned model and classifier were not able to leverage subword embeddings trained on general domain portuguese data. This has been amended, so we re-ran all experiments using the pre-trained vocab data. This repo contains the updated ULMFiT training notebook and the updated results.

Requirements

Reproducing results

Download the pretrained language model and place it in a model directory at the root
Run train_ulmfit.ipynb
Run train_baseline.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
data_exploration.ipynb		data_exploration.ipynb
data_preparation.ipynb		data_preparation.ipynb
results.txt		results.txt
train_baseline.ipynb		train_baseline.ipynb
train_ulmfit.ipynb		train_ulmfit.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inferring the source of official texts: can SVM beat ULMFiT?

Update (27/05/20)

Requirements

Reproducing results

About

Releases

Packages

Contributors 2

Languages

peluz/kneedle-exploration

Folders and files

Latest commit

History

Repository files navigation

Inferring the source of official texts: can SVM beat ULMFiT?

Update (27/05/20)

Requirements

Reproducing results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages