Code for our paper Table-based Fact Verification with Salience-aware Learning at EMNLP 2021 Findings.
pip install -r requirements.txtInstall pytorch_scatter.
We conduct experiments on the TabFact dataset. The statements in officially released train/val/test set are lemmatized. We use the raw (unlemmatized) statements. More discussion can be found in this issue.
Download the train/val/test set to ./data.
Download the table set to ./data/tables.
To convert raw data to model inputs:
cd data
python preprocess.pycd token_salience- First, run
bash run_origin.shto get predictions for original inputs. - Second, run
bash run_masked.shto get predictions for inputs with masked tokens. - Third, run
python calculate_salience.pyto get salience scores by comparing the outputs of last two steps. - Finally, run
python add_salience_to_data.pyto merge the salience scores into input data.
cd token_replacement- First, run
bash run_mlm.shto get predictions for replacing non-salient tokens. - Second, run
python add_token_replacement.pyto merge the token replacement candidates into input data.
cd joint_model
bash run_joint_model.sh@inproceedings{wang-etal-2021-table-based,
title = "Table-based Fact Verification With Salience-aware Learning",
author = "Wang, Fei and
Sun, Kexuan and
Pujara, Jay and
Szekely, Pedro and
Chen, Muhao",
booktitle = "EMNLP - findings",
year = "2021",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-emnlp.338",
pages = "4025--4036"
}