This repository contains source code for LLM generated synthetic evidence based Fact Verification Task.
Run generated_sentence_creation.py to generate the LLM generated synthetic data and then convert that into .csv format using generation_pickle_to_csv_convert.py file.
First create a .csv file including claim_id, claim,label, using the annotated FEVER .jsonl files.
Then run generated_sentence_creation.py to generate the LLM generated synthetic data.
Using these two sets of .csv files(train, test and validation) run the create_filtered_data.py dataset required in BERT_FSD model.
First create a .csv file including claim_id, claim,label, using the annotated FEVER .jsonl files.
Then run wiki_chunk_wise_data.py for the required dataset of BERT_ER model.
For training run training_with_LLM_generated_synthetic_data.py.
After training, you can find the best checkpoint on the dev set according to the evaluation results. For this use prediction.py.