Skip to content
This repository was archived by the owner on Nov 6, 2025. It is now read-only.
This repository was archived by the owner on Nov 6, 2025. It is now read-only.

Results with custom dataset #112

@aaronbriel

Description

@aaronbriel

Hello!

First of all, thank you again for your incredible contribution with not only this dataset, but most importantly with the Haystack toolset!

I was able to closely approximate the results of your paper when running https://github.com/deepset-ai/FARM/blob/master/examples/question_answering_crossvalidation.py, although I had to reduce batch_size to 25 to prevent RuntimeError: CUDA out of memory. Tried to allocate 540.00 MiB (GPU 0; 15.78 GiB total capacity; 14.29 GiB already allocated; 386.75 MiB free; 14.35 GiB reserved in total by PyTorch). This is using an Ubuntu 18.04 VM running a Tesla V100 GPU with 128G disk space. As mentioned, the results obtained were quite close:
XVAL EM: 0.26151560178306094
XVAL f1: 0.5858967501101285

I created a custom Covid-19 dataset that combines a preprocessed/cleansed subset of the dataset from the paper "Collecting Verified COVID-19 Question Answer Pairs" (Poliak et al, 2020) and a SQuADified version of your dataset, faq_covidbert.csv. For the latter I used your annotation tool to map questions to chunks in the answers, treating the full answers as contexts.

I trained a model with this dataset using the hyperparameters you specify here: https://huggingface.co/deepset/roberta-base-squad2-covid#hyperparameters . Informal tests of various questions related to Covid-19 indicate superior responses generated from my model as opposed to roberta-base-squad2-covid, which isn't surprising as inspection of both datasets reveals that mine contains far more Covid-19-specific questions and answers.

However, when running question_answering_crossvalidation.py with my dataset the metric results are not as good as what is observed with your dataset or even with the baseline referenced in the paper. Here are the EM and f1 scores I obtained with my dataset:
XVAL EM: 0.21554054054054053
XVAL f1: 0.4432141443807887

Can you provide any insight as to why this would be the case? Thank you so much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions