Skip to content

fix: successor_liability task gets evaluated on balanced_accuracy instead of F1#44

Open
KensingtonOscupant wants to merge 1 commit intoHazyResearch:mainfrom
KensingtonOscupant:fix/successor_liability_eval
Open

fix: successor_liability task gets evaluated on balanced_accuracy instead of F1#44
KensingtonOscupant wants to merge 1 commit intoHazyResearch:mainfrom
KensingtonOscupant:fix/successor_liability_eval

Conversation

@KensingtonOscupant
Copy link
Copy Markdown

Hi there!

I was working on something and noticed that when using evaluate() from evaluation.py, the successor_liability task gets evaluated on balanced accuracy. I believe it is meant to be evaluated on F1, cf. section 5.1.3 of the paper.

The evaluation mismatch happens because successor_liability is an item in the EXACT_MATCH_BALANCED_ACC list in evaluation.py. This causes an unintended early return at the top of the evaluate() function:

def evaluate(task: str, generations: List[str], answers: List[str]):

    if task in EXACT_MATCH_BALANCED_ACC_TASKS:
        return evaluate_exact_match_balanced_accuracy(generations, answers) # succesor_liability returns here already
    elif task == "sara_numeric":
        return evaluate_sara_numeric_acc(generations, answers)
    elif task == "successor_liability":
        return evaluate_successor_liability(generations, answers) # supposed to return here
    elif task == "citation_prediction_open":
        return evaluate_citation_open(generations, answers)
    elif task == "definition_extraction":
        return evaluate_definition_extraction(generations, answers)
    elif task.startswith("ssla"):
        return evaluate_ssla(generations, answers)
    elif task in MANUAL_EVAL_TASKS:
        raise Exception("This task needs to be manually evaluated:", task)
    else:
        raise Exception(f"Unknown task: {task}")

So in case of successor_liability, return evaluate_successor_liability(generations, answers) is never reached.

I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant