fix: successor_liability task gets evaluated on balanced_accuracy instead of F1 by KensingtonOscupant · Pull Request #44 · HazyResearch/legalbench

KensingtonOscupant · 2025-06-24T12:05:24Z

Hi there!

I was working on something and noticed that when using evaluate() from evaluation.py, the successor_liability task gets evaluated on balanced accuracy. I believe it is meant to be evaluated on F1, cf. section 5.1.3 of the paper.

The evaluation mismatch happens because successor_liability is an item in the EXACT_MATCH_BALANCED_ACC list in evaluation.py. This causes an unintended early return at the top of the evaluate() function:

def evaluate(task: str, generations: List[str], answers: List[str]):

    if task in EXACT_MATCH_BALANCED_ACC_TASKS:
        return evaluate_exact_match_balanced_accuracy(generations, answers) # succesor_liability returns here already
    elif task == "sara_numeric":
        return evaluate_sara_numeric_acc(generations, answers)
    elif task == "successor_liability":
        return evaluate_successor_liability(generations, answers) # supposed to return here
    elif task == "citation_prediction_open":
        return evaluate_citation_open(generations, answers)
    elif task == "definition_extraction":
        return evaluate_definition_extraction(generations, answers)
    elif task.startswith("ssla"):
        return evaluate_ssla(generations, answers)
    elif task in MANUAL_EVAL_TASKS:
        raise Exception("This task needs to be manually evaluated:", task)
    else:
        raise Exception(f"Unknown task: {task}")

So in case of successor_liability, return evaluate_successor_liability(generations, answers) is never reached.

I hope this helps!

removed successor_liability from EXACT_MATCH_BALANCED_ACC_TASKS list

09ef191

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: successor_liability task gets evaluated on balanced_accuracy instead of F1#44

fix: successor_liability task gets evaluated on balanced_accuracy instead of F1#44
KensingtonOscupant wants to merge 1 commit intoHazyResearch:mainfrom
KensingtonOscupant:fix/successor_liability_eval

KensingtonOscupant commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KensingtonOscupant commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant