Bug Report: ”total_score“ calculation ignores partial metrics in ”get_final_score“

## Summary
In `evaluation/utils.py`, the `get_final_score` function calculates four distinct metrics (`skill_match_score`, `entity_match_score`, `skill_with_entity_match_score`, `exact_match_score`). However, when computing the weighted `total_score`, the code iterates only over the keys present in the `skill_entity_scores` dictionary. This causes `skill_with_entity_match_score` (10% weight) and `exact_match_score` (10% weight) to be **completely ignored**.

As a result, the maximum possible score for any task is capped at **80.0** instead of 100.0, and models are not rewarded for correct structural dependencies or joint skill-entity matching.

## Code Analysis
The issue is located in `evaluation/utils.py`:

```python
# ... lines 319-323
skill_entity_scores = calculate_skill_and_entity_scores(standard_skill_sequence, model_skill_sequence)
# This dict ONLY contains: ['skill_match_score', 'entity_match_score']

skill_with_entity_scores = calculate_skill_with_entity_scores(standard_skill_sequence, model_skill_sequence)
# This is a separate variable

exact_match_score = get_exact_match(standard_skill_sequence, model_skill_sequence, dependency)
# This is a separate variable

score_weight = {
    "skill_match_score": 0.4,
    "entity_match_score": 0.4,
    "skill_with_entity_match_score": 0.1,
    "exact_match_score": 0.1
}

# BUG HERE: This loop only iterates over keys in `skill_entity_scores`,
# effectively ignoring the other two metrics computed above.
total_score = sum(score_weight[key] * value for key, value in skill_entity_scores.items())
```

## Reproduction Steps
1.  Run evaluation on any task where `skill_with_entity_match_score > 0`.
2.  Observe the generated `output.json` or `final_score.json`.
3.  Manually calculate the weighted sum: `0.4*skill + 0.4*entity + 0.1*joint + 0.1*exact`.
4.  Compare it with the logged `total_score`.

**Example:**
If a model gets:
- Skill Match: 100 (weighted 40)
- Entity Match: 0 (weighted 0)
- Skill+Entity Match: 50 (weighted 5)
- Exact Match: 0 (weighted 0)

**Expected Score:** 45.0
**Actual Score:** 40.0

## Suggested Fix
Explicitly sum all weighted components instead of iterating through a partial dictionary.

```python
    total_score = (
        skill_entity_scores["skill_match_score"] * score_weight["skill_match_score"] +
        skill_entity_scores["entity_match_score"] * score_weight["entity_match_score"] +
        skill_with_entity_scores["skill_with_entity_match_score"] * score_weight["skill_with_entity_match_score"] +
        exact_match_score * score_weight["exact_match_score"]
    )
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: ”total_score“ calculation ignores partial metrics in ”get_final_score“ #77

Summary

Code Analysis

Reproduction Steps

Suggested Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug Report: ”total_score“ calculation ignores partial metrics in ”get_final_score“ #77

Description

Summary

Code Analysis

Reproduction Steps

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions