Skip to content

Conversation

@juanmichelini
Copy link
Collaborator

Problem

GAIA evaluations were failing to create PRs because cost report generation was crashing with:

'NoneType' object has no attribute 'get'

This prevented cost_report.jsonl from being created, which caused push_to_index.py to fail and prevented PR creation.

Root Cause

The extract_accumulated_cost() and calculate_line_duration() functions in benchmarks/utils/report_costs.py were not handling null entries in JSONL files. GAIA evaluations can have null entries in their output files, which caused the functions to crash when trying to call .get() on None objects.

Solution

  1. Added null checks in both functions to skip null entries
  2. Updated type hints to reflect that JSONL entries can be None
  3. Graceful handling of mixed valid/null entries in JSONL files

Code Changes

def extract_accumulated_cost(entry: dict[str, Any] | None) -> float:
    if entry is None:  # Skip null entries
        return 0.0
    # ... rest of function

def calculate_line_duration(entry: dict[str, Any] | None) -> float:
    if entry is None:  # Skip null entries  
        return 0.0
    # ... rest of function

Impact

  • ✅ Fixes GAIA evaluation PR creation failures
  • ✅ Handles mixed valid/null entries in any benchmark JSONL files
  • ✅ Maintains backward compatibility with existing benchmarks
  • ✅ Restores cost report generation functionality

Testing

Tested with GAIA evaluation data containing null entries - cost reports now generate successfully.

Fixes the issue where GAIA evaluations sent Slack notifications but failed to create PRs due to cost report generation errors.

- Add null checks in extract_accumulated_cost() and calculate_line_duration()
- Update type hints to reflect that JSONL entries can be None
- Fixes 'NoneType' object has no attribute 'get' error that was preventing
  cost_report.jsonl generation for GAIA evaluations
- This was causing push_to_index.py to fail and preventing PR creation

Co-authored-by: openhands <[email protected]>
@simonrosenberg
Copy link
Collaborator

GAIA evaluations can have null entries in their output files

perhaps a fix is also removing null entries from output files?
@OpenHands why do gaia output files have null entries? Dont make any changes but explain what why and suggest the best fix possible here

@openhands-ai
Copy link

openhands-ai bot commented Jan 9, 2026

I'm on it! simonrosenberg can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Jan 9, 2026

Why GAIA Output Files Have Null Entries

Based on my analysis of the codebase, here's what's happening:

The Technical Issue

When report_costs.py reads JSONL files, it uses json.loads(line.strip()) for each line. If a line contains the literal JSON value null, Python's json.loads("null") returns None. Then when extract_accumulated_cost() tries to call entry.get("metrics", {}) on a None object, it crashes with 'NoneType' object has no attribute 'get'.

Why Null Entries Exist

After examining the codebase thoroughly, I cannot definitively identify the source of null entries in the code. All write operations use EvalOutput.model_dump_json() which should always produce valid JSON objects, not null.

Possible sources include:

  1. Race conditions in parallel execution - Though the code uses file locking, edge cases in multi-process writes could cause issues
  2. Interrupted writes - If a process is killed mid-write, partial data could be left
  3. External modification - Manual editing or other tools modifying the files
  4. Edge cases in error handling - Some unidentified code path that writes unexpected data

Best Fix Approach

There are two complementary strategies:

1. Defensive Reading (Current PR)
The current PR fix is good - it handles null entries gracefully when reading files. This is a necessary safety net.

2. Remove Null Entries from Output Files ✅ (Your suggestion)
This is also a valid approach and would be a good addition. You could:

  • Filter out null entries during the aggregate_results phase in iterative.py
  • Or add a cleanup step in read_jsonl_file that filters nulls before returning

Recommended Combined Fix:

# In read_jsonl_file - filter nulls at read time
def read_jsonl_file(file_path: Path) -> List[Dict]:
    try:
        with open(file_path, "r", encoding="utf-8") as f:
            # Filter out null entries at read time
            return [obj for line in f if line.strip() 
                    for obj in [json.loads(line.strip())] if obj is not None]
    except Exception as e:
        print(f"Error reading {file_path}: {e}")
        return []

This approach:

  • Removes null entries at the source (when reading)
  • Keeps the downstream code clean (no null checks needed everywhere)
  • Is more robust than checking for nulls in every function

The current PR's approach of checking for None in each function is also valid but more verbose. Either approach works - the key is ensuring null entries don't cause crashes.

No code changes were made as you requested - this is just an explanation of the issue and suggested fixes.

View full conversation

Copy link
Collaborator

@simonrosenberg simonrosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too bad we don't know why there are null entries but I guess its not P0

@juanmichelini juanmichelini merged commit 881f91f into main Jan 9, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants