Automated EIA baseline comparison #554

jmythms · 2025-11-17T23:30:27Z

This PR addresses #517 by:

Adding an EIA update helper script that compares mseg_res_com_cz.json entries against the EIA API for each combination of
- building class (residential, commercial)
- fuel type (electricity, natural gas, distillate, other fuel)
- end use (heating, cooling, lighting, etc.)
Adding a GitHub Actions workflow that runs this new script on GitHub Actions when new changes in the master branch happen.
Updating documentation to reflect new changes.
Adding .env variable file, changes to .gitignore, and secret to use EIA API from CI.

- Introduced a GitHub Actions workflow for EIA update checks. - Added a new script for comparing Scout microsegments with EIA AEO data. - Updated documentation to include instructions for running the EIA update check. - Enhanced .gitignore to exclude local environment variables. - Included python-dotenv as a dependency for managing environment variables.

jmythms · 2025-11-18T13:58:37Z

scout/AEO_update_helpers/baseline_comparison.py

+    for bldg in uv.bldg_class_translator.keys():
+        for fuel in uv.fuel_type:
+            for end_use in uv.end_use_translator.keys():
+                filters = FilterStrings(bldg_class=bldg, fuel=fuel, end_use=end_use)
+                compare_one_combination(mseg, filters, year, verbose, uv, api_key)
+
+    # After all combinations, print summary information
+    print_rollups()
+    report_large_errors()
+    report_zero_division_cases()


Crux of this script:
Go through each building class -> each fuel type-> each end use:
compare mseg vs eia API.

Then print results and errors.

jmythms · 2025-11-18T14:01:10Z

scout/AEO_update_helpers/baseline_comparison.py

+    # 1. Aggregate JSON
+    json_dict = recursive_aggregate(mseg, filters, uv)


This script also relies on recursion, in the same way as the EIA update scripts. Maybe, a future PR could focus on unravelling this recursion for ease of understanding.

jmythms · 2025-11-18T14:02:55Z

scout/AEO_update_helpers/baseline_comparison.py

+    # If EIA has data but the JSON aggregate is empty, treat this as a hard error.
+    if json_is_empty and eia_dict:
+        raise RuntimeError(
+            "EIA data exists but JSON aggregate is empty for "
+            f"{bldg} | {fuel} | {end_use} (series {series_id})."
+        )


Currently, this does not hit. But just left this case as a hard error in case a json field is not present in a future EIA update.

jmythms · 2025-11-18T14:04:50Z

scout/AEO_update_helpers/baseline_comparison.py

+    bldg = filters.bldg_class
+    fuel = filters.fuel
+    end_use = filters.end_use
+
+    # Conditions used inside the ID string
+    if bldg == "residential":
+        condition_1 = uv.end_use_translator[end_use]
+        condition_2 = "NA"
+        if end_use == "heating" and fuel == "electricity":
+            condition_3 = "hhd"  # special case for electric heating
+        else:
+            condition_3 = "NA"
+    else:  # commercial
+        # Commercial electricity is labeled as "Purchased Electricity" in API
+        fuel_for_api = "Purchased Electricity" if fuel == "electricity" else fuel
+        condition_1 = "NA"
+        condition_2 = uv.end_use_translator[end_use]
+        condition_3 = "NA"
+        fuel = fuel_for_api
+
+    eia_series_id = (
+        f"cnsm_{condition_3}_"
+        f"{uv.bldg_class_translator[bldg]}_"
+        f"{condition_1}_"
+        f"{uv.fuel_type_translator[fuel]}_"
+        f"{condition_2}_usa_qbtu"
+    )
+
+    return eia_series_id


Secret sauce for EIA APIs for Scout.

Can you add a comment specifying what condition 1, 2, and 3 mean? Mostly so later on we don't need to go back to the EIA docs.

Added here 👍🏾

jmythms · 2025-11-18T14:07:11Z

scout/AEO_update_helpers/baseline_comparison.py

+    if position is None:
+        position = []
+
+    energy_by_year: dict[str, float] = {}
+
+    for key, value in data.items():
+        if isinstance(value, dict) and key != "energy":
+            # Keep going down one level
+            sub_result = recursive_aggregate(value, filters, uv, position + [key])
+            for yr, val in sub_result.items():
+                energy_by_year[yr] = energy_by_year.get(yr, 0.0) + val
+            continue
+
+        if key != "energy":
+            # Only interested in energy leaves
+            continue
+
+        path = position + [key]
+        if len(path) < 5:
+            # We expect: climate_zone / bldg_type / fuel / end_use / ... / energy
+            continue
+
+        _cz, bldg_type, fuel_name, eu_name, *rest = path
+        subkey = rest[0] if rest else ""
+
+        # Only consider the requested building class and fuel
+        if bldg_type not in uv.all_bldg_types[filters.bldg_class]:
+            continue
+        if fuel_name != filters.fuel:
+            continue
+
+        # Decide whether to include this particular leaf based on end‑use rules
+        end_use = filters.end_use
+        accept = False
+
+        if eu_name == "other" and subkey in uv.other_end_uses and end_use == "other":
+            accept = True
+        elif eu_name == "ceiling fan" and end_use == "other":
+            accept = True
+        elif (
+            eu_name == "other"
+            and subkey in uv.separate_other_end_uses
+            and end_use == subkey
+        ):
+            accept = True
+        elif (
+            eu_name in uv.heating_end_uses
+            and end_use == "heating"
+            and subkey == "supply"
+        ):
+            accept = True
+        elif eu_name == "cooling" and end_use == "cooling" and subkey == "supply":
+            accept = True
+        elif eu_name in uv.remaining_end_uses and end_use == eu_name:
+            accept = True
+        elif (
+            eu_name in ("other", "unspecified")
+            and fuel_name != "electricity"
+            and end_use == "other"
+        ):
+            accept = True
+        elif (
+            eu_name in ("MELs", "unspecified")
+            and fuel_name == "electricity"
+            and end_use == "other"
+        ):
+            accept = True
+        elif eu_name == end_use and subkey == "energy":
+            accept = True
+
+        if not accept:
+            continue
+
+        # At this point we decided that this leaf contributes to our total
+        for yr, val in value.items():
+            energy_by_year[yr] = energy_by_year.get(yr, 0.0) + val


This function is how we unravel the mseg file. Again, this could become a non-recursive set of functions for much easier comprehension.

jmythms · 2025-11-18T14:10:28Z

scout/AEO_update_helpers/baseline_comparison.py

+MSEG_PATH = "scout/supporting_data/stock_energy_tech_data/mseg_res_com_cz.json"
+
+
+@dataclass(frozen=True)


TIL you can make a dataclass immutable with this decorator. Will keep in mind to check if it can be used more in the codebase/in ecm_prep.py or run.py.

jmythms · 2025-11-18T14:14:07Z

scout/AEO_update_helpers/baseline_comparison.py

+    return api_key
+
+
+@on_exception(expo, Exception, max_tries=5)


Another cool decorator. If an exception hits (like a timeout), retry up to 5 times with exponentially increasing delays.

…n pyproject.toml

jmythms · 2025-11-18T14:21:07Z

TODO: ~~set GitHub secret for EIA API key if we want to add .github/workflows/eia-update-check.yml as an additional test. Then test it before merging to master.~~

Done

rHorsey · 2025-12-08T18:56:23Z

scout/AEO_update_helpers/baseline_comparison.py

+            "security system",
+            "portable electric spas",
+            "smart speakers",
+            "tablets",


Please tell me this is GPT hallucinating and this isn't actually a thing. Also smart speakers?!?!?

Actually a thing 😅

rHorsey · 2025-12-08T19:02:18Z

pyproject.toml

 dev = [
    "openpyxl",
    "flake8 >= 7.0",
+    "tabulate",


I think this is still failing in the actions? Can you confirm that it is actually loading this dependency?

rHorsey

Thanks so much Jeremy! Sadly for you I have one more request, which is that you add a way for this to actually fail fail. Basically, if things exceed / get worse than current levels I'd like this to throw a red x, not a green checkmark. Otherwise we're never going to notice if / when the comparisons get worse...

…shold is applied

jmythms · 2025-12-10T19:52:44Z

Added comparisons to make test fail if average error for all years for a building_type, fuel, end use > current values (+/- tolerance for floating point errors). Made test purposely fail here with wrong test value then corrected it to get green checkmark here.

rHorsey · 2025-12-11T06:05:13Z

Like it love it! Thanks so much @jmythms !!! Many thanks also for commenting the EIA conditions - I wouldn't have guessed those in a hurry.

rHorsey

🫡

jmythms added 3 commits November 13, 2025 10:23

add baseline comparison script

5d4aa51

Remove old baseline comparison code file

c85130f

jmythms added this to the v1.2.1 milestone Nov 17, 2025

jmythms self-assigned this Nov 17, 2025

jmythms marked this pull request as draft November 17, 2025 23:35

jmythms added 3 commits November 17, 2025 18:54

flake8 fixes + script name fix

bc727c1

Add EIA update documentation and reference in overview

f717f61

Fix doc issues

0d6682e

jmythms marked this pull request as ready for review November 18, 2025 01:58

jmythms linked an issue Nov 18, 2025 that may be closed by this pull request

Automated baseline comparison w/ EIA API on GH #517

Open

jmythms commented Nov 18, 2025

View reviewed changes

Fix formatting in overview documentation and adjust dependency list i…

678599b

…n pyproject.toml

jmythms requested review from jtlangevin and rHorsey November 18, 2025 14:21

jmythms added 5 commits November 24, 2025 09:57

test workflow

217d879

update dependencies

a135480

Remove test branch from EIA update check workflow

643f9b3

Refactor code structure for improved readability and maintainability

074db5f

Fix branch name formatting in EIA update check workflow

d614307

rHorsey reviewed Dec 8, 2025

View reviewed changes

rHorsey requested changes Dec 8, 2025

View reviewed changes

jmythms added 8 commits December 10, 2025 10:37

Enforce failure if above allowed threshold

7496757

Show stderr with more descriptive error message.

d99af6a

fix grammar

27ee968

Implement per-series error tolerance in baseline comparison

1714232

Fix tolerance calculation in error enforcement to ensure minimum thre…

11a3bf4

…shold is applied

Make test pass

06b7c33

Add comments to clarify conditions in EIA series ID construction

4a53f78

better comments

7eb4c9a

jmythms requested a review from rHorsey December 10, 2025 19:59

rHorsey approved these changes Dec 11, 2025

View reviewed changes

		# 1. Aggregate JSON
		json_dict = recursive_aggregate(mseg, filters, uv)

		MSEG_PATH = "scout/supporting_data/stock_energy_tech_data/mseg_res_com_cz.json"


		@dataclass(frozen=True)

Automated EIA baseline comparison #554

Are you sure you want to change the base?

Automated EIA baseline comparison #554

Uh oh!

Conversation

jmythms commented Nov 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmythms commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rHorsey left a comment

Choose a reason for hiding this comment

Uh oh!

jmythms commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rHorsey commented Dec 11, 2025

Uh oh!

rHorsey left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jmythms commented Nov 18, 2025 •

edited

Loading

jmythms commented Dec 10, 2025 •

edited

Loading