Skip to content

Conversation

@jmythms
Copy link
Collaborator

@jmythms jmythms commented Nov 17, 2025

This PR addresses #517 by:

  1. Adding an EIA update helper script that compares mseg_res_com_cz.json entries against the EIA API for each combination of
    • building class (residential, commercial)
    • fuel type (electricity, natural gas, distillate, other fuel)
    • end use (heating, cooling, lighting, etc.)
  2. Adding a GitHub Actions workflow that runs this new script on GitHub Actions when new changes in the master branch happen.
  3. Updating documentation to reflect new changes.
  4. Adding .env variable file, changes to .gitignore, and secret to use EIA API from CI.

- Introduced a GitHub Actions workflow for EIA update checks.
- Added a new script for comparing Scout microsegments with EIA AEO data.
- Updated documentation to include instructions for running the EIA update check.
- Enhanced .gitignore to exclude local environment variables.
- Included python-dotenv as a dependency for managing environment variables.
@jmythms jmythms added this to the v1.2.1 milestone Nov 17, 2025
@jmythms jmythms self-assigned this Nov 17, 2025
@jmythms jmythms marked this pull request as draft November 17, 2025 23:35
@jmythms jmythms marked this pull request as ready for review November 18, 2025 01:58
@jmythms jmythms linked an issue Nov 18, 2025 that may be closed by this pull request
Comment on lines +745 to +754
for bldg in uv.bldg_class_translator.keys():
for fuel in uv.fuel_type:
for end_use in uv.end_use_translator.keys():
filters = FilterStrings(bldg_class=bldg, fuel=fuel, end_use=end_use)
compare_one_combination(mseg, filters, year, verbose, uv, api_key)

# After all combinations, print summary information
print_rollups()
report_large_errors()
report_zero_division_cases()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crux of this script:
Go through each building class -> each fuel type-> each end use:
compare mseg vs eia API.

Then print results and errors.

Comment on lines +466 to +467
# 1. Aggregate JSON
json_dict = recursive_aggregate(mseg, filters, uv)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script also relies on recursion, in the same way as the EIA update scripts. Maybe, a future PR could focus on unravelling this recursion for ease of understanding.

Comment on lines +477 to +482
# If EIA has data but the JSON aggregate is empty, treat this as a hard error.
if json_is_empty and eia_dict:
raise RuntimeError(
"EIA data exists but JSON aggregate is empty for "
f"{bldg} | {fuel} | {end_use} (series {series_id})."
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this does not hit. But just left this case as a hard error in case a json field is not present in a future EIA update.

Comment on lines 316 to 344
bldg = filters.bldg_class
fuel = filters.fuel
end_use = filters.end_use

# Conditions used inside the ID string
if bldg == "residential":
condition_1 = uv.end_use_translator[end_use]
condition_2 = "NA"
if end_use == "heating" and fuel == "electricity":
condition_3 = "hhd" # special case for electric heating
else:
condition_3 = "NA"
else: # commercial
# Commercial electricity is labeled as "Purchased Electricity" in API
fuel_for_api = "Purchased Electricity" if fuel == "electricity" else fuel
condition_1 = "NA"
condition_2 = uv.end_use_translator[end_use]
condition_3 = "NA"
fuel = fuel_for_api

eia_series_id = (
f"cnsm_{condition_3}_"
f"{uv.bldg_class_translator[bldg]}_"
f"{condition_1}_"
f"{uv.fuel_type_translator[fuel]}_"
f"{condition_2}_usa_qbtu"
)

return eia_series_id
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secret sauce for EIA APIs for Scout.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment specifying what condition 1, 2, and 3 mean? Mostly so later on we don't need to go back to the EIA docs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here 👍🏾

Comment on lines +365 to +440
if position is None:
position = []

energy_by_year: dict[str, float] = {}

for key, value in data.items():
if isinstance(value, dict) and key != "energy":
# Keep going down one level
sub_result = recursive_aggregate(value, filters, uv, position + [key])
for yr, val in sub_result.items():
energy_by_year[yr] = energy_by_year.get(yr, 0.0) + val
continue

if key != "energy":
# Only interested in energy leaves
continue

path = position + [key]
if len(path) < 5:
# We expect: climate_zone / bldg_type / fuel / end_use / ... / energy
continue

_cz, bldg_type, fuel_name, eu_name, *rest = path
subkey = rest[0] if rest else ""

# Only consider the requested building class and fuel
if bldg_type not in uv.all_bldg_types[filters.bldg_class]:
continue
if fuel_name != filters.fuel:
continue

# Decide whether to include this particular leaf based on end‑use rules
end_use = filters.end_use
accept = False

if eu_name == "other" and subkey in uv.other_end_uses and end_use == "other":
accept = True
elif eu_name == "ceiling fan" and end_use == "other":
accept = True
elif (
eu_name == "other"
and subkey in uv.separate_other_end_uses
and end_use == subkey
):
accept = True
elif (
eu_name in uv.heating_end_uses
and end_use == "heating"
and subkey == "supply"
):
accept = True
elif eu_name == "cooling" and end_use == "cooling" and subkey == "supply":
accept = True
elif eu_name in uv.remaining_end_uses and end_use == eu_name:
accept = True
elif (
eu_name in ("other", "unspecified")
and fuel_name != "electricity"
and end_use == "other"
):
accept = True
elif (
eu_name in ("MELs", "unspecified")
and fuel_name == "electricity"
and end_use == "other"
):
accept = True
elif eu_name == end_use and subkey == "energy":
accept = True

if not accept:
continue

# At this point we decided that this leaf contributes to our total
for yr, val in value.items():
energy_by_year[yr] = energy_by_year.get(yr, 0.0) + val
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is how we unravel the mseg file. Again, this could become a non-recursive set of functions for much easier comprehension.

MSEG_PATH = "scout/supporting_data/stock_energy_tech_data/mseg_res_com_cz.json"


@dataclass(frozen=True)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL you can make a dataclass immutable with this decorator. Will keep in mind to check if it can be used more in the codebase/in ecm_prep.py or run.py.

return api_key


@on_exception(expo, Exception, max_tries=5)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another cool decorator. If an exception hits (like a timeout), retry up to 5 times with exponentially increasing delays.

@jmythms
Copy link
Collaborator Author

jmythms commented Nov 18, 2025

TODO: set GitHub secret for EIA API key if we want to add .github/workflows/eia-update-check.yml as an additional test. Then test it before merging to master.

Done

"security system",
"portable electric spas",
"smart speakers",
"tablets",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tell me this is GPT hallucinating and this isn't actually a thing. Also smart speakers?!?!?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a thing 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

dev = [
"openpyxl",
"flake8 >= 7.0",
"tabulate",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still failing in the actions? Can you confirm that it is actually loading this dependency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed

Copy link
Collaborator

@rHorsey rHorsey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much Jeremy! Sadly for you I have one more request, which is that you add a way for this to actually fail fail. Basically, if things exceed / get worse than current levels I'd like this to throw a red x, not a green checkmark. Otherwise we're never going to notice if / when the comparisons get worse...

@jmythms
Copy link
Collaborator Author

jmythms commented Dec 10, 2025

Added comparisons to make test fail if average error for all years for a building_type, fuel, end use > current values (+/- tolerance for floating point errors). Made test purposely fail here with wrong test value then corrected it to get green checkmark here.

@jmythms jmythms requested a review from rHorsey December 10, 2025 19:59
@rHorsey
Copy link
Collaborator

rHorsey commented Dec 11, 2025

Like it love it! Thanks so much @jmythms !!! Many thanks also for commenting the EIA conditions - I wouldn't have guessed those in a hurry.

Copy link
Collaborator

@rHorsey rHorsey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automated baseline comparison w/ EIA API on GH

3 participants