Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config object to keep config in sync at all times #704

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

brynpickering
Copy link
Member

@brynpickering brynpickering commented Nov 6, 2024

Fixes #626
Partially fixes #619

Thought I'd already upload this so you can contribute to the attempt @irm-codebase. Tests haven't been cleaned up so I expect many will fail.

I'm quite liking pydantic @sjpfenninger. I know we questioned using it some time ago, but now that I've spent more time with it I do wonder whether it might make other parts of the code and input validation cleaner...

Summary of changes in this pull request

  • Config is a pydantic model, replacing the config schema (we can dump a yaml schema at any time, though!)
  • Config repr hides operate/spores options data if those modes aren't activated
  • For debugging, pydantic methods can be used to hide defaults (model.config.model_dump(exclude_defaults=True))
  • build and solve steps have isolated configs that account for ad-hoc kwargs, which are stored in the config class as applied_keyword_overrides (might want something snappier)
  • operate_[...] and spores_[...] config options have returned to being sub-dicts (build.operate.[...] and build.spores.[...]) so the options can be easily isolated and passed around as necessary
  • config options are "frozen" unless using the update method, which returns an updated config object, but keeps the base config object unchanged except for content in applied_keyword_overrides. So you can't change config options accidentally (e.g. model.config.init.name = "new_name" won't work).
  • Intellisense picks up the config option docstrings which is useful when doing development and probably also for users writing scripts!

Reviewer checklist

  • Test(s) added to cover contribution
  • Documentation updated
  • Changelog updated
  • Coverage maintained or improved

@irm-codebase
Copy link
Contributor

@brynpickering this sounds really nice. I'll give a thorough look at it.

One thing, based on your description of the additions (and before I check the code): perhaps we should alter things so that the mode extends the configuration, rather than hiding it? This would avoid extra work on our end down the line, and move the code of these modes towards a 'plug-in' approach.

I'll share more thoughts or suggestions in a while.

Copy link
Contributor

@irm-codebase irm-codebase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment, for now, since the code is not 100% ready.

Generally, I like this proposal.

Some of the positives I see.

  • The capacity in inherit validation models could do wonders to streamline our code. In particular, it could enable us to make non-standard modes 'plug-ins' that people can choose to use.
  • With a bit of context, pydantic makes the configuration quite easy to follow. It giving intellisense suggestions is very nice too.

Some concerns, though:

  • I see this approach as a duplicate of yaml schemas. We should only use one. Keeping both only makes the code harder to understand, imo.
  • I do not think this really solves Invalid configuration showing for un-activated modes due to schema defaults #626. The configuration is still 'tangled'. But it does provide a way to solve it.
  • More of an open question than a concern: would pydantic help in our efforts to make parameters and dimensions part of the input yaml files?

Comment on lines +38 to +44
match build_config.backend:
case "pyomo":
return PyomoBackendModel(data, math, **kwargs)
return PyomoBackendModel(data, math, build_config)
case "gurobi":
return GurobiBackendModel(data, math, **kwargs)
return GurobiBackendModel(data, math, build_config)
case _:
raise BackendError(f"Incorrect backend '{name}' requested.")
raise BackendError(f"Incorrect backend '{build_config.backend}' requested.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an area where the old approach and the new pydantic may be at odds.
Case _ is spurious, since pydantic should catch wrong settings beforehand, no?

Comment on lines +29 to +35
from calliope import config, exceptions
from calliope.attrdict import AttrDict
from calliope.backend import helper_functions, parsing
from calliope.exceptions import warn as model_warn
from calliope.io import load_config
from calliope.preprocess.model_math import ORDERED_COMPONENTS_T, CalliopeMath
from calliope.util.schema import (
MODEL_SCHEMA,
extract_from_schema,
update_then_validate_config,
)
from calliope.util.schema import MODEL_SCHEMA, extract_from_schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An issue here is that config.py and config/**.yaml files are at odds, since both provide similar functionality. Do we expect pydantic to replace our approach completely?

Comment on lines 19 to 21
if TYPE_CHECKING:
from calliope import config
from calliope.backend.backend_model import BackendModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for TYPE_CHECKING may indicate that config.py is not being placed sensibly (a possible cyclic import?). Would it make sense to move this into src/calliope/config/?). The dependencies of config.py do not seem to conflict with anything else, and this would make things easier to maintain.

Comment on lines +346 to +351
@model_validator(mode="before")
@classmethod
def update_solve_mode(cls, data):
"""Solve mode should match build mode."""
data["solve"]["mode"] = data["build"]["mode"]
return data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to do nothing?

Comment on lines +85 to +105
@overload
def model_yaml_schema(self, filepath: str | Path) -> None: ...

@overload
def model_yaml_schema(self, filepath: None = None) -> str: ...

def model_yaml_schema(self, filepath: str | Path | None = None) -> None | str:
"""Generate a YAML schema for the class.

Args:
filepath (str | Path | None, optional): If given, save schema to given path. Defaults to None.

Returns:
None | str: If `filepath` is given, returns None. Otherwise, returns the YAML string.
"""
# By default, the schema uses $ref/$def cross-referencing for each pydantic model class,
# but this isn't very readable when rendered in our documentation.
# So, we resolve references and then delete all the `$defs`
schema_dict = AttrDict(jsonref.replace_refs(self.model_json_schema()))
schema_dict.del_key("$defs")
return schema_dict.to_yaml(filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Nice to have' comment: the need for these overloads stems from to_yaml wanting to be two functions:

  • One saves a YAML schema file
  • The other returns a YAML string

An easy fix to make our code leaner would be to split this into two simpler functions: first one generates the string (to_yaml), the second uses the first to save it (save_yaml).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in config.py seems to only want the first functionality, generally.

if "applied_math" in model_data.attrs:
self.applied_math = preprocess.CalliopeMath.from_dict(
model_data.attrs.pop("applied_math")
)
if "config" in model_data.attrs:
self.config = config.CalliopeConfig(**model_data.attrs.pop("config"))
self.config.update(model_data.attrs.pop("config_kwarg_overrides"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you detail why updating with the overrides is necessary?
Wouldn't this cause ambiguity in what the pre-existing results relate to?

Comment on lines +270 to +271
this_build_config = self.config.update({"build": kwargs}).build
mode = this_build_config.mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of an odd name...
this_build_config -> new_build_config, or just build_config???

Comment on lines +347 to +348

this_solve_config = self.config.update({"solve": kwargs}).solve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above: this naming feels a bit... off?
solve_config is perfectly fine, IMO

Comment on lines 466 to 469
def _prepare_operate_mode_inputs(
self, start_window_idx: int = 0, **config_kwargs
self, operate_config: config.BuildOperate
) -> xr.Dataset:
"""Slice the input data to just the length of operate mode time horizon.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an additional comment to my comments on moving towards configuration setups that do not 'tangle', this is the kind of function I'd hope moves into a different file or plug-in, which is only possible if the most basic configuration is not 'tainted' by a plethora of modes.

Ditto for all the other if mode == 'operate' cases we have lying around.

table_name: str,
data_table: DataTableDict,
data_table_dfs: dict[str, pd.DataFrame] | None = None,
model_definition_path: Path | None = None,
model_definition_path: Path = Path("."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we always pass model_definition_path, I'd remove the default entirely. It does not seem to be 'optional'.

Less complexity / bugs down the line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants