-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build year aggregation #1056
base: master
Are you sure you want to change the base?
Build year aggregation #1056
Conversation
I could also briefly explain the other changes here apart from the aggregation and disaggregation functions:
(The CI is btw failing cause pypsa==0.28.0 isn't quite available yet; use the latest commit instead or wait a little before testing.) |
Saves a large amount of memory during problem preparation and solving. The aggregation and disaggregation steps take some time-overhead however.
for more information, see https://pre-commit.ci
I'd be cool to see this in v0.11 :) I've taken the liberty to clean up the history of this branch a little bit, and added what I think is sufficient documentation. After also handling changing marginal cost over time, I can now barely see any difference between aggregated and non-aggregated results. From my side this is good to go! A couple of quick things:
|
Actually sorry I take it back about this being good to go; please give me time for one more commit with a couple more fixes. Sorry I had two branches laying around and mixed them up a little. |
Saves a large amount of memory during problem preparation and solving. The aggregation and disaggregation steps take some time-overhead however.
Okay now it's finally all good :) |
Resolves occasional warnings issued by pandas when build year aggregation is enabled.
I'm actually still ironing out some small issues here and there; maybe it's an idea to still wait a little bit before merging. |
Also specify nominal attrbibutes to store in `vars_to_store`
Just a few answers so far:
I think this is a known issue: PyPSA/PyPSA#722
The environment is cached, maybe that was the cause. Anyway, the issue seems to be resolved.
Yes, fully agree.
This was addressed in #1065 and should be fine now.
Yes, could start with a function in |
What about next week I write a small function in By next week I'll have tested the build year aggregation even more myself too. It's currently working fine for me, and by then I should be even more confident that it really works. There are no remaining outstanding issues from my side that would have to be fixed before merging. |
I hope you don't mind @koen-vg, but I want to move this feature to the v0.12.0 release given that the pending v0.11.0 release is already very big. We can do this in relatively quick succession. |
Absolutely no problem! I agree v0.11 is rather momentous already. I'll be using this feature anyway and from my side it's not much of a burden to delay the merge a little. Just get the release out the door first :) |
The "reversed" column can lead to problems upon exporting a network; somehow these problems only become apparent upon build year aggregation and disaggregation. Nominally, netCDF doesn't support boolean values, which is a likely explanation. However, it's unclear the problem arises specifically with the "reversed" column and only when aggregating/disaggregating.
The solar potential constraint as originally formulated was not compatible with build year aggregation. Here, the constraint has been reformulated in such a way that it works for both "regular" networks and ones which have been aggregated by build-year. The previous formulation effectively calculated the remaining area available for solar (by subtracting installed solar-hsat from remaining area as given by p_nom_max, which itself has already been reduced by existing solar), and using this to limit the sum of newly installed solar and solar-hsat. The new formulation instead calculates _total_ available solar capacity (by summing installed capacity of the "solar" carrier and p_nom_max of extendable "solar" generators), and uses this to limit the total (both extendable and non-extendable) capacity of all solar carriers (including "solar-hsat" and potentially others). The new formulation is also a little shorter overall, and another bonus is that it should work if ever more types of solar are added (whereas the old formulation wasn't as easily adaptable to a situation with more than just "solar" and "solar-hsat").
for more information, see https://pre-commit.ci
Note: this feature also depends on the fix in #1262! That PR should be merged first, then this one. |
@fneum what do you think about merging this before the next release? Or is there anything that's still missing from your point of view (documentation, examples, tests, whatever)? It'll make my life a little easier if I don't have to keep this branch up-to-date :) It's an opt-in feature so shouldn't make a difference to users. Happy to write in more detail about the implementation, pros and cons, etc. if that would be helpful. |
@koen-vg, I'll try to find time this week to review! Sorry that this PR hasn't been prioritized! |
No worries, I understand that it's hard to find time for these things! |
Hmm, I take it that with PyPSA/PyPSA#1038, this implementation could be simplified a little bit, at least in that components that are to be clustered can just be marked as inactive instead of having to store information about them in ad-hoc extra columns as is currently done. I can have a look at implementing that if we want to use that new functionality, or it can also be done in a separate PR after merging this one. |
Okay I've refactored the implementation, using the new I recommend using the following minimal configuration to test the implementation for anyone that's interested: run:
name: "buildyearagg"
foresight: myopic
scenario:
ll:
- vopt
clusters:
- 45
opts:
- ""
sector_opts:
- ""
- "aggBuildYear"
planning_horizons:
- 2020
- 2030
- 2040
- 2050
sector:
central_heat_everywhere: true
clustering:
temporal:
resolution_sector: "168H"
solving:
solver:
name: gurobi
options: gurobi-default Then run the workflow up to |
Now, the problem is that, while correct, the way that the Anyone who still wants the memory benefits could for now use f4b208c. |
I did manage to fully exclude inactive components from model building in PyPSA/PyPSA@094af75; that's a pretty quick-and-dirty implementation but shows that it works in principle. The results are good: memory is back down to where it's supposed to be. |
As mentioned in PyPSA/linopy#248, myopic foresight optimisation comes with a significant progressive memory overhead. The problem seems to be two-fold:
See PyPSA/linopy#248 for a graph illustrating both effects.
This PR is an attempt at leading with the first of these two issues; in the process it does also solve the second issue in my tests (but cannot guarantee that that issue is resolved for any configuration).
Changes proposed in this Pull Request
The idea is that myopic foresight optimisation involves adding many near-copies of components, one for each planning horizon, with only one (the one corresponding to the "current" planning horizon) being extendable for each optimisation. For example, you have
NO0 0 onwind-2020
,NO0 0 onwind-2025
, etc.Now, these components are essentially the same apart from nominal capacity: they have the same bus connections, efficiencies, capacity factors, etc. This means that they could be treated as a single component. That is what this PR does.
The PR introduces two new functions in
solve_network.py
: an aggregation function and a disaggregation function. The aggregation function would aggregate e.g. the two componentsNO0 0 onwind-2020
,NO0 0 onwind-2025
to a single componentNO0 0 onwind
. The network is then solved. After solving the disaggregation function reconstructs the two original componentsNO0 0 onwind-2020
,NO0 0 onwind-2025
. Ideally, these two functions should be inverses in that first aggregating and then disaggregating results in the original network.Of course, some information might be lost in aggregation. Importantly, one needs to keep track of the nominal capacity (and lifetime) in each build year so that these capacities can be phased out at the end of their lifetimes correctly.
To deal with this, the aggregation function adds additional columns
p_nom-2020
,p_nom-2025
, etc. ton.df(c)
(for each class of componentsc
); the contents of these columns can then be used to reconstruct the original components in the disaggregation function.Exactly which attributes are stored between aggregation and disaggregation is flexible; currently this stands at
p_nom
,p_nom_max
,lifetime
andcapital_cost
.Results
I have tested the implementation with the following configuration:
Note that I added an option and sector_opts wildcard flag to enable or disable the build year aggregation.
The networks with and without build year aggregation have nearly identical characteristics, which I take as very strong evidence that the aggregation and disaggregation implementations are correct; see e.g. the energy balances:
As for what the whole point of this was, the build year aggregation leads to a drastic decrease in memory footprint. Here is the memory usage over time for the first and last planning horizons with and without build year aggregation:
That's an improvement of more than a factor of 5! (from more than 20GB to about 4GB). (The graph includes the aggregation and disaggregation steps.)
Interestingly the solving time doesn't change so much. Evidently Gurobi can solve the non-aggregated problem just as quickly; it only needs a lot more memory.
The aggregation and disaggregation does take some time, leading to a slight overhead. At the resolution of the above test (50 nodes, 24H), this just about breaks even. At lower resolutions, the overhead of the (dis)aggregation means that the total amount of time taken when aggregating build-years can be higher than without aggregation; the memory footprint is however consistently much lower.
Notes
I wrestled with this aggregation and disaggregation for a little while, and I have to say it's rather gnarly stuff.
As it stands, I think this branch could be ready for general use "at your own risk"; it could be interesting to merge it but disable the aggregation, leaving it an option for people (like me, currently) who need to cut their memory footprint, with the understanding that things could possibly go wrong and you have to double-check that the results are correct.
(Obviously a little bit of polishing is still needed, i.e. documentation, release notes, etc. etc.)
I'd also be interested in hearing about alternative approaches. The radical approach would be for pypsa to provide built-in support for build-years somehow; this would circumvent the aggregation and disaggregation implemented here but would presumably be a pretty substantial change to pypsa. Maybe it's the best solution in the long term, however.
It's also possible that there are neater ways of implementing what I did, and suggestions for improvement are welcome. In particular, it might be possible to optimise some the operations, cause they do take some time (leading to the above overhead) and it feels like it could be faster. I did a little bit of profiling and couldn't find any immediate improvements, however.
Caveats
p_nom_min
andp_nom_min
(ande_nom_min
,e_nom_max
) for non-extendable components in order to set the corresponding attributes correctly for the aggregated component.bus2
. To counter this, I added an option to add central heat buses everywhere, regardless of the fraction of population served by urban central heating.Checklist
envs/environment.yaml
.config.default.yaml
.doc/configtables/*.csv
.doc/release_notes.rst
is added.