-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Analysis Class API #252
New Analysis Class API #252
Conversation
…ared to develop_v3
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop_v3 #252 +/- ##
==============================================
- Coverage 65.50% 65.28% -0.22%
==============================================
Files 29 29
Lines 4061 4197 +136
==============================================
+ Hits 2660 2740 +80
- Misses 1401 1457 +56
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Rob, I like the consistent format for initializing the analysis classes and specifying/changing parameters in the run
methods! I have some minor questions and comments, but I think this is mostly ready to go. I will also update the doc strings for the wake losses class and/or change the default values for some of the arguments since the defaults without UQ weren't originally always the average of the default range with UQ.
openoa/analysis/aep.py
Outdated
@@ -1003,7 +1098,10 @@ def run_AEP_monte_carlo(self): | |||
iav[n] = gross_lt_annual.std() / gross_lt_annual.mean() | |||
avail_pct[n] = avail_lt_losses | |||
curt_pct[n] = curt_lt_losses | |||
lt_por_ratio[n] = (gross_lt.sum() / self._run.num_years_windiness) / gross_por.sum() | |||
gross_por_sum = gross_por.sum() | |||
if isinstance(gross_por_sum, pd.Series): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was causing gross_por_sum
to be a pd.Series sometimes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future pandas will just break for the original usage when gross_por_sum
is a single element Series.
self, | ||
num_sim: int, | ||
reg_model: str = None, | ||
reanalysis_products: list[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like changing reanalysis_products
in run
will work fine, but doesn't including this as an argument break the rule you mention that "Only parameters that do not change the nature of what data is validated/checked for existence can be passed to run()
"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point! I think I originally envisioned this being as a subset of any that were originally provided, so if you add more, the method wouldn't work, but this was never actually implemented either.
I'm curious as to your thoughts here on whether it's an appropriate breaking of the paradigm or not. My own thoughts are that reanalysis data is already validated, so it's all there to be used, but at the same time it's still changing the nature of the analysis.
@@ -159,19 +170,78 @@ def __attrs_post_init__(self): | |||
logger.info("Processing SCADA data into dictionaries by turbine (this can take a while)") | |||
self.sort_scada_by_turbine() | |||
|
|||
def finalize_reanalysis_products(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a common function for some of the analysis classes. Not necessary, but could it be moved to the "_analysis_validators" module or a similar higher-level module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely could, and I think my main hangup is that it's not universally shared across the analysis classes, nor is it doing much to begin with, so I felt it was easier to just keep in the methods where it was needed. That said, I could be convinced if you think we should make another mixin class like with the FromDictMixin
that manages the dictionary initialization processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ejsimley I was thinking on this a bit more, and I created and another tool in _analysis_validators
to do this. Let me know what you think.
Two thoughts: Thought 1: How will this new paradigm extend to the workflow where analysis classes can "pull" data from an underlying datastore? For example, you might want to use MonteCarloAEP schema to pull all data necessary for a MonteCarloAEP from an ENTR warehouse. But then after pulling all the data, you might want some way to "exclude" one of the columns, say, to compare its impact on the AEP. One possibility would be to extract the underlying PlantData object and then create two new MonteCarloAEP analyses using the same PlantData. Would that work? I'm trying to think through if there might be a benefit to preserving the ability to include/exclude data in run(). Thought 2: Are PlantData and Analysis classes immutable? It didn't used to be. Please verify that no option in the run() function will mutate the PlantData and Analysis class. Or if they do, it should be noted somehow. The user will expect any combination of PlantData and Analysis class to be "run" multiple times with different options. |
…gle attrs validator
@jordanperr thanks for the comments! I updated the description of the PR, but just to put it all in one place:
Let me know if I've missed anything here, or if you still have any reservations. |
This PR addresses a longstanding issue with a lack of consistency between the analysis classes, and the differences between what is passed through
__init__()
vsrun()
. The updated paradigm is as follows:run()
. For example, the keyword argumentwind_direction_col
can be passed during initialization ofWakeLosses
, but not during therun
method because we are changing the underlying data.run()
because this is about the sensitivity to long term weather data, and not pertaining to the underlying wind power plant data.PlantData
object, but use the underlying data to create the permutations necessary for the focal analysis.What this means is that run() can be used to iterate over permutations of an analysis on the same data, but if we need to swap out any data, e.g. different wind direction columns, additional SCADA columns, & etc., that should be considered to be an entirely new analysis, and therefore a new analysis class object is required.
Updated features:
openoa/schema/metadata.py:ANALYSIS_REQUIREMENTS
contains all the available modifiers for running analyses. For example, there is now "WakeLosses-scada" and "WakeLosses-tower" to indicate which data will be used, depending on if your wind direction is coming from thescada
ortower
.analysis_class.plant
is now a deep copy of the passed data object to ensure that no underlying data are updated. The only exception toanalysis.plant
remaining unchanged is in the case ofPlantData.analysis_type
gets updated to include the analysis that will be performed.Additionally, this PR expands the analysis requirements settings per #241.