Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Analysis Class API #252

Merged
merged 50 commits into from
Sep 27, 2023
Merged

Conversation

RHammond2
Copy link
Collaborator

@RHammond2 RHammond2 commented Aug 8, 2023

This PR addresses a longstanding issue with a lack of consistency between the analysis classes, and the differences between what is passed through __init__() vs run(). The updated paradigm is as follows:

  • All run and init arguments are now able to be passed to the init method
  • Only parameters that do not change the nature of what data is validated/checked for existence can be passed to run(). For example, the keyword argument wind_direction_col can be passed during initialization of WakeLosses, but not during the run method because we are changing the underlying data.
  • Changing reanalysis products is still allowed in run() because this is about the sensitivity to long term weather data, and not pertaining to the underlying wind power plant data.
  • Just as in v2, analysis classes do not operate on the PlantData object, but use the underlying data to create the permutations necessary for the focal analysis.

What this means is that run() can be used to iterate over permutations of an analysis on the same data, but if we need to swap out any data, e.g. different wind direction columns, additional SCADA columns, & etc., that should be considered to be an entirely new analysis, and therefore a new analysis class object is required.

Updated features:

  • openoa/schema/metadata.py:ANALYSIS_REQUIREMENTS contains all the available modifiers for running analyses. For example, there is now "WakeLosses-scada" and "WakeLosses-tower" to indicate which data will be used, depending on if your wind direction is coming from the scada or tower.
  • analysis_class.plant is now a deep copy of the passed data object to ensure that no underlying data are updated. The only exception to analysis.plant remaining unchanged is in the case of PlantData.analysis_type gets updated to include the analysis that will be performed.

Additionally, this PR expands the analysis requirements settings per #241.

@RHammond2 RHammond2 added the v3 All updates and changes for OpenOA v3 label Aug 8, 2023
@RHammond2 RHammond2 added this to the V3.0 milestone Aug 8, 2023
@codecov-commenter
Copy link

codecov-commenter commented Aug 8, 2023

Codecov Report

Attention: 83 lines in your changes are missing coverage. Please review.

Comparison is base (2807556) 65.50% compared to head (7de60a8) 65.28%.

Additional details and impacted files
@@              Coverage Diff               @@
##           develop_v3     #252      +/-   ##
==============================================
- Coverage       65.50%   65.28%   -0.22%     
==============================================
  Files              29       29              
  Lines            4061     4197     +136     
==============================================
+ Hits             2660     2740      +80     
- Misses           1401     1457      +56     
Files Coverage Δ
openoa/analysis/eya_gap_analysis.py 95.00% <100.00%> (+1.45%) ⬆️
openoa/schema/metadata.py 88.38% <100.00%> (+0.40%) ⬆️
openoa/analysis/_analysis_validators.py 71.05% <76.19%> (+2.63%) ⬆️
openoa/analysis/electrical_losses.py 77.08% <61.11%> (-2.77%) ⬇️
openoa/analysis/turbine_long_term_gross_energy.py 75.63% <72.41%> (-1.61%) ⬇️
openoa/analysis/yaw_misalignment.py 84.50% <85.71%> (-0.27%) ⬇️
openoa/analysis/wake_losses.py 86.97% <88.65%> (+0.11%) ⬆️
openoa/analysis/aep.py 84.78% <68.18%> (-1.20%) ⬇️
openoa/schema/schema.py 49.50% <0.00%> (-16.29%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ejsimley ejsimley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Rob, I like the consistent format for initializing the analysis classes and specifying/changing parameters in the run methods! I have some minor questions and comments, but I think this is mostly ready to go. I will also update the doc strings for the wake losses class and/or change the default values for some of the arguments since the defaults without UQ weren't originally always the average of the default range with UQ.

openoa/analysis/_analysis_validators.py Outdated Show resolved Hide resolved
openoa/analysis/_analysis_validators.py Outdated Show resolved Hide resolved
openoa/analysis/aep.py Outdated Show resolved Hide resolved
@@ -1003,7 +1098,10 @@ def run_AEP_monte_carlo(self):
iav[n] = gross_lt_annual.std() / gross_lt_annual.mean()
avail_pct[n] = avail_lt_losses
curt_pct[n] = curt_lt_losses
lt_por_ratio[n] = (gross_lt.sum() / self._run.num_years_windiness) / gross_por.sum()
gross_por_sum = gross_por.sum()
if isinstance(gross_por_sum, pd.Series):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was causing gross_por_sum to be a pd.Series sometimes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future pandas will just break for the original usage when gross_por_sum is a single element Series.

self,
num_sim: int,
reg_model: str = None,
reanalysis_products: list[str] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like changing reanalysis_products in run will work fine, but doesn't including this as an argument break the rule you mention that "Only parameters that do not change the nature of what data is validated/checked for existence can be passed to run()"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point! I think I originally envisioned this being as a subset of any that were originally provided, so if you add more, the method wouldn't work, but this was never actually implemented either.

I'm curious as to your thoughts here on whether it's an appropriate breaking of the paradigm or not. My own thoughts are that reanalysis data is already validated, so it's all there to be used, but at the same time it's still changing the nature of the analysis.

openoa/analysis/electrical_losses.py Show resolved Hide resolved
openoa/analysis/turbine_long_term_gross_energy.py Outdated Show resolved Hide resolved
openoa/analysis/turbine_long_term_gross_energy.py Outdated Show resolved Hide resolved
@@ -159,19 +170,78 @@ def __attrs_post_init__(self):
logger.info("Processing SCADA data into dictionaries by turbine (this can take a while)")
self.sort_scada_by_turbine()

def finalize_reanalysis_products(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a common function for some of the analysis classes. Not necessary, but could it be moved to the "_analysis_validators" module or a similar higher-level module?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely could, and I think my main hangup is that it's not universally shared across the analysis classes, nor is it doing much to begin with, so I felt it was easier to just keep in the methods where it was needed. That said, I could be convinced if you think we should make another mixin class like with the FromDictMixin that manages the dictionary initialization processes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejsimley I was thinking on this a bit more, and I created and another tool in _analysis_validators to do this. Let me know what you think.

openoa/analysis/turbine_long_term_gross_energy.py Outdated Show resolved Hide resolved
@jordanperr
Copy link
Collaborator

jordanperr commented Sep 12, 2023

Two thoughts:

Thought 1: How will this new paradigm extend to the workflow where analysis classes can "pull" data from an underlying datastore? For example, you might want to use MonteCarloAEP schema to pull all data necessary for a MonteCarloAEP from an ENTR warehouse. But then after pulling all the data, you might want some way to "exclude" one of the columns, say, to compare its impact on the AEP.

One possibility would be to extract the underlying PlantData object and then create two new MonteCarloAEP analyses using the same PlantData. Would that work?

I'm trying to think through if there might be a benefit to preserving the ability to include/exclude data in run().

Thought 2: Are PlantData and Analysis classes immutable? It didn't used to be. Please verify that no option in the run() function will mutate the PlantData and Analysis class. Or if they do, it should be noted somehow. The user will expect any combination of PlantData and Analysis class to be "run" multiple times with different options.

@RHammond2
Copy link
Collaborator Author

Two thoughts:

Thought 1: How will this new paradigm extend to the workflow where analysis classes can "pull" data from an underlying datastore? For example, you might want to use MonteCarloAEP schema to pull all data necessary for a MonteCarloAEP from an ENTR warehouse. But then after pulling all the data, you might want some way to "exclude" one of the columns, say, to compare its impact on the AEP.

One possibility would be to extract the underlying PlantData object and then create two new MonteCarloAEP analyses using the same PlantData. Would that work?

I'm trying to think through if there might be a benefit to preserving the ability to include/exclude data in run().

Thought 2: Are PlantData and Analysis classes immutable? It didn't used to be. Please verify that no option in the run() function will mutate the PlantData and Analysis class. Or if they do, it should be noted somehow. The user will expect any combination of PlantData and Analysis class to be "run" multiple times with different options.

@jordanperr thanks for the comments! I updated the description of the PR, but just to put it all in one place:

  • Only analysis_type gets updated in PlantData from an analysis class, but the analysis.plant object is also created as a deep copy of the original, so
  • you can create many new analysis classes to compare results of various modifications with ease
  • Naturally, if you modify an analysis parameter, the analysis class itself will be updated, otherwise you'd not be able to rerun an analysis with a new variation of a parameter, but what isn't modified is the underlying data.
  • The only thing excluded from the run() parameters are those that modify which data are validated, and therefore changing the fundamental basis of the analysis, with the exception of reanalysis_products.
  • For pulling analysis requirements from a data store, we still don't explicitly support this outside of grabbing requirements schemas for the analysis types in openoa/schema/ or from ANALYSIS_REQUIREMENTS

Let me know if I've missed anything here, or if you still have any reservations.

@RHammond2 RHammond2 merged commit 83ef14f into NREL:develop_v3 Sep 27, 2023
2 checks passed
@RHammond2 RHammond2 deleted the enhancement/analysis_api branch September 27, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v3 All updates and changes for OpenOA v3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants