You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge pull request #24 from sintel-dev/revised-api
New Zephyr API
- One Zephyr class
- encapsulates entire predictive engineering workflow.
- stores user state
- wrapped by GuideHandler.guide_step
- GuideHandler.guide_step helps manage the users flow of steps and helps ensure that the actual Zephyr state remains consistent
- Provides helpful logging so that users can understand what steps they should perform to make progress
Copy file name to clipboardExpand all lines: README.md
+88-94Lines changed: 88 additions & 94 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,26 +13,26 @@
13
13
14
14
A machine learning library for assisting in the generation of machine learning problems for wind farms operations data by analyzing past occurrences of events.
***EntitySet creation**: tools designed to represent wind farm data and the relationship
46
-
between different tables. We have functions to create EntitySets for datasets with PI data
47
-
and datasets using SCADA data.
48
-
***Labeling Functions**: a collection of functions, as well as tools to create custom versions
49
-
of them, ready to be used to analyze past operations data in the search for occurrences of
50
-
specific types of events in the past.
51
-
***Prediction Engineering**: a flexible framework designed to apply labeling functions on
52
-
wind turbine operations data in a number of different ways to create labels for custom
53
-
Machine Learning problems.
54
-
***Feature Engineering**: a guide to using Featuretools to apply automated feature engineerinig
55
-
to wind farm data.
45
+
-**EntitySet creation**: tools designed to represent wind farm data and the relationship
46
+
between different tables. We have functions to create EntitySets for datasets with PI data
47
+
and datasets using SCADA data.
48
+
-**Labeling Functions**: a collection of functions, as well as tools to create custom versions
49
+
of them, ready to be used to analyze past operations data in the search for occurrences of
50
+
specific types of events in the past.
51
+
-**Prediction Engineering**: a flexible framework designed to apply labeling functions on
52
+
wind turbine operations data in a number of different ways to create labels for custom
53
+
Machine Learning problems.
54
+
-**Feature Engineering**: a guide to using Featuretools to apply automated feature engineerinig
55
+
to wind farm data.
56
56
57
57
# Install
58
58
59
59
## Requirements
60
60
61
61
**Zephyr** has been developed and runs on Python 3.8, 3.9, 3.10, 3.11 and 3.12.
62
62
63
-
Also, although it is not strictly required, the usage of a [virtualenv](
64
-
https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid interfering
63
+
Also, although it is not strictly required, the usage of a [virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid interfering
65
64
with other software installed in the system where you are trying to run **Zephyr**.
66
65
67
66
## Download and Install
@@ -79,35 +78,38 @@ If you want to install from source or contribute to the project please read the
79
78
# Quickstart
80
79
81
80
In this short tutorial we will guide you through a series of steps that will help you
82
-
getting started with **Zephyr**.
81
+
getting started with **Zephyr**. For more detailed examples, please refer to the tutorial notebooks in the `notebooks` directory:
82
+
83
+
-`feature_engineering.ipynb`: Learn how to create EntitySets and perform feature engineering
84
+
-`modeling.ipynb`: Learn how to train and evaluate models
85
+
-`visualization.ipynb`: Learn how to visualize your data and results
83
86
84
87
## 1. Loading the data
85
88
86
-
The first step we will be to use preprocessed data to create an EntitySet. Depending on the
87
-
type of data, we will either the `zephyr_ml.create_pidata_entityset` or `zephyr_ml.create_scada_entityset`
88
-
functions.
89
+
The first step will be to use preprocessed data to create an EntitySet. Depending on the
90
+
type of data, we will use either the `generate_entityset` function with `es_type="pidata"`, `es_type="scada"` or `es_type="vibrations"`.
89
91
90
92
**NOTE**: if you cloned the **Zephyr** repository, you will find some demo data inside the
91
-
`notebooks/data` folder which has been preprocessed to fit the `create_entityset` data
92
-
requirements.
93
+
`notebooks/data` folder which has been preprocessed to fit the data requirements.
This will load the turbine, alarms, stoppages, work order, notifications, and SCADA data, and return it
@@ -132,15 +134,10 @@ Entityset: SCADA data
132
134
133
135
## 2. Selecting a Labeling Function
134
136
135
-
The second step will be to choose an adequate **Labeling Function**.
136
-
137
-
We can see the list of available labeling functions using the `zephyr_ml.labeling.get_labeling_functions`
138
-
function.
139
-
140
-
```python3
141
-
from zephyr_ml import labeling
137
+
The second step will be to choose an adequate **Labeling Function**. We can see the list of available labeling functions using the `GET_LABELING_FUNCTIONS` method.
0 2022-01-01 TA00 A0 LOC000 T00 LOCATION LOC ... Alarm1 Alarm1 Description of alarm 1 1 1 45801.0
200
196
201
197
[1 rows x 21 columns]
202
198
```
203
199
204
-
205
200
## 5. Modeling
206
201
207
-
Once we have the feature matrix, we can train a model using the Zephyr interface where you can train, infer, and evaluate a pipeline.
208
-
First, we need to prepare our dataset for training by creating ``X`` and ``y`` variables and one-hot encoding features.
202
+
Once we have the feature matrix, we can train a model using the Zephyr interface. First, we need to prepare our dataset for training by creating a train-test split.
In this example, we will use an 'xgb' regression pipeline to predict total power loss.
216
-
217
-
```python3
218
-
from zephyr_ml import Zephyr
211
+
In this example, we will use an 'xgb' regression pipeline to predict total power loss. To train the pipeline, we simply call the `fit_pipeline` method.
219
212
220
-
pipeline_name ='xgb_regressor'
213
+
```python
214
+
zephyr.fit_pipeline(
215
+
pipeline="xgb_regressor",
216
+
pipeline_hyperparameters=None,
221
217
222
-
zephyr = Zephyr(pipeline_name)
218
+
)
223
219
```
224
220
225
-
To train the pipeline, we simply use the `fit` function.
226
-
```python3
227
-
zephyr.fit(X, y)
221
+
After it finished training, we can make predictions using `predict`
222
+
223
+
```python
224
+
y_pred = zephyr.predict(X_test)
228
225
```
229
226
230
-
After it finished training, we can make prediciton using `predict`
227
+
We can also use `evaluate` to obtain the performance of the pipeline.
231
228
232
-
```python3
233
-
y_pred=zephyr.predict(X)
229
+
```python
230
+
results= zephyr.evaluate()
234
231
```
235
232
236
-
We can also use ``zephyr.evaluate`` to obtain the performance of the pipeline.
237
-
238
233
# What's Next?
239
234
240
235
If you want to continue learning about **Zephyr** and all its
241
-
features please have a look at the tutorials found inside the [notebooks folder](
0 commit comments