-
Notifications
You must be signed in to change notification settings - Fork 17
[ML-52574] Add covariate support in predict_timeseries for prediction table usage #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comment before merging.
""" | ||
Predict using the API from prophet model. | ||
:param horizon: Int number of periods to forecast forward. | ||
:param include_history: Boolean to include the historical dates in the data | ||
frame for predictions. | ||
:param future_df: Optional future input dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be more explicit that the future_df is for covariates and contains the covariate columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually future_df can be without covariates. The implementation here enables the function to take a df as input, it does not matter if it contains covariates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ic. please add more comments to the doc string: future_df is the dataframe that contains the time series column and covaraite columns if available. The returned result will contain the copy of future_df and the predicted target column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
if self._preprocess_func and self._split_col: | ||
future_df = apply_preprocess_func(future_df, self._preprocess_func, self._split_col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I read the function, this is a side effect that takes in the class field.
I suggest passing preprocess_fun
and split_col
as arguments to the _predict_impl and moving this logic to predict_timeseries
so they are more consistent between prophet and arima.
Also, please document in the predict_timeseries
that the preprocess will be applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preprocess_func and split_col are already class variables, why do we want to make them as input argument to another class function?
Just to clarify, this PR did not break consistency more between prophet and arima. Previously, arima has its make_futuer_dataframe inside _predict_impl but not predict_timeseries, which is not consistent with prophet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
@@ -279,3 +279,21 @@ def calculate_period_differences( | |||
freq_alias = PERIOD_ALIAS_MAP[OFFSET_ALIAS_MAP[frequency_unit]] | |||
# It is intended to get the floor value. And in the later check we will use this floor value to find out if it is not consistent. | |||
return (end_time.to_period(freq_alias) - start_time.to_period(freq_alias)).n // frequency_quantity | |||
|
|||
def apply_preprocess_func(df: pd.DataFrame, preprocess_func: callable, split_col: str) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking in this PR. it's always strange to read that split_col is required for such function. Let's refactor it in a follow-up
future_df.rename(columns={self._time_col: "ds"}, inplace=True) | ||
return self.model().predict(future_df) | ||
|
||
def predict_timeseries(self, horizon: int = None, include_history: bool = True, future_df: pd.DataFrame = None) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add unit test to test the future_df is used or created if None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
return future_df.groupby(self._id_cols).apply(lambda df: self._predict_impl(df, horizon, include_history)).reset_index() | ||
if self._preprocess_func and self._split_col: | ||
future_df = apply_preprocess_func(future_df, self._preprocess_func, self._split_col) | ||
future_df.rename(columns={self._time_col: "ds"}, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit test the future_df has columns renanmed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refactor the predict_impl and add unit test.
@@ -451,3 +488,68 @@ def preprocess_func(df): | |||
) | |||
yhat = prophet_model.predict(None, test_df) | |||
self.assertEqual(2, len(yhat)) | |||
|
|||
@patch("databricks.automl_runtime.forecast.prophet.model.MultiSeriesProphetModel._predict_impl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation of @patch line should be aligned with def test... below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is aligned on my side. Or python will complain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes.
This PR adds covariate support in predict_timeseries function for prediction table usage.
In details, it