Skip to content

[ML-52574] Add covariate support in predict_timeseries for prediction table usage #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 29, 2025

Conversation

Lanz-db
Copy link

@Lanz-db Lanz-db commented Apr 25, 2025

This PR adds covariate support in predict_timeseries function for prediction table usage.
In details, it

  1. adds preprocess_func before doing prediction for covariates
  2. fix the reset_index bug that id column can not be inserted back due to duplicate

Copy link

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment before merging.

@Lanz-db Lanz-db changed the title [ML-52574] Add a new predict function for generating prediction table with covariates [ML-52574] Add covariate support in predict_timeseries for prediction table usage Apr 28, 2025
@Lanz-db Lanz-db requested a review from apeforest April 28, 2025 23:51
@Lanz-db Lanz-db requested a review from apeforest April 29, 2025 17:21
"""
Predict using the API from prophet model.
:param horizon: Int number of periods to forecast forward.
:param include_history: Boolean to include the historical dates in the data
frame for predictions.
:param future_df: Optional future input dataframe

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be more explicit that the future_df is for covariates and contains the covariate columns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually future_df can be without covariates. The implementation here enables the function to take a df as input, it does not matter if it contains covariates

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ic. please add more comments to the doc string: future_df is the dataframe that contains the time series column and covaraite columns if available. The returned result will contain the copy of future_df and the predicted target column.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@Lanz-db Lanz-db requested a review from apeforest April 29, 2025 17:29
Comment on lines +125 to +126
if self._preprocess_func and self._split_col:
future_df = apply_preprocess_func(future_df, self._preprocess_func, self._split_col)
Copy link

@apeforest apeforest Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I read the function, this is a side effect that takes in the class field.

I suggest passing preprocess_fun and split_col as arguments to the _predict_impl and moving this logic to predict_timeseries so they are more consistent between prophet and arima.

Also, please document in the predict_timeseries that the preprocess will be applied.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preprocess_func and split_col are already class variables, why do we want to make them as input argument to another class function?

Just to clarify, this PR did not break consistency more between prophet and arima. Previously, arima has its make_futuer_dataframe inside _predict_impl but not predict_timeseries, which is not consistent with prophet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

@@ -279,3 +279,21 @@ def calculate_period_differences(
freq_alias = PERIOD_ALIAS_MAP[OFFSET_ALIAS_MAP[frequency_unit]]
# It is intended to get the floor value. And in the later check we will use this floor value to find out if it is not consistent.
return (end_time.to_period(freq_alias) - start_time.to_period(freq_alias)).n // frequency_quantity

def apply_preprocess_func(df: pd.DataFrame, preprocess_func: callable, split_col: str) -> pd.DataFrame:
Copy link

@apeforest apeforest Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking in this PR. it's always strange to read that split_col is required for such function. Let's refactor it in a follow-up

future_df.rename(columns={self._time_col: "ds"}, inplace=True)
return self.model().predict(future_df)

def predict_timeseries(self, horizon: int = None, include_history: bool = True, future_df: pd.DataFrame = None) -> pd.DataFrame:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add unit test to test the future_df is used or created if None

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

return future_df.groupby(self._id_cols).apply(lambda df: self._predict_impl(df, horizon, include_history)).reset_index()
if self._preprocess_func and self._split_col:
future_df = apply_preprocess_func(future_df, self._preprocess_func, self._split_col)
future_df.rename(columns={self._time_col: "ds"}, inplace=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit test the future_df has columns renanmed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please refactor the predict_impl and add unit test.

@Lanz-db Lanz-db requested a review from apeforest April 29, 2025 19:16
@@ -451,3 +488,68 @@ def preprocess_func(df):
)
yhat = prophet_model.predict(None, test_df)
self.assertEqual(2, len(yhat))

@patch("databricks.automl_runtime.forecast.prophet.model.MultiSeriesProphetModel._predict_impl")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation of @patch line should be aligned with def test... below?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is aligned on my side. Or python will complain?

Copy link

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes.

@Lanz-db Lanz-db merged commit f45437d into branch-0.2.20.8 Apr 29, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants