[ML-52574] Add covariate support in predict_timeseries for prediction table usage #169

Lanz-db · 2025-04-25T06:11:45Z

This PR adds covariate support in predict_timeseries function for prediction table usage.
In details, it

adds preprocess_func before doing prediction for covariates
fix the reset_index bug that id column can not be inserted back due to duplicate

runtime/databricks/automl_runtime/forecast/prophet/model.py

apeforest

Please address the comment before merging.

runtime/databricks/automl_runtime/forecast/prophet/model.py

apeforest · 2025-04-29T17:24:10Z

runtime/databricks/automl_runtime/forecast/prophet/model.py

        """
        Predict using the API from prophet model.
        :param horizon: Int number of periods to forecast forward.
        :param include_history: Boolean to include the historical dates in the data
            frame for predictions.
+        :param future_df: Optional future input dataframe


be more explicit that the future_df is for covariates and contains the covariate columns.

actually future_df can be without covariates. The implementation here enables the function to take a df as input, it does not matter if it contains covariates

ic. please add more comments to the doc string: future_df is the dataframe that contains the time series column and covaraite columns if available. The returned result will contain the copy of future_df and the predicted target column.

runtime/databricks/automl_runtime/forecast/prophet/model.py

apeforest · 2025-04-29T17:52:24Z

runtime/databricks/automl_runtime/forecast/prophet/model.py

+        if self._preprocess_func and self._split_col:
+            future_df = apply_preprocess_func(future_df, self._preprocess_func, self._split_col)


When I read the function, this is a side effect that takes in the class field.

I suggest passing preprocess_fun and split_col as arguments to the _predict_impl and moving this logic to predict_timeseries so they are more consistent between prophet and arima.

Also, please document in the predict_timeseries that the preprocess will be applied.

preprocess_func and split_col are already class variables, why do we want to make them as input argument to another class function?

Just to clarify, this PR did not break consistency more between prophet and arima. Previously, arima has its make_futuer_dataframe inside _predict_impl but not predict_timeseries, which is not consistent with prophet.

apeforest · 2025-04-29T17:56:02Z

runtime/databricks/automl_runtime/forecast/utils.py

@@ -279,3 +279,21 @@ def calculate_period_differences(
    freq_alias = PERIOD_ALIAS_MAP[OFFSET_ALIAS_MAP[frequency_unit]]
    # It is intended to get the floor value. And in the later check we will use this floor value to find out if it is not consistent.
    return  (end_time.to_period(freq_alias) - start_time.to_period(freq_alias)).n // frequency_quantity
+
+def apply_preprocess_func(df: pd.DataFrame, preprocess_func: callable, split_col: str) -> pd.DataFrame:


Non-blocking in this PR. it's always strange to read that split_col is required for such function. Let's refactor it in a follow-up

apeforest · 2025-04-29T18:00:53Z

runtime/databricks/automl_runtime/forecast/prophet/model.py

+        future_df.rename(columns={self._time_col: "ds"}, inplace=True)
+        return self.model().predict(future_df)
+
+    def predict_timeseries(self, horizon: int = None, include_history: bool = True, future_df: pd.DataFrame = None) -> pd.DataFrame:


please add unit test to test the future_df is used or created if None

apeforest · 2025-04-29T18:01:16Z

runtime/databricks/automl_runtime/forecast/prophet/model.py

-        return future_df.groupby(self._id_cols).apply(lambda df: self._predict_impl(df, horizon, include_history)).reset_index()
+        if self._preprocess_func and self._split_col:
+            future_df = apply_preprocess_func(future_df, self._preprocess_func, self._split_col)
+        future_df.rename(columns={self._time_col: "ds"}, inplace=True)


unit test the future_df has columns renanmed

apeforest

please refactor the predict_impl and add unit test.

apeforest · 2025-04-29T20:19:01Z

runtime/tests/automl_runtime/forecast/prophet/model_test.py

@@ -451,3 +488,68 @@ def preprocess_func(df):
        )
        yhat = prophet_model.predict(None, test_df)
        self.assertEqual(2, len(yhat))
+
+    @patch("databricks.automl_runtime.forecast.prophet.model.MultiSeriesProphetModel._predict_impl")


nit: indentation of @patch line should be aligned with def test... below?

it is aligned on my side. Or python will complain?

apeforest

Thanks for the changes.

Lanz-db added 2 commits April 24, 2025 23:05

init

611667d

change name

5131667

apeforest reviewed Apr 26, 2025

View reviewed changes

runtime/databricks/automl_runtime/forecast/prophet/model.py Outdated Show resolved Hide resolved

apeforest requested changes Apr 26, 2025

View reviewed changes

Lanz-db added 3 commits April 28, 2025 16:07

fix

1a7189a

fix

16e0f75

fix

78608ac

Lanz-db changed the title ~~[ML-52574] Add a new predict function for generating prediction table with covariates~~ [ML-52574] Add covariate support in predict_timeseries for prediction table usage Apr 28, 2025

Lanz-db requested a review from apeforest April 28, 2025 23:51