-
Notifications
You must be signed in to change notification settings - Fork 264
[ENH] add unequal-length time series support for tsfresh-based methods #3187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ENH] add unequal-length time series support for tsfresh-based methods #3187
Conversation
Thank you for contributing to
|
|
Hey @jsquaredosquared I pulled this PR on a local branch to try and identify the issues with it currently, The work seems well implemented. |
Thanks for the offer :D P.S., I forgot to add you as a co-author to the original commit 🥲. Is there another way I can credit you? |
|
Theres no need for credit , this entire PR has only been your code |
|
All right, but thanks again for your help 😄 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 👍🏼 . Could you update the tests for the different tsfresh estimators to also include variable-length inputs? Please also run the tests with tsfresh installed locally.
|
Looks good, thanks. Good to add a unequal length test for the transformer at least I agree. Could you see what other estimators use tsfresh i.e. FreshPRINCE in classification. May just be that. It would be good to check this has not altered the output. Could you compare this against the old version for equal length data and make sure the features are the same for a couple of datasets. Post the code and output here. Alternatively if there is a test that already compares against output can link here. |
ecdb1c9 to
4401568
Compare
I think I have updated all classifiers, clusterers, and regressors that use tsfresh. Please let me know if I have missed one.
import numpy as np
import pandas as pd
from aeon.testing.data_generation import (
make_example_3d_numpy,
make_example_3d_numpy_list,
)
# Old function
def _from_3d_numpy_to_long(arr):
# Converting the numpy array to a long format DataFrame
n_cases, n_channels, n_timepoints = arr.shape
# Creating a DataFrame from the numpy array with multi-level index
df = pd.DataFrame(arr.reshape(n_cases * n_channels, n_timepoints))
df["case_index"] = np.repeat(np.arange(n_cases), n_channels)
df["dimension"] = np.tile(np.arange(n_channels), n_cases)
df = df.melt(
id_vars=["case_index", "dimension"], var_name="time_index", value_name="value"
)
# Adjusting the column order and renaming columns
df = df[["case_index", "time_index", "dimension", "value"]]
df = df.rename(columns={"case_index": "index", "dimension": "column"})
df["column"] = "dim_" + df["column"].astype(str)
return df
# New function
def _from_collection_to_long(collection):
n_cases = len(collection)
n_channels = collection[0].shape[0]
n_timepoints = np.array([arr.shape[1] for arr in collection])
index = np.repeat(np.arange(n_cases), n_channels * n_timepoints)
timepoints = [np.arange(timepoints_i) for timepoints_i in n_timepoints]
time_index = np.concatenate([np.tile(arr, n_channels) for arr in timepoints])
column = np.concatenate(
[
np.repeat(np.arange(n_channels), timepoints_i)
for timepoints_i in n_timepoints
]
)
value = np.concatenate([arr.flatten() for arr in collection])
df = pd.DataFrame(
{"index": index, "time_index": time_index, "column": column, "value": value}
)
df["column"] = "dim_" + df["column"].astype(str)
return df
X, y = make_example_3d_numpy() # or replace with any dataset.
# Old function returns things in a different order, so need to rearrange to allow direct comparison.
# This does not change the actual values.
Xt_old = (
_from_3d_numpy_to_long(X)
.sort_values(
by=[
"column",
"index",
"time_index",
]
)
.reset_index(drop=True)
)
Xt_new = _from_collection_to_long(X)
(Xt_old == Xt_new).all(axis=None) |
Reference Issues/PRs
Closes #3179
What does this implement/fix? Explain your changes.
The TSFresh transformer and time series methods that use it do not support unequal-length time series, but according to the tsfresh FAQ it should be capable of doing so.
This pull request replaces the
_from_3d_numpy_to_longfunction with a_from_collection_to_longfunction capable of handling both collection types (3D numpy arrays and lists of 2D numpy arrays).Does your contribution introduce a new dependency? If yes, which one?
No new dependencies.
Any other comments?
PR checklist
For all contributions