Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Window Finder #1062

Open
seanlaw opened this issue Jan 17, 2025 · 2 comments
Open

Multi-Window Finder #1062

seanlaw opened this issue Jan 17, 2025 · 2 comments
Labels

Comments

@seanlaw
Copy link
Contributor

seanlaw commented Jan 17, 2025

It was brought to my attention that Keogh's lab wrote a nice short paper about identifying the "right" window sizes (beyond Pan Matrix Profiles) and I think it should be pretty straightforward to implement:

Paper
Code (Jupyter Notebook) and Data Sets

It would be great to see a notebook reproducer of this

@seanlaw seanlaw added enhancement New feature or request notebook reproducer labels Jan 17, 2025
@NimaSarajpoor
Copy link
Collaborator

A few things got my attention after taking a look at the paper/code. Going to share it to just highlight it for future readers:

(1) The paper proposes an algorithm which, at its core, uses a function that takes a time series T and window size m as inputs, and returns a real value as output. Although the authors use the term dist (distance) for the returned value, it is better to use another term as the returned value can be negative.

(2) The paper seems to not mention z-normalization. So, one might be curious to explore if the proposed algorithm still works when the subsequences are substantially different regarding their average but are similar after z-normalization (e.g. subsequences {0.1, 0.2, 0.3} and {100, 200, 300})

(3) Algorithm 1-lines 3-4 shows the following pseudo-code:

MA=moving-avg(T, w) //Algroithm 2
moving-dist ← 𝑆𝑢𝑚(𝐿𝑜𝑔(𝑎𝑏𝑠 (MA−𝑚𝑒𝑎𝑛(MA)))

However, the code shows the following line:

np.log(abs(moving_avg - (moving_avg).mean()).sum())

Note that sum and log are swapped. Maybe that's just a typo in placing parentheses. Or, there might be a certain reason behind such change. IMO, the paper's version makes sense as it probably tries to affect the extremely small or extremely large value in 𝑎𝑏𝑠 (MA−𝑚𝑒𝑎𝑛(MA)). The code's version however just takes a log of a positive value and this does not affect the final outcome AFAIU.

(4) Algorithm 1-lines 8-11 shows:

for i in local-min do
    𝑟𝑒𝑠 ← 𝑤𝑠 [𝑖]/(𝑖 +1)
end for
𝑤 = 𝑚𝑒𝑎𝑛(res)

The code shows:

for i in range(3):
    reswin.append(window_sizes[b[i]]/ (i+1))
reswin = np.array(reswin)
winTime = 0.8 * reswin[0] + 0.15 * reswin[1] + 0.05 * reswin[2]

why3 and 0.8, 0.15, 0.05?

@seanlaw
Copy link
Contributor Author

seanlaw commented Jan 21, 2025

Note that sum and log are swapped.

I noticed this too and this inconsistency scares me. I think we really need to take special care when trying this out and really understand/test everything before adding it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants