• bra-ket 6 days ago

    related: Matrix Profiles for time series https://www.cs.ucr.edu/~eamonn/MatrixProfile.html

  • uoaei 6 days ago

    See Stumpy for a handy library to get this working quickly (written in Python): https://github.com/TDAmeritrade/stumpy

  • seanlaw 5 days ago

    Hi all, I am the creator of STUMPY and wanted to thank you for your interest. Please feel free to post questions on our Github issues and we'll try to assist where we can.

  • ashsriv 5 days ago

    I am still a little confused about the real world application of MatrixProfile. It looks really good but once an MP is made then what ?

    Can this be automated to say for example - Based on your window, here are all the anomalies.

  • amai 5 days ago

    Don’t forget: „Clustering of Time Series Subsequences is Meaningless“ : https://www.cs.ucr.edu/~eamonn/meaningless.pdf

  • Topolomancer 5 days ago

    But this is not about clustering. It's about figuring out to what extent a certain subclass of features, namely the 'shapelets', are statistically significantly associated with a pre-defined binary outcome.

    The paper you mentioned is interesting, though, because it shows an issue that many algorithms are privy to: if the number of samples/features gets too large, at some point, you are only comparing _means_.

    (We are working on a paper to show the issues of this when it comes to time series classification.)

  • graycat 6 days ago

    Their math in their description of their data is in error: They need to state that the T_i (T with a subscript i), for i = 0, 1, 2, ..., n are distinct.

    More standard would be a function d: {0, 1, ..., n} --> R^{1 x m} x {0, 1}.

  • Topolomancer 6 days ago

    Seems to be standard terminology for time series classification to me, to be honest. I think the approach would also work if there are duplicates in the data. Although the estimate would be overly optimistic, right?

  • graycat 6 days ago

    With their notation they have not specified that the T's are unique. So, a first fix up would be just to state that the T's were distinct. And it would help to be explicit that i from 0, 1, 2, ... corresponded to increasing time. Moreover, is the data equally spaced in time? Likely, yes, and in that case, clearly say so.

  • jmmcd 5 days ago

    No, i indexes the patient, not time. (T_0, y_0) is one patients entire time series.

  • module0000 5 days ago

    This sure reads and looks like technical analysis indicators for time series data.

    It's useful though - example: 5 day MA of disk errors rises over the 15 day == likely failure