Robust Rolling K-Means (R2K-Means): an Updateable Nonlinear K-Means Clustering Methodology for Financial Time Series

K-Means is a popular clustering algorithm designed to group data points into k clusters. In the financial industry, grouping funds or assets can isolate behaviors and define investment universes using any number of  performance measures, holdings, or alternative features. Standard K-Means clustering at each time increment creates extremely unstable results due to the effects of random initialization and cluster mislabeling. Robust Rolling K-Means (R2K-Means) is the extension of K-Means to time series allowing investors to dynamically track and group funds in a stable and updateable framework. 

Since a learning-based model is only as powerful as the data it trains on, the more stable results of the R2 K-Means (versus the Standard K-Means) make it a better candidate for usage across AI-based applications.

Refer to our technical paper1 – Hirsa, Klinkert, Malhotra, Holmes (2024) for the financial engineering and implementation details.

In the following animations, we demonstrate the methodology on mutual fund data-sets and illustrate its advantages in terms of applicability, and explainability.

Section: Nonlinear Classification

R2K-Means uses a rolling window allowing for clusterings to form nonlinear decision boundaries to better group the natural shapes of the data. The animation below depicts the nonlinear decision boundaries of R2K-Means on 180 small cap mutual funds over time.

Section: Centroid Stability

The two animations below illustrate the stability differences between naive K-Mean and R2K-Means. Notice how the centroid in naive K-Means frequently changes locations. This is primarily attributed to random initialization and cluster mislabeling, two effects which have no relationship with the data itself.  The results of R2K-Means are much more stable and purely model the changes in the dataset. This is one of the benefits of using R2K-Means; if the data itself does not change, then the clustering results will also not change.

  1. Hirsa, Ali, Holmes, Ryan, Klinkert, Federico, and Malhotra, Satyan. “Robust Rolling K-Means (R2K-Means): an Updateable Nonlinear K-Means Clustering Methodology for Financial Time Series”. Working paper. Shortly to be available at SSRN (2024) ↩︎
Back to list

Leave a Reply

Your email address will not be published. Required fields are marked *