30
C U S T N- M F MIDAS W @ ECML-PKDD 2016 Andrea Pazienza, Sabrina Francesca Peegrino, Stefano Ferii, Foriana Esposito // - Riva del Garda, Italy

Clustering Underlying Stock Trends via NMF

Embed Size (px)

Citation preview

Page 1: Clustering Underlying Stock Trends via NMF

CLUSTERING UNDERLYING STOCKTRENDS VIA NON-NEGATIVE MATRIXFACTORIZATIONMIDAS WORKSHOP @ ECML-PKDD 2016

Andrea Pazienza, Sabrina Francesca Pellegrino,

Stefano Ferilli, Floriana Esposito

19/09/2016 - Riva del Garda, Italy

Page 2: Clustering Underlying Stock Trends via NMF

Overview

1. Introduction

2. Clustering with NMFs

3. Experiments on Financial Data

4. Conclusions

Page 3: Clustering Underlying Stock Trends via NMF

INTRODUCTION

Page 4: Clustering Underlying Stock Trends via NMF

Introduction

In Market Trading, the trader needs to predict future stock pricesto determine a self-financing trading strategy that maximizesthe portfolio return.

Problem: creating and managing successful portfolios offinancial assets is a difficult practice.

Solution: Portfolio Diversification to attempt to minimize therisk for a given amount of return.

This problem can be seen as a clustering process:

# group data (e.g., stocks) into subgroups of similar behavior(e.g., the same market trend).

Page 5: Clustering Underlying Stock Trends via NMF

Motivation

With K-Means it is not possible to establish the effectivenessand coherence of the clusters when dealing with stock data1:

# it tends to find spherical clusters: centroid-based clusteringdoes not handle the noise;

# need to introduce weighted Euclidean distance instead ofstandard Euclidean distance to re-evaluate centroid-basedclusters.

Proposal: Non-negative Matrix Factorization (NMF) to clusterunderlying stock trends.

1F. Cai, N. Le-Khac, and M. Kechadi. Clustering Approaches for FinancialData Analysis: A Survey, Proceedings of DMIN 2012. pp. 1-7. 2012.

Page 6: Clustering Underlying Stock Trends via NMF

CLUSTERING WITH NMFS

Page 7: Clustering Underlying Stock Trends via NMF

Problem formulation

Market made up of m stocks S1 , S2 , . . . , Sm stored as a rowvectorwhose entries are n daily closing prices.

Suppose there are k latent bases, W1 ,W2 , . . . ,Wk ; each W j is an-dimensional row vector, thought as a Brownian motion.

Express each stock as linear combination of these bases with anon negative real number Hi j indicating the association degreeof the i-th stock with the basis W j .

Using a matrix notation,

S+ � H+ W± ,

where S ∈ Rm×n+ , H ∈ Rm×k

+ and W ∈ Rk×n± .

Page 8: Clustering Underlying Stock Trends via NMF

NMFs

Standard definition:S � H W,

where S ∈ Rm×n+ , H ∈ Rm×k

+ and W ∈ Rk×n+ , and k ≤ m.

# Role of k: force representation for data to captureunderlying regularities in the data

# Matrices H and W are found by solving the optimizationproblem

minH≥0,W≥0

‖S − H W‖2F ,

where ‖ · ‖F is the Frobenius norm.

Page 9: Clustering Underlying Stock Trends via NMF

Convex NMF

Convex NMF (C-NMF) allows the data matrix S to have mixedsigns. It minimizes

min‖Hi ‖1�1,H≥0

‖S − S H W‖2F ,

Advantage of the convex constraint imposed on H:

# interpreting the rows of H as weighted sums of certaindata points so that rows can be interpreted as centroids.

Page 10: Clustering Underlying Stock Trends via NMF

Convex-Hull NMF

Convex-Hull NMF (CH-NMF) is a fast technique and scalesextremely well.

The task now is to solve the following optimization problem

min ‖S − S H W‖2F ,

subject to the convexity constraints

‖Hi‖1 � 1, H ≥ 0,

‖W j‖1 � 1, W ≥ 0.

This optimization problem is equivalent to projecting thesolution in the convex hull of S.

# Advantage: new opportunities for data interpretation.

Page 11: Clustering Underlying Stock Trends via NMF

EXPERIMENTS ON FINANCIAL DATA

Page 12: Clustering Underlying Stock Trends via NMF

Experiments

Data gathered:

# NASDAQ Stock Market

# 28 stocks belonging to 8 different sectors

# 10 years of closing prices (2518 working days)

Clustering methods applied:

# NMF

# C-NMF

# CH-NMF

# K-Means

Page 13: Clustering Underlying Stock Trends via NMF

Experiments

Tried different numbers of clusters:

# all methods were run for each k ∈ {3, 4, . . . , 8}.

For each k, clustering evaluated in terms of:

1. plots of reconstruction of matrix S by matrix multiplication H W

2. plots of trendmatrix W

3. analysis of colormaps for matrix H

4. Analysis of convergence iterations, Frobenius error and numberof attracted clusters for each method

5. Qualitative grouping of recurrent subgroups of stocks for eachmethod

Page 14: Clustering Underlying Stock Trends via NMF

Stock prices data trends for k � 4

Page 15: Clustering Underlying Stock Trends via NMF

Figure: Trends for NMF

Page 16: Clustering Underlying Stock Trends via NMF

Figure: Trends for C-NMF

Page 17: Clustering Underlying Stock Trends via NMF

Figure: Trends for CH-NMF

Page 18: Clustering Underlying Stock Trends via NMF

Table: Numerical results for NMF

NMFk # iter error # clusters

3 1528 33.7703 34 2355 27.1966 45 3358 21.0838 56 2523 16.9987 47 5000 14.6706 68 5000 13.5482 7

Page 19: Clustering Underlying Stock Trends via NMF

Table: Numerical results for C-NMF

C-NMFk # iter error # clusters

3 500 45.7185 24 500 42.4148 25 500 40.2502 26 500 33.5761 27 500 38.6675 28 500 32.1786 2

Page 20: Clustering Underlying Stock Trends via NMF

Table: Numerical results for CH-NMF

CH-NMFk # iter error # clusters

3 1 47.5844 24 1 43.9824 45 1 38.6585 46 1 56.4050 47 1 32.5755 58 1 46.2535 5

Page 21: Clustering Underlying Stock Trends via NMF

Figure: Colormap for NMF

Page 22: Clustering Underlying Stock Trends via NMF

Figure: Colormap for C-NMF

Page 23: Clustering Underlying Stock Trends via NMF

Figure: Colormap for CH-NMF

Page 24: Clustering Underlying Stock Trends via NMF

Table: NMF Clusters

k � 3 k � 4 k � 5 k � 6 k � 7 k � 83 4 5 6 3 5 8 10 12 14 15 3 4 5 6 3 4 5 6 11 15 17 187 8 9 10 11 15 16 16 17 18 7 8 10 11 7 8 14 16 20 21 22 23

11 14 15 16 19 20 21 19 20 23 13 14 15 16 19 20 24 24 25 26 2817 19 20 21 22 23 24 24 25 26 17 18 19 2022 23 24 25 25 26 28 21 22 24 25

26 28 2812 13 18 6 7 12 13 4 5 8 10 23 26 12 15 23 2 4 5 8

14 17 18 21 22 28 26 9 101 2 27 27 7 11 1 2 9 27 21 28 27

1 2 4 9 3 6 13 12 10 13 17 1618 22 25

1 2 9 27 27 3 13 141 2 9 11 1 6 7 12

19

Page 25: Clustering Underlying Stock Trends via NMF

Table: C-NMF Clusters

k � 3 k � 4 k � 5 k � 6 k � 7 k � 81 2 12 1 2 9 12 1 2 9 12 1 2 27 1 2 6 9 1 2 9 12

27 27 27 12 23 27 273 4 5 6 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 3 4 5 7 8 3 4 5 6 77 8 9 10 8 10 11 8 10 11 8 9 10 11 10 11 13 8 10 1111 13 14 13 14 15 13 14 15 12 13 14 15 14 15 16 13 14 1515 16 17 16 17 18 16 17 18 16 17 18 19 17 18 19 16 17 1818 19 20 19 20 21 19 20 21 20 21 22 23 20 21 22 19 20 2121 22 23 22 23 24 22 23 24 24 25 26 28 24 25 26 22 23 2424 25 26 25 26 28 25 26 28 28 25 26 28

28

Page 26: Clustering Underlying Stock Trends via NMF

Table: CH-NMF Clusters

k � 3 k � 4 k � 5 k � 6 k � 7 k � 81 2 9 12 27 1 2 9 27 1 2 9 27 1 2 9 12 27 1 2 9 2 273 4 5 6 7 3 4 5 7 8 3 5 7 8 3 4 5 7 8 3 4 6 7 4 8 108 9 10 11 10 11 14 10 11 18 11 14 15 16 13 14 17 11 13 1513 14 15 15 16 17 21 22 24 17 20 22 23 20 23 24 16 17 1816 17 18 18 19 20 25 24 25 26 28 25 26 20 21 2219 20 21 21 22 23 23 24 2522 23 24 24 25 26 26 2825 26 28 28

6 13 4 6 12 14 6 13 5 8 10 11 1 5 9 1215 16 17 15 16 18 14 1919 20 23 19 21 22

26 28 2812 13 10 18 19 21 12 6 7

27 3

Page 27: Clustering Underlying Stock Trends via NMF

Table: K-Means Clusters

k � 3 k � 4 k � 5 k � 6 k � 7 k � 83 4 5 6 7 8 3 4 5 6 7 3 4 7 8 3 4 7 8 3 4 7 8 5 6 15 1810 11 13 15 8 10 11 13 10 16 17 10 11 10 11 20 23 2416 18 20 23 15 16 18 20 19 21 25 16 25 16 25 26 2824 25 26 28 23 24 26 28 26

1 9 12 14 1 9 12 14 5 6 11 13 5 6 13 5 6 15 9 14 1717 19 21 22 17 19 21 22 15 18 20 23 15 18 20 18 20 23 19 21 22

25 24 26 28 23 24 28 24 26 282 27 2 1 9 12 9 14 17 9 14 17 4 7 8 10

14 22 19 21 22 19 21 22 11 16 2527 27 27 1 12 27

2 2 27 21 12 2 1 12

13 313

Page 28: Clustering Underlying Stock Trends via NMF

CONCLUSIONS

Page 29: Clustering Underlying Stock Trends via NMF

Conclusions

Portfolio diversification is the financial process of allocatingcapital in a way that reduces the exposure to risk by investing ina variety of assets (i.e., stocks).

This equals to clustering stocks having similar trend.

K-Means is not effective on this task. Hence, we applied NMF.

Adding convexity constraints in the transformation improvesthe exploitation of similar stock trends.

In particular

# CH-NMF is a very fast and scalable convex NMF techniquethat compares favorably for large data sets, both in terms ofspeed and reconstruction quality

Page 30: Clustering Underlying Stock Trends via NMF

Conclusions

Extensive experimental evaluation on real world NASDAQstock data show that, compared to K-Means, NMF techniques:

# better point out the clustering properties,

# yield very low error in Frobenius norm,

# high efficiency in terms of convergence time.

Future works:

# use more datasets from different markets

# investigate further decomposition techniques to improvethe effectiveness of clustering stock data

# impose other penalty constraints in order to achieve abetter portfolio diversification strategy