CMPE 544 Finding Underlying Trends and Clustering of Time ...gencersumbul.bilkent.edu.tr/pdf/time_series_clustering.pdf · mixture data via Non-negative matrix factorization (NMF),

CMPE 544

Finding Underlying Trends and Clustering of

Time Series with Non-negative Matrix

Factorization and K-Means

Gencer Sumbul

28/04/2016

Abstract

To find a solution of revealing underlying trends of time series andclustering resulting encodings, one of widely used matrix factorizationmethods, Non-negative Matrix Factorization (NMF), and clustering withk-Means algorithm are applied into mixture data, the daily closing pricesof Dow Jones Industrial Average components, in the project.

1 Introduction

In order to reveal similarities and differences in time-series, finding underlyinghidden trends, can be thought as all trends which are explaining a whole time-series set, is very important. In the project, with the help of decompose amixture data via Non-negative matrix factorization (NMF), time series into itsconstitute parts, the underlying trends extraction is tried to achieve. AfterNMF reveals underlying trends (bases), resulting encodings become preparedfor clustering with k-Means algorithm. Finally, time series would be clusteredtogether into performance-based groupings.

For the time series, the daily closing prices of the Dow Jones IndustrialAverage stock components are used as data-set (figure 1). It contains 501 dailyclosing prices of 2 years (2006, 2007) of 30 stocks which is accessable withQuantSoftware ToolKit package in Pyhton.

1

Figure 1: Daily closing prices of 2006 and 2007 Dow Jones Industrial Average30 stocks component

2 Non-negative Matrix Factorization (NMF)

When NMF was introduced by Lee and Seung [lee1999learning] for parts-based representation algorithm, it forms approximate decompositons of the form

Vi,u ≈ (WH)i,u =r∑

a=1Wi,aHa,u

given nonnegative matrix V = {vi,u} and positive matrices W = {wi,a}, H ={ha,u}.

Considering positivity constraints of W and H, the algorithms tries to finda solution of the minimisation problem:

(W,H)∗ = argminW,H>0

DF (V ||WH)

where DF is Frobenius norm error function:

DF (V ||WH) =1

2

∑i,u

|vi,u − [WH]i,u|2

Since each columns of V represents one time-series, each columns of Wrepresents one basis (underlying-trend) and each column of H represents oneencoding which will be clustered.

2.1 Factors Initialization

For the implementation of NMF, matrices W and H must be initialized firstly.Three ways of initialization are used:

• Each elements are randomly selected from a range [0.0, 1.0)

2

• Random Acol method which initiates each column of the matrix W byaveraging p random columns of V [LangvilleMACD14]

• Selecting mininum error result from multiple randomly initiated runs

After NMF implementation with diffrent initialization, Frobenius Norm val-ues states that multiple run way gives the best result as it is seen in figure2.

Figure 2: Frobenius norm error results of different initializations with numberof basis equals 6

2.2 Number of Bases

In NMF algorithm, number of bases are assigned first and it is implemented.After choosing different values for bases, 8 seems as better than lower valuesand higher values than 8 doen’t make change too much (Figure 3).

Figure 3: Error results of different number of bases selection

3

2.3 Normalization

Because matrix V consists of stock prices and two stocks have very large closingprices with respect to others (Figure 1), when NMF is applied to data, resultingbases are very dependent to these stocks and other ones cannot affect the resultsas it can be seen from heat maps of matrices V , W ,H in figure 4.

However, when daily closing prices are normalized with respect to the for-mula of S∗

t = log(10 ∗ St/S1) since St is closing prices at time t and S∗t is

normalized value at time t, closing price dependency is removed as it can beseen in figure 5 and new data are illustrated in figure 6.

Figure 4: Resulting matrices of NMF after normalization process when numberof basis equals 2

Figure 5: Resulting data after normalization

4

2.4 Error Metrics

In addition to Frobenius Norm, Kullback-Leibler divergence and Itakura-Saitodivergence [tjoa2010icassp] can be also used for error functions. Since Frobe-nius Norm was specified, Kullback-Leibler divergence and Itakura-Saito diver-gence formulas are:

DKL(V ||WH) =∑i,u

(vi,u logvi,u

[WH]i,u+ [WH]i,u − vi,u)

DIS(V ||WH) =∑i,u

(vi,u

[WH]i,u− log

vi,u[WH]i,u

− 1)

After each implementation with different error metrics, selecting one of themdoesn’t change the result too much. However, NMF with Itakura-Saito diver-gence gives more smooth bases than others (Table 1 and Table 2) so that it isused as cost function.

5

Table 1: Matrix results of different error functions (Top-to-bottom: Euclidean,Itakura-Saito, Kullback-Leibler) NMF when number of basis equals 2

6

Table 2: Basis results of different error functions (Top-to-bottom: Euclidean,Itakura-Saito, Kullback-Leibler) when number of basis equals 2

7

3 K-Means Clustering

After applying NMF algorithm to the data, K-Means clustering is used to clus-ters the columns of matrix H into performance-based groups. These type ofclustering can be preferred instead of traditional sector-based groups of stocks.

3.1 Principal Component Analysis (PCA) for Choosingthe Number Of Clusters

For the implementation of K-Means clustering, the most crucial point is deter-mining the number of clusters. Instead of intuitively selecting number of clus-ters, plotting the data in two dimensions with PCA [alpaydin2014introduction]can carry us to better results. Thus, selecting number of clusters as 5 seem asreasonable after plotting the data in two dimensions (Figure 7).

Figure 6: PCA results of matrix H when number of bases equal 28 and one ofthe possible grouping

3.2 Cluster Initialization

After number of clusters are idenfied, how centroids of clusters are initializedbefore the algorithm runs is a other decisive point. For centroid initializations,following ways can be tried:

• Generating k centroids from a Gaussian distribution with mean and vari-ance estimated from the data

• Choosing k observations from the data at random

As it can be seen from the cluster results of different cluster initialization infigure bilmem ne ve bilmem ne, generating centroids from a Gaussian at firstgives better solution (Table 3 and Table 4).

8

Table 3: Cluster assignments and for each cluster elements corresponding un-normalized stocks when number of basis equals 8 and initially centroids aregenerated from Gaussian

9

Table 4: Cluster assignments and for each cluster elements corresponding un-normalized stocks when number of basis equals 8 and initially centroids aregenerated from data randomly

4 Results

• There are 30 stocks and for each stocks closing prices of 501 days is con-sidered. Each value St is normalized with respect to S∗

t = log(10 ∗ St/S1)

• Itakura-Saito divergence is used as error function

• Number of basis is selected as 8

• Matrix factors initiated firstly with respect to selecting mininum errorresult from 10 times randomly initiated runs

• Number of clusters is selected as 5

10

• Cluster centroids are initially generated from Gaussian distribution whosemean and variance estimated from the data

• K-Means algorithm is run 10 times

1 2 3 4 5

14,15,17,22,23,24 0,3,11,13,19,21,26,27,29 1,2,4,5,7,8,10,16,28 6,9,12,25 18,20

1,4,10 5,8,15,17,23,24,28 2,3,6,7,9,13,16,25,27 12,14,22 0,11,18,19,20,21,26,29

14,15,17,22,24 11,18,20,21,27,29 1,2,4,6,7,10,12,13,16,25 5,8,23,28 0,3,9,19,26

1,8,12,14,15,17,22,23,24,28 0,2,3,4,7,9,16,19 5,10 6,13,25 11,18,20,21,26,27,29

11,13,18,20,21,27,29 14,22 1,2,4,6,10,25 8,12,15,17,23,24,28 0,3,5,7,9,16,19,26

11,18,20,21,27,29 1,4,10 6,8,12,13,14,25 0,2,3,5,7,9,16,19,26 15,17,22,23,24,28

1,5,8,10,15,23,28 0,11,18,19,20,21,26,27,29 14,17,22,24 2,3,4,7,9,16 6,12,13,25

9 1,2,4,6,7,10,16 5,8,15,17,22,23,24,28 0,3,11,13,18,19,20,21,25,26,27,29 12,14

0,3,7,9,16,19,20,26 1,2,4,5,6,10 12,14 8,15,17,22,23,24,28 11,13,18,21,25,27,29

14,22,24 9 5,8,12,15,17,23,28 0,11,18,19,20,21,26,27,29 1,2,3,4,6,7,10,13,16,25

Table 5: Cluster results of diffrent k-Means run

11

Documents

CMPE 544 Finding Underlying Trends and Clustering of Time ...gencersumbul.bilkent.edu.tr/pdf/time_series_clustering.pdf · mixture data via Non-negative matrix factorization (NMF),