Stock Price Pattern Prediction Based on Complex Network ...downloads.hindawi.com/journals/complexity/2019/4132485.pdf · Complexity irdly, K data pointswiththesmallest distancefrom

Research ArticleStock Price Pattern Prediction Based on Complex Network andMachine Learning

Hongduo Cao Tiantian Lin Ying Li and Hanyu Zhang

Business School Sun Yat-sen University Guangzhou 510275 China

Correspondence should be addressed to Hongduo Cao caohdmailsysueducn and Ying Li mnsliymailsysueducn

Received 7 March 2019 Accepted 14 May 2019 Published 28 May 2019

Guest Editor Benjamin M Tabak

Copyright copy 2019 Hongduo Cao et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Complex networks in stock market and stock price volatility pattern prediction are the important issues in stock price researchPrevious studies have used historical information regarding a single stock to predict the future trend of the stockrsquos price seldomconsidering comovement among stocks in the same market In this study in order to extract the information about relation stocksfor prediction we try to combine the complex network method with machine learning to predict stock price patterns Firstly wepropose a new pattern network construction method for multivariate stock time series The price volatility combination patternsof the Standard amp Poorrsquos 500 Index (SampP 500) the NASDAQ Composite Index (NASDAQ) and the Dow Jones Industrial Average(DJIA) are transformed into directed weighted networks It is found that network topology characteristics such as average degreecentrality average strength average shortest path length and closeness centrality can identify periods of sharp fluctuations in thestock market Next the topology characteristic variables for each combination symbolic pattern are used as the input variables forK-nearest neighbors (KNN) and support vector machine (SVM) algorithms to predict the next-day volatility patterns of a singlestock The results show that the optimal models corresponding to the two algorithms can be found through cross-validation andsearch methods respectively The prediction accuracy rates for the three indexes in relation to the testing data set are greater than70 In general the prediction ability of SVM algorithms is better than that of KNN algorithms

1 Introduction

Stock price volatility patterns classification and predictionis a very important problem in stock market researchThe prediction of stock price trends is actually a classifiedprediction of stock price fluctuation patterns [1] Literatureshowed that forecasting stock price patterns is sufficientto generate profitable trades and enable the execution ofprofitable trading strategies [2] Therefore many studieshave focused on predicting stock price patterns rather thanpredicting the absolute prices of stocks [2ndash4]

To date most studies have focused on the volatility pat-terns of a single stock based on its own historical attributes [56] and have paid less attention to the comovement of relatedstocks and information pertaining to the overall market Afew studies have used historical information regarding relatedstocks as the input variables for prediction and shown thatthe price fluctuations in a single stock are not isolated and areoften influenced by the trends ofmultiple related stocks [7 8]

Thus how to extract the comovement of multiple stocks andapply this information to the prediction of the fluctuationpatterns of a single stock is a problem worth studying

Complex network analysis provides a new explanationfor stock market behavior from a systematic perspectiveUsing complex network theory to study stock prices notonly allows us to analyze the relationship between differentstocks but also allows us to explore the macroaspects of thecomovement characteristics of themarket in different periods[9ndash11] Previous studies have proposed a variety ofmethods tobuild complex networks using the time series of stock pricesincluding visibility graphs [12ndash14] recurrence networks [15ndash17] correlation networks [11 18 19] pattern networks [1020] and K-neighbors networks [21 22] Of all the net-work construction methods the symbolic pattern networkis favored by many scholars because it can more accuratelyreflect the degree of correlation and direction of the primitiveelements in a complex system [10 20 23 24] In a stock pricevolatility pattern network each volatility pattern is regarded

HindawiComplexityVolume 2019 Article ID 4132485 12 pageshttpsdoiorg10115520194132485

2 Complexity

as a network node and the relationship between patterns isregarded as a connection between nodes [10] By analyzingthe topological properties of the network the characteristicsof stock price fluctuations can be better understood Huang etal used coarse-grained symbolization methods to constructa network of market prices and transaction volume datain different periods based on the Shanghai Stock Exchange(SSE) composite index and the results showed that the out-degree distribution of network nodes obeyed the power lawand the basic fluctuations exhibited different patterns duringdifferent periods [24] Wang et al converted the yields ofgasoline and crude oil stocks into five patterns and studiedthe characteristics of crude oil and gasoline node networks indifferent periods using sliding windows and then accuratelypredicted the crude oil and gasoline stock price pattern basedon the conversion characteristics of the price network [10 20]

However most of the existing studies on stock pricevolatility pattern networks have focused on univariate timeseries On this basis we propose a new network constructionmethod to build the volatility pattern networks of the threemost important indexes in the US stock market namelythe Standard amp Poorrsquos 500 Index (SampP 500) the NASDAQComposite Index (NASDAQ) and the Dow Jones IndustrialAverage (DJIA) Firstly the combination symbolic patternsfor the three stock indexes are derived using a coarse-grainedmethodThen the combination symbolic patterns are used asthe nodes of the network and the frequencies and directionsof the conversion of the patterns are used as the weights anddirections of the network connections Finally we constructdirected and weighted networks for the US stock market Byanalyzing the network topology properties we can identifyperiods of sharp fluctuations in the market

Meanwhile many machine learning algorithms havebeen applied to stock price volatility classification and pre-diction such as neural networks [25] random forests [26]decision trees [27] support vector machines (SVM) [3 7]and K-nearest neighbors (KNN) [1 28] Among them K-nearest neighbors (KNN) and support vectormachine (SVM)algorithms have been widely used in pattern recognitionand forecastingmachine learning information retrieval anddata mining KNN is a simple and effective classificationmethod that is easy to calculate and its performance iscomparable to the most advanced classification methods [2930] SVM which canmap nonlinear separable data into high-dimensional space and use hyperplanes for classificationis highly suitable for small sample classification because ofits excellent classification ability [26] Both KNN and SVMalgorithms have a mature theoretical basis in relation toclassification prediction Ballings et al also compared theaccuracy of SVM KNN and other algorithms in predictingstock pricemovements one year ahead for 5767 publicly listedEuropean companies and the results showed that SVM hasthe better prediction ability than KNN [2] Teixeira proposedan automatic stock trading method that combined technicalanalysis with KNN classification Using 15 stocks from SaoPaulo Stock Exchange (Bovespa) they found that the pro-posed method generated considerably higher profits than thebuy-and-hold method for most of the companies with fewbuy actions generated [1] Huang et al used SVM algorithms

to predict the weekly fluctuations in the Nikkei 225 indexand found that SVM outperformed the other classificationmethods such as quadratic discriminant analysis and Elmanbackpropagation neural networks [3]

Literature has demonstrated the ability of SVM andKNN to predict stock patterns However they predicted thestock price based on the information of the single stockitself without considering the information of the networksystem composed of the relevant stocks Therefore anotheraim of this study is to predict the next-day pattern of asingle stock for each combination mode of stocks usingthe network topology properties as input variables for SVMand KNN algorithms To the best of our knowledge thisshould be the first attempt in existing research Then wecompare the prediction accuracy using the testing data setafter identifying the best models using the training set Thestock price volatility pattern network includes price infor-mation for single stocks and related stocks and portrays themacronature of themarket which containsmore informationthan is available using only historical information relatingto single stocks The results show that the pattern networkcan provide some information to enable us to forecast theprice volatility patterns of single stocks Of the two predictionmethods the optimal parameter search strategy combinedwith cross-validation and search methods enables us to findthe models that perform well on the testing data set Overallthe performance of SVM algorithms is better than that ofKNN algorithms Combining with complex network andmachine learning can provide investors with information onprofitability strategies

The remainder of this paper is organized as follows Inthe next section we introduce the theoretical background forKNN and SVM algorithms In Section 3 the methodologyof constructing the network and of predicting the next-day patterns for each stock index is presented In Section 4we show the empirical results and compare the predictionaccuracy for KNN and SVM The last section is devoted toa summary

2 Theoretical Background for KNN and SVM

21 KNN K-nearest neighbor (KNN) algorithm is a non-parametric classification algorithm that assigns query data tobe classified to the category to which most of its119870 neighborsbelong [31] We use the Euclidean distance metric to find K-nearest neighbors from a sample set of known classificationsSuppose that the known data set has four feature variables1198911 1198912 1198913 1198914 and four categories 1199101 1199102 1199103 1199104 The stepsto search the category of the new data 119894 through the KNNalgorithm are as follows

Firstly the Euclidean distance of the feature variables ofthe data 119894 and the other data 119895 (119895 = 1 2 n) in the trainingdata set is calculated

119863119895 = radic 4sum119892=1

(119891119892 (119894) minus 119891119892 (119895))2 119895 = 1 2 n (1)

Secondly all the data in the training set are sorted inascending order according to the distance from data i

Complexity 3

Thirdly K data points with the smallest distance fromdata 119894 are selected

Finally the category with the largest proportion of these119870 data points will be considered as the category of data iAn important parameter to be determined in the KNN

algorithm is K which represents the number of the nearestneighbors to be considered when classifying unknown sam-ples [1 2]

22 SVM SVMwas introduced by Vapnik [32] and has beenwidely used in pattern prediction in recent years The basicidea of SVM is to nonlinearly transform the input vector intoa high-dimensional feature space and then search the optimallinear classification surface in this feature space to maximizethe distance between the classification plane and the nearestpoint The training samples closest to the classification planeare called support vectors SVM algorithm can be brieflydescribed as follows

Consider the binary linear classification problem of train-ing data set (119909119894 119910119894) (i = 1 2 n) 119909119894 isin 119877119899 119910119894 isin plusmn1where 119909119894 is a feature vector and 119910119894 is a class label Supposethese two classes can be separated by a linear hyperplane(w sdot x) + b = 0 In order to make the correct classificationand get the largest classification interval the optimizationproblem of constructing the optimal plane is described as

min 120593 (w) = 12 1199082 = 12 (1199081015840 sdot 119908) st 119910119894 [(w sdot x119894) + b] ge 1 (2)

The optimal solution of w and b can be solved byintroducing the Lagrange multiplier Then we can obtain theoptimal classification problem like (3)

119891 (119909) = sgn 119908lowast sdot x + 119887lowast (3)

For a nonlinear classification problem the feature vectoris transformed into high-dimensional space vector firstlyThen the optimal classification hyperplane is constructedSuppose the transformation function is Φ then the optimalproblem can be described as

min 120593 (w) = 12 1199082 + 119862 119873sum119894=1

120585119894 = 12 (1199081015840 sdot 119908) + 119862 119873sum119894=1

120585119894st 119910119894 [(w119879 sdotΦ (119909119894)) + b] + 120585119894 ge 1

120585119894 ge 0 i = 1 2 n(4)

where 119862 is the penalty parameter which specifies thetrade-off between classification distance andmisclassification[2] Finally the optimal classification hyperplane can bedescribed in (5)

119891 (119909) = sgn 119908lowast sdotΦ (x) + 119887lowast (5)

119896 (119909119894 119909119895) = Φ119879 (119909119894) sdotΦ (119909119895) (6)

The function (6) is called a kernel function Because theperformance of the Gaussian radial basis function (RBF)

is excellent when the additional information of the data islimited it is widely used in the financial time series analysis[3] The Gaussian radial basis function (RBF) is used as thekernel function to implement the SVM algorithm in thisstudy The RBF kernel function can be expressed as

119896 (119909119894 119909119895) = exp (minus120590 10038171003817100381710038171003817119909119894 minus 119909119895100381710038171003817100381710038172) 120590 gt 0 (7)

where 120590 is the constant of the radial basis function Beforeimplementing the SVM algorithm the parameter 120590 andparameter 119862 need to be determined

For multiclassification problem it can be convertedinto multiple two-classification problems [33] In this studya four-classification problem is transferred into six two-classification problems by the ldquoone-versus-onerdquo approach ofSVM

3 Methodology

In this section we introduce the methodology for predictingstock price patterns using network topology characteristicvariables Figure 1 shows a general framework of the proposedpattern prediction system It consists of two parts complexnetwork analysis and pattern prediction usingmachine learn-ing We present a more detail procedure in the subsections

31 Constructing a Pattern Network for the Stock MarketUsing the daily closing price of each stock index a slidingwindow is used to calculate the one-day return 119903 five-dayreturn 119877 and five-day volatility V corresponding to day t

119903 = ln 119862119897119900119904119890 (119905)119862119897119900119904119890 (119905 minus 1) (8)

119877 = ln 119862119897119900119904119890 (119905)119862119897119900119904119890 (119905 minus 5) (9)

V = 119904119905119889 (1199031 1199035) lowast radic5 (10)

where 119862119897119900119904119890(119905) is the closing price on day t119862119897119900119904119890(119905 minus 1) is theprevious dayrsquos closing price and 119904119905119889(1199031 1199035) is the standarddeviation of the yield from the first to the fifth day

Then we can calculate the average five-day volatility119881 foreach stock index

119881 = 1119873 sum119881 (11)

where 119873 is the number of trading days in a time seriesSuppose we study 119872 stocks in the stock market Then wecan obtain the average volatility for the overall market asfollows

V1015840 = 1119872119872sum119894=1

119881119894 (12)

Based on the sign of the five-day return 119877 and themagnitude of the five-day volatility V each day each stockcan be classified into one of four patterns

4 Complexity

Determine stock patterns

Construct pattern networks

Extract network topologycharacteristic variables

Feature selection based onKruskalndashWallis test

Determine the optimal parametervalues for KNN and SVM models

Complex NetworkAnalysis

Pattern Prediction

Stock next-day pattern predictionbased on KNN SVM

Accuracy comparison based onKNN SVM

Figure 1 General framework of the stock index pattern prediction

S1S1S1

S2S2S2S1S4S4

S3S1S4

1

1 2

11

Figure 2 Sample directed weighted network

119891 =

1198781 119894119891 119877 ge 0 119886119899119889 119881 ge V1015840 (sharp rise)1198782 119894119891 119877 ge 0 119886119899119889 119881 lt V1015840 (stable rise)1198783 119894119891 119877 lt 0 119886119899119889 119881 lt V1015840 (stable decline)1198784 119894119891 119877 lt 0 119886119899119889 119881 ge V1015840 (sharp decline)

(13)

By combining the patterns of each stock index we canobtain the corresponding combination symbolic pattern foreach day Assuming that we study three stock indexes we canobtain a maximum of 43 = 64 combination modes Takingthe daily combination patterns as the nodes of the networkthe edges and weights of the network can be determined intime order If the pattern on day 119905 is S1S4S4 and that on dayt+1 is S2S2S2 there is a directed edge from S1S4S4 to S2S2S2with a weight of 1 If the conversion frequency from S1S4S4to S2S2S2 is 119908 the weight of the directed edges from S1S4S4to S2S2S2 is 119908 For example if the current patterns of theSampP 500 NASDAQ andDJIA are 1198781 1198783 and 1198784 respectively

the current price combination pattern is 119878111987831198784 Supposethat the pattern transformation over a certain period of timeis S1S4S4 S2S2S2 S1S1S1 S1S4S4 S3S1S4 S2S2S2 S1S1S1Then we can obtain the directed weighted network shownin Figure 2

The key to the slidingwindow selection problem is how toeffectively keep the quality and quantity of original time seriesinformation while reducing the computational complexity tothe most extent [34] In this study we apply a sliding windowwith a length of 30 days (about one month in daily life andhalf a quarter in the stockmarket) and a step of one day to thestock indexes time series So we can obtain a pattern networkevery 30 days Table 1 shows the process of using the slidingwindow

32 Computing Network Topology Characteristic VariablesNext we calculate the network topology characteristic vari-ables for every 30-day network

Complexity 5

Table 1 The process of using the sliding window

Date Stock 1 Stock 2 Stock 3 Combination Pattern

1234

hellip303132hellip

S1S2S3S1hellipS1S2S3hellip



S1S2S3S2S2S2S3S3S3S1S1S1

hellipS1S1S2S2S2S2S3S4S4

hellip

Sliding window 1

Sliding window 2

Sliding window 3

321 Network Average Degree Centrality In undirected net-works the average degree centrality of the network reflectsthe level of connection between one node and other nodes inthe network that is whether one node is connected with theother nodes or not [17] The formula is as follows

120588 = 1119873 (119873 minus 1)119873sum119894119895=1

119886119894119895 (14)

where119873 is the number of nodes in the network and 119886119894119895 is thevalue of the adjacencymatrix of an undirected network 119886119894119895 =1 if node 119894 and node 119895 are connected otherwise 119886119894119895 = 0 Theadjacency matrix of an undirected network is a symmetricmatrix However 119886119894119895 = 1 does not mean that 119886119895119894 = 1 in adirected network In the directed network we must considerthe out-degree and in-degree We connect the nodes in timeorder so that the in-degree and the out-degree are the sameexcept for the first node and the last nodeTherefore we onlyselect the in-degree for analysis and calculate the average in-degree centrality as follows

120588 = 11198732119873sum119895=1

119873sum119894=1

119886119894119895 (15)

In terms of narration in the next sections we describeaverage in-degree centrality as average degree centralityAverage degree centrality measures the ratio of the actualnumber of connections to the maximum number of con-nections that is the edge density of the network Thegreater the average degree centrality the more connectionsbetween nodes in the price pattern network the higher theaccessibility between nodes and the greater the density of theoverall network [35]

322 Average Network Strength In a network the strengthof the connection from node 119894 to node 119895 is the weight 119908119894119895 ofthe directed edge from node 119894 to node 119895 Similar to the in-degree and out-degree of the directed network the strengthof the directed weighted network can also be divided into in-strength and out-strength [10 35] In this study we describeaverage out-strength as average network strength

S = 1119873119873sum119894=1

119873sum119895=1

119908119894119895 (16)

The greater the average strength of the network the fewerthe number of network nodes the simpler the compositionof the price volatility patterns the smaller the complexity ofthe network and the higher the frequency of the same nodeThe simpler price patterns reflect the fact that the consistencyof price changes of different stocks is stronger and lasts forlonger

323 Network Average Shortest Path Length The averageshortest path length of the network describes the degree ofseparation between nodes in the network that is the size ofthe network The average shortest path length can be used tocharacterize a ldquosmall-worldrdquo network in a complex network[17] The distance from node 119894 to node 119895 is defined as theminimum number of edges needed to pass from node 119894 tonode 119895 The average shortest path length of the network is theaverage length of all the shortest paths in the network

L = 1119873 (119873 minus 1)119873sum119894=1

119873sum119895=1

119889119894119895 (119894 = 119895) (17)

The shorter the average shortest path the less intermedi-ate patterns are required for conversion between stock pricemodes Modes can be connected by fewer edges and pricemodes can interact with each other through fewer othermodes As a result the conversion efficiency and speed of theoverall network are both greater

324 Network Closeness Centrality The closeness centralityof node 119894 is the reciprocal of the average shortest path lengthfrom other nodes to node 119894 [15 36]

119862119894 = 119873 minus 1sum119873119895=1 119889119895119894 (18)

The closer a point is to other points the easier it isto transmit information Now we consider the weightedshortest path 119897119894119895 which is defined as the shortest weighteddistance from node 119894 to node 119895 Then we can obtain thecloseness centrality of the network [37]

119862 = 1119873119873sum119894=1

119899 minus 1119873 minus 1 119899 minus 1sum119899minus1119895=1 119897119894119895 (119894 = 119895) (19)

6 Complexity

Training data set Testing data set

Feature selectionKruskalndashWallis test

Variablesstandardization

Train the models

Predict next-daypatterns

Variables standardization

Divide the trainingset into subsets

Results comparisonKNN vs SVM

I Model Training II Pattern Prediction

Use the optimal models

Figure 3 The road map of the pattern prediction experiment

where 119899 is the number of nodes reachable from node119894 The greater the closeness centrality of the network thesmaller the shortest weighted distance between the networknodes the fewer the conversion times between different pricemodes and the smaller the conversion cycle Pattern nodestend to transform on their own and thus the transformationarea of the overall network is more concentrated and central-ity is more prominent [35]

33 Next-Day Stock Pattern Prediction Based on KNN andSVM We train the optimal prediction models based onKNN and SVM algorithms by the obtained network topologycharacteristic variables and then predict next-day patterns ofthree single stock indexes using the testing data set

331 Detail Prediction Procedure Figure 3 shows the detailprocess of the pattern prediction experiment for each singlestock index It includes twomajor steps the first step ismodeltraining from which we can get the best models and thesecond step is pattern prediction

First the correlation between the second dayrsquos stockpatterns and the network characteristic variables of a singlestock index in the training set is tested by the KruskalndashWallistest The network topological characteristic variables whichare significantly correlated with the price pattern of eachstock index will be the input variables for the stock indexprediction

The values of topological characteristic variables are nor-malized so that a smaller valued indicator does not be ignored

because of an indicator with larger value [26] Formula (20)is used to standardize the variables [38]

119910119894 = 119910119894 minus 119910119898119894119899119910119898119886119909 minus 119910119898119894119899 (20)

Next in order to get a subset of each combinationmode we divide the training set into several training subsetsaccording to the number of types of combination patterns (orcombination pattern nodes) in the training set Obviously ineach subset the dayrsquos combination pattern is the same butthe next-day patterns of each stock index can be differentFor each training subset the next-day patterns of a singlestock index are the classification variables and the networktopological characteristic variables are the input featurevariables for the KNN and SVM algorithms To preventoverfitting the cross-validation and search method is used todetermine the optimal parameters in this study

Finally on the testing data set the next-day patterns ofeach stock index are predicted by the obtained models

The optimal training models are found according tothe recognized combination pattern and the next-day stockpatterns are predicted on the basis of the topological char-acteristic variables of the current corresponding 30-day net-work The average market volatility and the standardizationparameters used for the testing set were obtained through thetraining set

332 Model Selection Criteria Cross-validation is widelyused for model selection because of its simplicity anduniversality so we use cross-validation method and search

Complexity 7

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

treng

th

Average Strength

Sharp Fluctuation Period Sharp Fluctuation Period

NASDAQDJIASampP 500

Average Strength

2

4

6

8

10

12

14

5000

2500

0

Figure 4 The evolution of the three stock indexes and average strength

method to determine the optimal parameters in this study[39]We cross-validated theK-parameters for KNNby tryingall values of K = 1 2 3 30 To determine the optimalparameter values for SVM we perform a grid search on C= 10minus2 10minus1 100 101 102 120590 = 10minus2 10minus1 100 101 102 toidentify the best combination

Using the k-fold cross-validation method for each com-bination of parameters the training data set is divided intoa subset with k equal parts and kndash1 parts of the data areused as the training data while the other part is used asthe verification data In this way the accuracy rates of kverification sets can be obtained after k iterations Taking theaverage accuracy rate of the k verification sets that is theverification score as the criterion for parameter selectionthe optimal combination of parameters can be found Inaddition leave-one-out (LOO) is another simple efficientand common cross-validation method When using LOOone sample is taken from the data set each time as a validationset and the other samples are used as the training set Thusfor a data set with n samples we can get a total of n differenttest sets and their corresponding training sets LOO is verysuitable for model selection of small samples because onlyone sample is extracted from the training set at a time as averification set so that fewer samples are wasted [40]

In this study we use 3-fold cross-validation if the trainingsubset contains more than 100 samples and use LOO cross-validation otherwise

4 Empirical Results and Analysis

41 Data Processing We used the closing prices of theSampP 500 NASDAQ and DJIA from 1 January 2000 to 31

December 2014 as the sample data set This resulted in3769 daily records The data were obtained from the Winddatabase one of the most authoritative financial databasein China (the Wind database can be downloaded fromhttpswwwwindcomcn) First the five-day return rateand five-day volatility of each stock index are calculated Themethod outlined in the previous section is used to symbolizethe stock index and then we obtain the combination patternsfor the three stock indexes each day

A sliding window with a length of 30 days and a stepof one day is used to divide the stock pattern time seriesinto 3740 time periods A directed weighted network isconstructed for each period of stock price patterns resultingin 3740 networks There are 47 pattern nodes in all of thenetworks

The method we construct the stock networks is originalso we used Python for coding We used some functions inthe Python 37 standard library including networkx sklearnpandas and matplotlib for our analysis

42 Analysis ofNetworkTopological Characteristics Theaver-age degree centrality average strength average shortest pathlength and closeness centrality of eachnetwork are calculatedusing formulas (15) (16) (17) and (19)The evolution of thesefour network topological characteristics is shown in Figures4ndash7

As can be seen from the figures the points whereaverage degree centrality average intensity and closenesscentrality reach their peak value and the average shortestpath length reaches its minimum value all correspond toperiods when the overall market is fluctuating wildly When

8 Complexity

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

hort

est P

ath

Leng

th

Average Shortest Path Length


NASDAQDJIASampP 500


1

2

3

4

5

5000

2500

0

Figure 5 The evolution of the three stock indexes and average shortest path length

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Clos

enes

s Cen

tral

ity

Closeness Centrality


NASDAQDJIASampP 500


02

04

06

08

10

5000

2500

0

Figure 6 The evolution of the three stock indexes and closeness centrality

the three stock indexes fell to their lowest levels in October2002 and March 2009 the average degree centrality averagestrength and closeness centrality of the network reachedtheir highest points while the shortest path length reached itslowest point These two periods correspond to the last phase

of the dot-com bubble crisis and the subprime mortgagecrisis In addition the closeness centrality and the averagedegree centrality reached their maximum points again inMarch 2012 which corresponded with another long periodof sharp fluctuations in the US stock market The results

Complexity 9

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age D

egre

e Cen

tral

ity

Average Degree Centrality

Sharp Fluctuation PeriodSharp Fluctuation Period

Sharp Fluctuation Period

NASDAQDJIASampP 500


025

050

075

100

125

150

200

175

5000

2500

0

Figure 7 The evolution of the three stock indexes and average degree centrality

show that these four network topological characteristics havea remarkable relationship to the anomaly of the indexes inthe US stock market During the sharp fluctuation periodsthe comovement of three stock indexes is stronger and the30-day networks are simpler

The maximum value of the average strength of thenetwork reflects the fact that there are relatively fewer nodesin the stock pattern network and the fluctuation modes ofstocks are monotonous It shows that in the month beforethe extreme values the price fluctuations of the three stockindexes were synchronous resulting in the relative simplicityof the volatility combination pattern Before the three indexesreached their lowest levels in 2002 and 2009 they werebasically in a state of substantial decline The correlation andconsistency of the indexes reached their maximum duringthis period and so the average strength of the networkreached itsmaximumwhich is consistent with the findings ofprevious studies on complex networks using price time series[9 41]

In addition when the stock market is in a periodof dramatic fluctuations the node types in the networkare monotonous the stock network is constantly switchingbetween several price models and the edge density is largerso the average degree centrality reaches itsmaximumDuringthis period the nodes are more compact the transformationbetweennodes is faster fewer edges need to be passed and theaverage shortest path of the network reaches its minimumAlthough other modes may emerge during this period thelarge fluctuationmode occupies themost important positionand the price patterns tend to shift between the mainpatterns so that reaching maximum closeness centrality Thisconclusion is the same as that of Wang et al [10]

43 Next-Day Stock Pattern Prediction Using KNN and SVMAlgorithms By analyzing the 30-day network topologicalcharacteristics corresponding to each trading day we findthat the extreme values of the network topological character-istics can reflect the periods of dramatic fluctuations in thesystem composed of the three stock indexes KNN and SVMalgorithms are used to predict the next-day patterns of eachstock index when the combination patterns of the three stockindexes and the corresponding 30-day network topologicalcharacteristics for the current day are known

Based on the theory of cross-validation [42] and in orderto keep the year intact and ensure the continuity of the yearswe used the closing prices of the SampP 500 NASDAQ andDJIA from 1 January 2000 to 31 December 2014 as the trainingsample data set The testing sample data set used the closingprices of the three indices from 1 January 2015 to 31 December2017 The training set and the testing set contained 3769 and755 records respectively Since the training sample set has 47pattern nodes we divided the training data into 47 trainingsubsets

431 KruskalndashWallis Tests to Filter Variables Following themethods used to select variables in existing studies we usedthe KruskalndashWallis test to filter the four network topologicalcharacteristic variables and next-day patterns of each stockindex using the training samples [43] The results are shownin Table 2

It can be seen from Table 2 that the p-values of thevariables and the next-day patterns of the various stockindexes are all less than or equal to 01 except for the next-day patterns of the DJIA where closeness centrality is notsignificant Therefore when predicting the next-day patterns

10 Complexity

Table 2 KruskalndashWallis tests of the four network topological characteristic variables and next-day patterns of each stock index

SampP500 stock index NASDAQ stock index DJIA stock index

Average strength 430578lowast lowast lowast 3402039lowast lowast lowast 298488lowast lowast lowast(0001) (0001) (0001)

Average shortest path length 93241lowastlowast 281597lowast lowast lowast 76535lowast(00253) (0001) (00537)

Average degree centrality 254071lowast lowast lowast 2509374lowast lowast lowast 130833lowast lowast lowast(0001) (0001) (00045)

Closeness centrality 84532lowastlowast 1900352lowast lowast lowast 06204(00375) (0001) (08917)

Note figures in parentheses are p-values

Table 3 Prediction accuracy of the optimal KNN and SVM models in relation to next-day patterns for the three stock indexes using thetesting set

Closeness Algorithms DJIA stock index SampP500 stock index NASDAQ stock index

Yes KNN 7483 7258 7245SVM 7497 7311 7457

No KNN 7298 7059 7086SVM 7603 7245 7364

of the DJIA the closeness centrality is removed leaving thethree other variables as input variables for the KNN andSVM algorithms When predicting the next-day patterns ofthe SampP 500 and the NASDAQ all four network topologicalcharacteristic variables are retained as input variables

432 Predicting Stock Patterns Using KNN and SVM Algo-rithms The accuracy of prediction is defined as

Accuracy rate = The number of correct predictionTotal number of sample set

times 100 (21)

We compare the predicted next-day patterns with theactual next-day patterns of the stock index If they are thesame on a given day we can say that our prediction is correctThe proportion of the number of correctly predicted samplesto the total number of samples is the accuracy rate Theaccuracy rate is close to 1 means that the models yield moreaccurate predictions whereas the accuracy rate is close to 0means that the models are less accurate

After obtaining the optimal models using KNN and SVMalgorithms in relation to the training set using the cross-validation and searchmethods themodels are used to predictpatterns using the testing set and their performance is eval-uated based on their prediction accuracy rates Table 3 showsthe prediction accuracy of the optimalmodels obtained usingKNN and SVM algorithms for the three stock indexes

From Table 3 it can be seen that KNN and SVM algo-rithms can identify appropriate models based on the trainingset using the cross-validation and search methods withprediction accuracies in relation to the testing set of greaterthan 70 However generally the prediction accuracy ofSVM algorithms is higher than that of KNN algorithmsIt is similar to the findings of previous studies on the two

algorithms that is the generalization ability of the SVMclassification model is greater than that of the KNN model[1 2] To further illustrate the predictive effect of closeness onthe three stock indexes we compare the prediction accuracyrate in the cases of closeness and no closeness We findan interesting result wherein the prediction model withoutcloseness using SVM has the highest prediction accuracyrate when predicting the next-day patterns of the DJIA stockindex This result indicates that SVM is more accurate andsensitive than KNN Closeness does not affect predictionsregarding the DJIA as the results of the KruskalndashWallis testshow

Diether et al examined short-selling in US stocks usingSEC-mandated data for 2005 and found that short-sellingactivity was strongly positively correlated with previous five-day returns and volatility [44] The five-day movement ofstocks is also very important for short-term investment instocks or funds in the real world Thus if the investor canforecast future five-day volatility patterns more informationcan be obtained to support short-selling strategies Forinstance if the next five-day pattern is predicted to be S1(sharp rise) the investor can execute a short-selling strategythe next day

5 Conclusion

Based on the complex network method this study analyzesthe stock price fluctuation patterns of the three most impor-tant stock indexes for the US stock market Unlike previousstudies this study uses the three stock indexes to build patternnetworks for the system rather than using a single stockindex From the analyses of the average strength averageshortest path length average degree centrality and closenesscentrality of the price pattern network every 30 days it isfound that when the overall stock market is in a period of

Complexity 11

dramatic fluctuations the average strength average degreecentrality and closeness centrality reach their maximumvalues while the average shortest path length reaches itsminimum value This shows that price volatility patternnetworks can reflect special periods on the stock marketIn periods of dramatic fluctuations on the stock marketthe comovement of various indexes is stronger the edgedensity of the corresponding pattern network is greater theconversion between price modes is faster and the conversionarea of nodes is more concentrated It shows the validityof using price pattern network characteristics to identifyspecial periods on the stock market To a certain extentthey can reflect abnormal periods on the stock market froma macropoint of view When the four indicators approachextreme values investors should exercise caution

The stock price network characteristic variables not onlycontain price change information for individual stocks butalso reflect the overall change characteristics of the marketat the macrolevel Therefore another focus of this studyis the use of the network characteristic variables as inputvariables for KNN and SVM algorithms to predict the next-day fluctuation patterns of individual stocks Firstly theKruskalndashWallis test is used to test the next-day patterns ofthree stock indexes and four network characteristic variablesand we find that closeness does not affect predicting thenext-day stock patterns of DJIA index In the case of thecombination price patterns for the current day the networkcharacteristic variables are used as the input variables forKNN and SVMalgorithms to predict the next-day stock pricepatterns and the accuracy of the two algorithms in relationto the testing set is compared The results show that both theKNN and SVM algorithms display a high level of accuracy inpredicting the next-day stock price patterns with predictionaccuracy of greater than 70 for all three stock indexesHowever the generalization ability of the SVM algorithm isgreater than that of the KNN algorithmThus it is possible topredict stock trends by using a proper classification algorithmand combining the structural characteristics of themultistockprice networkThis approach can generate more informationfor financial trading strategies in the real world and providesa new focus for future research into stock price prediction

The application of the complex network method to thestock market is still in the developmental stage Revealing thecharacteristics of stock price fluctuations by using complexnetworks is helpful in understanding the essence of stockprice fluctuations and providing profit-making strategies Amore detailed examination of the correlation between finan-cial crises and network topological properties is a worthytopic for further research In addition the combined use ofmachine learning and complex network methods to studythe stock market deserves more in-depth discussion anddiversified development

Data Availability

The data used to support the findings of this study havebeen deposited in httpswwwkescicomhomedataset5c82706ed635ff002ca24a19

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (Grants nos 71371200 71071167and 71071168)

References

[1] L A Teixeira and A L I De Oliveira ldquoA method for automaticstock trading combining technical analysis and nearest neigh-bor classificationrdquo Expert Systems with Applications vol 37 no10 pp 6885ndash6890 2010

[2] M Ballings D Van Den Poel N Hespeels and R Gryp ldquoEval-uating multiple classifiers for stock price direction predictionrdquoExpert Systems with Applications vol 42 no 20 pp 7046ndash70562015

[3] W Huang Y Nakamori and S-Y Wang ldquoForecasting stockmarket movement direction with support vector machinerdquoComputers ampOperations Research vol 32 no 10 pp 2513ndash25222005

[4] Y-W Cheung M D Chinn and A G Pascual ldquoEmpiricalexchange rate models of the nineties Are any fit to surviverdquoJournal of International Money and Finance vol 24 no 7 pp1150ndash1175 2005

[5] G Armano A Murru and F Roli ldquoStock market predictionby a mixture of genetic-neural expertsrdquo International Journal ofPattern Recognition and Artificial Intelligence vol 16 no 5 pp501ndash526 2002

[6] A H Moghaddam M H Moghaddam and M EsfandyarildquoStock market index prediction using artificial neural networkrdquoJournal of Economics Finance and Administrative Science vol21 no 41 pp 89ndash93 2016

[7] R Choudhry and K Garg A Hybrid Machine Learning Systemfor Stock Market Forecasting vol 39 2008

[8] Y K Kwon S S Choi and B R Moon ldquoStock prediction basedon financial correlationrdquo in Proceedings of the Conference onGenetic Evolutionary Computation 2005

[9] L Lacasa V Nicosia and V Latora ldquoNetwork structure ofmultivariate time seriesrdquo Scientific Reports vol 5 2015

[10] M Wang Y Chen L Tian S Jiang Z Tian and R DuldquoFluctuation behavior analysis of international crude oil andgasoline price based on complex network perspectiverdquo AppliedEnergy vol 175 pp 109ndash127 2016

[11] B M Tabak T R Serra and D O Cajueiro ldquoTopologicalproperties of stock market networks the case of Brazilrdquo PhysicaA Statistical Mechanics and Its Applications vol 389 no 16 pp3240ndash3249 2010

[12] L Lacasa B Luque F Ballesteros J Luque and J C NunoldquoFrom time series to complex networks the visibility graphrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 105 no 13 pp 4972ndash4975 2008

[13] Z Gao Q Cai Y Yang W Dang and S Zhang ldquoMultiscalelimited penetrable horizontal visibility graph for analyzingnonlinear timeseriesrdquo Scientific Reports vol 6 no 1 Article ID35622 2016

[14] E Zhuang M Small and G Feng ldquoTime series analysis of thedeveloped financialmarketsrsquo integration using visibility graphsrdquo

12 Complexity

Physica A StatisticalMechanics and its Applications vol 410 pp483ndash495 2014

[15] R V Donner Y Zou J F Donges N Marwan and J KurthsldquoRecurrence networks-a novel paradigm for nonlinear timeseries analysisrdquo New Journal of Physics vol 12 no 3 ArticleID 033025 2010

[16] Y Li H Cao and Y Tan ldquoNovel method of identifying timeseries based on network graphsrdquo Complexity vol 17 no 1 pp13ndash34 2011

[17] N Marwan J F Donges Y Zou R V Donner and J KurthsldquoComplex network approach for recurrence analysis of timeseriesrdquo Physics Letters A vol 373 no 46 pp 4246ndash4254 2009

[18] Y Li H Cao and Y Tan ldquoA comparison of two methods formodeling large-scale data from time series as complex net-worksrdquoAIPAdvances vol 1 no 1 Article ID 012103 p 509 2011

[19] Y Yang and H Yang ldquoComplex network-based time seriesanalysisrdquo Physica A Statistical Mechanics and Its Applicationsvol 387 no 5-6 pp 1381ndash1386 2008

[20] MWang A LMVilela L TianH Xu andRDu ldquoA new timeseries prediction method based on complex network theoryrdquoin Proceedings of the 5th IEEE International Conference on BigData (Big Data) 2017 pp 4170ndash4175 December 2017

[21] Y Shimada T Kimura and T Ikeguchi Analysis of ChaoticDynamics Using Measures of the Complex Network TheorySpringer Berlin Germany 2008

[22] X Xu J Zhang and M Small ldquoSuperfamily phenomena andmotifs of networks induced from time seriesrdquo Proceedings of theNational Acadamy of Sciences of the United States of Americavol 105 no 50 pp 19601ndash19605 2008

[23] Z Ming W Er-Hong Z Ming-Yuan and M Qing-HaoldquoDirected weighted complex networks based on time seriessymbolic pattern representationrdquo Acta Physica Sinica vol 66no 21 Article ID 210502 2017

[24] W-Q Huang S Yao and X-T Zhuang ldquoA network dynamicmodel based on SSE composite index and trading volumefluctuationrdquo Journal of Northeastern University vol 31 no 10pp 1516ndash1520 2010

[25] S H Kim and S H Chun ldquoGraded forecasting using anarray of bipolar predictions Application of probabilistic neuralnetworks to a stock market indexrdquo International Journal ofForecasting vol 14 no 3 pp 323ndash337 1998

[26] J Patel S Shah P Thakkar and K Kotecha ldquoPredicting stockand stock price index movement using trend deterministic datapreparation and machine learning techniquesrdquo Expert Systemswith Applications vol 42 no 1 pp 259ndash268 2015

[27] M-C Wu S-Y Lin and C-H Lin ldquoAn effective application ofdecision tree to stock tradingrdquoExpert Systemswith Applicationsvol 31 no 2 pp 270ndash274 2006

[28] M V Subha and S T Nambi ldquoClassification of stock indexmovement using k-nearest neighbours (k-NN) algorithmrdquoWSEAS Transactions on Information Science and Applicationsvol 9 no 9 pp 261ndash270 2012

[29] C G Atkeson A W Moore and S Schaal ldquoLocally weightedlearningrdquo Artificial Intelligence Review vol 11 no 1ndash5 pp 11ndash73 1997

[30] C-J Huang D-X Yang and Y-T Chuang ldquoApplication ofwrapper approach and composite classifier to the stock trendpredictionrdquo Expert Systems with Applications vol 34 no 4 pp2870ndash2878 2008

[31] S Mitra and T Acharya Data Mining Multimedia Soft Com-puting and Bioinformatics John Wiley amp Sons 2005

[32] V Cherkassky ldquoThe nature of statistical learning theoryrdquo Tech-nometrics vol 38 no 4 pp 409-409 1996

[33] W-M Lin C-H Wu C-H Lin and F-S Cheng ldquoClassifica-tion of multiple power quality disturbances using support vec-tormachine and one-versus-one approachrdquo inProceedings of the2006 International Conference on Power System TechnologyPOWERCON2006 China October 2006

[34] F Li and J Xiao ldquoHow to get effective slide-window size in timeseries similarity searchrdquo Journal of Frontiers of Computer Scienceamp Technology vol 3 no 1 pp 105ndash112 2009

[35] X Sun M Small Y Zhao and X Xue ldquoCharacterizing systemdynamics with a weighted and directed network constructedfrom time series datardquo Chaos An Interdisciplinary Journal ofNonlinear Science vol 24 no 2 Article ID 024402 9 pages2014

[36] L C Freeman ldquoCentrality in social networks conceptual clari-ficationrdquo Social Networks vol 1 no 3 pp 215ndash239 1978-1979

[37] A W Wolfe ldquoSocial network analysis methods and applica-tionsrdquo American Ethnologist vol 24 no 4 pp 136-137 1995

[38] K Shin T S Lee andH Kim ldquoAn application of support vectormachines in bankruptcy predictionmodelrdquo Expert Systems withApplications vol 28 no 1 pp 127ndash135 2005

[39] S Arlot andA Celisse ldquoA survey of cross-validation proceduresfor model selectionrdquo Statistics Surveys vol 4 pp 40ndash79 2010

[40] G C Cawley andN L C Talbot ldquoEfficient leave-one-out cross-validation of kernel fisher discriminant classifiersrdquo PatternRecognition vol 36 no 11 pp 2585ndash2592 2003

[41] L Xia D You X Jiang and Q Guobc ldquoComparison betweenglobal financial crisis and local stock disaster on top of Chinesestock networkrdquo Physica A Statistical Mechanics Its Applicationsvol 490 Article ID S0378437117307227 pp 222ndash230 2017

[42] Z Zhou and Y Yu Machine Learning and Its Application 2011Tsinghua University Press Beijing China 2009

[43] X Wang H Xue and W Jia Prediction Model of Stockrsquos Rosingand Felling Based on BP Neural Network Value Engineering2010

[44] K B Diether K-H Lee and IMWerner ldquoShort-sale strategiesand return predictabilityrdquo Review of Financial Studies vol 22no 2 pp 575ndash607 2009

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of


Mathematical Problems in Engineering

Applied MathematicsJournal of


Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of



Engineering Mathematics

International Journal of


Operations ResearchAdvances in

Journal of


Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences


Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018


Decision SciencesAdvances in


AnalysisInternational Journal of


Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

2 Complexity

as a network node and the relationship between patterns isregarded as a connection between nodes [10] By analyzingthe topological properties of the network the characteristicsof stock price fluctuations can be better understood Huang etal used coarse-grained symbolization methods to constructa network of market prices and transaction volume datain different periods based on the Shanghai Stock Exchange(SSE) composite index and the results showed that the out-degree distribution of network nodes obeyed the power lawand the basic fluctuations exhibited different patterns duringdifferent periods [24] Wang et al converted the yields ofgasoline and crude oil stocks into five patterns and studiedthe characteristics of crude oil and gasoline node networks indifferent periods using sliding windows and then accuratelypredicted the crude oil and gasoline stock price pattern basedon the conversion characteristics of the price network [10 20]

However most of the existing studies on stock pricevolatility pattern networks have focused on univariate timeseries On this basis we propose a new network constructionmethod to build the volatility pattern networks of the threemost important indexes in the US stock market namelythe Standard amp Poorrsquos 500 Index (SampP 500) the NASDAQComposite Index (NASDAQ) and the Dow Jones IndustrialAverage (DJIA) Firstly the combination symbolic patternsfor the three stock indexes are derived using a coarse-grainedmethodThen the combination symbolic patterns are used asthe nodes of the network and the frequencies and directionsof the conversion of the patterns are used as the weights anddirections of the network connections Finally we constructdirected and weighted networks for the US stock market Byanalyzing the network topology properties we can identifyperiods of sharp fluctuations in the market

Meanwhile many machine learning algorithms havebeen applied to stock price volatility classification and pre-diction such as neural networks [25] random forests [26]decision trees [27] support vector machines (SVM) [3 7]and K-nearest neighbors (KNN) [1 28] Among them K-nearest neighbors (KNN) and support vectormachine (SVM)algorithms have been widely used in pattern recognitionand forecastingmachine learning information retrieval anddata mining KNN is a simple and effective classificationmethod that is easy to calculate and its performance iscomparable to the most advanced classification methods [2930] SVM which canmap nonlinear separable data into high-dimensional space and use hyperplanes for classificationis highly suitable for small sample classification because ofits excellent classification ability [26] Both KNN and SVMalgorithms have a mature theoretical basis in relation toclassification prediction Ballings et al also compared theaccuracy of SVM KNN and other algorithms in predictingstock pricemovements one year ahead for 5767 publicly listedEuropean companies and the results showed that SVM hasthe better prediction ability than KNN [2] Teixeira proposedan automatic stock trading method that combined technicalanalysis with KNN classification Using 15 stocks from SaoPaulo Stock Exchange (Bovespa) they found that the pro-posed method generated considerably higher profits than thebuy-and-hold method for most of the companies with fewbuy actions generated [1] Huang et al used SVM algorithms

to predict the weekly fluctuations in the Nikkei 225 indexand found that SVM outperformed the other classificationmethods such as quadratic discriminant analysis and Elmanbackpropagation neural networks [3]

Literature has demonstrated the ability of SVM andKNN to predict stock patterns However they predicted thestock price based on the information of the single stockitself without considering the information of the networksystem composed of the relevant stocks Therefore anotheraim of this study is to predict the next-day pattern of asingle stock for each combination mode of stocks usingthe network topology properties as input variables for SVMand KNN algorithms To the best of our knowledge thisshould be the first attempt in existing research Then wecompare the prediction accuracy using the testing data setafter identifying the best models using the training set Thestock price volatility pattern network includes price infor-mation for single stocks and related stocks and portrays themacronature of themarket which containsmore informationthan is available using only historical information relatingto single stocks The results show that the pattern networkcan provide some information to enable us to forecast theprice volatility patterns of single stocks Of the two predictionmethods the optimal parameter search strategy combinedwith cross-validation and search methods enables us to findthe models that perform well on the testing data set Overallthe performance of SVM algorithms is better than that ofKNN algorithms Combining with complex network andmachine learning can provide investors with information onprofitability strategies

The remainder of this paper is organized as follows Inthe next section we introduce the theoretical background forKNN and SVM algorithms In Section 3 the methodologyof constructing the network and of predicting the next-day patterns for each stock index is presented In Section 4we show the empirical results and compare the predictionaccuracy for KNN and SVM The last section is devoted toa summary

2 Theoretical Background for KNN and SVM

21 KNN K-nearest neighbor (KNN) algorithm is a non-parametric classification algorithm that assigns query data tobe classified to the category to which most of its119870 neighborsbelong [31] We use the Euclidean distance metric to find K-nearest neighbors from a sample set of known classificationsSuppose that the known data set has four feature variables1198911 1198912 1198913 1198914 and four categories 1199101 1199102 1199103 1199104 The stepsto search the category of the new data 119894 through the KNNalgorithm are as follows

Firstly the Euclidean distance of the feature variables ofthe data 119894 and the other data 119895 (119895 = 1 2 n) in the trainingdata set is calculated

119863119895 = radic 4sum119892=1

(119891119892 (119894) minus 119891119892 (119895))2 119895 = 1 2 n (1)

Secondly all the data in the training set are sorted inascending order according to the distance from data i

Complexity 3










min 120593 (w) = 12 1199082 + 119862 119873sum119894=1

120585119894 = 12 (1199081015840 sdot 119908) + 119862 119873sum119894=1

120585119894st 119910119894 [(w119879 sdotΦ (119909119894)) + b] + 120585119894 ge 1

120585119894 ge 0 i = 1 2 n(4)



119896 (119909119894 119909119895) = Φ119879 (119909119894) sdotΦ (119909119895) (6)






3 Methodology



119903 = ln 119862119897119900119904119890 (119905)119862119897119900119904119890 (119905 minus 1) (8)

119877 = ln 119862119897119900119904119890 (119905)119862119897119900119904119890 (119905 minus 5) (9)

V = 119904119905119889 (1199031 1199035) lowast radic5 (10)



119881 = 1119873 sum119881 (11)


V1015840 = 1119872119872sum119894=1

119881119894 (12)


4 Complexity







Pattern Prediction




S1S1S1

S2S2S2S1S4S4

S3S1S4

1

1 2

11


119891 =


(13)





Complexity 5



1234

hellip303132hellip






hellip

Sliding window 1

Sliding window 2

Sliding window 3


120588 = 1119873 (119873 minus 1)119873sum119894119895=1

119886119894119895 (14)


120588 = 11198732119873sum119895=1

119873sum119894=1

119886119894119895 (15)



S = 1119873119873sum119894=1

119873sum119895=1

119908119894119895 (16)



L = 1119873 (119873 minus 1)119873sum119894=1

119873sum119895=1

119889119894119895 (119894 = 119895) (17)



119862119894 = 119873 minus 1sum119873119895=1 119889119895119894 (18)


119862 = 1119873119873sum119894=1


6 Complexity




Train the models














119910119894 = 119910119894 minus 119910119898119894119899119910119898119886119909 minus 119910119898119894119899 (20)





Complexity 7

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

treng

th

Average Strength


NASDAQDJIASampP 500

Average Strength

2

4

6

8

10

12

14

5000

2500

0












8 Complexity

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

hort

est P

ath

Leng

th



NASDAQDJIASampP 500


1

2

3

4

5

5000

2500

0


2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Clos

enes

s Cen

tral

ity



NASDAQDJIASampP 500


02

04

06

08

10

5000

2500

0




Complexity 9

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age D

egre

e Cen

tral

ity




NASDAQDJIASampP 500


025

050

075

100

125

150

200

175

5000

2500

0









10 Complexity










Yes KNN 7483 7258 7245SVM 7497 7311 7457

No KNN 7298 7059 7086SVM 7603 7245 7364




times 100 (21)






5 Conclusion


Complexity 11




Data Availability




Acknowledgments


References















12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018








Complexity 3










min 120593 (w) = 12 1199082 + 119862 119873sum119894=1

120585119894 = 12 (1199081015840 sdot 119908) + 119862 119873sum119894=1

120585119894st 119910119894 [(w119879 sdotΦ (119909119894)) + b] + 120585119894 ge 1

120585119894 ge 0 i = 1 2 n(4)



119896 (119909119894 119909119895) = Φ119879 (119909119894) sdotΦ (119909119895) (6)






3 Methodology



119903 = ln 119862119897119900119904119890 (119905)119862119897119900119904119890 (119905 minus 1) (8)

119877 = ln 119862119897119900119904119890 (119905)119862119897119900119904119890 (119905 minus 5) (9)

V = 119904119905119889 (1199031 1199035) lowast radic5 (10)



119881 = 1119873 sum119881 (11)


V1015840 = 1119872119872sum119894=1

119881119894 (12)


4 Complexity







Pattern Prediction




S1S1S1

S2S2S2S1S4S4

S3S1S4

1

1 2

11


119891 =


(13)





Complexity 5



1234

hellip303132hellip






hellip

Sliding window 1

Sliding window 2

Sliding window 3


120588 = 1119873 (119873 minus 1)119873sum119894119895=1

119886119894119895 (14)


120588 = 11198732119873sum119895=1

119873sum119894=1

119886119894119895 (15)



S = 1119873119873sum119894=1

119873sum119895=1

119908119894119895 (16)



L = 1119873 (119873 minus 1)119873sum119894=1

119873sum119895=1

119889119894119895 (119894 = 119895) (17)



119862119894 = 119873 minus 1sum119873119895=1 119889119895119894 (18)


119862 = 1119873119873sum119894=1


6 Complexity




Train the models














119910119894 = 119910119894 minus 119910119898119894119899119910119898119886119909 minus 119910119898119894119899 (20)





Complexity 7

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

treng

th

Average Strength


NASDAQDJIASampP 500

Average Strength

2

4

6

8

10

12

14

5000

2500

0












8 Complexity

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

hort

est P

ath

Leng

th



NASDAQDJIASampP 500


1

2

3

4

5

5000

2500

0


2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Clos

enes

s Cen

tral

ity



NASDAQDJIASampP 500


02

04

06

08

10

5000

2500

0




Complexity 9

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age D

egre

e Cen

tral

ity




NASDAQDJIASampP 500


025

050

075

100

125

150

200

175

5000

2500

0









10 Complexity










Yes KNN 7483 7258 7245SVM 7497 7311 7457

No KNN 7298 7059 7086SVM 7603 7245 7364




times 100 (21)






5 Conclusion


Complexity 11




Data Availability




Acknowledgments


References















12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018








4 Complexity







Pattern Prediction




S1S1S1

S2S2S2S1S4S4

S3S1S4

1

1 2

11


119891 =


(13)





Complexity 5



1234

hellip303132hellip






hellip

Sliding window 1

Sliding window 2

Sliding window 3


120588 = 1119873 (119873 minus 1)119873sum119894119895=1

119886119894119895 (14)


120588 = 11198732119873sum119895=1

119873sum119894=1

119886119894119895 (15)



S = 1119873119873sum119894=1

119873sum119895=1

119908119894119895 (16)



L = 1119873 (119873 minus 1)119873sum119894=1

119873sum119895=1

119889119894119895 (119894 = 119895) (17)



119862119894 = 119873 minus 1sum119873119895=1 119889119895119894 (18)


119862 = 1119873119873sum119894=1


6 Complexity




Train the models














119910119894 = 119910119894 minus 119910119898119894119899119910119898119886119909 minus 119910119898119894119899 (20)





Complexity 7

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

treng

th

Average Strength


NASDAQDJIASampP 500

Average Strength

2

4

6

8

10

12

14

5000

2500

0












8 Complexity

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

hort

est P

ath

Leng

th



NASDAQDJIASampP 500


1

2

3

4

5

5000

2500

0


2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Clos

enes

s Cen

tral

ity



NASDAQDJIASampP 500


02

04

06

08

10

5000

2500

0




Complexity 9

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age D

egre

e Cen

tral

ity




NASDAQDJIASampP 500


025

050

075

100

125

150

200

175

5000

2500

0









10 Complexity










Yes KNN 7483 7258 7245SVM 7497 7311 7457

No KNN 7298 7059 7086SVM 7603 7245 7364




times 100 (21)






5 Conclusion


Complexity 11




Data Availability




Acknowledgments


References















12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018








Complexity 5



1234

hellip303132hellip






hellip

Sliding window 1

Sliding window 2

Sliding window 3


120588 = 1119873 (119873 minus 1)119873sum119894119895=1

119886119894119895 (14)


120588 = 11198732119873sum119895=1

119873sum119894=1

119886119894119895 (15)



S = 1119873119873sum119894=1

119873sum119895=1

119908119894119895 (16)



L = 1119873 (119873 minus 1)119873sum119894=1

119873sum119895=1

119889119894119895 (119894 = 119895) (17)



119862119894 = 119873 minus 1sum119873119895=1 119889119895119894 (18)


119862 = 1119873119873sum119894=1


6 Complexity




Train the models














119910119894 = 119910119894 minus 119910119898119894119899119910119898119886119909 minus 119910119898119894119899 (20)





Complexity 7

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

treng

th

Average Strength


NASDAQDJIASampP 500

Average Strength

2

4

6

8

10

12

14

5000

2500

0












8 Complexity

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

hort

est P

ath

Leng

th



NASDAQDJIASampP 500


1

2

3

4

5

5000

2500

0


2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Clos

enes

s Cen

tral

ity



NASDAQDJIASampP 500


02

04

06

08

10

5000

2500

0




Complexity 9

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age D

egre

e Cen

tral

ity




NASDAQDJIASampP 500


025

050

075

100

125

150

200

175

5000

2500

0









10 Complexity










Yes KNN 7483 7258 7245SVM 7497 7311 7457

No KNN 7298 7059 7086SVM 7603 7245 7364




times 100 (21)






5 Conclusion


Complexity 11




Data Availability




Acknowledgments


References















12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018








6 Complexity




Train the models














119910119894 = 119910119894 minus 119910119898119894119899119910119898119886119909 minus 119910119898119894119899 (20)





Complexity 7

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

treng

th

Average Strength


NASDAQDJIASampP 500

Average Strength

2

4

6

8

10

12

14

5000

2500

0












8 Complexity

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

hort

est P

ath

Leng

th



NASDAQDJIASampP 500


1

2

3

4

5

5000

2500

0


2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Clos

enes

s Cen

tral

ity



NASDAQDJIASampP 500


02

04

06

08

10

5000

2500

0




Complexity 9

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age D

egre

e Cen

tral

ity




NASDAQDJIASampP 500


025

050

075

100

125

150

200

175

5000

2500

0









10 Complexity










Yes KNN 7483 7258 7245SVM 7497 7311 7457

No KNN 7298 7059 7086SVM 7603 7245 7364




times 100 (21)






5 Conclusion


Complexity 11




Data Availability




Acknowledgments


References















12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018








Complexity 7

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

treng

th

Average Strength


NASDAQDJIASampP 500

Average Strength

2

4

6

8

10

12

14

5000

2500

0












8 Complexity

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

hort

est P

ath

Leng

th



NASDAQDJIASampP 500


1

2

3

4

5

5000

2500

0


2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Clos

enes

s Cen

tral

ity



NASDAQDJIASampP 500


02

04

06

08

10

5000

2500

0




Complexity 9

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age D

egre

e Cen

tral

ity




NASDAQDJIASampP 500


025

050

075

100

125

150

200

175

5000

2500

0









10 Complexity










Yes KNN 7483 7258 7245SVM 7497 7311 7457

No KNN 7298 7059 7086SVM 7603 7245 7364




times 100 (21)






5 Conclusion


Complexity 11




Data Availability




Acknowledgments


References















12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018








8 Complexity

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age S

hort

est P

ath

Leng

th



NASDAQDJIASampP 500


1

2

3

4

5

5000

2500

0


2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Clos

enes

s Cen

tral

ity



NASDAQDJIASampP 500


02

04

06

08

10

5000

2500

0




Complexity 9

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age D

egre

e Cen

tral

ity




NASDAQDJIASampP 500


025

050

075

100

125

150

200

175

5000

2500

0









10 Complexity










Yes KNN 7483 7258 7245SVM 7497 7311 7457

No KNN 7298 7059 7086SVM 7603 7245 7364




times 100 (21)






5 Conclusion


Complexity 11




Data Availability




Acknowledgments


References















12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018








Complexity 9

2008Time

2000 2002 2004 2006 2010 2012 2014

17500

15000

12500

10000

7500Clos

ing

Pric

e

Aver

age D

egre

e Cen

tral

ity




NASDAQDJIASampP 500


025

050

075

100

125

150

200

175

5000

2500

0









10 Complexity










Yes KNN 7483 7258 7245SVM 7497 7311 7457

No KNN 7298 7059 7086SVM 7603 7245 7364




times 100 (21)






5 Conclusion


Complexity 11




Data Availability




Acknowledgments


References















12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018








10 Complexity










Yes KNN 7483 7258 7245SVM 7497 7311 7457

No KNN 7298 7059 7086SVM 7603 7245 7364




times 100 (21)






5 Conclusion


Complexity 11




Data Availability




Acknowledgments


References















12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018








Complexity 11




Data Availability




Acknowledgments


References















12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018








12 Complexity







































Journal of












Journal of







Volume 2018






Volume 2018















Journal of












Journal of







Volume 2018






Volume 2018








Documents

Stock Price Pattern Prediction Based on Complex Network ...downloads.hindawi.com/journals/complexity/2019/4132485.pdf · Complexity irdly, K data pointswiththesmallest distancefrom