11
Research Article FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm for Detecting SNP Epistasis Lin Yuan, 1 Chang-An Yuan, 2 and De-Shuang Huang 1 1 Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Caoan Road 4800, Shanghai 201804, China 2 Science Computing and Intelligent Information Processing of Guang Xi Higher Education Key Laboratory, Guangxi Teachers Education University, Nanning, Guangxi 530001, China Correspondence should be addressed to De-Shuang Huang; [email protected] Received 31 March 2017; Accepted 24 July 2017; Published 7 September 2017 Academic Editor: Jianxin Wang Copyright © 2017 Lin Yuan et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e epistasis is prevalent in the SNP interactions. Some of the existing methods are focused on constructing models for two SNPs. Other methods only find the SNPs in consideration of one-objective function. In this paper, we present a unified fast framework integrating adaptive ant colony optimization algorithm with multiobjective functions for detecting SNP epistasis in GWAS datasets. We compared our method with other existing methods using synthetic datasets and applied the proposed method to Late-Onset Alzheimer’s Disease dataset. Our experimental results show that the proposed method outperforms other methods in epistasis detection, and the result of real dataset contributes to the research of mechanism underlying the disease. 1. Introduction Accompanied by the rapid development of genomics and gene chip technology, Genome-Wide Association Studies (GWAS) predicted massive genetic variations related to complex traits [1, 2]. Although this method has achieved great success. It can only explain a small part of the mechanism under the complex diseases known as “missing heritability” [3]. at is to say, marginal genetic effects of GWAS identified single nucleotide polymorphisms (SNPs) account for small part of pathogenic causes. For single-locus SNPs related disease [4], GWAS can identify SNPs that are responsible for disease trait. However, complex diseases are oſten due to the small and complex effects of large SNPs, such as type 2 diabetes [5], prostate cancer, and rheumatoid arthritis (RA) [6]. More and more studies have shown that epistasis exists in SNPs interaction. Many SNPs will interact with each other in the process of affecting the disease traits [7]. Some SNPs will affect the disease and dominate the effect of others. e relationship of one SNP repressing the effect of another SNP is known as epistasis. In many complex human diseases, the effect of epistasis among complex human diseases is unclear. e proposed methods for SNP related disease may have poor performance due to failure to identify epistasis. During the past decade, a lot of approaches have been proposed to detect epistasis. Some methods focus on the interaction between two certain SNPs. Zhang et al. [8] proposed a Bayesian partition method for epistatic eQTL modules. Kang et al. [9] proposed four different models to measure epistasis effect between two loci and suggest a statis- tical strategy to infer the hierarchical relationships. Recently, Lin et al. [10] reported forty-five SNP-SNP interaction models by considering the inheritance modes and model structures. ough these methods have been successful in studying epis- tasis between two SNPs. e GWAS data is high dimension data which contains hundreds of thousands or even million SNPs; at the same time, GWAS data only contains dozens or hundreds of individual sample data, for example, the small number sample data and the high dimension features; it needs vast amounts of time to identify the interaction between each pair of SNPs [11–13]. e computational burden is out of bounds. More and more machine learning methods are applied to research epistasis. Many methods were proposed to model Hindawi Complexity Volume 2017, Article ID 5024867, 10 pages https://doi.org/10.1155/2017/5024867

FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

Research ArticleFAACOSE A Fast Adaptive Ant Colony OptimizationAlgorithm for Detecting SNP Epistasis

Lin Yuan1 Chang-An Yuan2 and De-Shuang Huang1

1 Institute of Machine Learning and Systems Biology School of Electronics and Information Engineering Tongji UniversityCaoan Road 4800 Shanghai 201804 China2Science Computing and Intelligent Information Processing of Guang Xi Higher Education Key LaboratoryGuangxi Teachers Education University Nanning Guangxi 530001 China

Correspondence should be addressed to De-Shuang Huang dshuangtongjieducn

Received 31 March 2017 Accepted 24 July 2017 Published 7 September 2017

Academic Editor Jianxin Wang

Copyright copy 2017 Lin Yuan et alThis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

The epistasis is prevalent in the SNP interactions Some of the existing methods are focused on constructing models for two SNPsOther methods only find the SNPs in consideration of one-objective function In this paper we present a unified fast frameworkintegrating adaptive ant colony optimization algorithmwithmultiobjective functions for detecting SNP epistasis in GWAS datasetsWe compared our method with other existing methods using synthetic datasets and applied the proposed method to Late-OnsetAlzheimerrsquos Disease dataset Our experimental results show that the proposed method outperforms other methods in epistasisdetection and the result of real dataset contributes to the research of mechanism underlying the disease

1 Introduction

Accompanied by the rapid development of genomics andgene chip technology Genome-Wide Association Studies(GWAS) predicted massive genetic variations related tocomplex traits [1 2] Although thismethod has achieved greatsuccess It can only explain a small part of the mechanismunder the complex diseases known as ldquomissing heritabilityrdquo[3]That is to say marginal genetic effects of GWAS identifiedsingle nucleotide polymorphisms (SNPs) account for smallpart of pathogenic causes For single-locus SNPs relateddisease [4] GWAS can identify SNPs that are responsiblefor disease trait However complex diseases are often due tothe small and complex effects of large SNPs such as type 2diabetes [5] prostate cancer and rheumatoid arthritis (RA)[6] More and more studies have shown that epistasis existsin SNPs interaction Many SNPs will interact with each otherin the process of affecting the disease traits [7] Some SNPswill affect the disease and dominate the effect of others Therelationship of one SNP repressing the effect of another SNPis known as epistasis In many complex human diseases theeffect of epistasis among complex human diseases is unclear

Theproposedmethods for SNP related diseasemay have poorperformance due to failure to identify epistasis

During the past decade a lot of approaches have beenproposed to detect epistasis Some methods focus on theinteraction between two certain SNPs Zhang et al [8]proposed a Bayesian partition method for epistatic eQTLmodules Kang et al [9] proposed four different models tomeasure epistasis effect between two loci and suggest a statis-tical strategy to infer the hierarchical relationships RecentlyLin et al [10] reported forty-five SNP-SNP interactionmodelsby considering the inheritance modes and model structuresThough these methods have been successful in studying epis-tasis between two SNPs The GWAS data is high dimensiondata which contains hundreds of thousands or even millionSNPs at the same time GWAS data only contains dozens orhundreds of individual sample data for example the smallnumber sample data and the high dimension features it needsvast amounts of time to identify the interaction between eachpair of SNPs [11ndash13] The computational burden is out ofbounds

More and more machine learning methods are appliedto research epistasis Many methods were proposed to model

HindawiComplexityVolume 2017 Article ID 5024867 10 pageshttpsdoiorg10115520175024867

2 Complexity

epistasis effect from the perspective of the overall dataMooreet al [14] applied regression method to identify the relation-ship between gene expression and epistasis effect Michaelet al [15] applied Bayesian networks to identify the epistasiseffect network from the original SNPs data Although thesemethods solved some problems they still did not showsignificant effects with the large scale Genome-Wide Asso-ciation Study datasets owing to the same ldquohigh-dimensionalsmall sample size problemrdquo With the rapid development ofmultiobjective optimization method and machine learningdiscipline ant colony optimization (ACO) algorithm wasapplied to epistasis research Wang et al [16] proposedAntEpiSeeker AntEpiSeeker combines heuristic search withthe ant colony optimization to identify SNPs which dominateother SNPs Experimental results on real rheumatoid arthritisdataset show that AntEpiSeeker is better than other methodsThe drawback of this method is that other methods showdifferent performance on different disease models Zhangand Liu [17] developed the Bayesian inference method whichidentifies the epistatic interactions in case-control studiesHowever the BEAM method needs a lot of time in GWASdataset In this paper we extend SNP epistasis study to afast adaptive ant colony optimization algorithm for detectingSNP epistasis We search SNP epistasis with two-objectivefunctions and fast adaptive ant colony optimization

The experiments on several simulated datasets show thegood performance of our method We also compare ourmethod with the benchmark methods including BEAMgeneric ACO and AntEpiSeeker Experimental results showthat our method has better performance in GWAS datasetscontaining epistasis effect among SNPs

2 Methods

21 Ant Colony Optimization In the research of artificialintelligence and large scale problem solving the ant colonyoptimization (ACO) algorithm is inspired by the ants foodsearch behaviour in natureAssume that the food search pathsconstitute a graph the ant colony optimization algorithmcan reduce time of search paths through graphs [18] Thisalgorithm with other ant colony optimization algorithms iskind of swarm intelligence methods and it is member ofmetaheuristic optimizations Marco Dorigo proposed the antcolony optimization algorithm in 1992 in his PhD thesisIn the GWAS datasets the datasets often contain tens ofhundreds to millions of SNPs It is not feasible to identifythe relationship of every pair of SNPs within an acceptabletime ACO algorithmwas used here to reduce the complexityof exhaustive search In kingdom of insects in the processof finding food ants look like they are walking randomlyand in the back and forth path of searching for food theants will leave pheromones on the path If the path is foundby other ants other ants tend to follow the path but notwalk randomly going further if they find food throughthis path they will also leave pheromones the pheromonevalue on this path is enhanced Subject to other factors innature pheromone value starts to evaporate and the pathrsquosattractive strength starts to decrease The longer the pathis the more the time the ants are looking for food As a

comparison the time the ants take to walk through theshort path is greatly shortened and pheromone values willbe larger on shorter paths than longer paths Pheromoneevaporation results in dynamic changes in the path Pathdynamic changes can avoid the convergence of solutions toa locally optimal solution If there is no pheromone valuesevaporation the food search path selected by first ants wouldtend to be the only path or the most attractive path Thisphenomenon will lead to limitation of the solution spaceThe mechanism of pheromone evaporation in ant colonyis unclear but pheromone evaporation is a very importantapplication in artificial intelligence systems Though the antcolony optimization algorithm has achieved great success inapplication [19ndash21]

The travelling salesperson problem (TSP) is a problemwith some cities and physical distances between each pairof cities The question is what is the shortest possible pathwhere travelling salesperson visits each city once and finallyreturns to the origin city Suppose there are 119899 cities thereare (119899 minus 1)2 solutions to the problemThe feasible solutionswill increase exponentiallywhen the number of city increasesmaking the computation impractical Obviously it is an NP-hard problem of combinatorial optimizations

Suppose that 119898 ants are randomly placed in 119899 cities thekth ant in the 119894th city the probability if ant chooses the nextcity 119895 is

119901119896119894119895=

120591120572119894119895 (119905) 120578120573119894119895 (119905)

sum119900isincandidate119896 120591120572119894119900 (119905) 120578120573119894119900 (119905) 119895 isin candidate119896

0 otherwise

120578119894119895 (119905) =1119889119894119895

(1)

where 120591119894119895(119905) indicates the surplus information on path 119894119895 inmoment t 120578119894119895(119905) indicates the heuristic function 119889119894119895 indicatesthe physical distance between city 119894 and city j tabu119896 indicatesthe cities set which indicates ant 119896 has visited candidate119896indicates the set of cities which ant 119896 can visit next

Over time after 119899moments the ants complete a cycle theinformation of each path should be adjusted according to

120591119894119895 (119905 + 119899) = (1 minus 120588) 120591119894119895 (119905) + Δ120591119894119895

Δ120591119894119895 =119898

sum119896=1

Δ120591119896119894119895(2)

where Δ120591119894119895 indicates information increment of path 119894119895 afterthis cycle

Δ120591119894119895 =

119876119897119896 119894119895 isin 1198711198960 otherwise

(3)

where 119871119896 indicates ant krsquos paths in this cycle 119897119896 indicates thepath length of ant 119896 in this cycle The parameters neededto be determined are 120572 120573 120588119898 119876 the number of ants isless than or equal to city number Q is a large suitable

Complexity 3

number ACO is always used in large scale data problemsHowever slowness is still a bottleneck in the application ofthe ant colony algorithm for large scale search optimizationproblems Pheromone update strategy is one of the keys todetermine the convergence rate

In the process of applying ant colony optimization tospecific problems the search space should be as large aspossible At the same time ACO should consider timeefficiency ACO should balance the optimal solutions andsolve speed On the basis of previous studies [22ndash24]We onlyconsider pheromone evaporation factor 120588 and pheromoneimportance factor 120572 In (2) 120588 is used to balance the effects ofold pheromone value and current pheromone value When 120588is too small the residual pheromone value is too much andleads to local minimum solution We adopt adaptive 120588 whenthe algorithm does not improve the current optimal solutionwithin 119899 iterations

120588 (119905 + 119899) =

119892120588 (119905) 120588 (119905) le 120588max

120588max otherwise(4)

where 120588max equals 085 in practice 119892 equals 102 as tuneparameter When the pheromone value reaches the criticalvalue the pheromone importance factor begins to play a roleWith the increase of pheromone importance factor 120572 thealgorithm will jump out of the local optimal solution and hasability to search for global optimal solution

120572 (119905 + 119899) =

1198921120572 (119905) 120572 (119905) le 120572max

120572max otherwise(5)

where 1198921 is a constant larger than one and 120572max is less than orequal to five In the process of calculation first we follow thestandard ant colony optimization algorithm for119873 iterations119873 is predefined number If the current optimal solution is notimproved after119873 iterations update the parameters accordingto formulas (4) and (5) Then update all pheromone valueaccording to (2)

Given pheromone values and transfer rules we can usethe ant colony optimization algorithm to find a group ofSNPs which affect the disease Assume there are 119875 SNPs inthe global Genome-Wide Association Studies dataset we canconstruct a p-dimensional symmetricmatrix119872 to store everyantrsquos pheromone value The element119898119894119895 of matrix119872 denotesthe interaction which is related to disease between 119894th SNPand jth SNP At the beginning of our method every elementof matrix 119872 is assigned to a constant value 1198980 equivalentvalue shows the epistasis in every pair of SNPs and there isequal possibility relationship between the SNPs and disease

At the final pheromone iteration the ACO algorithmwill obtain the optimal solutions through forward selectionstrategy The advantage of ACO algorithm in this paper isthat the result contains nondominated solutions which havethe potentially equivalent possibility and potentially highestrelated strength with disease and omit dominated solutions

The disadvantages of traditional ant colony optimizationalgorithm are long search time and tendency to fall intothe local optimal solution The drawback of this working

mode is that the current pheromone evaporation factorand pheromone importance factor are predefined As animproved strategy we extended the ldquodynamic adaptive strat-egyrdquo to ant colony optimization The advantage of thisstrategy is the fast convergence rate and searching for globaloptimization solution Compared with traditional ACO thenew strategy can provide more accurate result

22 Two-Objective Function Optimization The results of antcolony optimization need to be evaluated We combine two-objective methods to assess the final epistasis results Ingeneral one of two-objective functions combines AkaikeInformation Criterion (AIC) score and logistic regressionfunction to measure relationship between phenotypic traitand genotype data Akaike Information Criterion indicatesthe effectiveness and complexity of the model [25 26] Inour method on the basis of the standard logistic regressionfollowing the North et al [27] strategy we use ADDINTlogistic regression model to search the relationship betweendisease and SNP nodes The second objective functionuses frequency measurement based on mutual informationtheory to model the relationship between genotype dataand phenotypic trait from the perspective of informationtheory The second objective function used to represent theselected SNP subsets can explain how much informationis about the disease trait Our proposed method obtainsinformation from data rather than a lot of priori informationThe above two-objective functions are designed from thedifferent perspective to measure the quality of the searchresults and the simulation data experiment results show thatour two-objective functions have a better performance thanother methods on simulated and real biological datasets

In order to avoid the bad impact of high dimension smallsize sample problem the identification of disease-associatedSNPs is known as a heuristic optimization problem In ourproposed method proposed method yields optimal solu-tions which is nondominated solutions the proposed two-objective functions method actually is kind of multiobjectiveoptimization the proposedmethoduses ant colony optimiza-tion to search for optimal solution [28]

Our proposed fast adaptiveACO framework contains twostages In the first stage we use modified ACO optimizationalgorithm with two-objective functions to search for non-dominated SNP subset After generating the nondominatedSNP subset we apply Fisher exact test [29 30] to the datasetcontaining nondominated SNP generated in the algorithmfirst stage The Fisher exact test will be used to identify therelationship between disease and SNPs

221 AIC Score The Akaike Information Criterion (AIC) isused to measure quality of dataset statistical models AIC isfrom information theory and it estimates loss of informationwhen a statistical model is used to express the data generationprocess The mechanism of Akaike Information Criterion isthat it deals with the trade-off between the goodness of fitof the model and the complexity of the model Based onthe nature of the AIC we construct AIC model from theperspective of GWAS dataset The goal of our method isto measure the relationship between the genotype data of

4 Complexity

genome and phenotype disease trait Logistic regression iswidely used to quantitatively analyze the correlation betweendependent variable and independent variable Based onabove methods we construct AIC score model containinglogistic regression and gradient penalty function Logisticregression can compute the maximized log-likelihood of themodel k is used to express the number of free parametersAIC score deals with the trade-off between the fitness effectof the model and the complexity of the model We follow Jingand Shen [28] strategy

AICscore = 2119896 minus 2 log lik (6)

where 119896 denotes the number of free parameters

222 Explanation Score In GWAS research the relationshipbetween two loci and disease in SNP research each locus hasthree values 0 1 and 2 0 means major allele homozygous 1means heterozygote and 2 means minor allele homozygous[31] For two loci there are nine cases of their combinationthe disease related SNP locus often changes when the diseaseoccurs In the case of double locus combination 119909119894means thenumber of 119894th combinations of two SNP lociY means case orcontrol state 1199101 means state case and 1199102 means state controlThe potential interrelationships of two discrete randomvariables 119883 and 119884 are defined as 119867(119883 119884) the relationshipbetween locus combination and disease ismeasured based onthe information of locus frequency 119867(119883 119884) is described asbelow

119867(119883 119884) =119868

sum119894=1

(100381610038161003816100381610038161199091198941199101 minus 119909119894119910210038161003816100381610038161003816) (7)

where 119868 means the total number of locus combinations Toavoid unbalanced sample the size affects score For exampleif data size of case is larger than control we extract thesame size of control data from case samples randomly Toavoid the impact of randomness we extract sample severaltimes and average the results The large value 119867 means thepotential association probability between disease and SNPs islarge Equation can also be applied to more than two locuscombinations We name this score explain score

23 Pareto Optimality for SNP Epistasis Detection Paretooptimality defines such a situation Pareto optimality isproposed to solve the following questions where it is impos-sible to make all objective function values of multiobjectiveoptimization optimal values [32 33] Pareto optimality isfirst applied to the area of income distribution and economyNow Pareto optimality has been extended to engineering andmultiobjective optimization research On the basis of previ-ous proposedmethods themodified ant colony optimizationalgorithm with first objective function and second objectivefunction the first objective function is AIC score with logis-tic regression and related parameters the second objectivefunction is explain score For the first objective functionsthe lower score of the objective function indicates the highpotential relationship between disease phenotype trait andSNPs [34] For the second objective functions the higherscore of the objective function indicates the high potential

relationship between the disease phenotype trait and SNPsThe target of fast two-stage ant colony optimization algorithmis to find the epistasis effect among SNPs and extract real SNPsubset with respect to the above proposed methods

In the real GWAS datasets an identified SNP subset mayperform the best compared with other method solutions interms of one-objective function but SNP subsetmay performpoorly in terms of another objective functionThus the targetis how to select better SNP subset with respect to both objec-tive functions In practical application rare subset performsbetter than other solutions while satisfying both conditionsThus for a framework with two-objective functions it ishard and even impossible to calculate the global optimalsolution On the basis of previous studies [28 34 35] weadopt Pareto optimality to find the practical optimal solutionWe first compare the two solutions in terms of GWAS SNPsubset a solution named S1 and another solution namedS2 comparing S1 and S2 only have two consequences oneresult is one solution dominates the other another result isS1 does not dominate S2 in turn the solution S2 does notdominate S1 Based on the mind of Pareto optimality weconsider S1 dominates S2 if they satisfy the following twoconditions The first condition is the value of 119891119890(S1) is nothigher than 119891119890(S2) for those two-objective functions Thesecond condition is the objective function119891119890(S1) is lower than119891119890(S2) for at least one-objective function The function 119891119890denotes the objective function modified AIC score objectivefunction and explain score objective function The 119890 equal toone denotes the first objective function the 119890 equal to twodenotes the second objective function If solutions S1 and S2satisfy the above two conditions we say solution S1 is a non-dominated solution in turn we say solution S2 is a dominatedsolution Based on above Pareto optimality approach andtwo-objective functions all solutions can be divided into twokinds one is nondominated set and another is dominated setFinally nondominated sets contain many solutions and allthe solutions from our proposedmethod with respect to two-objective functions now our goal is to find a nondominatedset which is the best under certain conditions

Next we will use the judgment rule mentioned earlier tosort the solutions of nondominated sets to find the optimalnondominated set Specifically in the first case 1198911(S2) islarger than 1198911(S1) at the same time 119891119890(S2) is larger than1198912(S1) In the second case 1198911(S2) equals 1198911(S1) at the sametime 1198912(S2) is larger than 1198912(S1) In the third case 1198911(S2) islarger than 1198911(S1) at the same time 1198912(S2) equals 1198912(S1)

24 Fisher Exact Test for Experimental Results Fisher exacttest is used in contingency tables to get a statistical signifi-cance [36ndash38] Although in practice it is used in small sizesample it is can also be used in all sample sizes Ronald Fisherfirst proposed this method and Fisher exact test is one kindof exact tests

In terms of our GWAS datasets research article on thebasis of unified framework which contains fast adaptive antcolony optimization (ACO) algorithm Akaike InformationCriterion (AIC) score explain score and Pareto optimalitywe can obtain the final result which is a nondominated SNPset in this section wewill use Fisher exact test to exhaustively

Complexity 5

search for the epistasis effect Fisher exact test is based onhypergeometric distribution the 119875 value in the Fisher exacttest is accurate for all individual samples Fisher exact test isused on the basis of contingency table The null hypothesis isthat the identified SNP subset and disease are not associatedThe alternative hypothesis is that SNP subset affects theexpression of the disease when the Fisher exact testrsquos 119875 valueis significant when 119875 value is less than predetermined valuesuch as 005 or smaller value Our proposed method willidentify significance SNP subsets

25 Power Test In previous sectionwe introduce each part ofour proposed fast adaptive ant colony optimization algorithmfor detecting SNP epistasis Our proposed unified frameworkcontains fast adaptive ant colony optimization algorithmAkaike Information Criterion (AIC) score explain scorePareto optimality and modified Fisher exact test In thissection we introduce how to verify the significance of theresults We construct 100 datasets according to the sameparametersThenweuse the traditional power test tomeasurethe effect of methods The power test is defined as follows

Power = |119878119863|100 (8)

where |119878119863| denotes the number of disease related datasetswhich were correctly selected from 100 datasets Only usingthe single test criterion may not clearly show the qualityof results We use precision recall standard to measure truepositive rate and false positive rate Precision recall criteriahave been widely used in classification model evaluationmodel [39 40] In pattern recognition and informationretrieval with binary classification precision also calledpositive predictive value is the fraction of retrieved instancesthat are relevant while recall also known as sensitivity isthe fraction of relevant instances that are retrieved [26] Bothprecision and recall are therefore based on an understandingand measure of relevance We use precision recall criteriato determine whether the classification results are good orbad The precision recall criteria can avoid the imbalanceproblem of precision recall numbers In our research thenumber of precision and recall always differs greatly In termsof the SNP epistasis research precision is also known aspositive predictive value equivalent to the true disease relatedSNP subsets recall is also known as sensitivity or negativeequivalent to the true disease unrelated SNP subsets If weuse only one judgment criterion thus false positive ratesingle indicator cannotmake the real result clearWe use falsepositive rate and true positive rate to measure the real resultThis is why we use precision and recall We also use 1198651 score(also 119865 score or 119865 measure) to measure the precision recalltest accuracyThe precision and recall will be introduced nextwith confusion matrix (Figure 1)

recall = TPTP + FN

precision = TPTP + FP

1198651 =precision sdot recallprecision + recall

(9)

Predicted classAssociated Nonassociated

True

clas

s Associated True positive(TP)

False negative(FN)

Nonassociated False positive(FP)

True negative(TN)

Figure 1 Precision recall explanation matrix

The precision also known as specificity denotes truepositive number ratio in the result through the number oftrue positives divided by the sum of true positive number andfalse positive number precision is often used to report falsepositive rate of an algorithmrsquos false positive rate The recallalso known as sensitivity denotes true positive ration in thesum of true positives and false negative In terms of SNPsselection problem the larger the recall value is the largerthe number of real true disease-related SNP combinationscan be found Simultaneously the larger the precision valuethe larger the number of real true disease-related SNPcombinations account for a high proportion of the identifiedSNP combinations The criterion 119865measure is the harmonicmean of precision and recall which is a synthesized measurecombining both precision and recall [41]

3 Simulation Experiments

31 Compared with One-Objective Function In this sectionwe use simulation data to compare our proposed methodwith other existing methods In order to avoid data favorcaused by the model we adopt BEAM package to generatesimulation datasets [17] Data was simulated following threegenetic models (1) additive model (2) epistatic interactionswith multiplicative effects and (3) epistatic interactions withthreshold effects In order to introduce our experiments theadditive model is referred to as ADDME The model aboutepistatic interactions with multiplicative effects is referred toas EIME The epistatic interactions with threshold effects arereferred to as EITEME In the next section we will use theshort name to indicate the corresponding data model

Because ourmethod is two-objective-based SNP epistasissearch method first we compared our proposed methodwith existing single objective-based exhaustive SNP epis-tasis search method to demonstrate the effectiveness oftwo-objective function SNP epistasis subset search methodSecond we compare our proposed method with recentlyproposed method BEAM [17] generic ACO algorithm andAntEpiSeeker [16] In the one-objective function SNP epis-tasis search method the objective function is used to scoreevery SNP combinations in general the score for everySNP combination is not the same Based on the nature ofthe method low score indicates the association betweenSNP combination and disease is relatively small high scoreindicates the association between SNP combination anddisease is relatively large Then the one-objective functionranks all SNP combinations based on the scoresHowever thetwo-objective-based SNP epistasis search method is to find aset of nondominated results and every nondominated SNP

6 Complexity

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10 r

2= 07 r

2= 10

r2= 07 r

2= 10

Figure 2 Power test comparisons between one-objective and two-objective methods on three different model with MAF value 01 02 and05

epistasis resultsrsquo score is the same To ensure fairness for theone-objective function we collect the same number as two-objective-based SNP epistasis search method from the top ofone-objective-based SNP rank The comparing results showthat the two-objective-based SNP epistasis search method isbetter than one-objective-based SNP epistasis search methodin three simulation data models In terms of two singleobjective-based SNP epistasis search methods the resultsof one-objective-based SNP epistasis search methods aresimilar with the other one-objective-based SNP epistasissearch methods The simulation data experiment resultsshow the effectiveness of two-objective-based SNP epistasissearch method and the poor experimental results showthe insufficiency of one-objective functions The experimentresults are shown in Figure 2 The abscissa of Figure 2 isminor allele frequency (MAF) which is assigned 01 02 and05We generate the simulate dataset and study the parametersetting following many previous studies [17 42ndash44] For eachsimulate dataset of parameter combination we generated 100datasets which contain 2000 experimental samples (1000case samples and 1000 control samples) and 1000 SNPs weresimulated We evaluate the algorithm performance throughcalculating the ratio of real number identified following the

significance level 001 which is adjusted after Bonferronicorrection The parameter 120582 was set to 03 for ADDME and02 for EIME and EITEME The parameter range of linkagedisequilibrium between SNPs is 1199032 from 07 to 1

32 Compared with Benchmark Methods After comparingwith single objective function We compare our proposedmethod with existing method The performance of our pro-posedmethod was evaluated by comparison with benchmarkmethods [45] In many previous studies the authors havealready discussed the parameter settings problem In thissection we set the parameters according to the existing strat-egy We evaluated performance of FAACOSE by comparingwith two recent methods BEAM generic ACO algorithmand the AntEpiSeeker we use BEAM package and previousparameter strategy to generate simulate dataset Be awareof the fact that the generic ACO algorithm could not selectlarger size SNP set We use simulated dataset introduced inSection 31 We evaluate the algorithm performance throughcalculating the ratio of real number identified following thesignificance level 001 which is adjusted after Bonferroni cor-rectionWe generate simulate datasets following three geneticmodels ADDME EIME and EITEME Other parameters

Complexity 7

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10

r2= 07 r

2= 10

r2= 07 r

2= 10

Figure 3 Power comparisons between existing methods and FAACOSE on three models

for data simulation were the effective size 120582 a measure ofmarginal effects as defined by Marchini et al [42] linkagedisequilibrium between SNPs measured by 1199032 and minorallele frequencies (MAFs) 120582 was set to 03 for ADDME and02 for EIME and EITEME For 1199032 two values (07 and 10)were used for each model For MAFs three values (01 02and 05) were considered The parameters for BEAM wereset as default The parameter settings for AntEpiSeeker werelarge dataset size = 6 small dataset size = 3 count large =150 count small = 300 epistasis model = 2 ant count = 1000120572 = 1 120588 = 005 and 1205910 = 100 (also available in the softwarepackage documentation of AntEpiSeeker)The parameters ofthe generic ACO algorithm were set as ant count = 1000120572 = 1 120588 = 005 1205910 = 100 count (number of iterations) = 900and epistasis model = 2 The comparison of detection powerfor BEAM genetic ACO algorithm and the AntEpiSeekeris presented in Figure 3 The results show that FAACOSEoutperforms BEAM and the generic ACO in all parametersettings and is superior to AntEpiSeeker in most parametersettings

In this section we compare our proposed method withbenchmark methods First we use power test to detect howmany real SNP subsets can be found with our proposed

method Second we use precision recall and 1198651 score toevaluate the results Precision denotes how many right SNPsubsets in the total final identified SNP subsets Recalldenotes the number of right SNP subsets that are identified1198651 score is an indicator used in statistics to measure theaccuracy of two classification models It takes into accountthe precision and recall of the classificationmodel simultane-ously 1198651 score can be seen as a weighted average of precisionand recall its maximum is 1 and minimum is 0l We showthe results of FAACOSE with other methods on 1199032 = 07 andMAF = 02 in Table 1

The 1198651 score of FAACOSE is better than other methodsWe run the same experiment on datasets with differentparameter combination In all eighteen datasets FAACOSEhas the highest 1198651 score in fifteen of them In real GWASdataset experiment the sample size of real dataset is hugeThe efficiency of the method is also to be considered Theexperimental results indicate that our proposed method ismore effective method in real GWAS dataset AntEpiSeekeris the most efficient algorithm among three methods Indifferent data samples we compare run time of AntEpiSeekerand FAACOSE And averaging the results FAACOSE is faster30 than AntEpiSeeker

8 Complexity

Table 1 1198651 score comparison between FAACOSE and other meth-ods

Model Method Recall Precision 1198651 score

ADDME

BEAM 029 015 020gACO 045 036 040

AntEpiSeeker 06 055 057FAACOSE 082 074 078

EIME

BEAM 03 045 036gACO 035 032 033

AntEpiSeeker 034 056 042FAACOSE 09 082 086

EITEME

BEAM 01 014 012gACO 015 020 017

AntEpiSeeker 054 046 050FAACOSE 065 062 063

4 Application to Real SNP Dataset

Late-Onset Alzheimerrsquos Disease (LOAD) is themost frequentform of Alzheimerrsquos disease which is frequently identifiedin people older than 65 years the LOAD or AD is a kindof chronic neurodegenerative diseases which is frequentlynot obvious in the onset of the disease and slowly changesdementia over time It is the cause of 60 to 70 of casesof dementia The most common early symptom is difficultyin remembering recent events (short-term memory loss)As the disease advances symptoms can include problemswith language disorientation (including easily getting lost)mood swings loss of motivation not managing self-care andbehavioural issues LOAD is a multifactor genetic disease itsetiology and pathogenesis have not yet been fully understoodThe apolipoprotein (APOE) gene is a definite risk factor forLOAD The APOE gene has three forms The 1205762 1205763 and1205764 the effect of 1205762 is positive 1205762 can effectively preventthe occurrence of the disease There has been researchreport that genetic variant 1205764 has induced effect on diseaseBetween 40 and 80 of people with AD possess at least oneAPOE 1205764 allele [46] Previous studies have reported somesignificant SNPs in the field of Genome-Wide AssociationStudies [47] Reference [47] reported that 10 SNPs in thearea of GAB2 gene have an epistasis effect with APOE e4in relation to Late-Onset Alzheimerrsquos Disease We appliedour proposed method to the LOAD GWAS dataset fromwebsite httpswwwtgenorg [47] After data preprocessingthe real biological dataset contains 1368 samples [48 49] Ofthese 836 samples were identified case studies the remaining532 samples were normal sample [50 51] Each sample ofreal biological dataset contains 309316 SNPs with genotypeinformation APOE status and LOAD status [52] For thenext calculation we code the APOE gene state with a binaryvariable the value 1 represents the 1205764 variant and in turn thevalue 0 represents the other three variants [53] An SNP locuswas coded as a quaternary variable considering the missing

Table 2 The number of selected SNPs of FAACOSE in LOADdataset

SNP rsrs7756992 rs611154 rs191840 rs7294919rs1887922 rs304900 rs1999764 rs1385600rs2373115 rs7101429 rs609812 rs613375rs1007837 rs2510038 rs4945261 rs10793294rs520227 rs191740 rs7924284 rs829465rs602106 rs7174511 rs606889 rs602192

state The high potential LOAD disease related SNP is shownin Table 2

5 Discussions

In this paper we proposed a novel ant colony optimizationbased fast search method for the discovery of epistasis inter-actions in large scale real GWAS dataset FAACOSEwas eval-uated through comparison with existing three approaches onboth simulated and real datasets FAACOSE which adopts afast adaptive optimization procedure is amodified algorithmderived from the generic ACO And with two-objectivefunction to demonstrate the advantages of fast adaptiveant colony optimization algorithm we also compared theperformance of the FAACOSE with that of the generic ACO

In future studies we intend to findmore powerful model-ing approaches ant colony optimization algorithmwith fasterconvergence objective functions which can better measuredata structure of GWAS dataset more efficient optimal SNPsubset search and identification strategies that can be com-bined and flexibly embedded into our SNP epistasis searchframework to find more accurate SNP subset With the rapiddevelopment of bioinformatics more and more biologicalinformation related to disease is identified More and morestudies will consider prior knowledge An important futureresearch direction is that we will try to apply expert priorknowledge to GWAS dataset with our proposed method thatis the fast adaptive ant colony optimization algorithm fordetecting SNP epistasis Expert prior knowledge can improvethe power and efficiency of epistasis detection

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is partly supported by National NaturalScience Foundation of China (Grant nos 6152010600631571364 61732012 61532008 U1611265 61672382 6140233461472280 61472173 61572447 61672203 61472282 and61373098) and China Postdoctoral Science Foundation(Grant nos 2014M561513 2015M580352 2017M611619 and2016M601646) Guangxi Bagui Scholars Program SpecialFund

Complexity 9

References

[1] J N Hirschhorn and M J Daly ldquoGenome-wide associationstudies for common diseases and complex traitsrdquo Nature Re-views Genetics vol 6 no 2 pp 95ndash108 2005

[2] B N Howie P Donnelly and J Marchini ldquoA flexible andaccurate genotype imputation method for the next generationof genome-wide association studiesrdquo PLoS Genetics vol 5 no6 Article ID e1000529 2009

[3] T AManolio F S Collins N J Cox et al ldquoFinding themissingheritability of complex diseasesrdquo Nature vol 461 no 7265 pp747ndash753 2009

[4] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[5] B Stubbs D Vancampfort M De Hert and A J MitchellldquoThe prevalence and predictors of type two diabetes mellitus inpeoplewith schizophrenia a systematic review and comparativemeta-analysisrdquo Acta Psychiatrica Scandinavica vol 132 no 2pp 144ndash157 2015

[6] K P Liao ldquoCardiovascular disease in patients with rheumatoidarthritisrdquo Trends in Cardiovascular Medicine vol 27 no 2 pp136ndash140 2017

[7] YMao N R London LMa D Dvorkin and Y Da ldquoDetectionof SNP epistasis effects of quantitative traits using an extendedKempthorne modelrdquo Physiological Genomics vol 28 no 1 pp46ndash52 2006

[8] W Zhang J Zhu E E Schadt and J S Liu ldquoA Bayesianpartition method for detecting pleiotropic and epistatic eQTLmodulesrdquo PLoS Computational Biology vol 6 no 1 Article IDe1000642 2010

[9] M Kang C Zhang H-W Chun C Ding C Liu and JGao ldquoEQTL epistasis Detecting epistatic effects and inferringhierarchical relationships of genes in biological pathwaysrdquoBioinformatics vol 31 no 5 pp 656ndash664 2015

[10] H Lin D Chen P Huang et al ldquoSNP interaction patternidentifier (SIPI) an intensive search for SNPndashSNP interactionpatternsrdquo Bioinformatics 2016

[11] R L Prentice and L Qi ldquoAspects of the design and analysisof high-dimensional SNP studies for disease risk estimationrdquoBiostatistics vol 7 no 3 pp 339ndash354 2006

[12] S-P Deng L Zhu and D-S Huang ldquoMining the bladdercancer-associated genes by an integrated strategy for the con-struction and analysis of differential co-expression networksrdquoBMC Genomics vol 16 no 3 article no S4 2015

[13] S-P Deng and D-S Huang ldquoSFAPS An R package forstructurefunction analysis of protein sequences based oninformational spectrum methodrdquo Methods vol 69 no 3 pp207ndash212 2014

[14] J H Moore J M Lamb N J Brown and D E Vaughan ldquoAcomparison of combinatorial partitioning and linear regressionfor the detection of epistatic effects of the ACE ID and PAI-1 4G5G polymorphisms on plasma PAI-1 Levelsrdquo ClinicalGenetics vol 62 no 1 pp 74ndash79 2002

[15] B M Michael R E Neapolitan X Jiang and V ShyamldquoLearning genetic epistasis using Bayesian network scoringcriteriardquo BMC Bioinformatics vol 12 no 1 89 pages 2011

[16] Y Wang X Liu K Robbins and R Rekaya ldquoAntEpiSeekerdetecting epistatic interactions for case-control studies using atwo-stage ant colony optimization algorithmrdquo BMC ResearchNotes vol 3 article 117 2010

[17] Y Zhang and J S Liu ldquoBayesian inference of epistatic interac-tions in case-control studiesrdquo Nature Genetics vol 39 no 9 pp1167ndash1173 2007

[18] M Dorigo M Birattari and C Blum ldquoAnt colony optimizationand swarm intelligencerdquo SpringerVerlag vol 5217 no 8 pp 767ndash771 2004

[19] T Stutzle M Lopez-Ibanez P Pellegrini et al ldquoParameteradaptation in ant colony optimizationrdquoAutonomous Search vol9783642214349 pp 191ndash215 2012

[20] C Blum and M Sampels ldquoAn ant colony optimization algo-rithm for shop scheduling problemsrdquo Journal of MathematicalModelling and Algorithms vol 3 no 3 pp 285ndash308 2004

[21] R Musa J-P Arnaout and H Jung ldquoAnt colony optimizationalgorithm to solve for the transportation problem of cross-docking networkrdquo Computers and Industrial Engineering vol59 no 1 pp 85ndash92 2010

[22] G N Varela and M C Sinclair ldquoAnt colony optimisation forvirtual-wavelength-path routing and wavelength allocationrdquo inProceedings of the 1999 Congress on Evolutionary Computation(CEC rsquo99) pp 1809ndash1816 Washington DC USA July 1999

[23] K M Sim andW H Sun ldquoAnt colony optimization for routingand load-balancing survey and new directionsrdquo SystemsManampCybernetics Part A Systems Humans IEEE Transactions on vol33 no 5 pp 560ndash572 2003

[24] S-H Ngo X Jiang and S Horiguchi ldquoAdaptive routing andwavelength assignment using ant-based algorithmrdquo in Proceed-ings of the 2004 12th IEEE International Conference onNetworksICON 2004 - Unity in Diversity pp 482ndash486 November 2004

[25] S I Vrieze ldquoModel selection and psychological theory adiscussion of the differences between the Akaike informationcriterion (AIC) and the Bayesian information criterion (BIC)rdquoPsychological Methods vol 17 no 2 pp 228ndash243 2012

[26] D-S Huang and J-X Du ldquoA constructive hybrid structureoptimization methodology for radial basis probabilistic neuralnetworksrdquo IEEE Transactions onNeural Networks vol 19 no 12pp 2099ndash2115 2008

[27] B V North D Curtis and P C Sham ldquoApplication of logisticregression to case-control association studies involving twocausative locirdquo Human Heredity vol 59 no 2 pp 79ndash87 2005

[28] P-J Jing and H-B Shen ldquoMACOED A multi-objective antcolony optimization algorithm for SNP epistasis detection ingenome-wide association studiesrdquo Bioinformatics vol 31 no 5pp 634ndash641 2015

[29] N Ryman ldquoCHIFISH A computer program testing for geneticheterogeneity at multiple loci using chi-square and Fisherrsquosexact testrdquo Molecular Ecology Notes vol 6 no 1 pp 285ndash2872006

[30] C R Mehta and N R Patel ldquoA network algorithm for perform-ing Fisherrsquos exact test in r times c contingency tablesrdquo Journal of theAmerican Statistical Association vol 78 no 382 pp 427ndash4341983

[31] B Sobrino M Brion and A Carracedo ldquoSNPs in forensicgenetics A review on SNP typing methodologiesrdquo ForensicScience International vol 154 no 2-3 pp 181ndash194 2005

[32] O Shoval H Sheftel G Shinar et al ldquoEvolutionary trade-offs pareto optimality and the geometry of phenotype spacerdquoScience vol 336 no 6085 pp 1157ndash1160 2012

[33] D-SHuang andW Jiang ldquoA general CPL-AdSmethodology forfixing dynamic parameters in dual environmentsrdquo IEEE Trans-actions on Systems Man and Cybernetics Part B Cyberneticsvol 42 no 5 pp 1489ndash1500 2012

10 Complexity

[34] L Zhu W-L Guo S-P Deng and D-S Huang ldquoChIP-PITenhancing the analysis of chip-seq data using convex-relaxedpair-wise interaction tensor decompositionrdquo IEEEACM Trans-actions onComputational Biology and Bioinformatics vol 13 no1 pp 55ndash63 2016

[35] C Angione G Carapezza J Costanza P Lio and G NicosialdquoPareto optimality in organelle energy metabolism analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 4 pp 1032ndash1044 2013

[36] R A Fisher ldquoOn the Interpretation of 1205942 from ContingencyTables and the Calculation of Prdquo Journal of the Royal StatisticalSociety vol 85 no 1 p 87 1922

[37] A Agresti ldquoA survey of exact inference for contingency tablesrdquoStatistical Science vol 7 no 1 pp 131ndash153 1992

[38] B Wenzheng C Yuehui and W Dong ldquoPrediction of proteinstructure classes with flexible neural treerdquo Bio-Medical Materi-als and Engineering vol 24 no 6 pp 3797ndash3806 2014

[39] L Zhu Z-H You D-S Huang and B Wang ldquot-LSE anovel robust geometric approach for modeling protein-proteininteraction networksrdquo PLoS ONE vol 8 no 4 Article IDe58368 2013

[40] C-H Zheng L Zhang V T-Y Ng C K Shiu and D-SHuang ldquoMolecular pattern discovery based on penalized ma-trix decompositionrdquo IEEEACMTransactions onComputationalBiology and Bioinformatics vol 8 no 6 pp 1592ndash1603 2011

[41] D-S Huang and H-J Yu ldquoNormalized feature vectors a novelalignment-free sequence comparison method based on thenumbers of adjacent amino acidsrdquo IEEEACM Transactions onComputational Biology and Bioinformatics vol 10 no 2 pp457ndash467 2013

[42] J Marchini P Donnelly and L R Cardon ldquoGenome-widestrategies for detecting multiple loci that influence complexdiseasesrdquo Nature Genetics vol 37 no 4 pp 413ndash417 2005

[43] R JiangW Tang XWu andW Fu ldquoA random forest approachto the detection of epistatic interactions in case-control studiesrdquoBMC Bioinformatics vol 10 no 1 article S65 2009

[44] J Kruppa A Ziegler and I R Konig ldquoRisk estimation and riskprediction using machine-learning methodsrdquoHuman Geneticsvol 131 no 10 pp 1639ndash1654 2012

[45] D-S Huang and C-H Zheng ldquoIndependent componentanalysis-based penalized discriminantmethod for tumor classi-fication using gene expression datardquo Bioinformatics vol 22 no15 pp 1855ndash1862 2006

[46] RWMahley K HWeisgraber and Y Huang ldquoApolipoproteinE4 a causative factor and therapeutic target in neuropathologyincluding Alzheimerrsquos diseaserdquo Proceedings of the National Aca-demy of Sciences of the United States of America vol 103 no 15pp 5644ndash5651 2006

[47] E M Reiman J A Webster A J Myers et al ldquoGAB2 allelesmodify Alzheimerrsquos Risk in APOE 1205764 carriersrdquo Neuron vol 54no 5 pp 713ndash720 2007

[48] C-H Zheng D-S Huang L Zhang and X-Z Kong ldquoTumorclustering using nonnegative matrix factorization with gene se-lectionrdquo IEEE Transactions on Information Technology in Bio-medicine vol 13 no 4 pp 599ndash607 2009

[49] S-P Deng L Zhu and D-S Huang ldquoPredicting hub genesassociated with cervical cancer through gene co-expressionnetworksrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 13 no 1 pp 27ndash35 2016

[50] L Zhu S-P Deng and D-S Huang ldquoA two-stage geometricmethod for pruning unreliable links in protein-protein net-worksrdquo IEEE Transactions on Nanobioscience vol 14 no 5 pp528ndash534 2015

[51] D-S Huang L Zhang KHan S Deng K Yang andH ZhangldquoPrediction of protein-protein interactions based on protein-protein correlation using least squares regressionrdquo CurrentProtein and Peptide Science vol 15 no 6 pp 553ndash560 2014

[52] D-S Huang Systematic Theory of Neural Networks for Pat-ternRecognition Publishing House of Electronic Industry of ChinaMay 1996

[53] D-S Huang ldquoRadial basis probabilistic neural networksmodeland applicationrdquo International Journal of Pattern Recognitionand Artificial Intelligence vol 13 no 7 pp 1083ndash1101 1999

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

2 Complexity

epistasis effect from the perspective of the overall dataMooreet al [14] applied regression method to identify the relation-ship between gene expression and epistasis effect Michaelet al [15] applied Bayesian networks to identify the epistasiseffect network from the original SNPs data Although thesemethods solved some problems they still did not showsignificant effects with the large scale Genome-Wide Asso-ciation Study datasets owing to the same ldquohigh-dimensionalsmall sample size problemrdquo With the rapid development ofmultiobjective optimization method and machine learningdiscipline ant colony optimization (ACO) algorithm wasapplied to epistasis research Wang et al [16] proposedAntEpiSeeker AntEpiSeeker combines heuristic search withthe ant colony optimization to identify SNPs which dominateother SNPs Experimental results on real rheumatoid arthritisdataset show that AntEpiSeeker is better than other methodsThe drawback of this method is that other methods showdifferent performance on different disease models Zhangand Liu [17] developed the Bayesian inference method whichidentifies the epistatic interactions in case-control studiesHowever the BEAM method needs a lot of time in GWASdataset In this paper we extend SNP epistasis study to afast adaptive ant colony optimization algorithm for detectingSNP epistasis We search SNP epistasis with two-objectivefunctions and fast adaptive ant colony optimization

The experiments on several simulated datasets show thegood performance of our method We also compare ourmethod with the benchmark methods including BEAMgeneric ACO and AntEpiSeeker Experimental results showthat our method has better performance in GWAS datasetscontaining epistasis effect among SNPs

2 Methods

21 Ant Colony Optimization In the research of artificialintelligence and large scale problem solving the ant colonyoptimization (ACO) algorithm is inspired by the ants foodsearch behaviour in natureAssume that the food search pathsconstitute a graph the ant colony optimization algorithmcan reduce time of search paths through graphs [18] Thisalgorithm with other ant colony optimization algorithms iskind of swarm intelligence methods and it is member ofmetaheuristic optimizations Marco Dorigo proposed the antcolony optimization algorithm in 1992 in his PhD thesisIn the GWAS datasets the datasets often contain tens ofhundreds to millions of SNPs It is not feasible to identifythe relationship of every pair of SNPs within an acceptabletime ACO algorithmwas used here to reduce the complexityof exhaustive search In kingdom of insects in the processof finding food ants look like they are walking randomlyand in the back and forth path of searching for food theants will leave pheromones on the path If the path is foundby other ants other ants tend to follow the path but notwalk randomly going further if they find food throughthis path they will also leave pheromones the pheromonevalue on this path is enhanced Subject to other factors innature pheromone value starts to evaporate and the pathrsquosattractive strength starts to decrease The longer the pathis the more the time the ants are looking for food As a

comparison the time the ants take to walk through theshort path is greatly shortened and pheromone values willbe larger on shorter paths than longer paths Pheromoneevaporation results in dynamic changes in the path Pathdynamic changes can avoid the convergence of solutions toa locally optimal solution If there is no pheromone valuesevaporation the food search path selected by first ants wouldtend to be the only path or the most attractive path Thisphenomenon will lead to limitation of the solution spaceThe mechanism of pheromone evaporation in ant colonyis unclear but pheromone evaporation is a very importantapplication in artificial intelligence systems Though the antcolony optimization algorithm has achieved great success inapplication [19ndash21]

The travelling salesperson problem (TSP) is a problemwith some cities and physical distances between each pairof cities The question is what is the shortest possible pathwhere travelling salesperson visits each city once and finallyreturns to the origin city Suppose there are 119899 cities thereare (119899 minus 1)2 solutions to the problemThe feasible solutionswill increase exponentiallywhen the number of city increasesmaking the computation impractical Obviously it is an NP-hard problem of combinatorial optimizations

Suppose that 119898 ants are randomly placed in 119899 cities thekth ant in the 119894th city the probability if ant chooses the nextcity 119895 is

119901119896119894119895=

120591120572119894119895 (119905) 120578120573119894119895 (119905)

sum119900isincandidate119896 120591120572119894119900 (119905) 120578120573119894119900 (119905) 119895 isin candidate119896

0 otherwise

120578119894119895 (119905) =1119889119894119895

(1)

where 120591119894119895(119905) indicates the surplus information on path 119894119895 inmoment t 120578119894119895(119905) indicates the heuristic function 119889119894119895 indicatesthe physical distance between city 119894 and city j tabu119896 indicatesthe cities set which indicates ant 119896 has visited candidate119896indicates the set of cities which ant 119896 can visit next

Over time after 119899moments the ants complete a cycle theinformation of each path should be adjusted according to

120591119894119895 (119905 + 119899) = (1 minus 120588) 120591119894119895 (119905) + Δ120591119894119895

Δ120591119894119895 =119898

sum119896=1

Δ120591119896119894119895(2)

where Δ120591119894119895 indicates information increment of path 119894119895 afterthis cycle

Δ120591119894119895 =

119876119897119896 119894119895 isin 1198711198960 otherwise

(3)

where 119871119896 indicates ant krsquos paths in this cycle 119897119896 indicates thepath length of ant 119896 in this cycle The parameters neededto be determined are 120572 120573 120588119898 119876 the number of ants isless than or equal to city number Q is a large suitable

Complexity 3

number ACO is always used in large scale data problemsHowever slowness is still a bottleneck in the application ofthe ant colony algorithm for large scale search optimizationproblems Pheromone update strategy is one of the keys todetermine the convergence rate

In the process of applying ant colony optimization tospecific problems the search space should be as large aspossible At the same time ACO should consider timeefficiency ACO should balance the optimal solutions andsolve speed On the basis of previous studies [22ndash24]We onlyconsider pheromone evaporation factor 120588 and pheromoneimportance factor 120572 In (2) 120588 is used to balance the effects ofold pheromone value and current pheromone value When 120588is too small the residual pheromone value is too much andleads to local minimum solution We adopt adaptive 120588 whenthe algorithm does not improve the current optimal solutionwithin 119899 iterations

120588 (119905 + 119899) =

119892120588 (119905) 120588 (119905) le 120588max

120588max otherwise(4)

where 120588max equals 085 in practice 119892 equals 102 as tuneparameter When the pheromone value reaches the criticalvalue the pheromone importance factor begins to play a roleWith the increase of pheromone importance factor 120572 thealgorithm will jump out of the local optimal solution and hasability to search for global optimal solution

120572 (119905 + 119899) =

1198921120572 (119905) 120572 (119905) le 120572max

120572max otherwise(5)

where 1198921 is a constant larger than one and 120572max is less than orequal to five In the process of calculation first we follow thestandard ant colony optimization algorithm for119873 iterations119873 is predefined number If the current optimal solution is notimproved after119873 iterations update the parameters accordingto formulas (4) and (5) Then update all pheromone valueaccording to (2)

Given pheromone values and transfer rules we can usethe ant colony optimization algorithm to find a group ofSNPs which affect the disease Assume there are 119875 SNPs inthe global Genome-Wide Association Studies dataset we canconstruct a p-dimensional symmetricmatrix119872 to store everyantrsquos pheromone value The element119898119894119895 of matrix119872 denotesthe interaction which is related to disease between 119894th SNPand jth SNP At the beginning of our method every elementof matrix 119872 is assigned to a constant value 1198980 equivalentvalue shows the epistasis in every pair of SNPs and there isequal possibility relationship between the SNPs and disease

At the final pheromone iteration the ACO algorithmwill obtain the optimal solutions through forward selectionstrategy The advantage of ACO algorithm in this paper isthat the result contains nondominated solutions which havethe potentially equivalent possibility and potentially highestrelated strength with disease and omit dominated solutions

The disadvantages of traditional ant colony optimizationalgorithm are long search time and tendency to fall intothe local optimal solution The drawback of this working

mode is that the current pheromone evaporation factorand pheromone importance factor are predefined As animproved strategy we extended the ldquodynamic adaptive strat-egyrdquo to ant colony optimization The advantage of thisstrategy is the fast convergence rate and searching for globaloptimization solution Compared with traditional ACO thenew strategy can provide more accurate result

22 Two-Objective Function Optimization The results of antcolony optimization need to be evaluated We combine two-objective methods to assess the final epistasis results Ingeneral one of two-objective functions combines AkaikeInformation Criterion (AIC) score and logistic regressionfunction to measure relationship between phenotypic traitand genotype data Akaike Information Criterion indicatesthe effectiveness and complexity of the model [25 26] Inour method on the basis of the standard logistic regressionfollowing the North et al [27] strategy we use ADDINTlogistic regression model to search the relationship betweendisease and SNP nodes The second objective functionuses frequency measurement based on mutual informationtheory to model the relationship between genotype dataand phenotypic trait from the perspective of informationtheory The second objective function used to represent theselected SNP subsets can explain how much informationis about the disease trait Our proposed method obtainsinformation from data rather than a lot of priori informationThe above two-objective functions are designed from thedifferent perspective to measure the quality of the searchresults and the simulation data experiment results show thatour two-objective functions have a better performance thanother methods on simulated and real biological datasets

In order to avoid the bad impact of high dimension smallsize sample problem the identification of disease-associatedSNPs is known as a heuristic optimization problem In ourproposed method proposed method yields optimal solu-tions which is nondominated solutions the proposed two-objective functions method actually is kind of multiobjectiveoptimization the proposedmethoduses ant colony optimiza-tion to search for optimal solution [28]

Our proposed fast adaptiveACO framework contains twostages In the first stage we use modified ACO optimizationalgorithm with two-objective functions to search for non-dominated SNP subset After generating the nondominatedSNP subset we apply Fisher exact test [29 30] to the datasetcontaining nondominated SNP generated in the algorithmfirst stage The Fisher exact test will be used to identify therelationship between disease and SNPs

221 AIC Score The Akaike Information Criterion (AIC) isused to measure quality of dataset statistical models AIC isfrom information theory and it estimates loss of informationwhen a statistical model is used to express the data generationprocess The mechanism of Akaike Information Criterion isthat it deals with the trade-off between the goodness of fitof the model and the complexity of the model Based onthe nature of the AIC we construct AIC model from theperspective of GWAS dataset The goal of our method isto measure the relationship between the genotype data of

4 Complexity

genome and phenotype disease trait Logistic regression iswidely used to quantitatively analyze the correlation betweendependent variable and independent variable Based onabove methods we construct AIC score model containinglogistic regression and gradient penalty function Logisticregression can compute the maximized log-likelihood of themodel k is used to express the number of free parametersAIC score deals with the trade-off between the fitness effectof the model and the complexity of the model We follow Jingand Shen [28] strategy

AICscore = 2119896 minus 2 log lik (6)

where 119896 denotes the number of free parameters

222 Explanation Score In GWAS research the relationshipbetween two loci and disease in SNP research each locus hasthree values 0 1 and 2 0 means major allele homozygous 1means heterozygote and 2 means minor allele homozygous[31] For two loci there are nine cases of their combinationthe disease related SNP locus often changes when the diseaseoccurs In the case of double locus combination 119909119894means thenumber of 119894th combinations of two SNP lociY means case orcontrol state 1199101 means state case and 1199102 means state controlThe potential interrelationships of two discrete randomvariables 119883 and 119884 are defined as 119867(119883 119884) the relationshipbetween locus combination and disease ismeasured based onthe information of locus frequency 119867(119883 119884) is described asbelow

119867(119883 119884) =119868

sum119894=1

(100381610038161003816100381610038161199091198941199101 minus 119909119894119910210038161003816100381610038161003816) (7)

where 119868 means the total number of locus combinations Toavoid unbalanced sample the size affects score For exampleif data size of case is larger than control we extract thesame size of control data from case samples randomly Toavoid the impact of randomness we extract sample severaltimes and average the results The large value 119867 means thepotential association probability between disease and SNPs islarge Equation can also be applied to more than two locuscombinations We name this score explain score

23 Pareto Optimality for SNP Epistasis Detection Paretooptimality defines such a situation Pareto optimality isproposed to solve the following questions where it is impos-sible to make all objective function values of multiobjectiveoptimization optimal values [32 33] Pareto optimality isfirst applied to the area of income distribution and economyNow Pareto optimality has been extended to engineering andmultiobjective optimization research On the basis of previ-ous proposedmethods themodified ant colony optimizationalgorithm with first objective function and second objectivefunction the first objective function is AIC score with logis-tic regression and related parameters the second objectivefunction is explain score For the first objective functionsthe lower score of the objective function indicates the highpotential relationship between disease phenotype trait andSNPs [34] For the second objective functions the higherscore of the objective function indicates the high potential

relationship between the disease phenotype trait and SNPsThe target of fast two-stage ant colony optimization algorithmis to find the epistasis effect among SNPs and extract real SNPsubset with respect to the above proposed methods

In the real GWAS datasets an identified SNP subset mayperform the best compared with other method solutions interms of one-objective function but SNP subsetmay performpoorly in terms of another objective functionThus the targetis how to select better SNP subset with respect to both objec-tive functions In practical application rare subset performsbetter than other solutions while satisfying both conditionsThus for a framework with two-objective functions it ishard and even impossible to calculate the global optimalsolution On the basis of previous studies [28 34 35] weadopt Pareto optimality to find the practical optimal solutionWe first compare the two solutions in terms of GWAS SNPsubset a solution named S1 and another solution namedS2 comparing S1 and S2 only have two consequences oneresult is one solution dominates the other another result isS1 does not dominate S2 in turn the solution S2 does notdominate S1 Based on the mind of Pareto optimality weconsider S1 dominates S2 if they satisfy the following twoconditions The first condition is the value of 119891119890(S1) is nothigher than 119891119890(S2) for those two-objective functions Thesecond condition is the objective function119891119890(S1) is lower than119891119890(S2) for at least one-objective function The function 119891119890denotes the objective function modified AIC score objectivefunction and explain score objective function The 119890 equal toone denotes the first objective function the 119890 equal to twodenotes the second objective function If solutions S1 and S2satisfy the above two conditions we say solution S1 is a non-dominated solution in turn we say solution S2 is a dominatedsolution Based on above Pareto optimality approach andtwo-objective functions all solutions can be divided into twokinds one is nondominated set and another is dominated setFinally nondominated sets contain many solutions and allthe solutions from our proposedmethod with respect to two-objective functions now our goal is to find a nondominatedset which is the best under certain conditions

Next we will use the judgment rule mentioned earlier tosort the solutions of nondominated sets to find the optimalnondominated set Specifically in the first case 1198911(S2) islarger than 1198911(S1) at the same time 119891119890(S2) is larger than1198912(S1) In the second case 1198911(S2) equals 1198911(S1) at the sametime 1198912(S2) is larger than 1198912(S1) In the third case 1198911(S2) islarger than 1198911(S1) at the same time 1198912(S2) equals 1198912(S1)

24 Fisher Exact Test for Experimental Results Fisher exacttest is used in contingency tables to get a statistical signifi-cance [36ndash38] Although in practice it is used in small sizesample it is can also be used in all sample sizes Ronald Fisherfirst proposed this method and Fisher exact test is one kindof exact tests

In terms of our GWAS datasets research article on thebasis of unified framework which contains fast adaptive antcolony optimization (ACO) algorithm Akaike InformationCriterion (AIC) score explain score and Pareto optimalitywe can obtain the final result which is a nondominated SNPset in this section wewill use Fisher exact test to exhaustively

Complexity 5

search for the epistasis effect Fisher exact test is based onhypergeometric distribution the 119875 value in the Fisher exacttest is accurate for all individual samples Fisher exact test isused on the basis of contingency table The null hypothesis isthat the identified SNP subset and disease are not associatedThe alternative hypothesis is that SNP subset affects theexpression of the disease when the Fisher exact testrsquos 119875 valueis significant when 119875 value is less than predetermined valuesuch as 005 or smaller value Our proposed method willidentify significance SNP subsets

25 Power Test In previous sectionwe introduce each part ofour proposed fast adaptive ant colony optimization algorithmfor detecting SNP epistasis Our proposed unified frameworkcontains fast adaptive ant colony optimization algorithmAkaike Information Criterion (AIC) score explain scorePareto optimality and modified Fisher exact test In thissection we introduce how to verify the significance of theresults We construct 100 datasets according to the sameparametersThenweuse the traditional power test tomeasurethe effect of methods The power test is defined as follows

Power = |119878119863|100 (8)

where |119878119863| denotes the number of disease related datasetswhich were correctly selected from 100 datasets Only usingthe single test criterion may not clearly show the qualityof results We use precision recall standard to measure truepositive rate and false positive rate Precision recall criteriahave been widely used in classification model evaluationmodel [39 40] In pattern recognition and informationretrieval with binary classification precision also calledpositive predictive value is the fraction of retrieved instancesthat are relevant while recall also known as sensitivity isthe fraction of relevant instances that are retrieved [26] Bothprecision and recall are therefore based on an understandingand measure of relevance We use precision recall criteriato determine whether the classification results are good orbad The precision recall criteria can avoid the imbalanceproblem of precision recall numbers In our research thenumber of precision and recall always differs greatly In termsof the SNP epistasis research precision is also known aspositive predictive value equivalent to the true disease relatedSNP subsets recall is also known as sensitivity or negativeequivalent to the true disease unrelated SNP subsets If weuse only one judgment criterion thus false positive ratesingle indicator cannotmake the real result clearWe use falsepositive rate and true positive rate to measure the real resultThis is why we use precision and recall We also use 1198651 score(also 119865 score or 119865 measure) to measure the precision recalltest accuracyThe precision and recall will be introduced nextwith confusion matrix (Figure 1)

recall = TPTP + FN

precision = TPTP + FP

1198651 =precision sdot recallprecision + recall

(9)

Predicted classAssociated Nonassociated

True

clas

s Associated True positive(TP)

False negative(FN)

Nonassociated False positive(FP)

True negative(TN)

Figure 1 Precision recall explanation matrix

The precision also known as specificity denotes truepositive number ratio in the result through the number oftrue positives divided by the sum of true positive number andfalse positive number precision is often used to report falsepositive rate of an algorithmrsquos false positive rate The recallalso known as sensitivity denotes true positive ration in thesum of true positives and false negative In terms of SNPsselection problem the larger the recall value is the largerthe number of real true disease-related SNP combinationscan be found Simultaneously the larger the precision valuethe larger the number of real true disease-related SNPcombinations account for a high proportion of the identifiedSNP combinations The criterion 119865measure is the harmonicmean of precision and recall which is a synthesized measurecombining both precision and recall [41]

3 Simulation Experiments

31 Compared with One-Objective Function In this sectionwe use simulation data to compare our proposed methodwith other existing methods In order to avoid data favorcaused by the model we adopt BEAM package to generatesimulation datasets [17] Data was simulated following threegenetic models (1) additive model (2) epistatic interactionswith multiplicative effects and (3) epistatic interactions withthreshold effects In order to introduce our experiments theadditive model is referred to as ADDME The model aboutepistatic interactions with multiplicative effects is referred toas EIME The epistatic interactions with threshold effects arereferred to as EITEME In the next section we will use theshort name to indicate the corresponding data model

Because ourmethod is two-objective-based SNP epistasissearch method first we compared our proposed methodwith existing single objective-based exhaustive SNP epis-tasis search method to demonstrate the effectiveness oftwo-objective function SNP epistasis subset search methodSecond we compare our proposed method with recentlyproposed method BEAM [17] generic ACO algorithm andAntEpiSeeker [16] In the one-objective function SNP epis-tasis search method the objective function is used to scoreevery SNP combinations in general the score for everySNP combination is not the same Based on the nature ofthe method low score indicates the association betweenSNP combination and disease is relatively small high scoreindicates the association between SNP combination anddisease is relatively large Then the one-objective functionranks all SNP combinations based on the scoresHowever thetwo-objective-based SNP epistasis search method is to find aset of nondominated results and every nondominated SNP

6 Complexity

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10 r

2= 07 r

2= 10

r2= 07 r

2= 10

Figure 2 Power test comparisons between one-objective and two-objective methods on three different model with MAF value 01 02 and05

epistasis resultsrsquo score is the same To ensure fairness for theone-objective function we collect the same number as two-objective-based SNP epistasis search method from the top ofone-objective-based SNP rank The comparing results showthat the two-objective-based SNP epistasis search method isbetter than one-objective-based SNP epistasis search methodin three simulation data models In terms of two singleobjective-based SNP epistasis search methods the resultsof one-objective-based SNP epistasis search methods aresimilar with the other one-objective-based SNP epistasissearch methods The simulation data experiment resultsshow the effectiveness of two-objective-based SNP epistasissearch method and the poor experimental results showthe insufficiency of one-objective functions The experimentresults are shown in Figure 2 The abscissa of Figure 2 isminor allele frequency (MAF) which is assigned 01 02 and05We generate the simulate dataset and study the parametersetting following many previous studies [17 42ndash44] For eachsimulate dataset of parameter combination we generated 100datasets which contain 2000 experimental samples (1000case samples and 1000 control samples) and 1000 SNPs weresimulated We evaluate the algorithm performance throughcalculating the ratio of real number identified following the

significance level 001 which is adjusted after Bonferronicorrection The parameter 120582 was set to 03 for ADDME and02 for EIME and EITEME The parameter range of linkagedisequilibrium between SNPs is 1199032 from 07 to 1

32 Compared with Benchmark Methods After comparingwith single objective function We compare our proposedmethod with existing method The performance of our pro-posedmethod was evaluated by comparison with benchmarkmethods [45] In many previous studies the authors havealready discussed the parameter settings problem In thissection we set the parameters according to the existing strat-egy We evaluated performance of FAACOSE by comparingwith two recent methods BEAM generic ACO algorithmand the AntEpiSeeker we use BEAM package and previousparameter strategy to generate simulate dataset Be awareof the fact that the generic ACO algorithm could not selectlarger size SNP set We use simulated dataset introduced inSection 31 We evaluate the algorithm performance throughcalculating the ratio of real number identified following thesignificance level 001 which is adjusted after Bonferroni cor-rectionWe generate simulate datasets following three geneticmodels ADDME EIME and EITEME Other parameters

Complexity 7

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10

r2= 07 r

2= 10

r2= 07 r

2= 10

Figure 3 Power comparisons between existing methods and FAACOSE on three models

for data simulation were the effective size 120582 a measure ofmarginal effects as defined by Marchini et al [42] linkagedisequilibrium between SNPs measured by 1199032 and minorallele frequencies (MAFs) 120582 was set to 03 for ADDME and02 for EIME and EITEME For 1199032 two values (07 and 10)were used for each model For MAFs three values (01 02and 05) were considered The parameters for BEAM wereset as default The parameter settings for AntEpiSeeker werelarge dataset size = 6 small dataset size = 3 count large =150 count small = 300 epistasis model = 2 ant count = 1000120572 = 1 120588 = 005 and 1205910 = 100 (also available in the softwarepackage documentation of AntEpiSeeker)The parameters ofthe generic ACO algorithm were set as ant count = 1000120572 = 1 120588 = 005 1205910 = 100 count (number of iterations) = 900and epistasis model = 2 The comparison of detection powerfor BEAM genetic ACO algorithm and the AntEpiSeekeris presented in Figure 3 The results show that FAACOSEoutperforms BEAM and the generic ACO in all parametersettings and is superior to AntEpiSeeker in most parametersettings

In this section we compare our proposed method withbenchmark methods First we use power test to detect howmany real SNP subsets can be found with our proposed

method Second we use precision recall and 1198651 score toevaluate the results Precision denotes how many right SNPsubsets in the total final identified SNP subsets Recalldenotes the number of right SNP subsets that are identified1198651 score is an indicator used in statistics to measure theaccuracy of two classification models It takes into accountthe precision and recall of the classificationmodel simultane-ously 1198651 score can be seen as a weighted average of precisionand recall its maximum is 1 and minimum is 0l We showthe results of FAACOSE with other methods on 1199032 = 07 andMAF = 02 in Table 1

The 1198651 score of FAACOSE is better than other methodsWe run the same experiment on datasets with differentparameter combination In all eighteen datasets FAACOSEhas the highest 1198651 score in fifteen of them In real GWASdataset experiment the sample size of real dataset is hugeThe efficiency of the method is also to be considered Theexperimental results indicate that our proposed method ismore effective method in real GWAS dataset AntEpiSeekeris the most efficient algorithm among three methods Indifferent data samples we compare run time of AntEpiSeekerand FAACOSE And averaging the results FAACOSE is faster30 than AntEpiSeeker

8 Complexity

Table 1 1198651 score comparison between FAACOSE and other meth-ods

Model Method Recall Precision 1198651 score

ADDME

BEAM 029 015 020gACO 045 036 040

AntEpiSeeker 06 055 057FAACOSE 082 074 078

EIME

BEAM 03 045 036gACO 035 032 033

AntEpiSeeker 034 056 042FAACOSE 09 082 086

EITEME

BEAM 01 014 012gACO 015 020 017

AntEpiSeeker 054 046 050FAACOSE 065 062 063

4 Application to Real SNP Dataset

Late-Onset Alzheimerrsquos Disease (LOAD) is themost frequentform of Alzheimerrsquos disease which is frequently identifiedin people older than 65 years the LOAD or AD is a kindof chronic neurodegenerative diseases which is frequentlynot obvious in the onset of the disease and slowly changesdementia over time It is the cause of 60 to 70 of casesof dementia The most common early symptom is difficultyin remembering recent events (short-term memory loss)As the disease advances symptoms can include problemswith language disorientation (including easily getting lost)mood swings loss of motivation not managing self-care andbehavioural issues LOAD is a multifactor genetic disease itsetiology and pathogenesis have not yet been fully understoodThe apolipoprotein (APOE) gene is a definite risk factor forLOAD The APOE gene has three forms The 1205762 1205763 and1205764 the effect of 1205762 is positive 1205762 can effectively preventthe occurrence of the disease There has been researchreport that genetic variant 1205764 has induced effect on diseaseBetween 40 and 80 of people with AD possess at least oneAPOE 1205764 allele [46] Previous studies have reported somesignificant SNPs in the field of Genome-Wide AssociationStudies [47] Reference [47] reported that 10 SNPs in thearea of GAB2 gene have an epistasis effect with APOE e4in relation to Late-Onset Alzheimerrsquos Disease We appliedour proposed method to the LOAD GWAS dataset fromwebsite httpswwwtgenorg [47] After data preprocessingthe real biological dataset contains 1368 samples [48 49] Ofthese 836 samples were identified case studies the remaining532 samples were normal sample [50 51] Each sample ofreal biological dataset contains 309316 SNPs with genotypeinformation APOE status and LOAD status [52] For thenext calculation we code the APOE gene state with a binaryvariable the value 1 represents the 1205764 variant and in turn thevalue 0 represents the other three variants [53] An SNP locuswas coded as a quaternary variable considering the missing

Table 2 The number of selected SNPs of FAACOSE in LOADdataset

SNP rsrs7756992 rs611154 rs191840 rs7294919rs1887922 rs304900 rs1999764 rs1385600rs2373115 rs7101429 rs609812 rs613375rs1007837 rs2510038 rs4945261 rs10793294rs520227 rs191740 rs7924284 rs829465rs602106 rs7174511 rs606889 rs602192

state The high potential LOAD disease related SNP is shownin Table 2

5 Discussions

In this paper we proposed a novel ant colony optimizationbased fast search method for the discovery of epistasis inter-actions in large scale real GWAS dataset FAACOSEwas eval-uated through comparison with existing three approaches onboth simulated and real datasets FAACOSE which adopts afast adaptive optimization procedure is amodified algorithmderived from the generic ACO And with two-objectivefunction to demonstrate the advantages of fast adaptiveant colony optimization algorithm we also compared theperformance of the FAACOSE with that of the generic ACO

In future studies we intend to findmore powerful model-ing approaches ant colony optimization algorithmwith fasterconvergence objective functions which can better measuredata structure of GWAS dataset more efficient optimal SNPsubset search and identification strategies that can be com-bined and flexibly embedded into our SNP epistasis searchframework to find more accurate SNP subset With the rapiddevelopment of bioinformatics more and more biologicalinformation related to disease is identified More and morestudies will consider prior knowledge An important futureresearch direction is that we will try to apply expert priorknowledge to GWAS dataset with our proposed method thatis the fast adaptive ant colony optimization algorithm fordetecting SNP epistasis Expert prior knowledge can improvethe power and efficiency of epistasis detection

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is partly supported by National NaturalScience Foundation of China (Grant nos 6152010600631571364 61732012 61532008 U1611265 61672382 6140233461472280 61472173 61572447 61672203 61472282 and61373098) and China Postdoctoral Science Foundation(Grant nos 2014M561513 2015M580352 2017M611619 and2016M601646) Guangxi Bagui Scholars Program SpecialFund

Complexity 9

References

[1] J N Hirschhorn and M J Daly ldquoGenome-wide associationstudies for common diseases and complex traitsrdquo Nature Re-views Genetics vol 6 no 2 pp 95ndash108 2005

[2] B N Howie P Donnelly and J Marchini ldquoA flexible andaccurate genotype imputation method for the next generationof genome-wide association studiesrdquo PLoS Genetics vol 5 no6 Article ID e1000529 2009

[3] T AManolio F S Collins N J Cox et al ldquoFinding themissingheritability of complex diseasesrdquo Nature vol 461 no 7265 pp747ndash753 2009

[4] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[5] B Stubbs D Vancampfort M De Hert and A J MitchellldquoThe prevalence and predictors of type two diabetes mellitus inpeoplewith schizophrenia a systematic review and comparativemeta-analysisrdquo Acta Psychiatrica Scandinavica vol 132 no 2pp 144ndash157 2015

[6] K P Liao ldquoCardiovascular disease in patients with rheumatoidarthritisrdquo Trends in Cardiovascular Medicine vol 27 no 2 pp136ndash140 2017

[7] YMao N R London LMa D Dvorkin and Y Da ldquoDetectionof SNP epistasis effects of quantitative traits using an extendedKempthorne modelrdquo Physiological Genomics vol 28 no 1 pp46ndash52 2006

[8] W Zhang J Zhu E E Schadt and J S Liu ldquoA Bayesianpartition method for detecting pleiotropic and epistatic eQTLmodulesrdquo PLoS Computational Biology vol 6 no 1 Article IDe1000642 2010

[9] M Kang C Zhang H-W Chun C Ding C Liu and JGao ldquoEQTL epistasis Detecting epistatic effects and inferringhierarchical relationships of genes in biological pathwaysrdquoBioinformatics vol 31 no 5 pp 656ndash664 2015

[10] H Lin D Chen P Huang et al ldquoSNP interaction patternidentifier (SIPI) an intensive search for SNPndashSNP interactionpatternsrdquo Bioinformatics 2016

[11] R L Prentice and L Qi ldquoAspects of the design and analysisof high-dimensional SNP studies for disease risk estimationrdquoBiostatistics vol 7 no 3 pp 339ndash354 2006

[12] S-P Deng L Zhu and D-S Huang ldquoMining the bladdercancer-associated genes by an integrated strategy for the con-struction and analysis of differential co-expression networksrdquoBMC Genomics vol 16 no 3 article no S4 2015

[13] S-P Deng and D-S Huang ldquoSFAPS An R package forstructurefunction analysis of protein sequences based oninformational spectrum methodrdquo Methods vol 69 no 3 pp207ndash212 2014

[14] J H Moore J M Lamb N J Brown and D E Vaughan ldquoAcomparison of combinatorial partitioning and linear regressionfor the detection of epistatic effects of the ACE ID and PAI-1 4G5G polymorphisms on plasma PAI-1 Levelsrdquo ClinicalGenetics vol 62 no 1 pp 74ndash79 2002

[15] B M Michael R E Neapolitan X Jiang and V ShyamldquoLearning genetic epistasis using Bayesian network scoringcriteriardquo BMC Bioinformatics vol 12 no 1 89 pages 2011

[16] Y Wang X Liu K Robbins and R Rekaya ldquoAntEpiSeekerdetecting epistatic interactions for case-control studies using atwo-stage ant colony optimization algorithmrdquo BMC ResearchNotes vol 3 article 117 2010

[17] Y Zhang and J S Liu ldquoBayesian inference of epistatic interac-tions in case-control studiesrdquo Nature Genetics vol 39 no 9 pp1167ndash1173 2007

[18] M Dorigo M Birattari and C Blum ldquoAnt colony optimizationand swarm intelligencerdquo SpringerVerlag vol 5217 no 8 pp 767ndash771 2004

[19] T Stutzle M Lopez-Ibanez P Pellegrini et al ldquoParameteradaptation in ant colony optimizationrdquoAutonomous Search vol9783642214349 pp 191ndash215 2012

[20] C Blum and M Sampels ldquoAn ant colony optimization algo-rithm for shop scheduling problemsrdquo Journal of MathematicalModelling and Algorithms vol 3 no 3 pp 285ndash308 2004

[21] R Musa J-P Arnaout and H Jung ldquoAnt colony optimizationalgorithm to solve for the transportation problem of cross-docking networkrdquo Computers and Industrial Engineering vol59 no 1 pp 85ndash92 2010

[22] G N Varela and M C Sinclair ldquoAnt colony optimisation forvirtual-wavelength-path routing and wavelength allocationrdquo inProceedings of the 1999 Congress on Evolutionary Computation(CEC rsquo99) pp 1809ndash1816 Washington DC USA July 1999

[23] K M Sim andW H Sun ldquoAnt colony optimization for routingand load-balancing survey and new directionsrdquo SystemsManampCybernetics Part A Systems Humans IEEE Transactions on vol33 no 5 pp 560ndash572 2003

[24] S-H Ngo X Jiang and S Horiguchi ldquoAdaptive routing andwavelength assignment using ant-based algorithmrdquo in Proceed-ings of the 2004 12th IEEE International Conference onNetworksICON 2004 - Unity in Diversity pp 482ndash486 November 2004

[25] S I Vrieze ldquoModel selection and psychological theory adiscussion of the differences between the Akaike informationcriterion (AIC) and the Bayesian information criterion (BIC)rdquoPsychological Methods vol 17 no 2 pp 228ndash243 2012

[26] D-S Huang and J-X Du ldquoA constructive hybrid structureoptimization methodology for radial basis probabilistic neuralnetworksrdquo IEEE Transactions onNeural Networks vol 19 no 12pp 2099ndash2115 2008

[27] B V North D Curtis and P C Sham ldquoApplication of logisticregression to case-control association studies involving twocausative locirdquo Human Heredity vol 59 no 2 pp 79ndash87 2005

[28] P-J Jing and H-B Shen ldquoMACOED A multi-objective antcolony optimization algorithm for SNP epistasis detection ingenome-wide association studiesrdquo Bioinformatics vol 31 no 5pp 634ndash641 2015

[29] N Ryman ldquoCHIFISH A computer program testing for geneticheterogeneity at multiple loci using chi-square and Fisherrsquosexact testrdquo Molecular Ecology Notes vol 6 no 1 pp 285ndash2872006

[30] C R Mehta and N R Patel ldquoA network algorithm for perform-ing Fisherrsquos exact test in r times c contingency tablesrdquo Journal of theAmerican Statistical Association vol 78 no 382 pp 427ndash4341983

[31] B Sobrino M Brion and A Carracedo ldquoSNPs in forensicgenetics A review on SNP typing methodologiesrdquo ForensicScience International vol 154 no 2-3 pp 181ndash194 2005

[32] O Shoval H Sheftel G Shinar et al ldquoEvolutionary trade-offs pareto optimality and the geometry of phenotype spacerdquoScience vol 336 no 6085 pp 1157ndash1160 2012

[33] D-SHuang andW Jiang ldquoA general CPL-AdSmethodology forfixing dynamic parameters in dual environmentsrdquo IEEE Trans-actions on Systems Man and Cybernetics Part B Cyberneticsvol 42 no 5 pp 1489ndash1500 2012

10 Complexity

[34] L Zhu W-L Guo S-P Deng and D-S Huang ldquoChIP-PITenhancing the analysis of chip-seq data using convex-relaxedpair-wise interaction tensor decompositionrdquo IEEEACM Trans-actions onComputational Biology and Bioinformatics vol 13 no1 pp 55ndash63 2016

[35] C Angione G Carapezza J Costanza P Lio and G NicosialdquoPareto optimality in organelle energy metabolism analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 4 pp 1032ndash1044 2013

[36] R A Fisher ldquoOn the Interpretation of 1205942 from ContingencyTables and the Calculation of Prdquo Journal of the Royal StatisticalSociety vol 85 no 1 p 87 1922

[37] A Agresti ldquoA survey of exact inference for contingency tablesrdquoStatistical Science vol 7 no 1 pp 131ndash153 1992

[38] B Wenzheng C Yuehui and W Dong ldquoPrediction of proteinstructure classes with flexible neural treerdquo Bio-Medical Materi-als and Engineering vol 24 no 6 pp 3797ndash3806 2014

[39] L Zhu Z-H You D-S Huang and B Wang ldquot-LSE anovel robust geometric approach for modeling protein-proteininteraction networksrdquo PLoS ONE vol 8 no 4 Article IDe58368 2013

[40] C-H Zheng L Zhang V T-Y Ng C K Shiu and D-SHuang ldquoMolecular pattern discovery based on penalized ma-trix decompositionrdquo IEEEACMTransactions onComputationalBiology and Bioinformatics vol 8 no 6 pp 1592ndash1603 2011

[41] D-S Huang and H-J Yu ldquoNormalized feature vectors a novelalignment-free sequence comparison method based on thenumbers of adjacent amino acidsrdquo IEEEACM Transactions onComputational Biology and Bioinformatics vol 10 no 2 pp457ndash467 2013

[42] J Marchini P Donnelly and L R Cardon ldquoGenome-widestrategies for detecting multiple loci that influence complexdiseasesrdquo Nature Genetics vol 37 no 4 pp 413ndash417 2005

[43] R JiangW Tang XWu andW Fu ldquoA random forest approachto the detection of epistatic interactions in case-control studiesrdquoBMC Bioinformatics vol 10 no 1 article S65 2009

[44] J Kruppa A Ziegler and I R Konig ldquoRisk estimation and riskprediction using machine-learning methodsrdquoHuman Geneticsvol 131 no 10 pp 1639ndash1654 2012

[45] D-S Huang and C-H Zheng ldquoIndependent componentanalysis-based penalized discriminantmethod for tumor classi-fication using gene expression datardquo Bioinformatics vol 22 no15 pp 1855ndash1862 2006

[46] RWMahley K HWeisgraber and Y Huang ldquoApolipoproteinE4 a causative factor and therapeutic target in neuropathologyincluding Alzheimerrsquos diseaserdquo Proceedings of the National Aca-demy of Sciences of the United States of America vol 103 no 15pp 5644ndash5651 2006

[47] E M Reiman J A Webster A J Myers et al ldquoGAB2 allelesmodify Alzheimerrsquos Risk in APOE 1205764 carriersrdquo Neuron vol 54no 5 pp 713ndash720 2007

[48] C-H Zheng D-S Huang L Zhang and X-Z Kong ldquoTumorclustering using nonnegative matrix factorization with gene se-lectionrdquo IEEE Transactions on Information Technology in Bio-medicine vol 13 no 4 pp 599ndash607 2009

[49] S-P Deng L Zhu and D-S Huang ldquoPredicting hub genesassociated with cervical cancer through gene co-expressionnetworksrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 13 no 1 pp 27ndash35 2016

[50] L Zhu S-P Deng and D-S Huang ldquoA two-stage geometricmethod for pruning unreliable links in protein-protein net-worksrdquo IEEE Transactions on Nanobioscience vol 14 no 5 pp528ndash534 2015

[51] D-S Huang L Zhang KHan S Deng K Yang andH ZhangldquoPrediction of protein-protein interactions based on protein-protein correlation using least squares regressionrdquo CurrentProtein and Peptide Science vol 15 no 6 pp 553ndash560 2014

[52] D-S Huang Systematic Theory of Neural Networks for Pat-ternRecognition Publishing House of Electronic Industry of ChinaMay 1996

[53] D-S Huang ldquoRadial basis probabilistic neural networksmodeland applicationrdquo International Journal of Pattern Recognitionand Artificial Intelligence vol 13 no 7 pp 1083ndash1101 1999

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

Complexity 3

number ACO is always used in large scale data problemsHowever slowness is still a bottleneck in the application ofthe ant colony algorithm for large scale search optimizationproblems Pheromone update strategy is one of the keys todetermine the convergence rate

In the process of applying ant colony optimization tospecific problems the search space should be as large aspossible At the same time ACO should consider timeefficiency ACO should balance the optimal solutions andsolve speed On the basis of previous studies [22ndash24]We onlyconsider pheromone evaporation factor 120588 and pheromoneimportance factor 120572 In (2) 120588 is used to balance the effects ofold pheromone value and current pheromone value When 120588is too small the residual pheromone value is too much andleads to local minimum solution We adopt adaptive 120588 whenthe algorithm does not improve the current optimal solutionwithin 119899 iterations

120588 (119905 + 119899) =

119892120588 (119905) 120588 (119905) le 120588max

120588max otherwise(4)

where 120588max equals 085 in practice 119892 equals 102 as tuneparameter When the pheromone value reaches the criticalvalue the pheromone importance factor begins to play a roleWith the increase of pheromone importance factor 120572 thealgorithm will jump out of the local optimal solution and hasability to search for global optimal solution

120572 (119905 + 119899) =

1198921120572 (119905) 120572 (119905) le 120572max

120572max otherwise(5)

where 1198921 is a constant larger than one and 120572max is less than orequal to five In the process of calculation first we follow thestandard ant colony optimization algorithm for119873 iterations119873 is predefined number If the current optimal solution is notimproved after119873 iterations update the parameters accordingto formulas (4) and (5) Then update all pheromone valueaccording to (2)

Given pheromone values and transfer rules we can usethe ant colony optimization algorithm to find a group ofSNPs which affect the disease Assume there are 119875 SNPs inthe global Genome-Wide Association Studies dataset we canconstruct a p-dimensional symmetricmatrix119872 to store everyantrsquos pheromone value The element119898119894119895 of matrix119872 denotesthe interaction which is related to disease between 119894th SNPand jth SNP At the beginning of our method every elementof matrix 119872 is assigned to a constant value 1198980 equivalentvalue shows the epistasis in every pair of SNPs and there isequal possibility relationship between the SNPs and disease

At the final pheromone iteration the ACO algorithmwill obtain the optimal solutions through forward selectionstrategy The advantage of ACO algorithm in this paper isthat the result contains nondominated solutions which havethe potentially equivalent possibility and potentially highestrelated strength with disease and omit dominated solutions

The disadvantages of traditional ant colony optimizationalgorithm are long search time and tendency to fall intothe local optimal solution The drawback of this working

mode is that the current pheromone evaporation factorand pheromone importance factor are predefined As animproved strategy we extended the ldquodynamic adaptive strat-egyrdquo to ant colony optimization The advantage of thisstrategy is the fast convergence rate and searching for globaloptimization solution Compared with traditional ACO thenew strategy can provide more accurate result

22 Two-Objective Function Optimization The results of antcolony optimization need to be evaluated We combine two-objective methods to assess the final epistasis results Ingeneral one of two-objective functions combines AkaikeInformation Criterion (AIC) score and logistic regressionfunction to measure relationship between phenotypic traitand genotype data Akaike Information Criterion indicatesthe effectiveness and complexity of the model [25 26] Inour method on the basis of the standard logistic regressionfollowing the North et al [27] strategy we use ADDINTlogistic regression model to search the relationship betweendisease and SNP nodes The second objective functionuses frequency measurement based on mutual informationtheory to model the relationship between genotype dataand phenotypic trait from the perspective of informationtheory The second objective function used to represent theselected SNP subsets can explain how much informationis about the disease trait Our proposed method obtainsinformation from data rather than a lot of priori informationThe above two-objective functions are designed from thedifferent perspective to measure the quality of the searchresults and the simulation data experiment results show thatour two-objective functions have a better performance thanother methods on simulated and real biological datasets

In order to avoid the bad impact of high dimension smallsize sample problem the identification of disease-associatedSNPs is known as a heuristic optimization problem In ourproposed method proposed method yields optimal solu-tions which is nondominated solutions the proposed two-objective functions method actually is kind of multiobjectiveoptimization the proposedmethoduses ant colony optimiza-tion to search for optimal solution [28]

Our proposed fast adaptiveACO framework contains twostages In the first stage we use modified ACO optimizationalgorithm with two-objective functions to search for non-dominated SNP subset After generating the nondominatedSNP subset we apply Fisher exact test [29 30] to the datasetcontaining nondominated SNP generated in the algorithmfirst stage The Fisher exact test will be used to identify therelationship between disease and SNPs

221 AIC Score The Akaike Information Criterion (AIC) isused to measure quality of dataset statistical models AIC isfrom information theory and it estimates loss of informationwhen a statistical model is used to express the data generationprocess The mechanism of Akaike Information Criterion isthat it deals with the trade-off between the goodness of fitof the model and the complexity of the model Based onthe nature of the AIC we construct AIC model from theperspective of GWAS dataset The goal of our method isto measure the relationship between the genotype data of

4 Complexity

genome and phenotype disease trait Logistic regression iswidely used to quantitatively analyze the correlation betweendependent variable and independent variable Based onabove methods we construct AIC score model containinglogistic regression and gradient penalty function Logisticregression can compute the maximized log-likelihood of themodel k is used to express the number of free parametersAIC score deals with the trade-off between the fitness effectof the model and the complexity of the model We follow Jingand Shen [28] strategy

AICscore = 2119896 minus 2 log lik (6)

where 119896 denotes the number of free parameters

222 Explanation Score In GWAS research the relationshipbetween two loci and disease in SNP research each locus hasthree values 0 1 and 2 0 means major allele homozygous 1means heterozygote and 2 means minor allele homozygous[31] For two loci there are nine cases of their combinationthe disease related SNP locus often changes when the diseaseoccurs In the case of double locus combination 119909119894means thenumber of 119894th combinations of two SNP lociY means case orcontrol state 1199101 means state case and 1199102 means state controlThe potential interrelationships of two discrete randomvariables 119883 and 119884 are defined as 119867(119883 119884) the relationshipbetween locus combination and disease ismeasured based onthe information of locus frequency 119867(119883 119884) is described asbelow

119867(119883 119884) =119868

sum119894=1

(100381610038161003816100381610038161199091198941199101 minus 119909119894119910210038161003816100381610038161003816) (7)

where 119868 means the total number of locus combinations Toavoid unbalanced sample the size affects score For exampleif data size of case is larger than control we extract thesame size of control data from case samples randomly Toavoid the impact of randomness we extract sample severaltimes and average the results The large value 119867 means thepotential association probability between disease and SNPs islarge Equation can also be applied to more than two locuscombinations We name this score explain score

23 Pareto Optimality for SNP Epistasis Detection Paretooptimality defines such a situation Pareto optimality isproposed to solve the following questions where it is impos-sible to make all objective function values of multiobjectiveoptimization optimal values [32 33] Pareto optimality isfirst applied to the area of income distribution and economyNow Pareto optimality has been extended to engineering andmultiobjective optimization research On the basis of previ-ous proposedmethods themodified ant colony optimizationalgorithm with first objective function and second objectivefunction the first objective function is AIC score with logis-tic regression and related parameters the second objectivefunction is explain score For the first objective functionsthe lower score of the objective function indicates the highpotential relationship between disease phenotype trait andSNPs [34] For the second objective functions the higherscore of the objective function indicates the high potential

relationship between the disease phenotype trait and SNPsThe target of fast two-stage ant colony optimization algorithmis to find the epistasis effect among SNPs and extract real SNPsubset with respect to the above proposed methods

In the real GWAS datasets an identified SNP subset mayperform the best compared with other method solutions interms of one-objective function but SNP subsetmay performpoorly in terms of another objective functionThus the targetis how to select better SNP subset with respect to both objec-tive functions In practical application rare subset performsbetter than other solutions while satisfying both conditionsThus for a framework with two-objective functions it ishard and even impossible to calculate the global optimalsolution On the basis of previous studies [28 34 35] weadopt Pareto optimality to find the practical optimal solutionWe first compare the two solutions in terms of GWAS SNPsubset a solution named S1 and another solution namedS2 comparing S1 and S2 only have two consequences oneresult is one solution dominates the other another result isS1 does not dominate S2 in turn the solution S2 does notdominate S1 Based on the mind of Pareto optimality weconsider S1 dominates S2 if they satisfy the following twoconditions The first condition is the value of 119891119890(S1) is nothigher than 119891119890(S2) for those two-objective functions Thesecond condition is the objective function119891119890(S1) is lower than119891119890(S2) for at least one-objective function The function 119891119890denotes the objective function modified AIC score objectivefunction and explain score objective function The 119890 equal toone denotes the first objective function the 119890 equal to twodenotes the second objective function If solutions S1 and S2satisfy the above two conditions we say solution S1 is a non-dominated solution in turn we say solution S2 is a dominatedsolution Based on above Pareto optimality approach andtwo-objective functions all solutions can be divided into twokinds one is nondominated set and another is dominated setFinally nondominated sets contain many solutions and allthe solutions from our proposedmethod with respect to two-objective functions now our goal is to find a nondominatedset which is the best under certain conditions

Next we will use the judgment rule mentioned earlier tosort the solutions of nondominated sets to find the optimalnondominated set Specifically in the first case 1198911(S2) islarger than 1198911(S1) at the same time 119891119890(S2) is larger than1198912(S1) In the second case 1198911(S2) equals 1198911(S1) at the sametime 1198912(S2) is larger than 1198912(S1) In the third case 1198911(S2) islarger than 1198911(S1) at the same time 1198912(S2) equals 1198912(S1)

24 Fisher Exact Test for Experimental Results Fisher exacttest is used in contingency tables to get a statistical signifi-cance [36ndash38] Although in practice it is used in small sizesample it is can also be used in all sample sizes Ronald Fisherfirst proposed this method and Fisher exact test is one kindof exact tests

In terms of our GWAS datasets research article on thebasis of unified framework which contains fast adaptive antcolony optimization (ACO) algorithm Akaike InformationCriterion (AIC) score explain score and Pareto optimalitywe can obtain the final result which is a nondominated SNPset in this section wewill use Fisher exact test to exhaustively

Complexity 5

search for the epistasis effect Fisher exact test is based onhypergeometric distribution the 119875 value in the Fisher exacttest is accurate for all individual samples Fisher exact test isused on the basis of contingency table The null hypothesis isthat the identified SNP subset and disease are not associatedThe alternative hypothesis is that SNP subset affects theexpression of the disease when the Fisher exact testrsquos 119875 valueis significant when 119875 value is less than predetermined valuesuch as 005 or smaller value Our proposed method willidentify significance SNP subsets

25 Power Test In previous sectionwe introduce each part ofour proposed fast adaptive ant colony optimization algorithmfor detecting SNP epistasis Our proposed unified frameworkcontains fast adaptive ant colony optimization algorithmAkaike Information Criterion (AIC) score explain scorePareto optimality and modified Fisher exact test In thissection we introduce how to verify the significance of theresults We construct 100 datasets according to the sameparametersThenweuse the traditional power test tomeasurethe effect of methods The power test is defined as follows

Power = |119878119863|100 (8)

where |119878119863| denotes the number of disease related datasetswhich were correctly selected from 100 datasets Only usingthe single test criterion may not clearly show the qualityof results We use precision recall standard to measure truepositive rate and false positive rate Precision recall criteriahave been widely used in classification model evaluationmodel [39 40] In pattern recognition and informationretrieval with binary classification precision also calledpositive predictive value is the fraction of retrieved instancesthat are relevant while recall also known as sensitivity isthe fraction of relevant instances that are retrieved [26] Bothprecision and recall are therefore based on an understandingand measure of relevance We use precision recall criteriato determine whether the classification results are good orbad The precision recall criteria can avoid the imbalanceproblem of precision recall numbers In our research thenumber of precision and recall always differs greatly In termsof the SNP epistasis research precision is also known aspositive predictive value equivalent to the true disease relatedSNP subsets recall is also known as sensitivity or negativeequivalent to the true disease unrelated SNP subsets If weuse only one judgment criterion thus false positive ratesingle indicator cannotmake the real result clearWe use falsepositive rate and true positive rate to measure the real resultThis is why we use precision and recall We also use 1198651 score(also 119865 score or 119865 measure) to measure the precision recalltest accuracyThe precision and recall will be introduced nextwith confusion matrix (Figure 1)

recall = TPTP + FN

precision = TPTP + FP

1198651 =precision sdot recallprecision + recall

(9)

Predicted classAssociated Nonassociated

True

clas

s Associated True positive(TP)

False negative(FN)

Nonassociated False positive(FP)

True negative(TN)

Figure 1 Precision recall explanation matrix

The precision also known as specificity denotes truepositive number ratio in the result through the number oftrue positives divided by the sum of true positive number andfalse positive number precision is often used to report falsepositive rate of an algorithmrsquos false positive rate The recallalso known as sensitivity denotes true positive ration in thesum of true positives and false negative In terms of SNPsselection problem the larger the recall value is the largerthe number of real true disease-related SNP combinationscan be found Simultaneously the larger the precision valuethe larger the number of real true disease-related SNPcombinations account for a high proportion of the identifiedSNP combinations The criterion 119865measure is the harmonicmean of precision and recall which is a synthesized measurecombining both precision and recall [41]

3 Simulation Experiments

31 Compared with One-Objective Function In this sectionwe use simulation data to compare our proposed methodwith other existing methods In order to avoid data favorcaused by the model we adopt BEAM package to generatesimulation datasets [17] Data was simulated following threegenetic models (1) additive model (2) epistatic interactionswith multiplicative effects and (3) epistatic interactions withthreshold effects In order to introduce our experiments theadditive model is referred to as ADDME The model aboutepistatic interactions with multiplicative effects is referred toas EIME The epistatic interactions with threshold effects arereferred to as EITEME In the next section we will use theshort name to indicate the corresponding data model

Because ourmethod is two-objective-based SNP epistasissearch method first we compared our proposed methodwith existing single objective-based exhaustive SNP epis-tasis search method to demonstrate the effectiveness oftwo-objective function SNP epistasis subset search methodSecond we compare our proposed method with recentlyproposed method BEAM [17] generic ACO algorithm andAntEpiSeeker [16] In the one-objective function SNP epis-tasis search method the objective function is used to scoreevery SNP combinations in general the score for everySNP combination is not the same Based on the nature ofthe method low score indicates the association betweenSNP combination and disease is relatively small high scoreindicates the association between SNP combination anddisease is relatively large Then the one-objective functionranks all SNP combinations based on the scoresHowever thetwo-objective-based SNP epistasis search method is to find aset of nondominated results and every nondominated SNP

6 Complexity

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10 r

2= 07 r

2= 10

r2= 07 r

2= 10

Figure 2 Power test comparisons between one-objective and two-objective methods on three different model with MAF value 01 02 and05

epistasis resultsrsquo score is the same To ensure fairness for theone-objective function we collect the same number as two-objective-based SNP epistasis search method from the top ofone-objective-based SNP rank The comparing results showthat the two-objective-based SNP epistasis search method isbetter than one-objective-based SNP epistasis search methodin three simulation data models In terms of two singleobjective-based SNP epistasis search methods the resultsof one-objective-based SNP epistasis search methods aresimilar with the other one-objective-based SNP epistasissearch methods The simulation data experiment resultsshow the effectiveness of two-objective-based SNP epistasissearch method and the poor experimental results showthe insufficiency of one-objective functions The experimentresults are shown in Figure 2 The abscissa of Figure 2 isminor allele frequency (MAF) which is assigned 01 02 and05We generate the simulate dataset and study the parametersetting following many previous studies [17 42ndash44] For eachsimulate dataset of parameter combination we generated 100datasets which contain 2000 experimental samples (1000case samples and 1000 control samples) and 1000 SNPs weresimulated We evaluate the algorithm performance throughcalculating the ratio of real number identified following the

significance level 001 which is adjusted after Bonferronicorrection The parameter 120582 was set to 03 for ADDME and02 for EIME and EITEME The parameter range of linkagedisequilibrium between SNPs is 1199032 from 07 to 1

32 Compared with Benchmark Methods After comparingwith single objective function We compare our proposedmethod with existing method The performance of our pro-posedmethod was evaluated by comparison with benchmarkmethods [45] In many previous studies the authors havealready discussed the parameter settings problem In thissection we set the parameters according to the existing strat-egy We evaluated performance of FAACOSE by comparingwith two recent methods BEAM generic ACO algorithmand the AntEpiSeeker we use BEAM package and previousparameter strategy to generate simulate dataset Be awareof the fact that the generic ACO algorithm could not selectlarger size SNP set We use simulated dataset introduced inSection 31 We evaluate the algorithm performance throughcalculating the ratio of real number identified following thesignificance level 001 which is adjusted after Bonferroni cor-rectionWe generate simulate datasets following three geneticmodels ADDME EIME and EITEME Other parameters

Complexity 7

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10

r2= 07 r

2= 10

r2= 07 r

2= 10

Figure 3 Power comparisons between existing methods and FAACOSE on three models

for data simulation were the effective size 120582 a measure ofmarginal effects as defined by Marchini et al [42] linkagedisequilibrium between SNPs measured by 1199032 and minorallele frequencies (MAFs) 120582 was set to 03 for ADDME and02 for EIME and EITEME For 1199032 two values (07 and 10)were used for each model For MAFs three values (01 02and 05) were considered The parameters for BEAM wereset as default The parameter settings for AntEpiSeeker werelarge dataset size = 6 small dataset size = 3 count large =150 count small = 300 epistasis model = 2 ant count = 1000120572 = 1 120588 = 005 and 1205910 = 100 (also available in the softwarepackage documentation of AntEpiSeeker)The parameters ofthe generic ACO algorithm were set as ant count = 1000120572 = 1 120588 = 005 1205910 = 100 count (number of iterations) = 900and epistasis model = 2 The comparison of detection powerfor BEAM genetic ACO algorithm and the AntEpiSeekeris presented in Figure 3 The results show that FAACOSEoutperforms BEAM and the generic ACO in all parametersettings and is superior to AntEpiSeeker in most parametersettings

In this section we compare our proposed method withbenchmark methods First we use power test to detect howmany real SNP subsets can be found with our proposed

method Second we use precision recall and 1198651 score toevaluate the results Precision denotes how many right SNPsubsets in the total final identified SNP subsets Recalldenotes the number of right SNP subsets that are identified1198651 score is an indicator used in statistics to measure theaccuracy of two classification models It takes into accountthe precision and recall of the classificationmodel simultane-ously 1198651 score can be seen as a weighted average of precisionand recall its maximum is 1 and minimum is 0l We showthe results of FAACOSE with other methods on 1199032 = 07 andMAF = 02 in Table 1

The 1198651 score of FAACOSE is better than other methodsWe run the same experiment on datasets with differentparameter combination In all eighteen datasets FAACOSEhas the highest 1198651 score in fifteen of them In real GWASdataset experiment the sample size of real dataset is hugeThe efficiency of the method is also to be considered Theexperimental results indicate that our proposed method ismore effective method in real GWAS dataset AntEpiSeekeris the most efficient algorithm among three methods Indifferent data samples we compare run time of AntEpiSeekerand FAACOSE And averaging the results FAACOSE is faster30 than AntEpiSeeker

8 Complexity

Table 1 1198651 score comparison between FAACOSE and other meth-ods

Model Method Recall Precision 1198651 score

ADDME

BEAM 029 015 020gACO 045 036 040

AntEpiSeeker 06 055 057FAACOSE 082 074 078

EIME

BEAM 03 045 036gACO 035 032 033

AntEpiSeeker 034 056 042FAACOSE 09 082 086

EITEME

BEAM 01 014 012gACO 015 020 017

AntEpiSeeker 054 046 050FAACOSE 065 062 063

4 Application to Real SNP Dataset

Late-Onset Alzheimerrsquos Disease (LOAD) is themost frequentform of Alzheimerrsquos disease which is frequently identifiedin people older than 65 years the LOAD or AD is a kindof chronic neurodegenerative diseases which is frequentlynot obvious in the onset of the disease and slowly changesdementia over time It is the cause of 60 to 70 of casesof dementia The most common early symptom is difficultyin remembering recent events (short-term memory loss)As the disease advances symptoms can include problemswith language disorientation (including easily getting lost)mood swings loss of motivation not managing self-care andbehavioural issues LOAD is a multifactor genetic disease itsetiology and pathogenesis have not yet been fully understoodThe apolipoprotein (APOE) gene is a definite risk factor forLOAD The APOE gene has three forms The 1205762 1205763 and1205764 the effect of 1205762 is positive 1205762 can effectively preventthe occurrence of the disease There has been researchreport that genetic variant 1205764 has induced effect on diseaseBetween 40 and 80 of people with AD possess at least oneAPOE 1205764 allele [46] Previous studies have reported somesignificant SNPs in the field of Genome-Wide AssociationStudies [47] Reference [47] reported that 10 SNPs in thearea of GAB2 gene have an epistasis effect with APOE e4in relation to Late-Onset Alzheimerrsquos Disease We appliedour proposed method to the LOAD GWAS dataset fromwebsite httpswwwtgenorg [47] After data preprocessingthe real biological dataset contains 1368 samples [48 49] Ofthese 836 samples were identified case studies the remaining532 samples were normal sample [50 51] Each sample ofreal biological dataset contains 309316 SNPs with genotypeinformation APOE status and LOAD status [52] For thenext calculation we code the APOE gene state with a binaryvariable the value 1 represents the 1205764 variant and in turn thevalue 0 represents the other three variants [53] An SNP locuswas coded as a quaternary variable considering the missing

Table 2 The number of selected SNPs of FAACOSE in LOADdataset

SNP rsrs7756992 rs611154 rs191840 rs7294919rs1887922 rs304900 rs1999764 rs1385600rs2373115 rs7101429 rs609812 rs613375rs1007837 rs2510038 rs4945261 rs10793294rs520227 rs191740 rs7924284 rs829465rs602106 rs7174511 rs606889 rs602192

state The high potential LOAD disease related SNP is shownin Table 2

5 Discussions

In this paper we proposed a novel ant colony optimizationbased fast search method for the discovery of epistasis inter-actions in large scale real GWAS dataset FAACOSEwas eval-uated through comparison with existing three approaches onboth simulated and real datasets FAACOSE which adopts afast adaptive optimization procedure is amodified algorithmderived from the generic ACO And with two-objectivefunction to demonstrate the advantages of fast adaptiveant colony optimization algorithm we also compared theperformance of the FAACOSE with that of the generic ACO

In future studies we intend to findmore powerful model-ing approaches ant colony optimization algorithmwith fasterconvergence objective functions which can better measuredata structure of GWAS dataset more efficient optimal SNPsubset search and identification strategies that can be com-bined and flexibly embedded into our SNP epistasis searchframework to find more accurate SNP subset With the rapiddevelopment of bioinformatics more and more biologicalinformation related to disease is identified More and morestudies will consider prior knowledge An important futureresearch direction is that we will try to apply expert priorknowledge to GWAS dataset with our proposed method thatis the fast adaptive ant colony optimization algorithm fordetecting SNP epistasis Expert prior knowledge can improvethe power and efficiency of epistasis detection

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is partly supported by National NaturalScience Foundation of China (Grant nos 6152010600631571364 61732012 61532008 U1611265 61672382 6140233461472280 61472173 61572447 61672203 61472282 and61373098) and China Postdoctoral Science Foundation(Grant nos 2014M561513 2015M580352 2017M611619 and2016M601646) Guangxi Bagui Scholars Program SpecialFund

Complexity 9

References

[1] J N Hirschhorn and M J Daly ldquoGenome-wide associationstudies for common diseases and complex traitsrdquo Nature Re-views Genetics vol 6 no 2 pp 95ndash108 2005

[2] B N Howie P Donnelly and J Marchini ldquoA flexible andaccurate genotype imputation method for the next generationof genome-wide association studiesrdquo PLoS Genetics vol 5 no6 Article ID e1000529 2009

[3] T AManolio F S Collins N J Cox et al ldquoFinding themissingheritability of complex diseasesrdquo Nature vol 461 no 7265 pp747ndash753 2009

[4] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[5] B Stubbs D Vancampfort M De Hert and A J MitchellldquoThe prevalence and predictors of type two diabetes mellitus inpeoplewith schizophrenia a systematic review and comparativemeta-analysisrdquo Acta Psychiatrica Scandinavica vol 132 no 2pp 144ndash157 2015

[6] K P Liao ldquoCardiovascular disease in patients with rheumatoidarthritisrdquo Trends in Cardiovascular Medicine vol 27 no 2 pp136ndash140 2017

[7] YMao N R London LMa D Dvorkin and Y Da ldquoDetectionof SNP epistasis effects of quantitative traits using an extendedKempthorne modelrdquo Physiological Genomics vol 28 no 1 pp46ndash52 2006

[8] W Zhang J Zhu E E Schadt and J S Liu ldquoA Bayesianpartition method for detecting pleiotropic and epistatic eQTLmodulesrdquo PLoS Computational Biology vol 6 no 1 Article IDe1000642 2010

[9] M Kang C Zhang H-W Chun C Ding C Liu and JGao ldquoEQTL epistasis Detecting epistatic effects and inferringhierarchical relationships of genes in biological pathwaysrdquoBioinformatics vol 31 no 5 pp 656ndash664 2015

[10] H Lin D Chen P Huang et al ldquoSNP interaction patternidentifier (SIPI) an intensive search for SNPndashSNP interactionpatternsrdquo Bioinformatics 2016

[11] R L Prentice and L Qi ldquoAspects of the design and analysisof high-dimensional SNP studies for disease risk estimationrdquoBiostatistics vol 7 no 3 pp 339ndash354 2006

[12] S-P Deng L Zhu and D-S Huang ldquoMining the bladdercancer-associated genes by an integrated strategy for the con-struction and analysis of differential co-expression networksrdquoBMC Genomics vol 16 no 3 article no S4 2015

[13] S-P Deng and D-S Huang ldquoSFAPS An R package forstructurefunction analysis of protein sequences based oninformational spectrum methodrdquo Methods vol 69 no 3 pp207ndash212 2014

[14] J H Moore J M Lamb N J Brown and D E Vaughan ldquoAcomparison of combinatorial partitioning and linear regressionfor the detection of epistatic effects of the ACE ID and PAI-1 4G5G polymorphisms on plasma PAI-1 Levelsrdquo ClinicalGenetics vol 62 no 1 pp 74ndash79 2002

[15] B M Michael R E Neapolitan X Jiang and V ShyamldquoLearning genetic epistasis using Bayesian network scoringcriteriardquo BMC Bioinformatics vol 12 no 1 89 pages 2011

[16] Y Wang X Liu K Robbins and R Rekaya ldquoAntEpiSeekerdetecting epistatic interactions for case-control studies using atwo-stage ant colony optimization algorithmrdquo BMC ResearchNotes vol 3 article 117 2010

[17] Y Zhang and J S Liu ldquoBayesian inference of epistatic interac-tions in case-control studiesrdquo Nature Genetics vol 39 no 9 pp1167ndash1173 2007

[18] M Dorigo M Birattari and C Blum ldquoAnt colony optimizationand swarm intelligencerdquo SpringerVerlag vol 5217 no 8 pp 767ndash771 2004

[19] T Stutzle M Lopez-Ibanez P Pellegrini et al ldquoParameteradaptation in ant colony optimizationrdquoAutonomous Search vol9783642214349 pp 191ndash215 2012

[20] C Blum and M Sampels ldquoAn ant colony optimization algo-rithm for shop scheduling problemsrdquo Journal of MathematicalModelling and Algorithms vol 3 no 3 pp 285ndash308 2004

[21] R Musa J-P Arnaout and H Jung ldquoAnt colony optimizationalgorithm to solve for the transportation problem of cross-docking networkrdquo Computers and Industrial Engineering vol59 no 1 pp 85ndash92 2010

[22] G N Varela and M C Sinclair ldquoAnt colony optimisation forvirtual-wavelength-path routing and wavelength allocationrdquo inProceedings of the 1999 Congress on Evolutionary Computation(CEC rsquo99) pp 1809ndash1816 Washington DC USA July 1999

[23] K M Sim andW H Sun ldquoAnt colony optimization for routingand load-balancing survey and new directionsrdquo SystemsManampCybernetics Part A Systems Humans IEEE Transactions on vol33 no 5 pp 560ndash572 2003

[24] S-H Ngo X Jiang and S Horiguchi ldquoAdaptive routing andwavelength assignment using ant-based algorithmrdquo in Proceed-ings of the 2004 12th IEEE International Conference onNetworksICON 2004 - Unity in Diversity pp 482ndash486 November 2004

[25] S I Vrieze ldquoModel selection and psychological theory adiscussion of the differences between the Akaike informationcriterion (AIC) and the Bayesian information criterion (BIC)rdquoPsychological Methods vol 17 no 2 pp 228ndash243 2012

[26] D-S Huang and J-X Du ldquoA constructive hybrid structureoptimization methodology for radial basis probabilistic neuralnetworksrdquo IEEE Transactions onNeural Networks vol 19 no 12pp 2099ndash2115 2008

[27] B V North D Curtis and P C Sham ldquoApplication of logisticregression to case-control association studies involving twocausative locirdquo Human Heredity vol 59 no 2 pp 79ndash87 2005

[28] P-J Jing and H-B Shen ldquoMACOED A multi-objective antcolony optimization algorithm for SNP epistasis detection ingenome-wide association studiesrdquo Bioinformatics vol 31 no 5pp 634ndash641 2015

[29] N Ryman ldquoCHIFISH A computer program testing for geneticheterogeneity at multiple loci using chi-square and Fisherrsquosexact testrdquo Molecular Ecology Notes vol 6 no 1 pp 285ndash2872006

[30] C R Mehta and N R Patel ldquoA network algorithm for perform-ing Fisherrsquos exact test in r times c contingency tablesrdquo Journal of theAmerican Statistical Association vol 78 no 382 pp 427ndash4341983

[31] B Sobrino M Brion and A Carracedo ldquoSNPs in forensicgenetics A review on SNP typing methodologiesrdquo ForensicScience International vol 154 no 2-3 pp 181ndash194 2005

[32] O Shoval H Sheftel G Shinar et al ldquoEvolutionary trade-offs pareto optimality and the geometry of phenotype spacerdquoScience vol 336 no 6085 pp 1157ndash1160 2012

[33] D-SHuang andW Jiang ldquoA general CPL-AdSmethodology forfixing dynamic parameters in dual environmentsrdquo IEEE Trans-actions on Systems Man and Cybernetics Part B Cyberneticsvol 42 no 5 pp 1489ndash1500 2012

10 Complexity

[34] L Zhu W-L Guo S-P Deng and D-S Huang ldquoChIP-PITenhancing the analysis of chip-seq data using convex-relaxedpair-wise interaction tensor decompositionrdquo IEEEACM Trans-actions onComputational Biology and Bioinformatics vol 13 no1 pp 55ndash63 2016

[35] C Angione G Carapezza J Costanza P Lio and G NicosialdquoPareto optimality in organelle energy metabolism analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 4 pp 1032ndash1044 2013

[36] R A Fisher ldquoOn the Interpretation of 1205942 from ContingencyTables and the Calculation of Prdquo Journal of the Royal StatisticalSociety vol 85 no 1 p 87 1922

[37] A Agresti ldquoA survey of exact inference for contingency tablesrdquoStatistical Science vol 7 no 1 pp 131ndash153 1992

[38] B Wenzheng C Yuehui and W Dong ldquoPrediction of proteinstructure classes with flexible neural treerdquo Bio-Medical Materi-als and Engineering vol 24 no 6 pp 3797ndash3806 2014

[39] L Zhu Z-H You D-S Huang and B Wang ldquot-LSE anovel robust geometric approach for modeling protein-proteininteraction networksrdquo PLoS ONE vol 8 no 4 Article IDe58368 2013

[40] C-H Zheng L Zhang V T-Y Ng C K Shiu and D-SHuang ldquoMolecular pattern discovery based on penalized ma-trix decompositionrdquo IEEEACMTransactions onComputationalBiology and Bioinformatics vol 8 no 6 pp 1592ndash1603 2011

[41] D-S Huang and H-J Yu ldquoNormalized feature vectors a novelalignment-free sequence comparison method based on thenumbers of adjacent amino acidsrdquo IEEEACM Transactions onComputational Biology and Bioinformatics vol 10 no 2 pp457ndash467 2013

[42] J Marchini P Donnelly and L R Cardon ldquoGenome-widestrategies for detecting multiple loci that influence complexdiseasesrdquo Nature Genetics vol 37 no 4 pp 413ndash417 2005

[43] R JiangW Tang XWu andW Fu ldquoA random forest approachto the detection of epistatic interactions in case-control studiesrdquoBMC Bioinformatics vol 10 no 1 article S65 2009

[44] J Kruppa A Ziegler and I R Konig ldquoRisk estimation and riskprediction using machine-learning methodsrdquoHuman Geneticsvol 131 no 10 pp 1639ndash1654 2012

[45] D-S Huang and C-H Zheng ldquoIndependent componentanalysis-based penalized discriminantmethod for tumor classi-fication using gene expression datardquo Bioinformatics vol 22 no15 pp 1855ndash1862 2006

[46] RWMahley K HWeisgraber and Y Huang ldquoApolipoproteinE4 a causative factor and therapeutic target in neuropathologyincluding Alzheimerrsquos diseaserdquo Proceedings of the National Aca-demy of Sciences of the United States of America vol 103 no 15pp 5644ndash5651 2006

[47] E M Reiman J A Webster A J Myers et al ldquoGAB2 allelesmodify Alzheimerrsquos Risk in APOE 1205764 carriersrdquo Neuron vol 54no 5 pp 713ndash720 2007

[48] C-H Zheng D-S Huang L Zhang and X-Z Kong ldquoTumorclustering using nonnegative matrix factorization with gene se-lectionrdquo IEEE Transactions on Information Technology in Bio-medicine vol 13 no 4 pp 599ndash607 2009

[49] S-P Deng L Zhu and D-S Huang ldquoPredicting hub genesassociated with cervical cancer through gene co-expressionnetworksrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 13 no 1 pp 27ndash35 2016

[50] L Zhu S-P Deng and D-S Huang ldquoA two-stage geometricmethod for pruning unreliable links in protein-protein net-worksrdquo IEEE Transactions on Nanobioscience vol 14 no 5 pp528ndash534 2015

[51] D-S Huang L Zhang KHan S Deng K Yang andH ZhangldquoPrediction of protein-protein interactions based on protein-protein correlation using least squares regressionrdquo CurrentProtein and Peptide Science vol 15 no 6 pp 553ndash560 2014

[52] D-S Huang Systematic Theory of Neural Networks for Pat-ternRecognition Publishing House of Electronic Industry of ChinaMay 1996

[53] D-S Huang ldquoRadial basis probabilistic neural networksmodeland applicationrdquo International Journal of Pattern Recognitionand Artificial Intelligence vol 13 no 7 pp 1083ndash1101 1999

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

4 Complexity

genome and phenotype disease trait Logistic regression iswidely used to quantitatively analyze the correlation betweendependent variable and independent variable Based onabove methods we construct AIC score model containinglogistic regression and gradient penalty function Logisticregression can compute the maximized log-likelihood of themodel k is used to express the number of free parametersAIC score deals with the trade-off between the fitness effectof the model and the complexity of the model We follow Jingand Shen [28] strategy

AICscore = 2119896 minus 2 log lik (6)

where 119896 denotes the number of free parameters

222 Explanation Score In GWAS research the relationshipbetween two loci and disease in SNP research each locus hasthree values 0 1 and 2 0 means major allele homozygous 1means heterozygote and 2 means minor allele homozygous[31] For two loci there are nine cases of their combinationthe disease related SNP locus often changes when the diseaseoccurs In the case of double locus combination 119909119894means thenumber of 119894th combinations of two SNP lociY means case orcontrol state 1199101 means state case and 1199102 means state controlThe potential interrelationships of two discrete randomvariables 119883 and 119884 are defined as 119867(119883 119884) the relationshipbetween locus combination and disease ismeasured based onthe information of locus frequency 119867(119883 119884) is described asbelow

119867(119883 119884) =119868

sum119894=1

(100381610038161003816100381610038161199091198941199101 minus 119909119894119910210038161003816100381610038161003816) (7)

where 119868 means the total number of locus combinations Toavoid unbalanced sample the size affects score For exampleif data size of case is larger than control we extract thesame size of control data from case samples randomly Toavoid the impact of randomness we extract sample severaltimes and average the results The large value 119867 means thepotential association probability between disease and SNPs islarge Equation can also be applied to more than two locuscombinations We name this score explain score

23 Pareto Optimality for SNP Epistasis Detection Paretooptimality defines such a situation Pareto optimality isproposed to solve the following questions where it is impos-sible to make all objective function values of multiobjectiveoptimization optimal values [32 33] Pareto optimality isfirst applied to the area of income distribution and economyNow Pareto optimality has been extended to engineering andmultiobjective optimization research On the basis of previ-ous proposedmethods themodified ant colony optimizationalgorithm with first objective function and second objectivefunction the first objective function is AIC score with logis-tic regression and related parameters the second objectivefunction is explain score For the first objective functionsthe lower score of the objective function indicates the highpotential relationship between disease phenotype trait andSNPs [34] For the second objective functions the higherscore of the objective function indicates the high potential

relationship between the disease phenotype trait and SNPsThe target of fast two-stage ant colony optimization algorithmis to find the epistasis effect among SNPs and extract real SNPsubset with respect to the above proposed methods

In the real GWAS datasets an identified SNP subset mayperform the best compared with other method solutions interms of one-objective function but SNP subsetmay performpoorly in terms of another objective functionThus the targetis how to select better SNP subset with respect to both objec-tive functions In practical application rare subset performsbetter than other solutions while satisfying both conditionsThus for a framework with two-objective functions it ishard and even impossible to calculate the global optimalsolution On the basis of previous studies [28 34 35] weadopt Pareto optimality to find the practical optimal solutionWe first compare the two solutions in terms of GWAS SNPsubset a solution named S1 and another solution namedS2 comparing S1 and S2 only have two consequences oneresult is one solution dominates the other another result isS1 does not dominate S2 in turn the solution S2 does notdominate S1 Based on the mind of Pareto optimality weconsider S1 dominates S2 if they satisfy the following twoconditions The first condition is the value of 119891119890(S1) is nothigher than 119891119890(S2) for those two-objective functions Thesecond condition is the objective function119891119890(S1) is lower than119891119890(S2) for at least one-objective function The function 119891119890denotes the objective function modified AIC score objectivefunction and explain score objective function The 119890 equal toone denotes the first objective function the 119890 equal to twodenotes the second objective function If solutions S1 and S2satisfy the above two conditions we say solution S1 is a non-dominated solution in turn we say solution S2 is a dominatedsolution Based on above Pareto optimality approach andtwo-objective functions all solutions can be divided into twokinds one is nondominated set and another is dominated setFinally nondominated sets contain many solutions and allthe solutions from our proposedmethod with respect to two-objective functions now our goal is to find a nondominatedset which is the best under certain conditions

Next we will use the judgment rule mentioned earlier tosort the solutions of nondominated sets to find the optimalnondominated set Specifically in the first case 1198911(S2) islarger than 1198911(S1) at the same time 119891119890(S2) is larger than1198912(S1) In the second case 1198911(S2) equals 1198911(S1) at the sametime 1198912(S2) is larger than 1198912(S1) In the third case 1198911(S2) islarger than 1198911(S1) at the same time 1198912(S2) equals 1198912(S1)

24 Fisher Exact Test for Experimental Results Fisher exacttest is used in contingency tables to get a statistical signifi-cance [36ndash38] Although in practice it is used in small sizesample it is can also be used in all sample sizes Ronald Fisherfirst proposed this method and Fisher exact test is one kindof exact tests

In terms of our GWAS datasets research article on thebasis of unified framework which contains fast adaptive antcolony optimization (ACO) algorithm Akaike InformationCriterion (AIC) score explain score and Pareto optimalitywe can obtain the final result which is a nondominated SNPset in this section wewill use Fisher exact test to exhaustively

Complexity 5

search for the epistasis effect Fisher exact test is based onhypergeometric distribution the 119875 value in the Fisher exacttest is accurate for all individual samples Fisher exact test isused on the basis of contingency table The null hypothesis isthat the identified SNP subset and disease are not associatedThe alternative hypothesis is that SNP subset affects theexpression of the disease when the Fisher exact testrsquos 119875 valueis significant when 119875 value is less than predetermined valuesuch as 005 or smaller value Our proposed method willidentify significance SNP subsets

25 Power Test In previous sectionwe introduce each part ofour proposed fast adaptive ant colony optimization algorithmfor detecting SNP epistasis Our proposed unified frameworkcontains fast adaptive ant colony optimization algorithmAkaike Information Criterion (AIC) score explain scorePareto optimality and modified Fisher exact test In thissection we introduce how to verify the significance of theresults We construct 100 datasets according to the sameparametersThenweuse the traditional power test tomeasurethe effect of methods The power test is defined as follows

Power = |119878119863|100 (8)

where |119878119863| denotes the number of disease related datasetswhich were correctly selected from 100 datasets Only usingthe single test criterion may not clearly show the qualityof results We use precision recall standard to measure truepositive rate and false positive rate Precision recall criteriahave been widely used in classification model evaluationmodel [39 40] In pattern recognition and informationretrieval with binary classification precision also calledpositive predictive value is the fraction of retrieved instancesthat are relevant while recall also known as sensitivity isthe fraction of relevant instances that are retrieved [26] Bothprecision and recall are therefore based on an understandingand measure of relevance We use precision recall criteriato determine whether the classification results are good orbad The precision recall criteria can avoid the imbalanceproblem of precision recall numbers In our research thenumber of precision and recall always differs greatly In termsof the SNP epistasis research precision is also known aspositive predictive value equivalent to the true disease relatedSNP subsets recall is also known as sensitivity or negativeequivalent to the true disease unrelated SNP subsets If weuse only one judgment criterion thus false positive ratesingle indicator cannotmake the real result clearWe use falsepositive rate and true positive rate to measure the real resultThis is why we use precision and recall We also use 1198651 score(also 119865 score or 119865 measure) to measure the precision recalltest accuracyThe precision and recall will be introduced nextwith confusion matrix (Figure 1)

recall = TPTP + FN

precision = TPTP + FP

1198651 =precision sdot recallprecision + recall

(9)

Predicted classAssociated Nonassociated

True

clas

s Associated True positive(TP)

False negative(FN)

Nonassociated False positive(FP)

True negative(TN)

Figure 1 Precision recall explanation matrix

The precision also known as specificity denotes truepositive number ratio in the result through the number oftrue positives divided by the sum of true positive number andfalse positive number precision is often used to report falsepositive rate of an algorithmrsquos false positive rate The recallalso known as sensitivity denotes true positive ration in thesum of true positives and false negative In terms of SNPsselection problem the larger the recall value is the largerthe number of real true disease-related SNP combinationscan be found Simultaneously the larger the precision valuethe larger the number of real true disease-related SNPcombinations account for a high proportion of the identifiedSNP combinations The criterion 119865measure is the harmonicmean of precision and recall which is a synthesized measurecombining both precision and recall [41]

3 Simulation Experiments

31 Compared with One-Objective Function In this sectionwe use simulation data to compare our proposed methodwith other existing methods In order to avoid data favorcaused by the model we adopt BEAM package to generatesimulation datasets [17] Data was simulated following threegenetic models (1) additive model (2) epistatic interactionswith multiplicative effects and (3) epistatic interactions withthreshold effects In order to introduce our experiments theadditive model is referred to as ADDME The model aboutepistatic interactions with multiplicative effects is referred toas EIME The epistatic interactions with threshold effects arereferred to as EITEME In the next section we will use theshort name to indicate the corresponding data model

Because ourmethod is two-objective-based SNP epistasissearch method first we compared our proposed methodwith existing single objective-based exhaustive SNP epis-tasis search method to demonstrate the effectiveness oftwo-objective function SNP epistasis subset search methodSecond we compare our proposed method with recentlyproposed method BEAM [17] generic ACO algorithm andAntEpiSeeker [16] In the one-objective function SNP epis-tasis search method the objective function is used to scoreevery SNP combinations in general the score for everySNP combination is not the same Based on the nature ofthe method low score indicates the association betweenSNP combination and disease is relatively small high scoreindicates the association between SNP combination anddisease is relatively large Then the one-objective functionranks all SNP combinations based on the scoresHowever thetwo-objective-based SNP epistasis search method is to find aset of nondominated results and every nondominated SNP

6 Complexity

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10 r

2= 07 r

2= 10

r2= 07 r

2= 10

Figure 2 Power test comparisons between one-objective and two-objective methods on three different model with MAF value 01 02 and05

epistasis resultsrsquo score is the same To ensure fairness for theone-objective function we collect the same number as two-objective-based SNP epistasis search method from the top ofone-objective-based SNP rank The comparing results showthat the two-objective-based SNP epistasis search method isbetter than one-objective-based SNP epistasis search methodin three simulation data models In terms of two singleobjective-based SNP epistasis search methods the resultsof one-objective-based SNP epistasis search methods aresimilar with the other one-objective-based SNP epistasissearch methods The simulation data experiment resultsshow the effectiveness of two-objective-based SNP epistasissearch method and the poor experimental results showthe insufficiency of one-objective functions The experimentresults are shown in Figure 2 The abscissa of Figure 2 isminor allele frequency (MAF) which is assigned 01 02 and05We generate the simulate dataset and study the parametersetting following many previous studies [17 42ndash44] For eachsimulate dataset of parameter combination we generated 100datasets which contain 2000 experimental samples (1000case samples and 1000 control samples) and 1000 SNPs weresimulated We evaluate the algorithm performance throughcalculating the ratio of real number identified following the

significance level 001 which is adjusted after Bonferronicorrection The parameter 120582 was set to 03 for ADDME and02 for EIME and EITEME The parameter range of linkagedisequilibrium between SNPs is 1199032 from 07 to 1

32 Compared with Benchmark Methods After comparingwith single objective function We compare our proposedmethod with existing method The performance of our pro-posedmethod was evaluated by comparison with benchmarkmethods [45] In many previous studies the authors havealready discussed the parameter settings problem In thissection we set the parameters according to the existing strat-egy We evaluated performance of FAACOSE by comparingwith two recent methods BEAM generic ACO algorithmand the AntEpiSeeker we use BEAM package and previousparameter strategy to generate simulate dataset Be awareof the fact that the generic ACO algorithm could not selectlarger size SNP set We use simulated dataset introduced inSection 31 We evaluate the algorithm performance throughcalculating the ratio of real number identified following thesignificance level 001 which is adjusted after Bonferroni cor-rectionWe generate simulate datasets following three geneticmodels ADDME EIME and EITEME Other parameters

Complexity 7

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10

r2= 07 r

2= 10

r2= 07 r

2= 10

Figure 3 Power comparisons between existing methods and FAACOSE on three models

for data simulation were the effective size 120582 a measure ofmarginal effects as defined by Marchini et al [42] linkagedisequilibrium between SNPs measured by 1199032 and minorallele frequencies (MAFs) 120582 was set to 03 for ADDME and02 for EIME and EITEME For 1199032 two values (07 and 10)were used for each model For MAFs three values (01 02and 05) were considered The parameters for BEAM wereset as default The parameter settings for AntEpiSeeker werelarge dataset size = 6 small dataset size = 3 count large =150 count small = 300 epistasis model = 2 ant count = 1000120572 = 1 120588 = 005 and 1205910 = 100 (also available in the softwarepackage documentation of AntEpiSeeker)The parameters ofthe generic ACO algorithm were set as ant count = 1000120572 = 1 120588 = 005 1205910 = 100 count (number of iterations) = 900and epistasis model = 2 The comparison of detection powerfor BEAM genetic ACO algorithm and the AntEpiSeekeris presented in Figure 3 The results show that FAACOSEoutperforms BEAM and the generic ACO in all parametersettings and is superior to AntEpiSeeker in most parametersettings

In this section we compare our proposed method withbenchmark methods First we use power test to detect howmany real SNP subsets can be found with our proposed

method Second we use precision recall and 1198651 score toevaluate the results Precision denotes how many right SNPsubsets in the total final identified SNP subsets Recalldenotes the number of right SNP subsets that are identified1198651 score is an indicator used in statistics to measure theaccuracy of two classification models It takes into accountthe precision and recall of the classificationmodel simultane-ously 1198651 score can be seen as a weighted average of precisionand recall its maximum is 1 and minimum is 0l We showthe results of FAACOSE with other methods on 1199032 = 07 andMAF = 02 in Table 1

The 1198651 score of FAACOSE is better than other methodsWe run the same experiment on datasets with differentparameter combination In all eighteen datasets FAACOSEhas the highest 1198651 score in fifteen of them In real GWASdataset experiment the sample size of real dataset is hugeThe efficiency of the method is also to be considered Theexperimental results indicate that our proposed method ismore effective method in real GWAS dataset AntEpiSeekeris the most efficient algorithm among three methods Indifferent data samples we compare run time of AntEpiSeekerand FAACOSE And averaging the results FAACOSE is faster30 than AntEpiSeeker

8 Complexity

Table 1 1198651 score comparison between FAACOSE and other meth-ods

Model Method Recall Precision 1198651 score

ADDME

BEAM 029 015 020gACO 045 036 040

AntEpiSeeker 06 055 057FAACOSE 082 074 078

EIME

BEAM 03 045 036gACO 035 032 033

AntEpiSeeker 034 056 042FAACOSE 09 082 086

EITEME

BEAM 01 014 012gACO 015 020 017

AntEpiSeeker 054 046 050FAACOSE 065 062 063

4 Application to Real SNP Dataset

Late-Onset Alzheimerrsquos Disease (LOAD) is themost frequentform of Alzheimerrsquos disease which is frequently identifiedin people older than 65 years the LOAD or AD is a kindof chronic neurodegenerative diseases which is frequentlynot obvious in the onset of the disease and slowly changesdementia over time It is the cause of 60 to 70 of casesof dementia The most common early symptom is difficultyin remembering recent events (short-term memory loss)As the disease advances symptoms can include problemswith language disorientation (including easily getting lost)mood swings loss of motivation not managing self-care andbehavioural issues LOAD is a multifactor genetic disease itsetiology and pathogenesis have not yet been fully understoodThe apolipoprotein (APOE) gene is a definite risk factor forLOAD The APOE gene has three forms The 1205762 1205763 and1205764 the effect of 1205762 is positive 1205762 can effectively preventthe occurrence of the disease There has been researchreport that genetic variant 1205764 has induced effect on diseaseBetween 40 and 80 of people with AD possess at least oneAPOE 1205764 allele [46] Previous studies have reported somesignificant SNPs in the field of Genome-Wide AssociationStudies [47] Reference [47] reported that 10 SNPs in thearea of GAB2 gene have an epistasis effect with APOE e4in relation to Late-Onset Alzheimerrsquos Disease We appliedour proposed method to the LOAD GWAS dataset fromwebsite httpswwwtgenorg [47] After data preprocessingthe real biological dataset contains 1368 samples [48 49] Ofthese 836 samples were identified case studies the remaining532 samples were normal sample [50 51] Each sample ofreal biological dataset contains 309316 SNPs with genotypeinformation APOE status and LOAD status [52] For thenext calculation we code the APOE gene state with a binaryvariable the value 1 represents the 1205764 variant and in turn thevalue 0 represents the other three variants [53] An SNP locuswas coded as a quaternary variable considering the missing

Table 2 The number of selected SNPs of FAACOSE in LOADdataset

SNP rsrs7756992 rs611154 rs191840 rs7294919rs1887922 rs304900 rs1999764 rs1385600rs2373115 rs7101429 rs609812 rs613375rs1007837 rs2510038 rs4945261 rs10793294rs520227 rs191740 rs7924284 rs829465rs602106 rs7174511 rs606889 rs602192

state The high potential LOAD disease related SNP is shownin Table 2

5 Discussions

In this paper we proposed a novel ant colony optimizationbased fast search method for the discovery of epistasis inter-actions in large scale real GWAS dataset FAACOSEwas eval-uated through comparison with existing three approaches onboth simulated and real datasets FAACOSE which adopts afast adaptive optimization procedure is amodified algorithmderived from the generic ACO And with two-objectivefunction to demonstrate the advantages of fast adaptiveant colony optimization algorithm we also compared theperformance of the FAACOSE with that of the generic ACO

In future studies we intend to findmore powerful model-ing approaches ant colony optimization algorithmwith fasterconvergence objective functions which can better measuredata structure of GWAS dataset more efficient optimal SNPsubset search and identification strategies that can be com-bined and flexibly embedded into our SNP epistasis searchframework to find more accurate SNP subset With the rapiddevelopment of bioinformatics more and more biologicalinformation related to disease is identified More and morestudies will consider prior knowledge An important futureresearch direction is that we will try to apply expert priorknowledge to GWAS dataset with our proposed method thatis the fast adaptive ant colony optimization algorithm fordetecting SNP epistasis Expert prior knowledge can improvethe power and efficiency of epistasis detection

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is partly supported by National NaturalScience Foundation of China (Grant nos 6152010600631571364 61732012 61532008 U1611265 61672382 6140233461472280 61472173 61572447 61672203 61472282 and61373098) and China Postdoctoral Science Foundation(Grant nos 2014M561513 2015M580352 2017M611619 and2016M601646) Guangxi Bagui Scholars Program SpecialFund

Complexity 9

References

[1] J N Hirschhorn and M J Daly ldquoGenome-wide associationstudies for common diseases and complex traitsrdquo Nature Re-views Genetics vol 6 no 2 pp 95ndash108 2005

[2] B N Howie P Donnelly and J Marchini ldquoA flexible andaccurate genotype imputation method for the next generationof genome-wide association studiesrdquo PLoS Genetics vol 5 no6 Article ID e1000529 2009

[3] T AManolio F S Collins N J Cox et al ldquoFinding themissingheritability of complex diseasesrdquo Nature vol 461 no 7265 pp747ndash753 2009

[4] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[5] B Stubbs D Vancampfort M De Hert and A J MitchellldquoThe prevalence and predictors of type two diabetes mellitus inpeoplewith schizophrenia a systematic review and comparativemeta-analysisrdquo Acta Psychiatrica Scandinavica vol 132 no 2pp 144ndash157 2015

[6] K P Liao ldquoCardiovascular disease in patients with rheumatoidarthritisrdquo Trends in Cardiovascular Medicine vol 27 no 2 pp136ndash140 2017

[7] YMao N R London LMa D Dvorkin and Y Da ldquoDetectionof SNP epistasis effects of quantitative traits using an extendedKempthorne modelrdquo Physiological Genomics vol 28 no 1 pp46ndash52 2006

[8] W Zhang J Zhu E E Schadt and J S Liu ldquoA Bayesianpartition method for detecting pleiotropic and epistatic eQTLmodulesrdquo PLoS Computational Biology vol 6 no 1 Article IDe1000642 2010

[9] M Kang C Zhang H-W Chun C Ding C Liu and JGao ldquoEQTL epistasis Detecting epistatic effects and inferringhierarchical relationships of genes in biological pathwaysrdquoBioinformatics vol 31 no 5 pp 656ndash664 2015

[10] H Lin D Chen P Huang et al ldquoSNP interaction patternidentifier (SIPI) an intensive search for SNPndashSNP interactionpatternsrdquo Bioinformatics 2016

[11] R L Prentice and L Qi ldquoAspects of the design and analysisof high-dimensional SNP studies for disease risk estimationrdquoBiostatistics vol 7 no 3 pp 339ndash354 2006

[12] S-P Deng L Zhu and D-S Huang ldquoMining the bladdercancer-associated genes by an integrated strategy for the con-struction and analysis of differential co-expression networksrdquoBMC Genomics vol 16 no 3 article no S4 2015

[13] S-P Deng and D-S Huang ldquoSFAPS An R package forstructurefunction analysis of protein sequences based oninformational spectrum methodrdquo Methods vol 69 no 3 pp207ndash212 2014

[14] J H Moore J M Lamb N J Brown and D E Vaughan ldquoAcomparison of combinatorial partitioning and linear regressionfor the detection of epistatic effects of the ACE ID and PAI-1 4G5G polymorphisms on plasma PAI-1 Levelsrdquo ClinicalGenetics vol 62 no 1 pp 74ndash79 2002

[15] B M Michael R E Neapolitan X Jiang and V ShyamldquoLearning genetic epistasis using Bayesian network scoringcriteriardquo BMC Bioinformatics vol 12 no 1 89 pages 2011

[16] Y Wang X Liu K Robbins and R Rekaya ldquoAntEpiSeekerdetecting epistatic interactions for case-control studies using atwo-stage ant colony optimization algorithmrdquo BMC ResearchNotes vol 3 article 117 2010

[17] Y Zhang and J S Liu ldquoBayesian inference of epistatic interac-tions in case-control studiesrdquo Nature Genetics vol 39 no 9 pp1167ndash1173 2007

[18] M Dorigo M Birattari and C Blum ldquoAnt colony optimizationand swarm intelligencerdquo SpringerVerlag vol 5217 no 8 pp 767ndash771 2004

[19] T Stutzle M Lopez-Ibanez P Pellegrini et al ldquoParameteradaptation in ant colony optimizationrdquoAutonomous Search vol9783642214349 pp 191ndash215 2012

[20] C Blum and M Sampels ldquoAn ant colony optimization algo-rithm for shop scheduling problemsrdquo Journal of MathematicalModelling and Algorithms vol 3 no 3 pp 285ndash308 2004

[21] R Musa J-P Arnaout and H Jung ldquoAnt colony optimizationalgorithm to solve for the transportation problem of cross-docking networkrdquo Computers and Industrial Engineering vol59 no 1 pp 85ndash92 2010

[22] G N Varela and M C Sinclair ldquoAnt colony optimisation forvirtual-wavelength-path routing and wavelength allocationrdquo inProceedings of the 1999 Congress on Evolutionary Computation(CEC rsquo99) pp 1809ndash1816 Washington DC USA July 1999

[23] K M Sim andW H Sun ldquoAnt colony optimization for routingand load-balancing survey and new directionsrdquo SystemsManampCybernetics Part A Systems Humans IEEE Transactions on vol33 no 5 pp 560ndash572 2003

[24] S-H Ngo X Jiang and S Horiguchi ldquoAdaptive routing andwavelength assignment using ant-based algorithmrdquo in Proceed-ings of the 2004 12th IEEE International Conference onNetworksICON 2004 - Unity in Diversity pp 482ndash486 November 2004

[25] S I Vrieze ldquoModel selection and psychological theory adiscussion of the differences between the Akaike informationcriterion (AIC) and the Bayesian information criterion (BIC)rdquoPsychological Methods vol 17 no 2 pp 228ndash243 2012

[26] D-S Huang and J-X Du ldquoA constructive hybrid structureoptimization methodology for radial basis probabilistic neuralnetworksrdquo IEEE Transactions onNeural Networks vol 19 no 12pp 2099ndash2115 2008

[27] B V North D Curtis and P C Sham ldquoApplication of logisticregression to case-control association studies involving twocausative locirdquo Human Heredity vol 59 no 2 pp 79ndash87 2005

[28] P-J Jing and H-B Shen ldquoMACOED A multi-objective antcolony optimization algorithm for SNP epistasis detection ingenome-wide association studiesrdquo Bioinformatics vol 31 no 5pp 634ndash641 2015

[29] N Ryman ldquoCHIFISH A computer program testing for geneticheterogeneity at multiple loci using chi-square and Fisherrsquosexact testrdquo Molecular Ecology Notes vol 6 no 1 pp 285ndash2872006

[30] C R Mehta and N R Patel ldquoA network algorithm for perform-ing Fisherrsquos exact test in r times c contingency tablesrdquo Journal of theAmerican Statistical Association vol 78 no 382 pp 427ndash4341983

[31] B Sobrino M Brion and A Carracedo ldquoSNPs in forensicgenetics A review on SNP typing methodologiesrdquo ForensicScience International vol 154 no 2-3 pp 181ndash194 2005

[32] O Shoval H Sheftel G Shinar et al ldquoEvolutionary trade-offs pareto optimality and the geometry of phenotype spacerdquoScience vol 336 no 6085 pp 1157ndash1160 2012

[33] D-SHuang andW Jiang ldquoA general CPL-AdSmethodology forfixing dynamic parameters in dual environmentsrdquo IEEE Trans-actions on Systems Man and Cybernetics Part B Cyberneticsvol 42 no 5 pp 1489ndash1500 2012

10 Complexity

[34] L Zhu W-L Guo S-P Deng and D-S Huang ldquoChIP-PITenhancing the analysis of chip-seq data using convex-relaxedpair-wise interaction tensor decompositionrdquo IEEEACM Trans-actions onComputational Biology and Bioinformatics vol 13 no1 pp 55ndash63 2016

[35] C Angione G Carapezza J Costanza P Lio and G NicosialdquoPareto optimality in organelle energy metabolism analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 4 pp 1032ndash1044 2013

[36] R A Fisher ldquoOn the Interpretation of 1205942 from ContingencyTables and the Calculation of Prdquo Journal of the Royal StatisticalSociety vol 85 no 1 p 87 1922

[37] A Agresti ldquoA survey of exact inference for contingency tablesrdquoStatistical Science vol 7 no 1 pp 131ndash153 1992

[38] B Wenzheng C Yuehui and W Dong ldquoPrediction of proteinstructure classes with flexible neural treerdquo Bio-Medical Materi-als and Engineering vol 24 no 6 pp 3797ndash3806 2014

[39] L Zhu Z-H You D-S Huang and B Wang ldquot-LSE anovel robust geometric approach for modeling protein-proteininteraction networksrdquo PLoS ONE vol 8 no 4 Article IDe58368 2013

[40] C-H Zheng L Zhang V T-Y Ng C K Shiu and D-SHuang ldquoMolecular pattern discovery based on penalized ma-trix decompositionrdquo IEEEACMTransactions onComputationalBiology and Bioinformatics vol 8 no 6 pp 1592ndash1603 2011

[41] D-S Huang and H-J Yu ldquoNormalized feature vectors a novelalignment-free sequence comparison method based on thenumbers of adjacent amino acidsrdquo IEEEACM Transactions onComputational Biology and Bioinformatics vol 10 no 2 pp457ndash467 2013

[42] J Marchini P Donnelly and L R Cardon ldquoGenome-widestrategies for detecting multiple loci that influence complexdiseasesrdquo Nature Genetics vol 37 no 4 pp 413ndash417 2005

[43] R JiangW Tang XWu andW Fu ldquoA random forest approachto the detection of epistatic interactions in case-control studiesrdquoBMC Bioinformatics vol 10 no 1 article S65 2009

[44] J Kruppa A Ziegler and I R Konig ldquoRisk estimation and riskprediction using machine-learning methodsrdquoHuman Geneticsvol 131 no 10 pp 1639ndash1654 2012

[45] D-S Huang and C-H Zheng ldquoIndependent componentanalysis-based penalized discriminantmethod for tumor classi-fication using gene expression datardquo Bioinformatics vol 22 no15 pp 1855ndash1862 2006

[46] RWMahley K HWeisgraber and Y Huang ldquoApolipoproteinE4 a causative factor and therapeutic target in neuropathologyincluding Alzheimerrsquos diseaserdquo Proceedings of the National Aca-demy of Sciences of the United States of America vol 103 no 15pp 5644ndash5651 2006

[47] E M Reiman J A Webster A J Myers et al ldquoGAB2 allelesmodify Alzheimerrsquos Risk in APOE 1205764 carriersrdquo Neuron vol 54no 5 pp 713ndash720 2007

[48] C-H Zheng D-S Huang L Zhang and X-Z Kong ldquoTumorclustering using nonnegative matrix factorization with gene se-lectionrdquo IEEE Transactions on Information Technology in Bio-medicine vol 13 no 4 pp 599ndash607 2009

[49] S-P Deng L Zhu and D-S Huang ldquoPredicting hub genesassociated with cervical cancer through gene co-expressionnetworksrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 13 no 1 pp 27ndash35 2016

[50] L Zhu S-P Deng and D-S Huang ldquoA two-stage geometricmethod for pruning unreliable links in protein-protein net-worksrdquo IEEE Transactions on Nanobioscience vol 14 no 5 pp528ndash534 2015

[51] D-S Huang L Zhang KHan S Deng K Yang andH ZhangldquoPrediction of protein-protein interactions based on protein-protein correlation using least squares regressionrdquo CurrentProtein and Peptide Science vol 15 no 6 pp 553ndash560 2014

[52] D-S Huang Systematic Theory of Neural Networks for Pat-ternRecognition Publishing House of Electronic Industry of ChinaMay 1996

[53] D-S Huang ldquoRadial basis probabilistic neural networksmodeland applicationrdquo International Journal of Pattern Recognitionand Artificial Intelligence vol 13 no 7 pp 1083ndash1101 1999

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

Complexity 5

search for the epistasis effect Fisher exact test is based onhypergeometric distribution the 119875 value in the Fisher exacttest is accurate for all individual samples Fisher exact test isused on the basis of contingency table The null hypothesis isthat the identified SNP subset and disease are not associatedThe alternative hypothesis is that SNP subset affects theexpression of the disease when the Fisher exact testrsquos 119875 valueis significant when 119875 value is less than predetermined valuesuch as 005 or smaller value Our proposed method willidentify significance SNP subsets

25 Power Test In previous sectionwe introduce each part ofour proposed fast adaptive ant colony optimization algorithmfor detecting SNP epistasis Our proposed unified frameworkcontains fast adaptive ant colony optimization algorithmAkaike Information Criterion (AIC) score explain scorePareto optimality and modified Fisher exact test In thissection we introduce how to verify the significance of theresults We construct 100 datasets according to the sameparametersThenweuse the traditional power test tomeasurethe effect of methods The power test is defined as follows

Power = |119878119863|100 (8)

where |119878119863| denotes the number of disease related datasetswhich were correctly selected from 100 datasets Only usingthe single test criterion may not clearly show the qualityof results We use precision recall standard to measure truepositive rate and false positive rate Precision recall criteriahave been widely used in classification model evaluationmodel [39 40] In pattern recognition and informationretrieval with binary classification precision also calledpositive predictive value is the fraction of retrieved instancesthat are relevant while recall also known as sensitivity isthe fraction of relevant instances that are retrieved [26] Bothprecision and recall are therefore based on an understandingand measure of relevance We use precision recall criteriato determine whether the classification results are good orbad The precision recall criteria can avoid the imbalanceproblem of precision recall numbers In our research thenumber of precision and recall always differs greatly In termsof the SNP epistasis research precision is also known aspositive predictive value equivalent to the true disease relatedSNP subsets recall is also known as sensitivity or negativeequivalent to the true disease unrelated SNP subsets If weuse only one judgment criterion thus false positive ratesingle indicator cannotmake the real result clearWe use falsepositive rate and true positive rate to measure the real resultThis is why we use precision and recall We also use 1198651 score(also 119865 score or 119865 measure) to measure the precision recalltest accuracyThe precision and recall will be introduced nextwith confusion matrix (Figure 1)

recall = TPTP + FN

precision = TPTP + FP

1198651 =precision sdot recallprecision + recall

(9)

Predicted classAssociated Nonassociated

True

clas

s Associated True positive(TP)

False negative(FN)

Nonassociated False positive(FP)

True negative(TN)

Figure 1 Precision recall explanation matrix

The precision also known as specificity denotes truepositive number ratio in the result through the number oftrue positives divided by the sum of true positive number andfalse positive number precision is often used to report falsepositive rate of an algorithmrsquos false positive rate The recallalso known as sensitivity denotes true positive ration in thesum of true positives and false negative In terms of SNPsselection problem the larger the recall value is the largerthe number of real true disease-related SNP combinationscan be found Simultaneously the larger the precision valuethe larger the number of real true disease-related SNPcombinations account for a high proportion of the identifiedSNP combinations The criterion 119865measure is the harmonicmean of precision and recall which is a synthesized measurecombining both precision and recall [41]

3 Simulation Experiments

31 Compared with One-Objective Function In this sectionwe use simulation data to compare our proposed methodwith other existing methods In order to avoid data favorcaused by the model we adopt BEAM package to generatesimulation datasets [17] Data was simulated following threegenetic models (1) additive model (2) epistatic interactionswith multiplicative effects and (3) epistatic interactions withthreshold effects In order to introduce our experiments theadditive model is referred to as ADDME The model aboutepistatic interactions with multiplicative effects is referred toas EIME The epistatic interactions with threshold effects arereferred to as EITEME In the next section we will use theshort name to indicate the corresponding data model

Because ourmethod is two-objective-based SNP epistasissearch method first we compared our proposed methodwith existing single objective-based exhaustive SNP epis-tasis search method to demonstrate the effectiveness oftwo-objective function SNP epistasis subset search methodSecond we compare our proposed method with recentlyproposed method BEAM [17] generic ACO algorithm andAntEpiSeeker [16] In the one-objective function SNP epis-tasis search method the objective function is used to scoreevery SNP combinations in general the score for everySNP combination is not the same Based on the nature ofthe method low score indicates the association betweenSNP combination and disease is relatively small high scoreindicates the association between SNP combination anddisease is relatively large Then the one-objective functionranks all SNP combinations based on the scoresHowever thetwo-objective-based SNP epistasis search method is to find aset of nondominated results and every nondominated SNP

6 Complexity

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10 r

2= 07 r

2= 10

r2= 07 r

2= 10

Figure 2 Power test comparisons between one-objective and two-objective methods on three different model with MAF value 01 02 and05

epistasis resultsrsquo score is the same To ensure fairness for theone-objective function we collect the same number as two-objective-based SNP epistasis search method from the top ofone-objective-based SNP rank The comparing results showthat the two-objective-based SNP epistasis search method isbetter than one-objective-based SNP epistasis search methodin three simulation data models In terms of two singleobjective-based SNP epistasis search methods the resultsof one-objective-based SNP epistasis search methods aresimilar with the other one-objective-based SNP epistasissearch methods The simulation data experiment resultsshow the effectiveness of two-objective-based SNP epistasissearch method and the poor experimental results showthe insufficiency of one-objective functions The experimentresults are shown in Figure 2 The abscissa of Figure 2 isminor allele frequency (MAF) which is assigned 01 02 and05We generate the simulate dataset and study the parametersetting following many previous studies [17 42ndash44] For eachsimulate dataset of parameter combination we generated 100datasets which contain 2000 experimental samples (1000case samples and 1000 control samples) and 1000 SNPs weresimulated We evaluate the algorithm performance throughcalculating the ratio of real number identified following the

significance level 001 which is adjusted after Bonferronicorrection The parameter 120582 was set to 03 for ADDME and02 for EIME and EITEME The parameter range of linkagedisequilibrium between SNPs is 1199032 from 07 to 1

32 Compared with Benchmark Methods After comparingwith single objective function We compare our proposedmethod with existing method The performance of our pro-posedmethod was evaluated by comparison with benchmarkmethods [45] In many previous studies the authors havealready discussed the parameter settings problem In thissection we set the parameters according to the existing strat-egy We evaluated performance of FAACOSE by comparingwith two recent methods BEAM generic ACO algorithmand the AntEpiSeeker we use BEAM package and previousparameter strategy to generate simulate dataset Be awareof the fact that the generic ACO algorithm could not selectlarger size SNP set We use simulated dataset introduced inSection 31 We evaluate the algorithm performance throughcalculating the ratio of real number identified following thesignificance level 001 which is adjusted after Bonferroni cor-rectionWe generate simulate datasets following three geneticmodels ADDME EIME and EITEME Other parameters

Complexity 7

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10

r2= 07 r

2= 10

r2= 07 r

2= 10

Figure 3 Power comparisons between existing methods and FAACOSE on three models

for data simulation were the effective size 120582 a measure ofmarginal effects as defined by Marchini et al [42] linkagedisequilibrium between SNPs measured by 1199032 and minorallele frequencies (MAFs) 120582 was set to 03 for ADDME and02 for EIME and EITEME For 1199032 two values (07 and 10)were used for each model For MAFs three values (01 02and 05) were considered The parameters for BEAM wereset as default The parameter settings for AntEpiSeeker werelarge dataset size = 6 small dataset size = 3 count large =150 count small = 300 epistasis model = 2 ant count = 1000120572 = 1 120588 = 005 and 1205910 = 100 (also available in the softwarepackage documentation of AntEpiSeeker)The parameters ofthe generic ACO algorithm were set as ant count = 1000120572 = 1 120588 = 005 1205910 = 100 count (number of iterations) = 900and epistasis model = 2 The comparison of detection powerfor BEAM genetic ACO algorithm and the AntEpiSeekeris presented in Figure 3 The results show that FAACOSEoutperforms BEAM and the generic ACO in all parametersettings and is superior to AntEpiSeeker in most parametersettings

In this section we compare our proposed method withbenchmark methods First we use power test to detect howmany real SNP subsets can be found with our proposed

method Second we use precision recall and 1198651 score toevaluate the results Precision denotes how many right SNPsubsets in the total final identified SNP subsets Recalldenotes the number of right SNP subsets that are identified1198651 score is an indicator used in statistics to measure theaccuracy of two classification models It takes into accountthe precision and recall of the classificationmodel simultane-ously 1198651 score can be seen as a weighted average of precisionand recall its maximum is 1 and minimum is 0l We showthe results of FAACOSE with other methods on 1199032 = 07 andMAF = 02 in Table 1

The 1198651 score of FAACOSE is better than other methodsWe run the same experiment on datasets with differentparameter combination In all eighteen datasets FAACOSEhas the highest 1198651 score in fifteen of them In real GWASdataset experiment the sample size of real dataset is hugeThe efficiency of the method is also to be considered Theexperimental results indicate that our proposed method ismore effective method in real GWAS dataset AntEpiSeekeris the most efficient algorithm among three methods Indifferent data samples we compare run time of AntEpiSeekerand FAACOSE And averaging the results FAACOSE is faster30 than AntEpiSeeker

8 Complexity

Table 1 1198651 score comparison between FAACOSE and other meth-ods

Model Method Recall Precision 1198651 score

ADDME

BEAM 029 015 020gACO 045 036 040

AntEpiSeeker 06 055 057FAACOSE 082 074 078

EIME

BEAM 03 045 036gACO 035 032 033

AntEpiSeeker 034 056 042FAACOSE 09 082 086

EITEME

BEAM 01 014 012gACO 015 020 017

AntEpiSeeker 054 046 050FAACOSE 065 062 063

4 Application to Real SNP Dataset

Late-Onset Alzheimerrsquos Disease (LOAD) is themost frequentform of Alzheimerrsquos disease which is frequently identifiedin people older than 65 years the LOAD or AD is a kindof chronic neurodegenerative diseases which is frequentlynot obvious in the onset of the disease and slowly changesdementia over time It is the cause of 60 to 70 of casesof dementia The most common early symptom is difficultyin remembering recent events (short-term memory loss)As the disease advances symptoms can include problemswith language disorientation (including easily getting lost)mood swings loss of motivation not managing self-care andbehavioural issues LOAD is a multifactor genetic disease itsetiology and pathogenesis have not yet been fully understoodThe apolipoprotein (APOE) gene is a definite risk factor forLOAD The APOE gene has three forms The 1205762 1205763 and1205764 the effect of 1205762 is positive 1205762 can effectively preventthe occurrence of the disease There has been researchreport that genetic variant 1205764 has induced effect on diseaseBetween 40 and 80 of people with AD possess at least oneAPOE 1205764 allele [46] Previous studies have reported somesignificant SNPs in the field of Genome-Wide AssociationStudies [47] Reference [47] reported that 10 SNPs in thearea of GAB2 gene have an epistasis effect with APOE e4in relation to Late-Onset Alzheimerrsquos Disease We appliedour proposed method to the LOAD GWAS dataset fromwebsite httpswwwtgenorg [47] After data preprocessingthe real biological dataset contains 1368 samples [48 49] Ofthese 836 samples were identified case studies the remaining532 samples were normal sample [50 51] Each sample ofreal biological dataset contains 309316 SNPs with genotypeinformation APOE status and LOAD status [52] For thenext calculation we code the APOE gene state with a binaryvariable the value 1 represents the 1205764 variant and in turn thevalue 0 represents the other three variants [53] An SNP locuswas coded as a quaternary variable considering the missing

Table 2 The number of selected SNPs of FAACOSE in LOADdataset

SNP rsrs7756992 rs611154 rs191840 rs7294919rs1887922 rs304900 rs1999764 rs1385600rs2373115 rs7101429 rs609812 rs613375rs1007837 rs2510038 rs4945261 rs10793294rs520227 rs191740 rs7924284 rs829465rs602106 rs7174511 rs606889 rs602192

state The high potential LOAD disease related SNP is shownin Table 2

5 Discussions

In this paper we proposed a novel ant colony optimizationbased fast search method for the discovery of epistasis inter-actions in large scale real GWAS dataset FAACOSEwas eval-uated through comparison with existing three approaches onboth simulated and real datasets FAACOSE which adopts afast adaptive optimization procedure is amodified algorithmderived from the generic ACO And with two-objectivefunction to demonstrate the advantages of fast adaptiveant colony optimization algorithm we also compared theperformance of the FAACOSE with that of the generic ACO

In future studies we intend to findmore powerful model-ing approaches ant colony optimization algorithmwith fasterconvergence objective functions which can better measuredata structure of GWAS dataset more efficient optimal SNPsubset search and identification strategies that can be com-bined and flexibly embedded into our SNP epistasis searchframework to find more accurate SNP subset With the rapiddevelopment of bioinformatics more and more biologicalinformation related to disease is identified More and morestudies will consider prior knowledge An important futureresearch direction is that we will try to apply expert priorknowledge to GWAS dataset with our proposed method thatis the fast adaptive ant colony optimization algorithm fordetecting SNP epistasis Expert prior knowledge can improvethe power and efficiency of epistasis detection

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is partly supported by National NaturalScience Foundation of China (Grant nos 6152010600631571364 61732012 61532008 U1611265 61672382 6140233461472280 61472173 61572447 61672203 61472282 and61373098) and China Postdoctoral Science Foundation(Grant nos 2014M561513 2015M580352 2017M611619 and2016M601646) Guangxi Bagui Scholars Program SpecialFund

Complexity 9

References

[1] J N Hirschhorn and M J Daly ldquoGenome-wide associationstudies for common diseases and complex traitsrdquo Nature Re-views Genetics vol 6 no 2 pp 95ndash108 2005

[2] B N Howie P Donnelly and J Marchini ldquoA flexible andaccurate genotype imputation method for the next generationof genome-wide association studiesrdquo PLoS Genetics vol 5 no6 Article ID e1000529 2009

[3] T AManolio F S Collins N J Cox et al ldquoFinding themissingheritability of complex diseasesrdquo Nature vol 461 no 7265 pp747ndash753 2009

[4] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[5] B Stubbs D Vancampfort M De Hert and A J MitchellldquoThe prevalence and predictors of type two diabetes mellitus inpeoplewith schizophrenia a systematic review and comparativemeta-analysisrdquo Acta Psychiatrica Scandinavica vol 132 no 2pp 144ndash157 2015

[6] K P Liao ldquoCardiovascular disease in patients with rheumatoidarthritisrdquo Trends in Cardiovascular Medicine vol 27 no 2 pp136ndash140 2017

[7] YMao N R London LMa D Dvorkin and Y Da ldquoDetectionof SNP epistasis effects of quantitative traits using an extendedKempthorne modelrdquo Physiological Genomics vol 28 no 1 pp46ndash52 2006

[8] W Zhang J Zhu E E Schadt and J S Liu ldquoA Bayesianpartition method for detecting pleiotropic and epistatic eQTLmodulesrdquo PLoS Computational Biology vol 6 no 1 Article IDe1000642 2010

[9] M Kang C Zhang H-W Chun C Ding C Liu and JGao ldquoEQTL epistasis Detecting epistatic effects and inferringhierarchical relationships of genes in biological pathwaysrdquoBioinformatics vol 31 no 5 pp 656ndash664 2015

[10] H Lin D Chen P Huang et al ldquoSNP interaction patternidentifier (SIPI) an intensive search for SNPndashSNP interactionpatternsrdquo Bioinformatics 2016

[11] R L Prentice and L Qi ldquoAspects of the design and analysisof high-dimensional SNP studies for disease risk estimationrdquoBiostatistics vol 7 no 3 pp 339ndash354 2006

[12] S-P Deng L Zhu and D-S Huang ldquoMining the bladdercancer-associated genes by an integrated strategy for the con-struction and analysis of differential co-expression networksrdquoBMC Genomics vol 16 no 3 article no S4 2015

[13] S-P Deng and D-S Huang ldquoSFAPS An R package forstructurefunction analysis of protein sequences based oninformational spectrum methodrdquo Methods vol 69 no 3 pp207ndash212 2014

[14] J H Moore J M Lamb N J Brown and D E Vaughan ldquoAcomparison of combinatorial partitioning and linear regressionfor the detection of epistatic effects of the ACE ID and PAI-1 4G5G polymorphisms on plasma PAI-1 Levelsrdquo ClinicalGenetics vol 62 no 1 pp 74ndash79 2002

[15] B M Michael R E Neapolitan X Jiang and V ShyamldquoLearning genetic epistasis using Bayesian network scoringcriteriardquo BMC Bioinformatics vol 12 no 1 89 pages 2011

[16] Y Wang X Liu K Robbins and R Rekaya ldquoAntEpiSeekerdetecting epistatic interactions for case-control studies using atwo-stage ant colony optimization algorithmrdquo BMC ResearchNotes vol 3 article 117 2010

[17] Y Zhang and J S Liu ldquoBayesian inference of epistatic interac-tions in case-control studiesrdquo Nature Genetics vol 39 no 9 pp1167ndash1173 2007

[18] M Dorigo M Birattari and C Blum ldquoAnt colony optimizationand swarm intelligencerdquo SpringerVerlag vol 5217 no 8 pp 767ndash771 2004

[19] T Stutzle M Lopez-Ibanez P Pellegrini et al ldquoParameteradaptation in ant colony optimizationrdquoAutonomous Search vol9783642214349 pp 191ndash215 2012

[20] C Blum and M Sampels ldquoAn ant colony optimization algo-rithm for shop scheduling problemsrdquo Journal of MathematicalModelling and Algorithms vol 3 no 3 pp 285ndash308 2004

[21] R Musa J-P Arnaout and H Jung ldquoAnt colony optimizationalgorithm to solve for the transportation problem of cross-docking networkrdquo Computers and Industrial Engineering vol59 no 1 pp 85ndash92 2010

[22] G N Varela and M C Sinclair ldquoAnt colony optimisation forvirtual-wavelength-path routing and wavelength allocationrdquo inProceedings of the 1999 Congress on Evolutionary Computation(CEC rsquo99) pp 1809ndash1816 Washington DC USA July 1999

[23] K M Sim andW H Sun ldquoAnt colony optimization for routingand load-balancing survey and new directionsrdquo SystemsManampCybernetics Part A Systems Humans IEEE Transactions on vol33 no 5 pp 560ndash572 2003

[24] S-H Ngo X Jiang and S Horiguchi ldquoAdaptive routing andwavelength assignment using ant-based algorithmrdquo in Proceed-ings of the 2004 12th IEEE International Conference onNetworksICON 2004 - Unity in Diversity pp 482ndash486 November 2004

[25] S I Vrieze ldquoModel selection and psychological theory adiscussion of the differences between the Akaike informationcriterion (AIC) and the Bayesian information criterion (BIC)rdquoPsychological Methods vol 17 no 2 pp 228ndash243 2012

[26] D-S Huang and J-X Du ldquoA constructive hybrid structureoptimization methodology for radial basis probabilistic neuralnetworksrdquo IEEE Transactions onNeural Networks vol 19 no 12pp 2099ndash2115 2008

[27] B V North D Curtis and P C Sham ldquoApplication of logisticregression to case-control association studies involving twocausative locirdquo Human Heredity vol 59 no 2 pp 79ndash87 2005

[28] P-J Jing and H-B Shen ldquoMACOED A multi-objective antcolony optimization algorithm for SNP epistasis detection ingenome-wide association studiesrdquo Bioinformatics vol 31 no 5pp 634ndash641 2015

[29] N Ryman ldquoCHIFISH A computer program testing for geneticheterogeneity at multiple loci using chi-square and Fisherrsquosexact testrdquo Molecular Ecology Notes vol 6 no 1 pp 285ndash2872006

[30] C R Mehta and N R Patel ldquoA network algorithm for perform-ing Fisherrsquos exact test in r times c contingency tablesrdquo Journal of theAmerican Statistical Association vol 78 no 382 pp 427ndash4341983

[31] B Sobrino M Brion and A Carracedo ldquoSNPs in forensicgenetics A review on SNP typing methodologiesrdquo ForensicScience International vol 154 no 2-3 pp 181ndash194 2005

[32] O Shoval H Sheftel G Shinar et al ldquoEvolutionary trade-offs pareto optimality and the geometry of phenotype spacerdquoScience vol 336 no 6085 pp 1157ndash1160 2012

[33] D-SHuang andW Jiang ldquoA general CPL-AdSmethodology forfixing dynamic parameters in dual environmentsrdquo IEEE Trans-actions on Systems Man and Cybernetics Part B Cyberneticsvol 42 no 5 pp 1489ndash1500 2012

10 Complexity

[34] L Zhu W-L Guo S-P Deng and D-S Huang ldquoChIP-PITenhancing the analysis of chip-seq data using convex-relaxedpair-wise interaction tensor decompositionrdquo IEEEACM Trans-actions onComputational Biology and Bioinformatics vol 13 no1 pp 55ndash63 2016

[35] C Angione G Carapezza J Costanza P Lio and G NicosialdquoPareto optimality in organelle energy metabolism analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 4 pp 1032ndash1044 2013

[36] R A Fisher ldquoOn the Interpretation of 1205942 from ContingencyTables and the Calculation of Prdquo Journal of the Royal StatisticalSociety vol 85 no 1 p 87 1922

[37] A Agresti ldquoA survey of exact inference for contingency tablesrdquoStatistical Science vol 7 no 1 pp 131ndash153 1992

[38] B Wenzheng C Yuehui and W Dong ldquoPrediction of proteinstructure classes with flexible neural treerdquo Bio-Medical Materi-als and Engineering vol 24 no 6 pp 3797ndash3806 2014

[39] L Zhu Z-H You D-S Huang and B Wang ldquot-LSE anovel robust geometric approach for modeling protein-proteininteraction networksrdquo PLoS ONE vol 8 no 4 Article IDe58368 2013

[40] C-H Zheng L Zhang V T-Y Ng C K Shiu and D-SHuang ldquoMolecular pattern discovery based on penalized ma-trix decompositionrdquo IEEEACMTransactions onComputationalBiology and Bioinformatics vol 8 no 6 pp 1592ndash1603 2011

[41] D-S Huang and H-J Yu ldquoNormalized feature vectors a novelalignment-free sequence comparison method based on thenumbers of adjacent amino acidsrdquo IEEEACM Transactions onComputational Biology and Bioinformatics vol 10 no 2 pp457ndash467 2013

[42] J Marchini P Donnelly and L R Cardon ldquoGenome-widestrategies for detecting multiple loci that influence complexdiseasesrdquo Nature Genetics vol 37 no 4 pp 413ndash417 2005

[43] R JiangW Tang XWu andW Fu ldquoA random forest approachto the detection of epistatic interactions in case-control studiesrdquoBMC Bioinformatics vol 10 no 1 article S65 2009

[44] J Kruppa A Ziegler and I R Konig ldquoRisk estimation and riskprediction using machine-learning methodsrdquoHuman Geneticsvol 131 no 10 pp 1639ndash1654 2012

[45] D-S Huang and C-H Zheng ldquoIndependent componentanalysis-based penalized discriminantmethod for tumor classi-fication using gene expression datardquo Bioinformatics vol 22 no15 pp 1855ndash1862 2006

[46] RWMahley K HWeisgraber and Y Huang ldquoApolipoproteinE4 a causative factor and therapeutic target in neuropathologyincluding Alzheimerrsquos diseaserdquo Proceedings of the National Aca-demy of Sciences of the United States of America vol 103 no 15pp 5644ndash5651 2006

[47] E M Reiman J A Webster A J Myers et al ldquoGAB2 allelesmodify Alzheimerrsquos Risk in APOE 1205764 carriersrdquo Neuron vol 54no 5 pp 713ndash720 2007

[48] C-H Zheng D-S Huang L Zhang and X-Z Kong ldquoTumorclustering using nonnegative matrix factorization with gene se-lectionrdquo IEEE Transactions on Information Technology in Bio-medicine vol 13 no 4 pp 599ndash607 2009

[49] S-P Deng L Zhu and D-S Huang ldquoPredicting hub genesassociated with cervical cancer through gene co-expressionnetworksrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 13 no 1 pp 27ndash35 2016

[50] L Zhu S-P Deng and D-S Huang ldquoA two-stage geometricmethod for pruning unreliable links in protein-protein net-worksrdquo IEEE Transactions on Nanobioscience vol 14 no 5 pp528ndash534 2015

[51] D-S Huang L Zhang KHan S Deng K Yang andH ZhangldquoPrediction of protein-protein interactions based on protein-protein correlation using least squares regressionrdquo CurrentProtein and Peptide Science vol 15 no 6 pp 553ndash560 2014

[52] D-S Huang Systematic Theory of Neural Networks for Pat-ternRecognition Publishing House of Electronic Industry of ChinaMay 1996

[53] D-S Huang ldquoRadial basis probabilistic neural networksmodeland applicationrdquo International Journal of Pattern Recognitionand Artificial Intelligence vol 13 no 7 pp 1083ndash1101 1999

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

6 Complexity

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

K2 scoreEn scoreFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10 r

2= 07 r

2= 10

r2= 07 r

2= 10

Figure 2 Power test comparisons between one-objective and two-objective methods on three different model with MAF value 01 02 and05

epistasis resultsrsquo score is the same To ensure fairness for theone-objective function we collect the same number as two-objective-based SNP epistasis search method from the top ofone-objective-based SNP rank The comparing results showthat the two-objective-based SNP epistasis search method isbetter than one-objective-based SNP epistasis search methodin three simulation data models In terms of two singleobjective-based SNP epistasis search methods the resultsof one-objective-based SNP epistasis search methods aresimilar with the other one-objective-based SNP epistasissearch methods The simulation data experiment resultsshow the effectiveness of two-objective-based SNP epistasissearch method and the poor experimental results showthe insufficiency of one-objective functions The experimentresults are shown in Figure 2 The abscissa of Figure 2 isminor allele frequency (MAF) which is assigned 01 02 and05We generate the simulate dataset and study the parametersetting following many previous studies [17 42ndash44] For eachsimulate dataset of parameter combination we generated 100datasets which contain 2000 experimental samples (1000case samples and 1000 control samples) and 1000 SNPs weresimulated We evaluate the algorithm performance throughcalculating the ratio of real number identified following the

significance level 001 which is adjusted after Bonferronicorrection The parameter 120582 was set to 03 for ADDME and02 for EIME and EITEME The parameter range of linkagedisequilibrium between SNPs is 1199032 from 07 to 1

32 Compared with Benchmark Methods After comparingwith single objective function We compare our proposedmethod with existing method The performance of our pro-posedmethod was evaluated by comparison with benchmarkmethods [45] In many previous studies the authors havealready discussed the parameter settings problem In thissection we set the parameters according to the existing strat-egy We evaluated performance of FAACOSE by comparingwith two recent methods BEAM generic ACO algorithmand the AntEpiSeeker we use BEAM package and previousparameter strategy to generate simulate dataset Be awareof the fact that the generic ACO algorithm could not selectlarger size SNP set We use simulated dataset introduced inSection 31 We evaluate the algorithm performance throughcalculating the ratio of real number identified following thesignificance level 001 which is adjusted after Bonferroni cor-rectionWe generate simulate datasets following three geneticmodels ADDME EIME and EITEME Other parameters

Complexity 7

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10

r2= 07 r

2= 10

r2= 07 r

2= 10

Figure 3 Power comparisons between existing methods and FAACOSE on three models

for data simulation were the effective size 120582 a measure ofmarginal effects as defined by Marchini et al [42] linkagedisequilibrium between SNPs measured by 1199032 and minorallele frequencies (MAFs) 120582 was set to 03 for ADDME and02 for EIME and EITEME For 1199032 two values (07 and 10)were used for each model For MAFs three values (01 02and 05) were considered The parameters for BEAM wereset as default The parameter settings for AntEpiSeeker werelarge dataset size = 6 small dataset size = 3 count large =150 count small = 300 epistasis model = 2 ant count = 1000120572 = 1 120588 = 005 and 1205910 = 100 (also available in the softwarepackage documentation of AntEpiSeeker)The parameters ofthe generic ACO algorithm were set as ant count = 1000120572 = 1 120588 = 005 1205910 = 100 count (number of iterations) = 900and epistasis model = 2 The comparison of detection powerfor BEAM genetic ACO algorithm and the AntEpiSeekeris presented in Figure 3 The results show that FAACOSEoutperforms BEAM and the generic ACO in all parametersettings and is superior to AntEpiSeeker in most parametersettings

In this section we compare our proposed method withbenchmark methods First we use power test to detect howmany real SNP subsets can be found with our proposed

method Second we use precision recall and 1198651 score toevaluate the results Precision denotes how many right SNPsubsets in the total final identified SNP subsets Recalldenotes the number of right SNP subsets that are identified1198651 score is an indicator used in statistics to measure theaccuracy of two classification models It takes into accountthe precision and recall of the classificationmodel simultane-ously 1198651 score can be seen as a weighted average of precisionand recall its maximum is 1 and minimum is 0l We showthe results of FAACOSE with other methods on 1199032 = 07 andMAF = 02 in Table 1

The 1198651 score of FAACOSE is better than other methodsWe run the same experiment on datasets with differentparameter combination In all eighteen datasets FAACOSEhas the highest 1198651 score in fifteen of them In real GWASdataset experiment the sample size of real dataset is hugeThe efficiency of the method is also to be considered Theexperimental results indicate that our proposed method ismore effective method in real GWAS dataset AntEpiSeekeris the most efficient algorithm among three methods Indifferent data samples we compare run time of AntEpiSeekerand FAACOSE And averaging the results FAACOSE is faster30 than AntEpiSeeker

8 Complexity

Table 1 1198651 score comparison between FAACOSE and other meth-ods

Model Method Recall Precision 1198651 score

ADDME

BEAM 029 015 020gACO 045 036 040

AntEpiSeeker 06 055 057FAACOSE 082 074 078

EIME

BEAM 03 045 036gACO 035 032 033

AntEpiSeeker 034 056 042FAACOSE 09 082 086

EITEME

BEAM 01 014 012gACO 015 020 017

AntEpiSeeker 054 046 050FAACOSE 065 062 063

4 Application to Real SNP Dataset

Late-Onset Alzheimerrsquos Disease (LOAD) is themost frequentform of Alzheimerrsquos disease which is frequently identifiedin people older than 65 years the LOAD or AD is a kindof chronic neurodegenerative diseases which is frequentlynot obvious in the onset of the disease and slowly changesdementia over time It is the cause of 60 to 70 of casesof dementia The most common early symptom is difficultyin remembering recent events (short-term memory loss)As the disease advances symptoms can include problemswith language disorientation (including easily getting lost)mood swings loss of motivation not managing self-care andbehavioural issues LOAD is a multifactor genetic disease itsetiology and pathogenesis have not yet been fully understoodThe apolipoprotein (APOE) gene is a definite risk factor forLOAD The APOE gene has three forms The 1205762 1205763 and1205764 the effect of 1205762 is positive 1205762 can effectively preventthe occurrence of the disease There has been researchreport that genetic variant 1205764 has induced effect on diseaseBetween 40 and 80 of people with AD possess at least oneAPOE 1205764 allele [46] Previous studies have reported somesignificant SNPs in the field of Genome-Wide AssociationStudies [47] Reference [47] reported that 10 SNPs in thearea of GAB2 gene have an epistasis effect with APOE e4in relation to Late-Onset Alzheimerrsquos Disease We appliedour proposed method to the LOAD GWAS dataset fromwebsite httpswwwtgenorg [47] After data preprocessingthe real biological dataset contains 1368 samples [48 49] Ofthese 836 samples were identified case studies the remaining532 samples were normal sample [50 51] Each sample ofreal biological dataset contains 309316 SNPs with genotypeinformation APOE status and LOAD status [52] For thenext calculation we code the APOE gene state with a binaryvariable the value 1 represents the 1205764 variant and in turn thevalue 0 represents the other three variants [53] An SNP locuswas coded as a quaternary variable considering the missing

Table 2 The number of selected SNPs of FAACOSE in LOADdataset

SNP rsrs7756992 rs611154 rs191840 rs7294919rs1887922 rs304900 rs1999764 rs1385600rs2373115 rs7101429 rs609812 rs613375rs1007837 rs2510038 rs4945261 rs10793294rs520227 rs191740 rs7924284 rs829465rs602106 rs7174511 rs606889 rs602192

state The high potential LOAD disease related SNP is shownin Table 2

5 Discussions

In this paper we proposed a novel ant colony optimizationbased fast search method for the discovery of epistasis inter-actions in large scale real GWAS dataset FAACOSEwas eval-uated through comparison with existing three approaches onboth simulated and real datasets FAACOSE which adopts afast adaptive optimization procedure is amodified algorithmderived from the generic ACO And with two-objectivefunction to demonstrate the advantages of fast adaptiveant colony optimization algorithm we also compared theperformance of the FAACOSE with that of the generic ACO

In future studies we intend to findmore powerful model-ing approaches ant colony optimization algorithmwith fasterconvergence objective functions which can better measuredata structure of GWAS dataset more efficient optimal SNPsubset search and identification strategies that can be com-bined and flexibly embedded into our SNP epistasis searchframework to find more accurate SNP subset With the rapiddevelopment of bioinformatics more and more biologicalinformation related to disease is identified More and morestudies will consider prior knowledge An important futureresearch direction is that we will try to apply expert priorknowledge to GWAS dataset with our proposed method thatis the fast adaptive ant colony optimization algorithm fordetecting SNP epistasis Expert prior knowledge can improvethe power and efficiency of epistasis detection

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is partly supported by National NaturalScience Foundation of China (Grant nos 6152010600631571364 61732012 61532008 U1611265 61672382 6140233461472280 61472173 61572447 61672203 61472282 and61373098) and China Postdoctoral Science Foundation(Grant nos 2014M561513 2015M580352 2017M611619 and2016M601646) Guangxi Bagui Scholars Program SpecialFund

Complexity 9

References

[1] J N Hirschhorn and M J Daly ldquoGenome-wide associationstudies for common diseases and complex traitsrdquo Nature Re-views Genetics vol 6 no 2 pp 95ndash108 2005

[2] B N Howie P Donnelly and J Marchini ldquoA flexible andaccurate genotype imputation method for the next generationof genome-wide association studiesrdquo PLoS Genetics vol 5 no6 Article ID e1000529 2009

[3] T AManolio F S Collins N J Cox et al ldquoFinding themissingheritability of complex diseasesrdquo Nature vol 461 no 7265 pp747ndash753 2009

[4] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[5] B Stubbs D Vancampfort M De Hert and A J MitchellldquoThe prevalence and predictors of type two diabetes mellitus inpeoplewith schizophrenia a systematic review and comparativemeta-analysisrdquo Acta Psychiatrica Scandinavica vol 132 no 2pp 144ndash157 2015

[6] K P Liao ldquoCardiovascular disease in patients with rheumatoidarthritisrdquo Trends in Cardiovascular Medicine vol 27 no 2 pp136ndash140 2017

[7] YMao N R London LMa D Dvorkin and Y Da ldquoDetectionof SNP epistasis effects of quantitative traits using an extendedKempthorne modelrdquo Physiological Genomics vol 28 no 1 pp46ndash52 2006

[8] W Zhang J Zhu E E Schadt and J S Liu ldquoA Bayesianpartition method for detecting pleiotropic and epistatic eQTLmodulesrdquo PLoS Computational Biology vol 6 no 1 Article IDe1000642 2010

[9] M Kang C Zhang H-W Chun C Ding C Liu and JGao ldquoEQTL epistasis Detecting epistatic effects and inferringhierarchical relationships of genes in biological pathwaysrdquoBioinformatics vol 31 no 5 pp 656ndash664 2015

[10] H Lin D Chen P Huang et al ldquoSNP interaction patternidentifier (SIPI) an intensive search for SNPndashSNP interactionpatternsrdquo Bioinformatics 2016

[11] R L Prentice and L Qi ldquoAspects of the design and analysisof high-dimensional SNP studies for disease risk estimationrdquoBiostatistics vol 7 no 3 pp 339ndash354 2006

[12] S-P Deng L Zhu and D-S Huang ldquoMining the bladdercancer-associated genes by an integrated strategy for the con-struction and analysis of differential co-expression networksrdquoBMC Genomics vol 16 no 3 article no S4 2015

[13] S-P Deng and D-S Huang ldquoSFAPS An R package forstructurefunction analysis of protein sequences based oninformational spectrum methodrdquo Methods vol 69 no 3 pp207ndash212 2014

[14] J H Moore J M Lamb N J Brown and D E Vaughan ldquoAcomparison of combinatorial partitioning and linear regressionfor the detection of epistatic effects of the ACE ID and PAI-1 4G5G polymorphisms on plasma PAI-1 Levelsrdquo ClinicalGenetics vol 62 no 1 pp 74ndash79 2002

[15] B M Michael R E Neapolitan X Jiang and V ShyamldquoLearning genetic epistasis using Bayesian network scoringcriteriardquo BMC Bioinformatics vol 12 no 1 89 pages 2011

[16] Y Wang X Liu K Robbins and R Rekaya ldquoAntEpiSeekerdetecting epistatic interactions for case-control studies using atwo-stage ant colony optimization algorithmrdquo BMC ResearchNotes vol 3 article 117 2010

[17] Y Zhang and J S Liu ldquoBayesian inference of epistatic interac-tions in case-control studiesrdquo Nature Genetics vol 39 no 9 pp1167ndash1173 2007

[18] M Dorigo M Birattari and C Blum ldquoAnt colony optimizationand swarm intelligencerdquo SpringerVerlag vol 5217 no 8 pp 767ndash771 2004

[19] T Stutzle M Lopez-Ibanez P Pellegrini et al ldquoParameteradaptation in ant colony optimizationrdquoAutonomous Search vol9783642214349 pp 191ndash215 2012

[20] C Blum and M Sampels ldquoAn ant colony optimization algo-rithm for shop scheduling problemsrdquo Journal of MathematicalModelling and Algorithms vol 3 no 3 pp 285ndash308 2004

[21] R Musa J-P Arnaout and H Jung ldquoAnt colony optimizationalgorithm to solve for the transportation problem of cross-docking networkrdquo Computers and Industrial Engineering vol59 no 1 pp 85ndash92 2010

[22] G N Varela and M C Sinclair ldquoAnt colony optimisation forvirtual-wavelength-path routing and wavelength allocationrdquo inProceedings of the 1999 Congress on Evolutionary Computation(CEC rsquo99) pp 1809ndash1816 Washington DC USA July 1999

[23] K M Sim andW H Sun ldquoAnt colony optimization for routingand load-balancing survey and new directionsrdquo SystemsManampCybernetics Part A Systems Humans IEEE Transactions on vol33 no 5 pp 560ndash572 2003

[24] S-H Ngo X Jiang and S Horiguchi ldquoAdaptive routing andwavelength assignment using ant-based algorithmrdquo in Proceed-ings of the 2004 12th IEEE International Conference onNetworksICON 2004 - Unity in Diversity pp 482ndash486 November 2004

[25] S I Vrieze ldquoModel selection and psychological theory adiscussion of the differences between the Akaike informationcriterion (AIC) and the Bayesian information criterion (BIC)rdquoPsychological Methods vol 17 no 2 pp 228ndash243 2012

[26] D-S Huang and J-X Du ldquoA constructive hybrid structureoptimization methodology for radial basis probabilistic neuralnetworksrdquo IEEE Transactions onNeural Networks vol 19 no 12pp 2099ndash2115 2008

[27] B V North D Curtis and P C Sham ldquoApplication of logisticregression to case-control association studies involving twocausative locirdquo Human Heredity vol 59 no 2 pp 79ndash87 2005

[28] P-J Jing and H-B Shen ldquoMACOED A multi-objective antcolony optimization algorithm for SNP epistasis detection ingenome-wide association studiesrdquo Bioinformatics vol 31 no 5pp 634ndash641 2015

[29] N Ryman ldquoCHIFISH A computer program testing for geneticheterogeneity at multiple loci using chi-square and Fisherrsquosexact testrdquo Molecular Ecology Notes vol 6 no 1 pp 285ndash2872006

[30] C R Mehta and N R Patel ldquoA network algorithm for perform-ing Fisherrsquos exact test in r times c contingency tablesrdquo Journal of theAmerican Statistical Association vol 78 no 382 pp 427ndash4341983

[31] B Sobrino M Brion and A Carracedo ldquoSNPs in forensicgenetics A review on SNP typing methodologiesrdquo ForensicScience International vol 154 no 2-3 pp 181ndash194 2005

[32] O Shoval H Sheftel G Shinar et al ldquoEvolutionary trade-offs pareto optimality and the geometry of phenotype spacerdquoScience vol 336 no 6085 pp 1157ndash1160 2012

[33] D-SHuang andW Jiang ldquoA general CPL-AdSmethodology forfixing dynamic parameters in dual environmentsrdquo IEEE Trans-actions on Systems Man and Cybernetics Part B Cyberneticsvol 42 no 5 pp 1489ndash1500 2012

10 Complexity

[34] L Zhu W-L Guo S-P Deng and D-S Huang ldquoChIP-PITenhancing the analysis of chip-seq data using convex-relaxedpair-wise interaction tensor decompositionrdquo IEEEACM Trans-actions onComputational Biology and Bioinformatics vol 13 no1 pp 55ndash63 2016

[35] C Angione G Carapezza J Costanza P Lio and G NicosialdquoPareto optimality in organelle energy metabolism analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 4 pp 1032ndash1044 2013

[36] R A Fisher ldquoOn the Interpretation of 1205942 from ContingencyTables and the Calculation of Prdquo Journal of the Royal StatisticalSociety vol 85 no 1 p 87 1922

[37] A Agresti ldquoA survey of exact inference for contingency tablesrdquoStatistical Science vol 7 no 1 pp 131ndash153 1992

[38] B Wenzheng C Yuehui and W Dong ldquoPrediction of proteinstructure classes with flexible neural treerdquo Bio-Medical Materi-als and Engineering vol 24 no 6 pp 3797ndash3806 2014

[39] L Zhu Z-H You D-S Huang and B Wang ldquot-LSE anovel robust geometric approach for modeling protein-proteininteraction networksrdquo PLoS ONE vol 8 no 4 Article IDe58368 2013

[40] C-H Zheng L Zhang V T-Y Ng C K Shiu and D-SHuang ldquoMolecular pattern discovery based on penalized ma-trix decompositionrdquo IEEEACMTransactions onComputationalBiology and Bioinformatics vol 8 no 6 pp 1592ndash1603 2011

[41] D-S Huang and H-J Yu ldquoNormalized feature vectors a novelalignment-free sequence comparison method based on thenumbers of adjacent amino acidsrdquo IEEEACM Transactions onComputational Biology and Bioinformatics vol 10 no 2 pp457ndash467 2013

[42] J Marchini P Donnelly and L R Cardon ldquoGenome-widestrategies for detecting multiple loci that influence complexdiseasesrdquo Nature Genetics vol 37 no 4 pp 413ndash417 2005

[43] R JiangW Tang XWu andW Fu ldquoA random forest approachto the detection of epistatic interactions in case-control studiesrdquoBMC Bioinformatics vol 10 no 1 article S65 2009

[44] J Kruppa A Ziegler and I R Konig ldquoRisk estimation and riskprediction using machine-learning methodsrdquoHuman Geneticsvol 131 no 10 pp 1639ndash1654 2012

[45] D-S Huang and C-H Zheng ldquoIndependent componentanalysis-based penalized discriminantmethod for tumor classi-fication using gene expression datardquo Bioinformatics vol 22 no15 pp 1855ndash1862 2006

[46] RWMahley K HWeisgraber and Y Huang ldquoApolipoproteinE4 a causative factor and therapeutic target in neuropathologyincluding Alzheimerrsquos diseaserdquo Proceedings of the National Aca-demy of Sciences of the United States of America vol 103 no 15pp 5644ndash5651 2006

[47] E M Reiman J A Webster A J Myers et al ldquoGAB2 allelesmodify Alzheimerrsquos Risk in APOE 1205764 carriersrdquo Neuron vol 54no 5 pp 713ndash720 2007

[48] C-H Zheng D-S Huang L Zhang and X-Z Kong ldquoTumorclustering using nonnegative matrix factorization with gene se-lectionrdquo IEEE Transactions on Information Technology in Bio-medicine vol 13 no 4 pp 599ndash607 2009

[49] S-P Deng L Zhu and D-S Huang ldquoPredicting hub genesassociated with cervical cancer through gene co-expressionnetworksrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 13 no 1 pp 27ndash35 2016

[50] L Zhu S-P Deng and D-S Huang ldquoA two-stage geometricmethod for pruning unreliable links in protein-protein net-worksrdquo IEEE Transactions on Nanobioscience vol 14 no 5 pp528ndash534 2015

[51] D-S Huang L Zhang KHan S Deng K Yang andH ZhangldquoPrediction of protein-protein interactions based on protein-protein correlation using least squares regressionrdquo CurrentProtein and Peptide Science vol 15 no 6 pp 553ndash560 2014

[52] D-S Huang Systematic Theory of Neural Networks for Pat-ternRecognition Publishing House of Electronic Industry of ChinaMay 1996

[53] D-S Huang ldquoRadial basis probabilistic neural networksmodeland applicationrdquo International Journal of Pattern Recognitionand Artificial Intelligence vol 13 no 7 pp 1083ndash1101 1999

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

Complexity 7

0010203040506070809

1

01 02 05 01 02 05

Pow

erADDME

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

BEAMgACO

AntEpiSeekerFAACOSE

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EIME

0010203040506070809

1

01 02 05 01 02 05

Pow

er

EITEME

r2= 07 r

2= 10

r2= 07 r

2= 10

r2= 07 r

2= 10

Figure 3 Power comparisons between existing methods and FAACOSE on three models

for data simulation were the effective size 120582 a measure ofmarginal effects as defined by Marchini et al [42] linkagedisequilibrium between SNPs measured by 1199032 and minorallele frequencies (MAFs) 120582 was set to 03 for ADDME and02 for EIME and EITEME For 1199032 two values (07 and 10)were used for each model For MAFs three values (01 02and 05) were considered The parameters for BEAM wereset as default The parameter settings for AntEpiSeeker werelarge dataset size = 6 small dataset size = 3 count large =150 count small = 300 epistasis model = 2 ant count = 1000120572 = 1 120588 = 005 and 1205910 = 100 (also available in the softwarepackage documentation of AntEpiSeeker)The parameters ofthe generic ACO algorithm were set as ant count = 1000120572 = 1 120588 = 005 1205910 = 100 count (number of iterations) = 900and epistasis model = 2 The comparison of detection powerfor BEAM genetic ACO algorithm and the AntEpiSeekeris presented in Figure 3 The results show that FAACOSEoutperforms BEAM and the generic ACO in all parametersettings and is superior to AntEpiSeeker in most parametersettings

In this section we compare our proposed method withbenchmark methods First we use power test to detect howmany real SNP subsets can be found with our proposed

method Second we use precision recall and 1198651 score toevaluate the results Precision denotes how many right SNPsubsets in the total final identified SNP subsets Recalldenotes the number of right SNP subsets that are identified1198651 score is an indicator used in statistics to measure theaccuracy of two classification models It takes into accountthe precision and recall of the classificationmodel simultane-ously 1198651 score can be seen as a weighted average of precisionand recall its maximum is 1 and minimum is 0l We showthe results of FAACOSE with other methods on 1199032 = 07 andMAF = 02 in Table 1

The 1198651 score of FAACOSE is better than other methodsWe run the same experiment on datasets with differentparameter combination In all eighteen datasets FAACOSEhas the highest 1198651 score in fifteen of them In real GWASdataset experiment the sample size of real dataset is hugeThe efficiency of the method is also to be considered Theexperimental results indicate that our proposed method ismore effective method in real GWAS dataset AntEpiSeekeris the most efficient algorithm among three methods Indifferent data samples we compare run time of AntEpiSeekerand FAACOSE And averaging the results FAACOSE is faster30 than AntEpiSeeker

8 Complexity

Table 1 1198651 score comparison between FAACOSE and other meth-ods

Model Method Recall Precision 1198651 score

ADDME

BEAM 029 015 020gACO 045 036 040

AntEpiSeeker 06 055 057FAACOSE 082 074 078

EIME

BEAM 03 045 036gACO 035 032 033

AntEpiSeeker 034 056 042FAACOSE 09 082 086

EITEME

BEAM 01 014 012gACO 015 020 017

AntEpiSeeker 054 046 050FAACOSE 065 062 063

4 Application to Real SNP Dataset

Late-Onset Alzheimerrsquos Disease (LOAD) is themost frequentform of Alzheimerrsquos disease which is frequently identifiedin people older than 65 years the LOAD or AD is a kindof chronic neurodegenerative diseases which is frequentlynot obvious in the onset of the disease and slowly changesdementia over time It is the cause of 60 to 70 of casesof dementia The most common early symptom is difficultyin remembering recent events (short-term memory loss)As the disease advances symptoms can include problemswith language disorientation (including easily getting lost)mood swings loss of motivation not managing self-care andbehavioural issues LOAD is a multifactor genetic disease itsetiology and pathogenesis have not yet been fully understoodThe apolipoprotein (APOE) gene is a definite risk factor forLOAD The APOE gene has three forms The 1205762 1205763 and1205764 the effect of 1205762 is positive 1205762 can effectively preventthe occurrence of the disease There has been researchreport that genetic variant 1205764 has induced effect on diseaseBetween 40 and 80 of people with AD possess at least oneAPOE 1205764 allele [46] Previous studies have reported somesignificant SNPs in the field of Genome-Wide AssociationStudies [47] Reference [47] reported that 10 SNPs in thearea of GAB2 gene have an epistasis effect with APOE e4in relation to Late-Onset Alzheimerrsquos Disease We appliedour proposed method to the LOAD GWAS dataset fromwebsite httpswwwtgenorg [47] After data preprocessingthe real biological dataset contains 1368 samples [48 49] Ofthese 836 samples were identified case studies the remaining532 samples were normal sample [50 51] Each sample ofreal biological dataset contains 309316 SNPs with genotypeinformation APOE status and LOAD status [52] For thenext calculation we code the APOE gene state with a binaryvariable the value 1 represents the 1205764 variant and in turn thevalue 0 represents the other three variants [53] An SNP locuswas coded as a quaternary variable considering the missing

Table 2 The number of selected SNPs of FAACOSE in LOADdataset

SNP rsrs7756992 rs611154 rs191840 rs7294919rs1887922 rs304900 rs1999764 rs1385600rs2373115 rs7101429 rs609812 rs613375rs1007837 rs2510038 rs4945261 rs10793294rs520227 rs191740 rs7924284 rs829465rs602106 rs7174511 rs606889 rs602192

state The high potential LOAD disease related SNP is shownin Table 2

5 Discussions

In this paper we proposed a novel ant colony optimizationbased fast search method for the discovery of epistasis inter-actions in large scale real GWAS dataset FAACOSEwas eval-uated through comparison with existing three approaches onboth simulated and real datasets FAACOSE which adopts afast adaptive optimization procedure is amodified algorithmderived from the generic ACO And with two-objectivefunction to demonstrate the advantages of fast adaptiveant colony optimization algorithm we also compared theperformance of the FAACOSE with that of the generic ACO

In future studies we intend to findmore powerful model-ing approaches ant colony optimization algorithmwith fasterconvergence objective functions which can better measuredata structure of GWAS dataset more efficient optimal SNPsubset search and identification strategies that can be com-bined and flexibly embedded into our SNP epistasis searchframework to find more accurate SNP subset With the rapiddevelopment of bioinformatics more and more biologicalinformation related to disease is identified More and morestudies will consider prior knowledge An important futureresearch direction is that we will try to apply expert priorknowledge to GWAS dataset with our proposed method thatis the fast adaptive ant colony optimization algorithm fordetecting SNP epistasis Expert prior knowledge can improvethe power and efficiency of epistasis detection

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is partly supported by National NaturalScience Foundation of China (Grant nos 6152010600631571364 61732012 61532008 U1611265 61672382 6140233461472280 61472173 61572447 61672203 61472282 and61373098) and China Postdoctoral Science Foundation(Grant nos 2014M561513 2015M580352 2017M611619 and2016M601646) Guangxi Bagui Scholars Program SpecialFund

Complexity 9

References

[1] J N Hirschhorn and M J Daly ldquoGenome-wide associationstudies for common diseases and complex traitsrdquo Nature Re-views Genetics vol 6 no 2 pp 95ndash108 2005

[2] B N Howie P Donnelly and J Marchini ldquoA flexible andaccurate genotype imputation method for the next generationof genome-wide association studiesrdquo PLoS Genetics vol 5 no6 Article ID e1000529 2009

[3] T AManolio F S Collins N J Cox et al ldquoFinding themissingheritability of complex diseasesrdquo Nature vol 461 no 7265 pp747ndash753 2009

[4] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[5] B Stubbs D Vancampfort M De Hert and A J MitchellldquoThe prevalence and predictors of type two diabetes mellitus inpeoplewith schizophrenia a systematic review and comparativemeta-analysisrdquo Acta Psychiatrica Scandinavica vol 132 no 2pp 144ndash157 2015

[6] K P Liao ldquoCardiovascular disease in patients with rheumatoidarthritisrdquo Trends in Cardiovascular Medicine vol 27 no 2 pp136ndash140 2017

[7] YMao N R London LMa D Dvorkin and Y Da ldquoDetectionof SNP epistasis effects of quantitative traits using an extendedKempthorne modelrdquo Physiological Genomics vol 28 no 1 pp46ndash52 2006

[8] W Zhang J Zhu E E Schadt and J S Liu ldquoA Bayesianpartition method for detecting pleiotropic and epistatic eQTLmodulesrdquo PLoS Computational Biology vol 6 no 1 Article IDe1000642 2010

[9] M Kang C Zhang H-W Chun C Ding C Liu and JGao ldquoEQTL epistasis Detecting epistatic effects and inferringhierarchical relationships of genes in biological pathwaysrdquoBioinformatics vol 31 no 5 pp 656ndash664 2015

[10] H Lin D Chen P Huang et al ldquoSNP interaction patternidentifier (SIPI) an intensive search for SNPndashSNP interactionpatternsrdquo Bioinformatics 2016

[11] R L Prentice and L Qi ldquoAspects of the design and analysisof high-dimensional SNP studies for disease risk estimationrdquoBiostatistics vol 7 no 3 pp 339ndash354 2006

[12] S-P Deng L Zhu and D-S Huang ldquoMining the bladdercancer-associated genes by an integrated strategy for the con-struction and analysis of differential co-expression networksrdquoBMC Genomics vol 16 no 3 article no S4 2015

[13] S-P Deng and D-S Huang ldquoSFAPS An R package forstructurefunction analysis of protein sequences based oninformational spectrum methodrdquo Methods vol 69 no 3 pp207ndash212 2014

[14] J H Moore J M Lamb N J Brown and D E Vaughan ldquoAcomparison of combinatorial partitioning and linear regressionfor the detection of epistatic effects of the ACE ID and PAI-1 4G5G polymorphisms on plasma PAI-1 Levelsrdquo ClinicalGenetics vol 62 no 1 pp 74ndash79 2002

[15] B M Michael R E Neapolitan X Jiang and V ShyamldquoLearning genetic epistasis using Bayesian network scoringcriteriardquo BMC Bioinformatics vol 12 no 1 89 pages 2011

[16] Y Wang X Liu K Robbins and R Rekaya ldquoAntEpiSeekerdetecting epistatic interactions for case-control studies using atwo-stage ant colony optimization algorithmrdquo BMC ResearchNotes vol 3 article 117 2010

[17] Y Zhang and J S Liu ldquoBayesian inference of epistatic interac-tions in case-control studiesrdquo Nature Genetics vol 39 no 9 pp1167ndash1173 2007

[18] M Dorigo M Birattari and C Blum ldquoAnt colony optimizationand swarm intelligencerdquo SpringerVerlag vol 5217 no 8 pp 767ndash771 2004

[19] T Stutzle M Lopez-Ibanez P Pellegrini et al ldquoParameteradaptation in ant colony optimizationrdquoAutonomous Search vol9783642214349 pp 191ndash215 2012

[20] C Blum and M Sampels ldquoAn ant colony optimization algo-rithm for shop scheduling problemsrdquo Journal of MathematicalModelling and Algorithms vol 3 no 3 pp 285ndash308 2004

[21] R Musa J-P Arnaout and H Jung ldquoAnt colony optimizationalgorithm to solve for the transportation problem of cross-docking networkrdquo Computers and Industrial Engineering vol59 no 1 pp 85ndash92 2010

[22] G N Varela and M C Sinclair ldquoAnt colony optimisation forvirtual-wavelength-path routing and wavelength allocationrdquo inProceedings of the 1999 Congress on Evolutionary Computation(CEC rsquo99) pp 1809ndash1816 Washington DC USA July 1999

[23] K M Sim andW H Sun ldquoAnt colony optimization for routingand load-balancing survey and new directionsrdquo SystemsManampCybernetics Part A Systems Humans IEEE Transactions on vol33 no 5 pp 560ndash572 2003

[24] S-H Ngo X Jiang and S Horiguchi ldquoAdaptive routing andwavelength assignment using ant-based algorithmrdquo in Proceed-ings of the 2004 12th IEEE International Conference onNetworksICON 2004 - Unity in Diversity pp 482ndash486 November 2004

[25] S I Vrieze ldquoModel selection and psychological theory adiscussion of the differences between the Akaike informationcriterion (AIC) and the Bayesian information criterion (BIC)rdquoPsychological Methods vol 17 no 2 pp 228ndash243 2012

[26] D-S Huang and J-X Du ldquoA constructive hybrid structureoptimization methodology for radial basis probabilistic neuralnetworksrdquo IEEE Transactions onNeural Networks vol 19 no 12pp 2099ndash2115 2008

[27] B V North D Curtis and P C Sham ldquoApplication of logisticregression to case-control association studies involving twocausative locirdquo Human Heredity vol 59 no 2 pp 79ndash87 2005

[28] P-J Jing and H-B Shen ldquoMACOED A multi-objective antcolony optimization algorithm for SNP epistasis detection ingenome-wide association studiesrdquo Bioinformatics vol 31 no 5pp 634ndash641 2015

[29] N Ryman ldquoCHIFISH A computer program testing for geneticheterogeneity at multiple loci using chi-square and Fisherrsquosexact testrdquo Molecular Ecology Notes vol 6 no 1 pp 285ndash2872006

[30] C R Mehta and N R Patel ldquoA network algorithm for perform-ing Fisherrsquos exact test in r times c contingency tablesrdquo Journal of theAmerican Statistical Association vol 78 no 382 pp 427ndash4341983

[31] B Sobrino M Brion and A Carracedo ldquoSNPs in forensicgenetics A review on SNP typing methodologiesrdquo ForensicScience International vol 154 no 2-3 pp 181ndash194 2005

[32] O Shoval H Sheftel G Shinar et al ldquoEvolutionary trade-offs pareto optimality and the geometry of phenotype spacerdquoScience vol 336 no 6085 pp 1157ndash1160 2012

[33] D-SHuang andW Jiang ldquoA general CPL-AdSmethodology forfixing dynamic parameters in dual environmentsrdquo IEEE Trans-actions on Systems Man and Cybernetics Part B Cyberneticsvol 42 no 5 pp 1489ndash1500 2012

10 Complexity

[34] L Zhu W-L Guo S-P Deng and D-S Huang ldquoChIP-PITenhancing the analysis of chip-seq data using convex-relaxedpair-wise interaction tensor decompositionrdquo IEEEACM Trans-actions onComputational Biology and Bioinformatics vol 13 no1 pp 55ndash63 2016

[35] C Angione G Carapezza J Costanza P Lio and G NicosialdquoPareto optimality in organelle energy metabolism analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 4 pp 1032ndash1044 2013

[36] R A Fisher ldquoOn the Interpretation of 1205942 from ContingencyTables and the Calculation of Prdquo Journal of the Royal StatisticalSociety vol 85 no 1 p 87 1922

[37] A Agresti ldquoA survey of exact inference for contingency tablesrdquoStatistical Science vol 7 no 1 pp 131ndash153 1992

[38] B Wenzheng C Yuehui and W Dong ldquoPrediction of proteinstructure classes with flexible neural treerdquo Bio-Medical Materi-als and Engineering vol 24 no 6 pp 3797ndash3806 2014

[39] L Zhu Z-H You D-S Huang and B Wang ldquot-LSE anovel robust geometric approach for modeling protein-proteininteraction networksrdquo PLoS ONE vol 8 no 4 Article IDe58368 2013

[40] C-H Zheng L Zhang V T-Y Ng C K Shiu and D-SHuang ldquoMolecular pattern discovery based on penalized ma-trix decompositionrdquo IEEEACMTransactions onComputationalBiology and Bioinformatics vol 8 no 6 pp 1592ndash1603 2011

[41] D-S Huang and H-J Yu ldquoNormalized feature vectors a novelalignment-free sequence comparison method based on thenumbers of adjacent amino acidsrdquo IEEEACM Transactions onComputational Biology and Bioinformatics vol 10 no 2 pp457ndash467 2013

[42] J Marchini P Donnelly and L R Cardon ldquoGenome-widestrategies for detecting multiple loci that influence complexdiseasesrdquo Nature Genetics vol 37 no 4 pp 413ndash417 2005

[43] R JiangW Tang XWu andW Fu ldquoA random forest approachto the detection of epistatic interactions in case-control studiesrdquoBMC Bioinformatics vol 10 no 1 article S65 2009

[44] J Kruppa A Ziegler and I R Konig ldquoRisk estimation and riskprediction using machine-learning methodsrdquoHuman Geneticsvol 131 no 10 pp 1639ndash1654 2012

[45] D-S Huang and C-H Zheng ldquoIndependent componentanalysis-based penalized discriminantmethod for tumor classi-fication using gene expression datardquo Bioinformatics vol 22 no15 pp 1855ndash1862 2006

[46] RWMahley K HWeisgraber and Y Huang ldquoApolipoproteinE4 a causative factor and therapeutic target in neuropathologyincluding Alzheimerrsquos diseaserdquo Proceedings of the National Aca-demy of Sciences of the United States of America vol 103 no 15pp 5644ndash5651 2006

[47] E M Reiman J A Webster A J Myers et al ldquoGAB2 allelesmodify Alzheimerrsquos Risk in APOE 1205764 carriersrdquo Neuron vol 54no 5 pp 713ndash720 2007

[48] C-H Zheng D-S Huang L Zhang and X-Z Kong ldquoTumorclustering using nonnegative matrix factorization with gene se-lectionrdquo IEEE Transactions on Information Technology in Bio-medicine vol 13 no 4 pp 599ndash607 2009

[49] S-P Deng L Zhu and D-S Huang ldquoPredicting hub genesassociated with cervical cancer through gene co-expressionnetworksrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 13 no 1 pp 27ndash35 2016

[50] L Zhu S-P Deng and D-S Huang ldquoA two-stage geometricmethod for pruning unreliable links in protein-protein net-worksrdquo IEEE Transactions on Nanobioscience vol 14 no 5 pp528ndash534 2015

[51] D-S Huang L Zhang KHan S Deng K Yang andH ZhangldquoPrediction of protein-protein interactions based on protein-protein correlation using least squares regressionrdquo CurrentProtein and Peptide Science vol 15 no 6 pp 553ndash560 2014

[52] D-S Huang Systematic Theory of Neural Networks for Pat-ternRecognition Publishing House of Electronic Industry of ChinaMay 1996

[53] D-S Huang ldquoRadial basis probabilistic neural networksmodeland applicationrdquo International Journal of Pattern Recognitionand Artificial Intelligence vol 13 no 7 pp 1083ndash1101 1999

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

8 Complexity

Table 1 1198651 score comparison between FAACOSE and other meth-ods

Model Method Recall Precision 1198651 score

ADDME

BEAM 029 015 020gACO 045 036 040

AntEpiSeeker 06 055 057FAACOSE 082 074 078

EIME

BEAM 03 045 036gACO 035 032 033

AntEpiSeeker 034 056 042FAACOSE 09 082 086

EITEME

BEAM 01 014 012gACO 015 020 017

AntEpiSeeker 054 046 050FAACOSE 065 062 063

4 Application to Real SNP Dataset

Late-Onset Alzheimerrsquos Disease (LOAD) is themost frequentform of Alzheimerrsquos disease which is frequently identifiedin people older than 65 years the LOAD or AD is a kindof chronic neurodegenerative diseases which is frequentlynot obvious in the onset of the disease and slowly changesdementia over time It is the cause of 60 to 70 of casesof dementia The most common early symptom is difficultyin remembering recent events (short-term memory loss)As the disease advances symptoms can include problemswith language disorientation (including easily getting lost)mood swings loss of motivation not managing self-care andbehavioural issues LOAD is a multifactor genetic disease itsetiology and pathogenesis have not yet been fully understoodThe apolipoprotein (APOE) gene is a definite risk factor forLOAD The APOE gene has three forms The 1205762 1205763 and1205764 the effect of 1205762 is positive 1205762 can effectively preventthe occurrence of the disease There has been researchreport that genetic variant 1205764 has induced effect on diseaseBetween 40 and 80 of people with AD possess at least oneAPOE 1205764 allele [46] Previous studies have reported somesignificant SNPs in the field of Genome-Wide AssociationStudies [47] Reference [47] reported that 10 SNPs in thearea of GAB2 gene have an epistasis effect with APOE e4in relation to Late-Onset Alzheimerrsquos Disease We appliedour proposed method to the LOAD GWAS dataset fromwebsite httpswwwtgenorg [47] After data preprocessingthe real biological dataset contains 1368 samples [48 49] Ofthese 836 samples were identified case studies the remaining532 samples were normal sample [50 51] Each sample ofreal biological dataset contains 309316 SNPs with genotypeinformation APOE status and LOAD status [52] For thenext calculation we code the APOE gene state with a binaryvariable the value 1 represents the 1205764 variant and in turn thevalue 0 represents the other three variants [53] An SNP locuswas coded as a quaternary variable considering the missing

Table 2 The number of selected SNPs of FAACOSE in LOADdataset

SNP rsrs7756992 rs611154 rs191840 rs7294919rs1887922 rs304900 rs1999764 rs1385600rs2373115 rs7101429 rs609812 rs613375rs1007837 rs2510038 rs4945261 rs10793294rs520227 rs191740 rs7924284 rs829465rs602106 rs7174511 rs606889 rs602192

state The high potential LOAD disease related SNP is shownin Table 2

5 Discussions

In this paper we proposed a novel ant colony optimizationbased fast search method for the discovery of epistasis inter-actions in large scale real GWAS dataset FAACOSEwas eval-uated through comparison with existing three approaches onboth simulated and real datasets FAACOSE which adopts afast adaptive optimization procedure is amodified algorithmderived from the generic ACO And with two-objectivefunction to demonstrate the advantages of fast adaptiveant colony optimization algorithm we also compared theperformance of the FAACOSE with that of the generic ACO

In future studies we intend to findmore powerful model-ing approaches ant colony optimization algorithmwith fasterconvergence objective functions which can better measuredata structure of GWAS dataset more efficient optimal SNPsubset search and identification strategies that can be com-bined and flexibly embedded into our SNP epistasis searchframework to find more accurate SNP subset With the rapiddevelopment of bioinformatics more and more biologicalinformation related to disease is identified More and morestudies will consider prior knowledge An important futureresearch direction is that we will try to apply expert priorknowledge to GWAS dataset with our proposed method thatis the fast adaptive ant colony optimization algorithm fordetecting SNP epistasis Expert prior knowledge can improvethe power and efficiency of epistasis detection

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is partly supported by National NaturalScience Foundation of China (Grant nos 6152010600631571364 61732012 61532008 U1611265 61672382 6140233461472280 61472173 61572447 61672203 61472282 and61373098) and China Postdoctoral Science Foundation(Grant nos 2014M561513 2015M580352 2017M611619 and2016M601646) Guangxi Bagui Scholars Program SpecialFund

Complexity 9

References

[1] J N Hirschhorn and M J Daly ldquoGenome-wide associationstudies for common diseases and complex traitsrdquo Nature Re-views Genetics vol 6 no 2 pp 95ndash108 2005

[2] B N Howie P Donnelly and J Marchini ldquoA flexible andaccurate genotype imputation method for the next generationof genome-wide association studiesrdquo PLoS Genetics vol 5 no6 Article ID e1000529 2009

[3] T AManolio F S Collins N J Cox et al ldquoFinding themissingheritability of complex diseasesrdquo Nature vol 461 no 7265 pp747ndash753 2009

[4] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[5] B Stubbs D Vancampfort M De Hert and A J MitchellldquoThe prevalence and predictors of type two diabetes mellitus inpeoplewith schizophrenia a systematic review and comparativemeta-analysisrdquo Acta Psychiatrica Scandinavica vol 132 no 2pp 144ndash157 2015

[6] K P Liao ldquoCardiovascular disease in patients with rheumatoidarthritisrdquo Trends in Cardiovascular Medicine vol 27 no 2 pp136ndash140 2017

[7] YMao N R London LMa D Dvorkin and Y Da ldquoDetectionof SNP epistasis effects of quantitative traits using an extendedKempthorne modelrdquo Physiological Genomics vol 28 no 1 pp46ndash52 2006

[8] W Zhang J Zhu E E Schadt and J S Liu ldquoA Bayesianpartition method for detecting pleiotropic and epistatic eQTLmodulesrdquo PLoS Computational Biology vol 6 no 1 Article IDe1000642 2010

[9] M Kang C Zhang H-W Chun C Ding C Liu and JGao ldquoEQTL epistasis Detecting epistatic effects and inferringhierarchical relationships of genes in biological pathwaysrdquoBioinformatics vol 31 no 5 pp 656ndash664 2015

[10] H Lin D Chen P Huang et al ldquoSNP interaction patternidentifier (SIPI) an intensive search for SNPndashSNP interactionpatternsrdquo Bioinformatics 2016

[11] R L Prentice and L Qi ldquoAspects of the design and analysisof high-dimensional SNP studies for disease risk estimationrdquoBiostatistics vol 7 no 3 pp 339ndash354 2006

[12] S-P Deng L Zhu and D-S Huang ldquoMining the bladdercancer-associated genes by an integrated strategy for the con-struction and analysis of differential co-expression networksrdquoBMC Genomics vol 16 no 3 article no S4 2015

[13] S-P Deng and D-S Huang ldquoSFAPS An R package forstructurefunction analysis of protein sequences based oninformational spectrum methodrdquo Methods vol 69 no 3 pp207ndash212 2014

[14] J H Moore J M Lamb N J Brown and D E Vaughan ldquoAcomparison of combinatorial partitioning and linear regressionfor the detection of epistatic effects of the ACE ID and PAI-1 4G5G polymorphisms on plasma PAI-1 Levelsrdquo ClinicalGenetics vol 62 no 1 pp 74ndash79 2002

[15] B M Michael R E Neapolitan X Jiang and V ShyamldquoLearning genetic epistasis using Bayesian network scoringcriteriardquo BMC Bioinformatics vol 12 no 1 89 pages 2011

[16] Y Wang X Liu K Robbins and R Rekaya ldquoAntEpiSeekerdetecting epistatic interactions for case-control studies using atwo-stage ant colony optimization algorithmrdquo BMC ResearchNotes vol 3 article 117 2010

[17] Y Zhang and J S Liu ldquoBayesian inference of epistatic interac-tions in case-control studiesrdquo Nature Genetics vol 39 no 9 pp1167ndash1173 2007

[18] M Dorigo M Birattari and C Blum ldquoAnt colony optimizationand swarm intelligencerdquo SpringerVerlag vol 5217 no 8 pp 767ndash771 2004

[19] T Stutzle M Lopez-Ibanez P Pellegrini et al ldquoParameteradaptation in ant colony optimizationrdquoAutonomous Search vol9783642214349 pp 191ndash215 2012

[20] C Blum and M Sampels ldquoAn ant colony optimization algo-rithm for shop scheduling problemsrdquo Journal of MathematicalModelling and Algorithms vol 3 no 3 pp 285ndash308 2004

[21] R Musa J-P Arnaout and H Jung ldquoAnt colony optimizationalgorithm to solve for the transportation problem of cross-docking networkrdquo Computers and Industrial Engineering vol59 no 1 pp 85ndash92 2010

[22] G N Varela and M C Sinclair ldquoAnt colony optimisation forvirtual-wavelength-path routing and wavelength allocationrdquo inProceedings of the 1999 Congress on Evolutionary Computation(CEC rsquo99) pp 1809ndash1816 Washington DC USA July 1999

[23] K M Sim andW H Sun ldquoAnt colony optimization for routingand load-balancing survey and new directionsrdquo SystemsManampCybernetics Part A Systems Humans IEEE Transactions on vol33 no 5 pp 560ndash572 2003

[24] S-H Ngo X Jiang and S Horiguchi ldquoAdaptive routing andwavelength assignment using ant-based algorithmrdquo in Proceed-ings of the 2004 12th IEEE International Conference onNetworksICON 2004 - Unity in Diversity pp 482ndash486 November 2004

[25] S I Vrieze ldquoModel selection and psychological theory adiscussion of the differences between the Akaike informationcriterion (AIC) and the Bayesian information criterion (BIC)rdquoPsychological Methods vol 17 no 2 pp 228ndash243 2012

[26] D-S Huang and J-X Du ldquoA constructive hybrid structureoptimization methodology for radial basis probabilistic neuralnetworksrdquo IEEE Transactions onNeural Networks vol 19 no 12pp 2099ndash2115 2008

[27] B V North D Curtis and P C Sham ldquoApplication of logisticregression to case-control association studies involving twocausative locirdquo Human Heredity vol 59 no 2 pp 79ndash87 2005

[28] P-J Jing and H-B Shen ldquoMACOED A multi-objective antcolony optimization algorithm for SNP epistasis detection ingenome-wide association studiesrdquo Bioinformatics vol 31 no 5pp 634ndash641 2015

[29] N Ryman ldquoCHIFISH A computer program testing for geneticheterogeneity at multiple loci using chi-square and Fisherrsquosexact testrdquo Molecular Ecology Notes vol 6 no 1 pp 285ndash2872006

[30] C R Mehta and N R Patel ldquoA network algorithm for perform-ing Fisherrsquos exact test in r times c contingency tablesrdquo Journal of theAmerican Statistical Association vol 78 no 382 pp 427ndash4341983

[31] B Sobrino M Brion and A Carracedo ldquoSNPs in forensicgenetics A review on SNP typing methodologiesrdquo ForensicScience International vol 154 no 2-3 pp 181ndash194 2005

[32] O Shoval H Sheftel G Shinar et al ldquoEvolutionary trade-offs pareto optimality and the geometry of phenotype spacerdquoScience vol 336 no 6085 pp 1157ndash1160 2012

[33] D-SHuang andW Jiang ldquoA general CPL-AdSmethodology forfixing dynamic parameters in dual environmentsrdquo IEEE Trans-actions on Systems Man and Cybernetics Part B Cyberneticsvol 42 no 5 pp 1489ndash1500 2012

10 Complexity

[34] L Zhu W-L Guo S-P Deng and D-S Huang ldquoChIP-PITenhancing the analysis of chip-seq data using convex-relaxedpair-wise interaction tensor decompositionrdquo IEEEACM Trans-actions onComputational Biology and Bioinformatics vol 13 no1 pp 55ndash63 2016

[35] C Angione G Carapezza J Costanza P Lio and G NicosialdquoPareto optimality in organelle energy metabolism analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 4 pp 1032ndash1044 2013

[36] R A Fisher ldquoOn the Interpretation of 1205942 from ContingencyTables and the Calculation of Prdquo Journal of the Royal StatisticalSociety vol 85 no 1 p 87 1922

[37] A Agresti ldquoA survey of exact inference for contingency tablesrdquoStatistical Science vol 7 no 1 pp 131ndash153 1992

[38] B Wenzheng C Yuehui and W Dong ldquoPrediction of proteinstructure classes with flexible neural treerdquo Bio-Medical Materi-als and Engineering vol 24 no 6 pp 3797ndash3806 2014

[39] L Zhu Z-H You D-S Huang and B Wang ldquot-LSE anovel robust geometric approach for modeling protein-proteininteraction networksrdquo PLoS ONE vol 8 no 4 Article IDe58368 2013

[40] C-H Zheng L Zhang V T-Y Ng C K Shiu and D-SHuang ldquoMolecular pattern discovery based on penalized ma-trix decompositionrdquo IEEEACMTransactions onComputationalBiology and Bioinformatics vol 8 no 6 pp 1592ndash1603 2011

[41] D-S Huang and H-J Yu ldquoNormalized feature vectors a novelalignment-free sequence comparison method based on thenumbers of adjacent amino acidsrdquo IEEEACM Transactions onComputational Biology and Bioinformatics vol 10 no 2 pp457ndash467 2013

[42] J Marchini P Donnelly and L R Cardon ldquoGenome-widestrategies for detecting multiple loci that influence complexdiseasesrdquo Nature Genetics vol 37 no 4 pp 413ndash417 2005

[43] R JiangW Tang XWu andW Fu ldquoA random forest approachto the detection of epistatic interactions in case-control studiesrdquoBMC Bioinformatics vol 10 no 1 article S65 2009

[44] J Kruppa A Ziegler and I R Konig ldquoRisk estimation and riskprediction using machine-learning methodsrdquoHuman Geneticsvol 131 no 10 pp 1639ndash1654 2012

[45] D-S Huang and C-H Zheng ldquoIndependent componentanalysis-based penalized discriminantmethod for tumor classi-fication using gene expression datardquo Bioinformatics vol 22 no15 pp 1855ndash1862 2006

[46] RWMahley K HWeisgraber and Y Huang ldquoApolipoproteinE4 a causative factor and therapeutic target in neuropathologyincluding Alzheimerrsquos diseaserdquo Proceedings of the National Aca-demy of Sciences of the United States of America vol 103 no 15pp 5644ndash5651 2006

[47] E M Reiman J A Webster A J Myers et al ldquoGAB2 allelesmodify Alzheimerrsquos Risk in APOE 1205764 carriersrdquo Neuron vol 54no 5 pp 713ndash720 2007

[48] C-H Zheng D-S Huang L Zhang and X-Z Kong ldquoTumorclustering using nonnegative matrix factorization with gene se-lectionrdquo IEEE Transactions on Information Technology in Bio-medicine vol 13 no 4 pp 599ndash607 2009

[49] S-P Deng L Zhu and D-S Huang ldquoPredicting hub genesassociated with cervical cancer through gene co-expressionnetworksrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 13 no 1 pp 27ndash35 2016

[50] L Zhu S-P Deng and D-S Huang ldquoA two-stage geometricmethod for pruning unreliable links in protein-protein net-worksrdquo IEEE Transactions on Nanobioscience vol 14 no 5 pp528ndash534 2015

[51] D-S Huang L Zhang KHan S Deng K Yang andH ZhangldquoPrediction of protein-protein interactions based on protein-protein correlation using least squares regressionrdquo CurrentProtein and Peptide Science vol 15 no 6 pp 553ndash560 2014

[52] D-S Huang Systematic Theory of Neural Networks for Pat-ternRecognition Publishing House of Electronic Industry of ChinaMay 1996

[53] D-S Huang ldquoRadial basis probabilistic neural networksmodeland applicationrdquo International Journal of Pattern Recognitionand Artificial Intelligence vol 13 no 7 pp 1083ndash1101 1999

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

Complexity 9

References

[1] J N Hirschhorn and M J Daly ldquoGenome-wide associationstudies for common diseases and complex traitsrdquo Nature Re-views Genetics vol 6 no 2 pp 95ndash108 2005

[2] B N Howie P Donnelly and J Marchini ldquoA flexible andaccurate genotype imputation method for the next generationof genome-wide association studiesrdquo PLoS Genetics vol 5 no6 Article ID e1000529 2009

[3] T AManolio F S Collins N J Cox et al ldquoFinding themissingheritability of complex diseasesrdquo Nature vol 461 no 7265 pp747ndash753 2009

[4] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[5] B Stubbs D Vancampfort M De Hert and A J MitchellldquoThe prevalence and predictors of type two diabetes mellitus inpeoplewith schizophrenia a systematic review and comparativemeta-analysisrdquo Acta Psychiatrica Scandinavica vol 132 no 2pp 144ndash157 2015

[6] K P Liao ldquoCardiovascular disease in patients with rheumatoidarthritisrdquo Trends in Cardiovascular Medicine vol 27 no 2 pp136ndash140 2017

[7] YMao N R London LMa D Dvorkin and Y Da ldquoDetectionof SNP epistasis effects of quantitative traits using an extendedKempthorne modelrdquo Physiological Genomics vol 28 no 1 pp46ndash52 2006

[8] W Zhang J Zhu E E Schadt and J S Liu ldquoA Bayesianpartition method for detecting pleiotropic and epistatic eQTLmodulesrdquo PLoS Computational Biology vol 6 no 1 Article IDe1000642 2010

[9] M Kang C Zhang H-W Chun C Ding C Liu and JGao ldquoEQTL epistasis Detecting epistatic effects and inferringhierarchical relationships of genes in biological pathwaysrdquoBioinformatics vol 31 no 5 pp 656ndash664 2015

[10] H Lin D Chen P Huang et al ldquoSNP interaction patternidentifier (SIPI) an intensive search for SNPndashSNP interactionpatternsrdquo Bioinformatics 2016

[11] R L Prentice and L Qi ldquoAspects of the design and analysisof high-dimensional SNP studies for disease risk estimationrdquoBiostatistics vol 7 no 3 pp 339ndash354 2006

[12] S-P Deng L Zhu and D-S Huang ldquoMining the bladdercancer-associated genes by an integrated strategy for the con-struction and analysis of differential co-expression networksrdquoBMC Genomics vol 16 no 3 article no S4 2015

[13] S-P Deng and D-S Huang ldquoSFAPS An R package forstructurefunction analysis of protein sequences based oninformational spectrum methodrdquo Methods vol 69 no 3 pp207ndash212 2014

[14] J H Moore J M Lamb N J Brown and D E Vaughan ldquoAcomparison of combinatorial partitioning and linear regressionfor the detection of epistatic effects of the ACE ID and PAI-1 4G5G polymorphisms on plasma PAI-1 Levelsrdquo ClinicalGenetics vol 62 no 1 pp 74ndash79 2002

[15] B M Michael R E Neapolitan X Jiang and V ShyamldquoLearning genetic epistasis using Bayesian network scoringcriteriardquo BMC Bioinformatics vol 12 no 1 89 pages 2011

[16] Y Wang X Liu K Robbins and R Rekaya ldquoAntEpiSeekerdetecting epistatic interactions for case-control studies using atwo-stage ant colony optimization algorithmrdquo BMC ResearchNotes vol 3 article 117 2010

[17] Y Zhang and J S Liu ldquoBayesian inference of epistatic interac-tions in case-control studiesrdquo Nature Genetics vol 39 no 9 pp1167ndash1173 2007

[18] M Dorigo M Birattari and C Blum ldquoAnt colony optimizationand swarm intelligencerdquo SpringerVerlag vol 5217 no 8 pp 767ndash771 2004

[19] T Stutzle M Lopez-Ibanez P Pellegrini et al ldquoParameteradaptation in ant colony optimizationrdquoAutonomous Search vol9783642214349 pp 191ndash215 2012

[20] C Blum and M Sampels ldquoAn ant colony optimization algo-rithm for shop scheduling problemsrdquo Journal of MathematicalModelling and Algorithms vol 3 no 3 pp 285ndash308 2004

[21] R Musa J-P Arnaout and H Jung ldquoAnt colony optimizationalgorithm to solve for the transportation problem of cross-docking networkrdquo Computers and Industrial Engineering vol59 no 1 pp 85ndash92 2010

[22] G N Varela and M C Sinclair ldquoAnt colony optimisation forvirtual-wavelength-path routing and wavelength allocationrdquo inProceedings of the 1999 Congress on Evolutionary Computation(CEC rsquo99) pp 1809ndash1816 Washington DC USA July 1999

[23] K M Sim andW H Sun ldquoAnt colony optimization for routingand load-balancing survey and new directionsrdquo SystemsManampCybernetics Part A Systems Humans IEEE Transactions on vol33 no 5 pp 560ndash572 2003

[24] S-H Ngo X Jiang and S Horiguchi ldquoAdaptive routing andwavelength assignment using ant-based algorithmrdquo in Proceed-ings of the 2004 12th IEEE International Conference onNetworksICON 2004 - Unity in Diversity pp 482ndash486 November 2004

[25] S I Vrieze ldquoModel selection and psychological theory adiscussion of the differences between the Akaike informationcriterion (AIC) and the Bayesian information criterion (BIC)rdquoPsychological Methods vol 17 no 2 pp 228ndash243 2012

[26] D-S Huang and J-X Du ldquoA constructive hybrid structureoptimization methodology for radial basis probabilistic neuralnetworksrdquo IEEE Transactions onNeural Networks vol 19 no 12pp 2099ndash2115 2008

[27] B V North D Curtis and P C Sham ldquoApplication of logisticregression to case-control association studies involving twocausative locirdquo Human Heredity vol 59 no 2 pp 79ndash87 2005

[28] P-J Jing and H-B Shen ldquoMACOED A multi-objective antcolony optimization algorithm for SNP epistasis detection ingenome-wide association studiesrdquo Bioinformatics vol 31 no 5pp 634ndash641 2015

[29] N Ryman ldquoCHIFISH A computer program testing for geneticheterogeneity at multiple loci using chi-square and Fisherrsquosexact testrdquo Molecular Ecology Notes vol 6 no 1 pp 285ndash2872006

[30] C R Mehta and N R Patel ldquoA network algorithm for perform-ing Fisherrsquos exact test in r times c contingency tablesrdquo Journal of theAmerican Statistical Association vol 78 no 382 pp 427ndash4341983

[31] B Sobrino M Brion and A Carracedo ldquoSNPs in forensicgenetics A review on SNP typing methodologiesrdquo ForensicScience International vol 154 no 2-3 pp 181ndash194 2005

[32] O Shoval H Sheftel G Shinar et al ldquoEvolutionary trade-offs pareto optimality and the geometry of phenotype spacerdquoScience vol 336 no 6085 pp 1157ndash1160 2012

[33] D-SHuang andW Jiang ldquoA general CPL-AdSmethodology forfixing dynamic parameters in dual environmentsrdquo IEEE Trans-actions on Systems Man and Cybernetics Part B Cyberneticsvol 42 no 5 pp 1489ndash1500 2012

10 Complexity

[34] L Zhu W-L Guo S-P Deng and D-S Huang ldquoChIP-PITenhancing the analysis of chip-seq data using convex-relaxedpair-wise interaction tensor decompositionrdquo IEEEACM Trans-actions onComputational Biology and Bioinformatics vol 13 no1 pp 55ndash63 2016

[35] C Angione G Carapezza J Costanza P Lio and G NicosialdquoPareto optimality in organelle energy metabolism analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 4 pp 1032ndash1044 2013

[36] R A Fisher ldquoOn the Interpretation of 1205942 from ContingencyTables and the Calculation of Prdquo Journal of the Royal StatisticalSociety vol 85 no 1 p 87 1922

[37] A Agresti ldquoA survey of exact inference for contingency tablesrdquoStatistical Science vol 7 no 1 pp 131ndash153 1992

[38] B Wenzheng C Yuehui and W Dong ldquoPrediction of proteinstructure classes with flexible neural treerdquo Bio-Medical Materi-als and Engineering vol 24 no 6 pp 3797ndash3806 2014

[39] L Zhu Z-H You D-S Huang and B Wang ldquot-LSE anovel robust geometric approach for modeling protein-proteininteraction networksrdquo PLoS ONE vol 8 no 4 Article IDe58368 2013

[40] C-H Zheng L Zhang V T-Y Ng C K Shiu and D-SHuang ldquoMolecular pattern discovery based on penalized ma-trix decompositionrdquo IEEEACMTransactions onComputationalBiology and Bioinformatics vol 8 no 6 pp 1592ndash1603 2011

[41] D-S Huang and H-J Yu ldquoNormalized feature vectors a novelalignment-free sequence comparison method based on thenumbers of adjacent amino acidsrdquo IEEEACM Transactions onComputational Biology and Bioinformatics vol 10 no 2 pp457ndash467 2013

[42] J Marchini P Donnelly and L R Cardon ldquoGenome-widestrategies for detecting multiple loci that influence complexdiseasesrdquo Nature Genetics vol 37 no 4 pp 413ndash417 2005

[43] R JiangW Tang XWu andW Fu ldquoA random forest approachto the detection of epistatic interactions in case-control studiesrdquoBMC Bioinformatics vol 10 no 1 article S65 2009

[44] J Kruppa A Ziegler and I R Konig ldquoRisk estimation and riskprediction using machine-learning methodsrdquoHuman Geneticsvol 131 no 10 pp 1639ndash1654 2012

[45] D-S Huang and C-H Zheng ldquoIndependent componentanalysis-based penalized discriminantmethod for tumor classi-fication using gene expression datardquo Bioinformatics vol 22 no15 pp 1855ndash1862 2006

[46] RWMahley K HWeisgraber and Y Huang ldquoApolipoproteinE4 a causative factor and therapeutic target in neuropathologyincluding Alzheimerrsquos diseaserdquo Proceedings of the National Aca-demy of Sciences of the United States of America vol 103 no 15pp 5644ndash5651 2006

[47] E M Reiman J A Webster A J Myers et al ldquoGAB2 allelesmodify Alzheimerrsquos Risk in APOE 1205764 carriersrdquo Neuron vol 54no 5 pp 713ndash720 2007

[48] C-H Zheng D-S Huang L Zhang and X-Z Kong ldquoTumorclustering using nonnegative matrix factorization with gene se-lectionrdquo IEEE Transactions on Information Technology in Bio-medicine vol 13 no 4 pp 599ndash607 2009

[49] S-P Deng L Zhu and D-S Huang ldquoPredicting hub genesassociated with cervical cancer through gene co-expressionnetworksrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 13 no 1 pp 27ndash35 2016

[50] L Zhu S-P Deng and D-S Huang ldquoA two-stage geometricmethod for pruning unreliable links in protein-protein net-worksrdquo IEEE Transactions on Nanobioscience vol 14 no 5 pp528ndash534 2015

[51] D-S Huang L Zhang KHan S Deng K Yang andH ZhangldquoPrediction of protein-protein interactions based on protein-protein correlation using least squares regressionrdquo CurrentProtein and Peptide Science vol 15 no 6 pp 553ndash560 2014

[52] D-S Huang Systematic Theory of Neural Networks for Pat-ternRecognition Publishing House of Electronic Industry of ChinaMay 1996

[53] D-S Huang ldquoRadial basis probabilistic neural networksmodeland applicationrdquo International Journal of Pattern Recognitionand Artificial Intelligence vol 13 no 7 pp 1083ndash1101 1999

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

10 Complexity

[34] L Zhu W-L Guo S-P Deng and D-S Huang ldquoChIP-PITenhancing the analysis of chip-seq data using convex-relaxedpair-wise interaction tensor decompositionrdquo IEEEACM Trans-actions onComputational Biology and Bioinformatics vol 13 no1 pp 55ndash63 2016

[35] C Angione G Carapezza J Costanza P Lio and G NicosialdquoPareto optimality in organelle energy metabolism analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 4 pp 1032ndash1044 2013

[36] R A Fisher ldquoOn the Interpretation of 1205942 from ContingencyTables and the Calculation of Prdquo Journal of the Royal StatisticalSociety vol 85 no 1 p 87 1922

[37] A Agresti ldquoA survey of exact inference for contingency tablesrdquoStatistical Science vol 7 no 1 pp 131ndash153 1992

[38] B Wenzheng C Yuehui and W Dong ldquoPrediction of proteinstructure classes with flexible neural treerdquo Bio-Medical Materi-als and Engineering vol 24 no 6 pp 3797ndash3806 2014

[39] L Zhu Z-H You D-S Huang and B Wang ldquot-LSE anovel robust geometric approach for modeling protein-proteininteraction networksrdquo PLoS ONE vol 8 no 4 Article IDe58368 2013

[40] C-H Zheng L Zhang V T-Y Ng C K Shiu and D-SHuang ldquoMolecular pattern discovery based on penalized ma-trix decompositionrdquo IEEEACMTransactions onComputationalBiology and Bioinformatics vol 8 no 6 pp 1592ndash1603 2011

[41] D-S Huang and H-J Yu ldquoNormalized feature vectors a novelalignment-free sequence comparison method based on thenumbers of adjacent amino acidsrdquo IEEEACM Transactions onComputational Biology and Bioinformatics vol 10 no 2 pp457ndash467 2013

[42] J Marchini P Donnelly and L R Cardon ldquoGenome-widestrategies for detecting multiple loci that influence complexdiseasesrdquo Nature Genetics vol 37 no 4 pp 413ndash417 2005

[43] R JiangW Tang XWu andW Fu ldquoA random forest approachto the detection of epistatic interactions in case-control studiesrdquoBMC Bioinformatics vol 10 no 1 article S65 2009

[44] J Kruppa A Ziegler and I R Konig ldquoRisk estimation and riskprediction using machine-learning methodsrdquoHuman Geneticsvol 131 no 10 pp 1639ndash1654 2012

[45] D-S Huang and C-H Zheng ldquoIndependent componentanalysis-based penalized discriminantmethod for tumor classi-fication using gene expression datardquo Bioinformatics vol 22 no15 pp 1855ndash1862 2006

[46] RWMahley K HWeisgraber and Y Huang ldquoApolipoproteinE4 a causative factor and therapeutic target in neuropathologyincluding Alzheimerrsquos diseaserdquo Proceedings of the National Aca-demy of Sciences of the United States of America vol 103 no 15pp 5644ndash5651 2006

[47] E M Reiman J A Webster A J Myers et al ldquoGAB2 allelesmodify Alzheimerrsquos Risk in APOE 1205764 carriersrdquo Neuron vol 54no 5 pp 713ndash720 2007

[48] C-H Zheng D-S Huang L Zhang and X-Z Kong ldquoTumorclustering using nonnegative matrix factorization with gene se-lectionrdquo IEEE Transactions on Information Technology in Bio-medicine vol 13 no 4 pp 599ndash607 2009

[49] S-P Deng L Zhu and D-S Huang ldquoPredicting hub genesassociated with cervical cancer through gene co-expressionnetworksrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 13 no 1 pp 27ndash35 2016

[50] L Zhu S-P Deng and D-S Huang ldquoA two-stage geometricmethod for pruning unreliable links in protein-protein net-worksrdquo IEEE Transactions on Nanobioscience vol 14 no 5 pp528ndash534 2015

[51] D-S Huang L Zhang KHan S Deng K Yang andH ZhangldquoPrediction of protein-protein interactions based on protein-protein correlation using least squares regressionrdquo CurrentProtein and Peptide Science vol 15 no 6 pp 553ndash560 2014

[52] D-S Huang Systematic Theory of Neural Networks for Pat-ternRecognition Publishing House of Electronic Industry of ChinaMay 1996

[53] D-S Huang ldquoRadial basis probabilistic neural networksmodeland applicationrdquo International Journal of Pattern Recognitionand Artificial Intelligence vol 13 no 7 pp 1083ndash1101 1999

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of