14
Research Article A Genetic Algorithm Based Support Vector Machine Model for Blood-Brain Barrier Penetration Prediction Daqing Zhang, 1 Jianfeng Xiao, 2 Nannan Zhou, 3 Mingyue Zheng, 2 Xiaomin Luo, 2 Hualiang Jiang, 2,3 and Kaixian Chen 2 1 Center for Systems Biology, Soochow University, Suzhou 215006, China 2 Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China 3 School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China Correspondence should be addressed to Mingyue Zheng; [email protected], Xiaomin Luo; [email protected], and Hualiang Jiang; [email protected] Received 24 February 2015; Revised 7 May 2015; Accepted 19 May 2015 Academic Editor: S´ ılvia A. Sousa Copyright © 2015 Daqing Zhang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. e results show that our GA/SVM model is more accurate than other currently available log BB models. erefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. 1. Introduction e blood-brain barrier (BBB) plays important roles in separating the central nervous system (CNS) from circulating blood and maintaining brain homeostasis. BBB penetration, which may be desired or not depending on the therapeutic target, is a critical character in chemical toxicological studies and in drug design. Compounds can cross the BBB by passive diffusion or by means of a variety of catalyzed transport systems that can carry compounds into the brain (carrier- mediated transport, receptor-mediated transcytosis) or out of the brain (active efflux). Various parameters are used for predicting BBB penetration such as CNS+/, log , and log . CNS+/is a qualitative property denoting the compound’s activity (CNS+) or inactivity (CNS) against a CNS target with its BBB penetration [1]. e problem with CNS+/datasets is that CNS activity implies BBB permeation, while CNS inactivity might be due to factors other than nonpermeation, such as the fact that compounds might be rapidly metabolized or effluxed from the brain. Log , which is defined as logarithm of Brain/Blood partitioning ratio at steady state [2], is by far the most widely used parameter for BBB penetration. However, this parameter may also result in misleading conclusions because it ignores the main parts of process of permeability [3]. Log PS, which is defined as the logarithm of permeability-surface area product reflecting the rate of brain permeation, is superior to but more difficult to measure compared to log [4]. In vivo brain uptake methods may be the most reliable evaluation of BBB penetration. However, the low-throughput, expensive, and labor-intensive characteristics make these methods inappli- cable in early drug discovery stages. For these reasons, in vitro Hindawi Publishing Corporation BioMed Research International Volume 2015, Article ID 292683, 13 pages http://dx.doi.org/10.1155/2015/292683

Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

Research ArticleA Genetic Algorithm Based Support Vector MachineModel for Blood-Brain Barrier Penetration Prediction

Daqing Zhang1 Jianfeng Xiao2 Nannan Zhou3 Mingyue Zheng2

Xiaomin Luo2 Hualiang Jiang23 and Kaixian Chen2

1Center for Systems Biology Soochow University Suzhou 215006 China2Drug Discovery and Design Center State Key Laboratory of Drug Research Shanghai Institute of Materia MedicaChinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China3School of Pharmacy East China University of Science and Technology Shanghai 200237 China

Correspondence should be addressed to Mingyue Zheng myzhengsimmaccn Xiaomin Luo xmluosimmaccnand Hualiang Jiang hljiangsimmaccn

Received 24 February 2015 Revised 7 May 2015 Accepted 19 May 2015

Academic Editor Sılvia A Sousa

Copyright copy 2015 Daqing Zhang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain Supportvector machine (SVM) is a kernel-basedmachine learningmethod that is widely used in QSAR study For a successful SVMmodelthe kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy In moststudies they are treated as two independent problems but it has been proven that they could affect each other We designed andimplemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied itto the BBB penetration predictionThe results show that our GASVMmodel is more accurate than other currently available log BBmodelsTherefore to optimize both SVMparameters and feature subset simultaneously with genetic algorithm is a better approachthan other methods that treat the two problems separately Analysis of our log BBmodel suggests that carboxylic acid group polarsurface area (PSA)hydrogen-bonding ability lipophilicity and molecular charge play important role in BBB penetration Amongthose properties relevant to BBB penetration lipophilicity could enhance the BBB penetration while all the others are negativelycorrelated with BBB penetration

1 Introduction

The blood-brain barrier (BBB) plays important roles inseparating the central nervous system (CNS) from circulatingblood and maintaining brain homeostasis BBB penetrationwhich may be desired or not depending on the therapeutictarget is a critical character in chemical toxicological studiesand in drug design Compounds can cross the BBB by passivediffusion or by means of a variety of catalyzed transportsystems that can carry compounds into the brain (carrier-mediated transport receptor-mediated transcytosis) or outof the brain (active efflux) Various parameters are usedfor predicting BBB penetration such as CNS+minus log119861119861and log119875119878 CNS+minus is a qualitative property denoting thecompoundrsquos activity (CNS+) or inactivity (CNSminus) againsta CNS target with its BBB penetration [1] The problem

with CNS+minus datasets is that CNS activity implies BBBpermeation while CNS inactivity might be due to factorsother than nonpermeation such as the fact that compoundsmight be rapidly metabolized or effluxed from the brain Log119861119861 which is defined as logarithmof BrainBlood partitioningratio at steady state [2] is by far the most widely usedparameter for BBB penetrationHowever this parametermayalso result in misleading conclusions because it ignores themain parts of process of permeability [3] Log PS which isdefined as the logarithmof permeability-surface area productreflecting the rate of brain permeation is superior to butmoredifficult to measure compared to log119861119861 [4] In vivo brainuptake methods may be the most reliable evaluation of BBBpenetration However the low-throughput expensive andlabor-intensive characteristics make these methods inappli-cable in early drug discovery stages For these reasons in vitro

Hindawi Publishing CorporationBioMed Research InternationalVolume 2015 Article ID 292683 13 pageshttpdxdoiorg1011552015292683

2 BioMed Research International

and in silicomethods have been introduced As there is no onein vitro model which can mimic all properties of the in vivoBBB developing more reliable models remains challenging[4] So far a great number of in silico BBB models have beendeveloped and thoroughly reviewed [2 3 5ndash9] Because of thehigh complex nature of the BBB most computational modelsonly account for passive diffusion

Initial studies were focusing on making correlationbetween BBB permeability of small set of compounds andsimple descriptors and then revealed ldquorules of thumbrdquoTheserough models reflect some important relationships betweenBBB penetration and properties of compounds but havea problem of oversimplification [10 11] As the accumula-tion of new data various more sophisticated models werereported to predict BBB permeability Classification models[12] which were used widely explored for distinguishingbetween the molecules capable of being across the BBB andthose restricted to periphery These models often developedby using the same dataset of about 1500 drugs compiled byAdenot and Lahana [13] which is the largest single homoge-neous up-to-date source of qualitative data published Someothers [12 14] distinguishmolecules based on a certain log119861119861threshold However themain problem is the threshold whichis subjectively determined and not unified Most quantitativemodels were developed by buildingQSARmodels [10 15ndash18]Since different datasets and validation methods were usedit is difficult to compare the performance of these models[19] Recently Carpenter et al [20] developed a new modelpredicting the BBB penetration using molecular dynamicsimulations and received good results providing new threadof BBB permeability prediction Here we focused on log119861119861models of BBB penetration by passive diffusion

Various data mining methods have been employed inBBB penetration models such as multiple linear regression[21 22] partial least squares (PLS) regression [13 23]recursive partitioning [23 24] neural network [25ndash27] andsupport vector machine (SVM) [28ndash30] SVM which wasoriginally developed by Vapnik and coworkers [31] has beenextensively used and consistently achieves similar or superiorperformance compared to other machine learning methods[32] Its main idea is to map data points to a high dimensionspace with a kernel function and then these data points canbe separated by a hyper plane

For a successful SVM model kernel parameters of SVMand feature subset selection are the two most importantfactors affecting the prediction accuracy Various strate-gies have been adopted for the two problems Grid-basedalgorithm is one of the most straightforward strategies forparameter optimization which discretizes the parametersand then systematically searches every grid point to finda best combination of the parameters [33] However itsuse is limited due to the computational complexity andtime-consumption Gradient-basedmethods [34 35] are alsowidely used which require the kernel function and thescoring function differentiable to assess the performance ofthe parameters Evolutionarymethod [36] has also been usedand achieved promising results As for the feature selectiongenetic algorithms- (GA-) based [37ndash41] 119865-score basedfeature recursive elimination [42] and many other methods

[43ndash47] have been employedMost of thesemethods focus onfeature selection or parameters optimization separately [45]However the choice of feature subset influences the appropri-ate kernel parameters and vice versa [48] Hence the properway seems to address the two problems simultaneously GA[41] immune clonal algorithm (ICA) [49] and Bayesianapproach [50] have been recently used for simultaneouslyfeature selection and parameters optimization for SVM ongeneral classification problems In our study GA was usedto do parameter optimization and feature subset selectionsimultaneously and an SVM regressionmodel was developedfor the blood-brain barrier penetration prediction

2 Methods

The workflow used in this study for BBB penetration predic-tion is illustrated in Figure 1

21 Dataset and Molecular Descriptors The log119861119861 datasetused in this study was compiled by Abraham et al [51]which was a combination of both in vivo and in vitro dataincluding 302 substances (328 data points) Abraham et alapplied linear free energy relationship (LFER) to the datasetand obtained good correlation between log119861119861 values andLFERdescriptors plus two indicator variables [51] CODESSA[52] could not calculate descriptors for the first 5 gases ([Ar][Kr] [Ne] [Rn] and [Xe]) of the original dataset and theywere excluded from the dataset The final dataset contained297 compounds (323 data points) The indicator variablesof 119868V and AbsCarboxy used in Abrahamrsquos study [51] wereretained in this study 119868V was defined as 119868V = 1 for the invitro data and 119868V = 0 for the in vivo data AbsCarboxy was anindicator for carboxylic acid (AbsCarboxy = 1 for carboxylicacid otherwise AbsCarboxy = 0)

The initial structures in SMILES format were importedto Marvin [53] and exported in MDL MOL format AM1method in AMPAC [54] was used for optimization plusfrequencies and thermodynamic properties calculation Thegenerated output files were used by CODESSA to calculate alarge number of constitutional topological geometrical elec-trostatic quantum-chemical and thermodynamic descrip-torsMarvin was also used to calculate some physicochemicalproperties of the compound including log119875 log119863 polarsurface area (PSA) polarizability and refractivity All thesedescriptors and properties were used as candidate features inlater modeling

Features with missing values or having no change acrossthe data set were removed If the correlation coefficient oftwo features is higher than a specified cutoff value (0999999used here) then one of them is randomly chosen andremovedThe cutoff value used here is very high because veryhigh variable correlation does not mean absence of variablecomplementarity [55] A total number of 326 descriptorswereleft for further analysis However many highly correlatedfeatures have very similar physicochemical meanings In ourfinal analysis similar features were put together by theirphysicochemical meaning which we hope could unveil someunderlying molecular properties that determine the BBBpenetration

BioMed Research International 3

328 data points

Calculate descriptors

Remove bad values(323 data points and 326 features)

Data scale

Training settest set

GASVM

Abraham et al (2006)

Remove features with missing values

Scaled to [01]

Kennard-Stone method

Feature selection and parameter optimization simultaneously

AbsCarboxy log P log D PSA H-Bond number of

Remove features having no change over the dataset (SD = 0)Remove features having very high correlation ((1 minus r) le 1e minus 6)

atomsbondsrings charge and HOMOLUMO

Figure 1 Workflow of GASVMmodel for BBB penetration prediction

The dataset was then split into training set and testset using the Kennard-Stone method [56] which selects asubset of representative data points uniformly distributed inthe sample space [57] At start the Kennard-Stone methodchooses the data point that is the closest to the center of thedataset measured by Euclidean distance After that from allremaining data points the data point that is the furthest fromthose already selected is added to the training setThis processcontinues until the size of the training set reaches specifiedsize 260 data pointswere selected as training set and the other63 were used as test set

22 SVM Regression Details about SVM regression can befound in literatures [58ndash60] As in other multivariate statis-tical models the performance of SVM regression dependson the combination of several parameters In general 119862 is aregularization parameter that controls the tradeoff betweentraining error and model complexity If 119862 is too large themodel will have a high penalty for nonseparable points andmay store too many support vectors and get overfitting If itis too small the model may have underfitting Parameter 120576controls the width of the 120576-insensitive zone used to fit thetraining data The value of 120576 can affect the number of thesupport vectors used to construct the regression functionThe bigger 120576 is the fewer support vectors are selected Onthe other hand bigger 120576-values result in more flat estimatesHence both 119862 and 120576-values affect model complexity (butin a different way) The kernel type is another importantparameter In SVM regression radial basis function (RBF)(1) was the most commonly used kernel function for itsbetter generalization ability less number of parameters andless numerical difficulties [33] and was used in this studyParameter 120574 in RBF controls the amplitude of the RBF kerneland therefore controls the generalization ability of SVMregressionThe LIBSVM package (version 281) [61] was usedin this study for SVM regression calculation taking the form

119870(119909

119894 119909

119895) = exp (minus120574 100381710038171003817

1003817

1003817

119909

119894minus119909

119895

1003817

1003817

1003817

1003817

1003817

2) 120574 gt 0 (1)

where 119909119894and 119909

119895are training vectors (119894 = 119895 119909

119894= 119909

119895) and 120574 is

kernel parameter

23 Genetic Algorithms Genetic algorithms (GA) [41] arestochastic optimization and search method that mimicsbiological evolution as a problem-solving strategy They arevery flexible and attractive for optimization problems

Given a specific problem to solve the input to the GA isa set of potential solutions to that problem encoded in somefashion and a fitness function that allows each candidate tobe quantitatively evaluated (Figure 2) Selection mating andmutation justmimic the natural process For each generationindividuals are selected for reproduction according to theirfitness values Favorable individuals have a better chance tobe selected for reproduction and the offspring have chanceto mutate to keep diversity while the unfavorable individualsare less likely to survive After each generation whether theevolution is converged or the termination criteria are met ischecked if yes job is done if not the evolution goes intonext generation After many generations good individualswill dominate the population and we will get solutions thatare good enough for our problem

First in order to solve a problemwithGA each individualin the population should be represented by a chromosomeIn our study since the parameter optimization and featuresubset selection should be addressed simultaneously thechromosome is a combination of parameter genes and featuregene (Figure 3) where 119891

119899is an integer in the range of [1119873]

and 119873 is the number of candidate features for model con-struction A chromosome represents an individual in geneticalgorithms and parameters contained in chromosome couldbe used for SVM modeling Left part of the chromosome isthe parameter genes of which 119862 120574 and 120576 all are float genes

4 BioMed Research International

Define encoding strategy fitness function and GA parameters

Generate initial population (SVM parameters and selected features)

Fitness evaluation (MSE from 10-fold CV SVM)

Convergence

feature subset

Select mating and mutation

Yes

No

Optimized (C 120574 120576) and

Figure 2 Workflow of genetic algorithms

SVM parameters Selected features

Chromosome

C (float) 120574 (float) 120576 (float)

C (float) 120574 (float) 120576 (float) middot middot middot

middot middot middotf1(int) f2(int) f3(int) f4(int) fn(int)

f1(int) f2(int) f3(int) f4(int) fn(int)

Figure 3 Encoding of the chromosome

The feature gene is an array of integers and each integerrepresents a feature

Fitness function can be seen as a ruler which was usedto quantitatively evaluate and compare each candidate Inour study the mean squared error (MSE) of 10-fold crossvalidation (CV) for SVM was used as fitness function andsmaller fitness value indicated better individual Given atraining set containing 119899 compounds (1199091 1199101) (119909119899 119910119899) 119909119894is descriptor vector of compound 119894 and 119909

119894isin 119877

119899 119910119894is the

log119861119861 value and 119910119894isin minus1 +1 The objective function can

be calculated by

119891 (119909) =

119899

sum

119894=1(120572

119894minus120572

lowast

119894)119870 (119909

119894 119909) + 119887 (2)

120572

119894and 120572lowast

119894are Lagronia factors 119870(119909

119894 119909) is kernel of

radial basis function The MSE of 10-fold CV for SVM wascalculated by

MSE = 1119899

119899

sum

119894=1(

119883

119894minus119883

119894)

2 (3)

where 119899 is the number of all data points 119883119894is the predicted

value and119883119894is the experiment value

Tournament selection was used as the selection strategyin GA which selected the best 1 from 3 randomly chosencandidates The advantage of tournament selection overroulette wheel selection is that tournament selection does notneed to sort the whole population by fitness value

Since there are different types of genes in a chromosomedifferent mating strategies were used for different types ofgenes (Figure 4)

119881new = 1205731199011 + (1minus120573) 1199012 (4)

where 120573 uniformly distributed random number on theinterval [minus025 125] 119901

119899is the value of parent gene and119881new

is the value of child geneFor float genes the new value is a linear combination of

the parents (4) For feature gene uniform crossover is usedeach element of the child gene is selected randomly from thecorresponding items of parents

BioMed Research International 5

CParent A

CParent B

C

C

Child A

Child B

Uniform crossover

++ + +

++ + +

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

Vnew = 120573p1 + (1 minus 120573)p2

Figure 4 Mating strategy of GA

Table 1 Performance comparison of models with different number of features

Number of features Training (CV = 10) Prediction1199032 Parameters of SVMMSE 119903

2 Test set Training set 119862 120574 120576

4 01197 0674 0722 0740 388833 06081 014915 01042 0715 0770 0805 163419 07973 027436 00945 0744 0840 0829 133573 07158 015137 00959 074 0821 0843 343067 05218 015958 00883 0761 0834 0883 609596 05871 023579 00815 0777 0847 0864 37770 08764 0166310 00823 0776 0858 0903 152236 06247 0143411 00714 0804 0861 0891 56937 06531 0157312 00780 0787 0864 0905 72787 07428 0151513 00817 0778 0862 0922 41957 07791 0157414 00812 0778 0882 0917 148391 05002 0205415 00734 0799 0870 0919 49915 05231 01077

Again different mutation strategies were used for differ-ent types of genes For float genes the values were randomlymutated upward or downward The new value was given by

119881new =

119881 minus 120573 (119881 minus 119881min) if random () lt 05

119881 + 120573 (119881max minus 119881) if random () ge 05(5)

where 120573 was a random number distributed in [0 1] 119881 and119881new are values before and after mutation and 119881min and 119881maxare the minimum and maximum values allowed for a gene

For feature gene several points were first randomlychosen for mutation and then a random number in [1119873](119873 is the total number of features) was chosen as new featurewhile avoiding duplicate features The GA was terminatedwhen the evolution reached 1000 generations In our pilotstudy (data not shown) 1000 generations were enough for theGA to convergeThe other parameters for GAwere as followspopulation size 100 cross rate 08 mutation rate 01 elite size2 and number of new individuals in each generation 8

3 Results and Discussion

31 GASVM Performance GA was run with different num-ber of features from 4 to 15 For each number of featuresGA was run 50 times and the best model was chosen forfurther analysis From Table 1 and Figure 5 the overall trendof the GA showed the following (1) the accuracy of themodelincreased with the number of the features (2) the accuracyof the model on training set was better than the accuracy onthe test set which was then better than the accuracy of crossvalidation

As the feature number increases the complexity alsoincreases which will often increase the probability of overfit-ting A complex model is also difficult to interpret and applyin practical use so generally speaking we need to find abalance between the accuracy and complexity of themodel Itis observed that (Table 1 Figure 5(a)) the prediction accuracy(1199032 = 0744) of cross validation (119899 = 10) of the 6-featuremodel was similar to that of the Abrahamrsquos model (1199032 =075) which used all 328 data points (Table 2) As the numberof features increased from 6 to 15 the prediction accuracy

6 BioMed Research International

4 6 8 10 12 14 1604

05

06

07

08

09

10

Number of features

CVr2

CV r2

Test r2

Train r2

(a)

0 200 400 600 800 1000

01

02

03

04

05

06

07

08

MSE

(10-

fold

CV

)

Generation

MSE (10-fold CV)R2 (test set)

(b)

Figure 5 (a) Performance comparison of models with different number of features (b) Evolution of the best 6-feature model

Table 2 Comparison of most relevant QSAR studies on BBB permeability

Descriptors 119873train 119873test Methods 119903train2 Predictive accuracy

on test set Reference

Δlop119875 log119875and log119875cyc 20 mdash LinearRegression 069 mdash Young et al [77]

Excess molar refractiondipolaritypolarisability H-bond acidityand basicitySolute McGowan volume

148 30 LFER 075 119903test2= 073 Platts et al [66]

Δ119866

11988255 mdash Linear

Regression 082 mdash Lombardo et al [78]

PSA the octanolwater partitioncoefficient and the conformationalflexibility

56 7 MLR 085 119903test2= 080 Iyer et al [79]

CODESSADRAGON (482) 200 110 PLSSVM

083097

119903test2= 081

119903test2= 096

Golmohammadi et al [62]

Molecular (CODESSA-PRO) descriptors(5) 113 19 MLR 078 119903test

2= 077 Katritzky et al [15]

Molecular fragment (ISIDA) descriptors 112 19 MLR 090 119903test2= 083 Katritzky et al [15]

PSA log119875 the number of H-bondacceptors E-state and VSA 144 10

CombinatorialQSAR (KNN

SVM)091 119903test

2= 08 Zhang et al [17]

Abraham solute descriptors andindicators 328 mdash LFER 075 mdash Abraham et al [51]

Abraham solute descriptors andindicators 164 164 LFER 071 119904 = 025 MAE = 020 Abraham et al [51]

CODESSAMarvinindicator (6) 260 63 GA based SVM 083 119903test2 = 084 RMSE =

023

This research GASVMfinal model

119862 = 133573 120574 = 0715761 120576= 0151289

CODESSAMarvinindicator (236) 260 63 GA based SVM 097 119903test2 = 055 RMSE =

031

This research GridSVM119862 = 80 120574 = 0015625 120576 =

00625

CODESSAMarvinindicator (6) 260 63 GA based SVM 086 119903test2 = 058 RMSE =

029This research GridSVM119862 = 80 120574 = 10 120576 = 0125

BioMed Research International 7

minus2 minus1 0 1 2minus2

minus1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(a)

minus2 minus1minus2

minus1

0 1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(b)

Figure 6 Prediction accuracy of the final model on training set (a) and test set (b)

on training set increased from 0829 to 0919 while theprediction accuracy on test set only slightly increased from0840 to 0870 accordingly Take all these into considerationthe 6-feature model (Table 1 Figure 5(a)) was chosen as ourfinal model (Figure 6) of which the prediction accuracy onboth test set and training set was similar (119903train

2minus119903test2= 011)

and high enough (gt082)Figure 5(b) showed the evolution of the prediction per-

formance of the model (MSE and 119903test2) In the first 100

generations the MSE decreased very fast followed with aplatform stage from about 100 to 650 generations Anotherdecrease occurred at about 650 generations and then theevolution became stable at about 850 generationsThemodelsdid not improve much in the last 150 generations which mayimply a convergence

A tabular presentation of relevant studies regarding theprediction of the blood-brain distribution is shown inTable 2These models were constructed by using different statisticallearning methods yielding different prediction capabilitywith 119877test

2 ranging from 05 to over 09 Generally regressionby SVM appears to be more robust than traditional linearapproaches such as PLS andMLR with respect to the nonlin-ear effects induced bymultiple potentially cooperative factorsgoverning the BBB permeability For example the SVMmodel by Golmohammadi et al [62] yielded the highest119877test

2

on a test set containing 110 molecules However it shouldbe noted that direct comparison with results from previousstudies is usually inappropriate because of differences in theirdatasets In this study a combination of both in vivo andin vitro data compiled by Abraham et al [51] was used fordeveloping BBB prediction model which is of high dataquality and covers large chemical diversity space In additionto the data source kernel parameter optimization and featureselection are two crucial factors influencing the predictionaccuracy of SVM models To reduce the computational costmost of the existing models addressed the feature selectionand parameter optimization procedures separately In thisstudy we used a GA scheme to perform the kernel parameteroptimization and feature selection simultaneously which ismore efficient at searching the optimal feature subset space

Abrahamrsquos model [51] is the best model that is currentlypublicly available A comparison of our models with Abra-hamrsquos models was shown in Table 2 In Table 2 the last 3rows are models in our study The same dataset was used inAbrahamrsquos research and this study but the data set was splitinto different training set and test set (our model traintest =26063 Abrahamrsquosmodel trainingtest = 164164) 7 variableswere used in Abrahamrsquos model compared to 6 in our finalmodel The 1199032 values for training set in Abrahamrsquos 164164model and 3280 model were 071 and 075 respectivelycompared with 083 for our model It has to be noted that thesize of our training set (260) is bigger than Abrahamrsquos (164)

Our model was also compared with grid method imple-mented with Python toolkit (gridpy) shipped with libsvm[61] for parameter optimization First since grid methodcannot be used to select feature subset all 326 features wereused to construct a BBB prediction model The predictionaccuracy of the training set was very high (119903train

2= 097)

but that of the test set was disappointing (119903test2= 055) Then

we used the same feature set as our final model (6 features)The result was slightly better but still too bad for test setprediction (119903train

2= 086 119903test

2= 058)

So compared with the grid-based method our GA-basedmethod could get better accuracy with fewer features whichsuggested that GA could get much better combination ofparameters and feature subset This was also observed inotherrsquos study [48]

32 Feature Analysis An examination of the descriptors usedin the model could provide an insight into the molecularproperties that are most relevant to BBB penetration Table 3showed the features used in the final 6-feature model andtheir meanings In order to explore the relative importanceand the underlying molecular properties of the descriptorsthe most frequently used features in all 50 6-feature modelswere analyzed Table 4 showed the top 10 most frequentlyused features Interestingly AbsCarboxy an indicator of theexistence of carboxylic acidwas themost significant propertySome features with similar meaning were also found to occurin the models such as PSA related features (M PSA 74

8 BioMed Research International

Table 3 Features used in the final model

Name MeaningM log119875 log119875 (Marvin)HA dependent HDSA-2 [Zefirovrsquos PC] H-bond donor surface area related (CODESSA)M PSA 74 PSA at pH 74 (Marvin)AbsCarboxy Carboxylic acid indicator (Abraham)HA dependent HDCA-2SQRT(TMSA) [Zefirovrsquos PC] H-bond donor charged area related (CODESSA)Average Complementary Information content (order 0) Topology descriptor (CODESSA)

Table 4 The most frequently used features for all 6-feature modelsa

Number Feature name Occurrence(50 models) Meaning

11 AbsCarboxy 36 Indicator for carboxylic aciddagger

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

14 H-acceptor surface areatotal molecularsurface area

101 Topographic electronic index (all bonds) Zefirovrsquos PC 12 Topological electronic index for allbonded pairs of atomsbDagger

8 M PSA 74 11 PSA at pH 74csect

267 ESP-HASA H-acceptors surface area Quantum-Chemical PC 10 H-acceptor surface area

5 delta log119863 9 log119863 (pH 65) minus log119863 (pH 74)deand

7 M PSA 70 9 PSA at pH 70sect

138 HA dependent HDCA-2 [Zefirovrsquos PC] 9 H-donors charged surface area

6 M PSA 65 8 PSA at pH 65sect

1 M log119875 7 log119875andaRows with the same symbol could be categorized into the same groupbTopological electronic index is a feature to characterize the distribution of molecular charge 119879 = sum119873119861

(119894lt119895)(|119902119894 minus 119902119895|119903

2119894119895) where 119902119894 is net charge on 119894th atom and

119903119894119895 is the distance between two bonded atomsc74 is the pH in bloodd65 is the pH in intestinee119863 is the ratio of the sum of the concentrations of all species of a compound in octanol to the sum of the concentrations of all species of the compound inwater For neutral compounds log119863 is equal to log119875

M PSA 70 andM PSA 65) andH-bond related descriptors(number 267 268 and 138) So we decided to put similardescriptors into the same group (Figure 7) to find the under-lying properties affecting BBB penetration

According to the associated molecular properties ofthe features those frequently used features are catego-rized into 5 groups AbsCarboxy (indicator of carboxylicacid) H-bonding (H-bonding ability including H-bonddonoracceptor related features) PSA (molecular polar sur-face area related features) lipophilicity (including M log119875and delta log119863) and molecular charge (including chargeand topological electronic index related features)

The following is observed

(1) Interestingly AbsCarboxy was also the most signif-icant feature which occurred 36 times in total 50models This may indicate that the carboxylic acidgroup plays an important role in the BBB penetrationwhich is consistent with the study of Abraham et al[51]

3633

28

1612

0

5

10

15

20

25

30

35

40

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 7 Top features for all 6-feature models (50 in all)

(2) H-bonding (H-bond donoracceptor) related surfacearea polar surface area log119875 related features and

BioMed Research International 9

Table 5 Most frequently used features for all top models (number of features range from 4 to 15)lowast

Number Descriptor name Occurrence(120 models) Meaning

11 AbsCarboxy 84 Indicator for carboxylic aciddagger

5 delta log119863 35 log119863 (pH 65) minus log119863 (pH 74)and

7 M PSA 70 32 PSA at pH 70sect

1 M log119875 28 log119875and

138 HA dependent HDCA-2 [Zefirovrsquos PC] 27 H-donors charged surface area

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

27 H-acceptor surface areatotal molecularsurface area

6 M PSA 65 25 PSA at pH 65sect

167 PPSA-1 Partial positive surface area [Quantum-Chemical PC] 25 Partial positive surface areasect

101 Topographic electronic index (all bonds) Zefirovrsquos PC 23 Topological electronic index for allbonded pairs of atomsDagger

8 M PSA 74 21 PSA at pH 74sectlowastRows with the same symbol could be categorized into the same group

84

54

103 157

63

23

0

20

40

60

80

100

120

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 8 The most frequently used features for all top models

topological electronic index related features are alsosignificant

In order to further confirm the previous finding all top10 models from features = 4 to 15 were further analyzedand the result was almost the same (Table 5 Figure 8) Thetop 10 most frequently used feature sets were very similarCompared to the 6-feature models the only difference wasthat H-bond related feature number 267 is replaced with PSArelated feature number 167

Again the descriptors were analyzed by group Thecomposites of the groups were almost the same Given thatthere are 4 descriptors in PSA group AbsCarboxy was alsothe most significant property followed by PSA lipophilicityand H-bonding having similar occurences then followedby molecular charge with relatively low frequencies Thehigh consistency suggested that these groups of features hadsignficant impact on the BBB penetration ability

PSA andH-bonding descriptors are highly relevant prop-erties PSA is the molecular areas contributed by polar atoms

(nitrogen sulphur oxygen and phosphorus) andmost of thetime these polar atoms can be H-bond acceptor or donorIf PSA and H-bonding were merged into one group (119899 =157) they will become the most significant property groupof features

33 Properties Relevant to BBB Penetration

331 Carboxylic Acid Group It was proposed by Abraham etal [51] that carboxylic acid group played an important role inBBB penetration While it was commonly believed that themost important molecular properties related to BBB pene-tration were H-bonding ability lipophilicity and molecularcharge [63 64] However our study confirmed Abrahamrsquosconclusion and showed that the importance of carboxylicacid group in BBB penetration could be underestimated

In the models by Abraham et al [51] the indicatorvariable of carboxylic acid group has the largest negativecoefficient indicating its importance in BBB penetrationand is consistent with observations in our model that theindicator of carboxylic acid group is the most frequentlyused descriptor Zhao et al [23] tried to classify compoundsinto BBB positive or BBB negative groups using H-bondingrelated descriptors and the indicator of carboxylic acid groupwas also found to be important in their model Furthermoreour results are consistent with the fact that basic moleculeshave a better BBB penetration than the acid molecules [10]

The carboxylic acid groupmay affect the BBB penetrationthroughmolecular charge interactions since inmost cases thecarboxylic acid group will exist in the ionized form carryinga negative chargeThe carboxylic acid group could also affectthe BBB penetration by formingH-bondwith BBB and henceweaken the BBB penetration abilities of molecules

Abraham et al suggested that the presence of carboxylicacid group which acted to hinder BBB penetration wasnot only simply due to the intrinsic hydrogen bonding andpolarity properties of neutral acids [51] There were some

10 BioMed Research International

other ways in which the carboxylic acid groups could affectthe BBB penetration such as acidic drugs which couldbind to albumin [65] the ionization of the carboxylic acidgroups which could increase the excess molar refraction andhydrogen bonding basicity and finally the carboxylic acidgroups which may be removed from brain by some effluxmechanism [66]

332 Polar Surface Area and H-Bonding Ability As pointedout in our previous analysis PSA and H-bonding abilityare actually two highly correlated properties If these twogroups are merged they will be the most significant groupof properties even more significant than the carboxylic acidindicator (Figure 8) Norinder and Haeberlein [6] concludedthat hydrogen bonding term is a cornerstone in BBB penetra-tion prediction In Zhao et alrsquos study [23] PSA AbsCarboxynumber of H-bonding donors and positively charged formfraction at pH 74 were all treated as H-bonding descriptorsand were found to be important in the final model

Furthermore almost all published models make use ofmolecular polar andorH-bonding ability related descriptorssuch as PSA [23 67 68] high-charged PSA [22] number ofhydrogen donors and acceptors [23 69] and hydrogen bondaciditybasicity [23] And in these models PSA andor H-bond ability are all negatively correlated with log119861119861 whichis in agreement with Abraham et alrsquos study [51] in which thecoefficients of the H-bond acidity and H-bond basicity areboth negative

After review of many previous works Norinder andHaeberlein [6] proposed that if the sum of number ofnitrogen and oxygen atoms (N + O) in a molecule was fiveor less it had a high chance of entering the brain As we allknow nitrogen and oxygen atoms have great impact on PSAandH-bonding Norinder and Haeberlein [6] also concludedthat BBB penetration could be increased by lowering theoverall hydrogen bonding ability of a compound such asby encouraging intramolecular hydrogen bonding After ananalysis of the CNS activity of 125 marketed drugs van deWaterbeemd et al [70] suggested that the upper limit for PSAin a molecule that is desired to penetrate the brain shouldbe around 90 A2 while Kelder et al [71] analyzed the PSAdistribution of 776 orally administered CNS drugs that havereached at least phase II studies and suggested that the upperlimit should be 60ndash70 A2

Having in mind that molecules mainly cross the BBBby passive diffusion we think it may be because moleculeswith strong H-bonding ability have a greater tendency toformH-bonds with the polar environment (the blood) henceweakening their ability to cross the BBB by passive diffusion

333 Lipophilicity Lipophilicity is another property widelyrecognized as being important in BBB penetration and mostof the current models utilize features related to lipophilicity[22 64 67 72] Lipophilicity was thought to be positivelycorrelated with log119861119861 that is increase the lipophilicity of amolecule will increase the BBB penetration of the moleculeNorinder andHaeberlein [6] also proposed that if log119875minus(N+O) gt 0 then log119861119861 was positive Van de Waterbeemd et

al [70] suggested that the log119863 of the molecule should bein [1 3] for good BBB penetration These observations areconsistent with the fact that the lipid bilayer is lipophilic innature and lipophilic molecules could cross the BBB and getinto the brain more easily than hydrophobic molecules

34 Molecular Charge From the viewpoint of computationalchemistry the distribution of molecular charge is a veryimportant property that affects the molecule propertiesgreatly It is the uncharged form that can pass the BBBby passive diffusion Fischer et al [73] have shown thatacid molecules with p119870a lt 4 and basic molecules withp119870a gt 10 could not cross the BBB by passive diffusionUnder physiological conditions (pH = 74) acid moleculeswith p119870a lt 4 and basic molecules with p119870a gt 10 will beionized completely and carry net charges As mentioned inour previous analysis the carboxylic acid group may affectthe BBB penetration through molecular charge interactionsMahar Doan et al [74] compared physicochemical propertiesof 93 CNS (119899 = 48) and non-CNS (119899 = 45) drugs and showedthat 0 of 48 CNS drugs have a negative charge and CNSdrugs tend to have less positive charge These are reasonablyconsistent with the study of Abraham et al [51] in which thecoefficient of the carboxylic acid indicator is negative

It has to be noted that all these molecular propertiesare not independent and they are related to each other Forexample the carboxylic acid group is related to both PSA andH-bonding ability for theOatom in the carboxylic acid groupis a polar atom and has a strong ability to form H-bondsthe carboxylic acid group which carries charge under mostconditions is also related to molecular charge The PSAH-bonding ability is also correlated tomolecular charge becausein many cases atoms could contribute to PSA or form H-bonds which could probably carry charges Lipophilicity isalso related to molecular charge

We can get a conclusion that the most important prop-erties for a molecule to penetrate BBB are carboxylic acidgroup PSAH-bonding ability lipophilicity and charge BBBpenetration is positively correlated with the lipophilicityand negatively correlated with the other three properties Acomparison of the physicochemical properties of 48 CNSdrugs and 45 non-CNS suggested that compared to non-CNSdrugs CNS drugs tend to be more lipophilic and more rigidand have fewer hydrogen-bond donors fewer charges andlower PSA (lt80 A2) [74] which is in reasonable consistencywith our finding except that the molecular flexibility is notimportant in our model

There are some other properties utilized in some existingmodels such as molecular weight molecular shape andmolecular flexibility It is suggested by van de Waterbeemdet al [70] that molecular weight should be less than 450 forgood BBB penetration while Hou and Xu [22] suggested thatthe influence of molecular bulkiness would be obvious whenthe size of themolecule was larger than a threshold and foundthat molecular weight made a negative contribution to theBBB penetration when the molecular weight is greater 360This is not widely observed in other studies In Zhao et alrsquosstudy [23] molecular weight was found to be not importantcompared to hydrogen bond properties

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

2 BioMed Research International

and in silicomethods have been introduced As there is no onein vitro model which can mimic all properties of the in vivoBBB developing more reliable models remains challenging[4] So far a great number of in silico BBB models have beendeveloped and thoroughly reviewed [2 3 5ndash9] Because of thehigh complex nature of the BBB most computational modelsonly account for passive diffusion

Initial studies were focusing on making correlationbetween BBB permeability of small set of compounds andsimple descriptors and then revealed ldquorules of thumbrdquoTheserough models reflect some important relationships betweenBBB penetration and properties of compounds but havea problem of oversimplification [10 11] As the accumula-tion of new data various more sophisticated models werereported to predict BBB permeability Classification models[12] which were used widely explored for distinguishingbetween the molecules capable of being across the BBB andthose restricted to periphery These models often developedby using the same dataset of about 1500 drugs compiled byAdenot and Lahana [13] which is the largest single homoge-neous up-to-date source of qualitative data published Someothers [12 14] distinguishmolecules based on a certain log119861119861threshold However themain problem is the threshold whichis subjectively determined and not unified Most quantitativemodels were developed by buildingQSARmodels [10 15ndash18]Since different datasets and validation methods were usedit is difficult to compare the performance of these models[19] Recently Carpenter et al [20] developed a new modelpredicting the BBB penetration using molecular dynamicsimulations and received good results providing new threadof BBB permeability prediction Here we focused on log119861119861models of BBB penetration by passive diffusion

Various data mining methods have been employed inBBB penetration models such as multiple linear regression[21 22] partial least squares (PLS) regression [13 23]recursive partitioning [23 24] neural network [25ndash27] andsupport vector machine (SVM) [28ndash30] SVM which wasoriginally developed by Vapnik and coworkers [31] has beenextensively used and consistently achieves similar or superiorperformance compared to other machine learning methods[32] Its main idea is to map data points to a high dimensionspace with a kernel function and then these data points canbe separated by a hyper plane

For a successful SVM model kernel parameters of SVMand feature subset selection are the two most importantfactors affecting the prediction accuracy Various strate-gies have been adopted for the two problems Grid-basedalgorithm is one of the most straightforward strategies forparameter optimization which discretizes the parametersand then systematically searches every grid point to finda best combination of the parameters [33] However itsuse is limited due to the computational complexity andtime-consumption Gradient-basedmethods [34 35] are alsowidely used which require the kernel function and thescoring function differentiable to assess the performance ofthe parameters Evolutionarymethod [36] has also been usedand achieved promising results As for the feature selectiongenetic algorithms- (GA-) based [37ndash41] 119865-score basedfeature recursive elimination [42] and many other methods

[43ndash47] have been employedMost of thesemethods focus onfeature selection or parameters optimization separately [45]However the choice of feature subset influences the appropri-ate kernel parameters and vice versa [48] Hence the properway seems to address the two problems simultaneously GA[41] immune clonal algorithm (ICA) [49] and Bayesianapproach [50] have been recently used for simultaneouslyfeature selection and parameters optimization for SVM ongeneral classification problems In our study GA was usedto do parameter optimization and feature subset selectionsimultaneously and an SVM regressionmodel was developedfor the blood-brain barrier penetration prediction

2 Methods

The workflow used in this study for BBB penetration predic-tion is illustrated in Figure 1

21 Dataset and Molecular Descriptors The log119861119861 datasetused in this study was compiled by Abraham et al [51]which was a combination of both in vivo and in vitro dataincluding 302 substances (328 data points) Abraham et alapplied linear free energy relationship (LFER) to the datasetand obtained good correlation between log119861119861 values andLFERdescriptors plus two indicator variables [51] CODESSA[52] could not calculate descriptors for the first 5 gases ([Ar][Kr] [Ne] [Rn] and [Xe]) of the original dataset and theywere excluded from the dataset The final dataset contained297 compounds (323 data points) The indicator variablesof 119868V and AbsCarboxy used in Abrahamrsquos study [51] wereretained in this study 119868V was defined as 119868V = 1 for the invitro data and 119868V = 0 for the in vivo data AbsCarboxy was anindicator for carboxylic acid (AbsCarboxy = 1 for carboxylicacid otherwise AbsCarboxy = 0)

The initial structures in SMILES format were importedto Marvin [53] and exported in MDL MOL format AM1method in AMPAC [54] was used for optimization plusfrequencies and thermodynamic properties calculation Thegenerated output files were used by CODESSA to calculate alarge number of constitutional topological geometrical elec-trostatic quantum-chemical and thermodynamic descrip-torsMarvin was also used to calculate some physicochemicalproperties of the compound including log119875 log119863 polarsurface area (PSA) polarizability and refractivity All thesedescriptors and properties were used as candidate features inlater modeling

Features with missing values or having no change acrossthe data set were removed If the correlation coefficient oftwo features is higher than a specified cutoff value (0999999used here) then one of them is randomly chosen andremovedThe cutoff value used here is very high because veryhigh variable correlation does not mean absence of variablecomplementarity [55] A total number of 326 descriptorswereleft for further analysis However many highly correlatedfeatures have very similar physicochemical meanings In ourfinal analysis similar features were put together by theirphysicochemical meaning which we hope could unveil someunderlying molecular properties that determine the BBBpenetration

BioMed Research International 3

328 data points

Calculate descriptors

Remove bad values(323 data points and 326 features)

Data scale

Training settest set

GASVM

Abraham et al (2006)

Remove features with missing values

Scaled to [01]

Kennard-Stone method

Feature selection and parameter optimization simultaneously

AbsCarboxy log P log D PSA H-Bond number of

Remove features having no change over the dataset (SD = 0)Remove features having very high correlation ((1 minus r) le 1e minus 6)

atomsbondsrings charge and HOMOLUMO

Figure 1 Workflow of GASVMmodel for BBB penetration prediction

The dataset was then split into training set and testset using the Kennard-Stone method [56] which selects asubset of representative data points uniformly distributed inthe sample space [57] At start the Kennard-Stone methodchooses the data point that is the closest to the center of thedataset measured by Euclidean distance After that from allremaining data points the data point that is the furthest fromthose already selected is added to the training setThis processcontinues until the size of the training set reaches specifiedsize 260 data pointswere selected as training set and the other63 were used as test set

22 SVM Regression Details about SVM regression can befound in literatures [58ndash60] As in other multivariate statis-tical models the performance of SVM regression dependson the combination of several parameters In general 119862 is aregularization parameter that controls the tradeoff betweentraining error and model complexity If 119862 is too large themodel will have a high penalty for nonseparable points andmay store too many support vectors and get overfitting If itis too small the model may have underfitting Parameter 120576controls the width of the 120576-insensitive zone used to fit thetraining data The value of 120576 can affect the number of thesupport vectors used to construct the regression functionThe bigger 120576 is the fewer support vectors are selected Onthe other hand bigger 120576-values result in more flat estimatesHence both 119862 and 120576-values affect model complexity (butin a different way) The kernel type is another importantparameter In SVM regression radial basis function (RBF)(1) was the most commonly used kernel function for itsbetter generalization ability less number of parameters andless numerical difficulties [33] and was used in this studyParameter 120574 in RBF controls the amplitude of the RBF kerneland therefore controls the generalization ability of SVMregressionThe LIBSVM package (version 281) [61] was usedin this study for SVM regression calculation taking the form

119870(119909

119894 119909

119895) = exp (minus120574 100381710038171003817

1003817

1003817

119909

119894minus119909

119895

1003817

1003817

1003817

1003817

1003817

2) 120574 gt 0 (1)

where 119909119894and 119909

119895are training vectors (119894 = 119895 119909

119894= 119909

119895) and 120574 is

kernel parameter

23 Genetic Algorithms Genetic algorithms (GA) [41] arestochastic optimization and search method that mimicsbiological evolution as a problem-solving strategy They arevery flexible and attractive for optimization problems

Given a specific problem to solve the input to the GA isa set of potential solutions to that problem encoded in somefashion and a fitness function that allows each candidate tobe quantitatively evaluated (Figure 2) Selection mating andmutation justmimic the natural process For each generationindividuals are selected for reproduction according to theirfitness values Favorable individuals have a better chance tobe selected for reproduction and the offspring have chanceto mutate to keep diversity while the unfavorable individualsare less likely to survive After each generation whether theevolution is converged or the termination criteria are met ischecked if yes job is done if not the evolution goes intonext generation After many generations good individualswill dominate the population and we will get solutions thatare good enough for our problem

First in order to solve a problemwithGA each individualin the population should be represented by a chromosomeIn our study since the parameter optimization and featuresubset selection should be addressed simultaneously thechromosome is a combination of parameter genes and featuregene (Figure 3) where 119891

119899is an integer in the range of [1119873]

and 119873 is the number of candidate features for model con-struction A chromosome represents an individual in geneticalgorithms and parameters contained in chromosome couldbe used for SVM modeling Left part of the chromosome isthe parameter genes of which 119862 120574 and 120576 all are float genes

4 BioMed Research International

Define encoding strategy fitness function and GA parameters

Generate initial population (SVM parameters and selected features)

Fitness evaluation (MSE from 10-fold CV SVM)

Convergence

feature subset

Select mating and mutation

Yes

No

Optimized (C 120574 120576) and

Figure 2 Workflow of genetic algorithms

SVM parameters Selected features

Chromosome

C (float) 120574 (float) 120576 (float)

C (float) 120574 (float) 120576 (float) middot middot middot

middot middot middotf1(int) f2(int) f3(int) f4(int) fn(int)

f1(int) f2(int) f3(int) f4(int) fn(int)

Figure 3 Encoding of the chromosome

The feature gene is an array of integers and each integerrepresents a feature

Fitness function can be seen as a ruler which was usedto quantitatively evaluate and compare each candidate Inour study the mean squared error (MSE) of 10-fold crossvalidation (CV) for SVM was used as fitness function andsmaller fitness value indicated better individual Given atraining set containing 119899 compounds (1199091 1199101) (119909119899 119910119899) 119909119894is descriptor vector of compound 119894 and 119909

119894isin 119877

119899 119910119894is the

log119861119861 value and 119910119894isin minus1 +1 The objective function can

be calculated by

119891 (119909) =

119899

sum

119894=1(120572

119894minus120572

lowast

119894)119870 (119909

119894 119909) + 119887 (2)

120572

119894and 120572lowast

119894are Lagronia factors 119870(119909

119894 119909) is kernel of

radial basis function The MSE of 10-fold CV for SVM wascalculated by

MSE = 1119899

119899

sum

119894=1(

119883

119894minus119883

119894)

2 (3)

where 119899 is the number of all data points 119883119894is the predicted

value and119883119894is the experiment value

Tournament selection was used as the selection strategyin GA which selected the best 1 from 3 randomly chosencandidates The advantage of tournament selection overroulette wheel selection is that tournament selection does notneed to sort the whole population by fitness value

Since there are different types of genes in a chromosomedifferent mating strategies were used for different types ofgenes (Figure 4)

119881new = 1205731199011 + (1minus120573) 1199012 (4)

where 120573 uniformly distributed random number on theinterval [minus025 125] 119901

119899is the value of parent gene and119881new

is the value of child geneFor float genes the new value is a linear combination of

the parents (4) For feature gene uniform crossover is usedeach element of the child gene is selected randomly from thecorresponding items of parents

BioMed Research International 5

CParent A

CParent B

C

C

Child A

Child B

Uniform crossover

++ + +

++ + +

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

Vnew = 120573p1 + (1 minus 120573)p2

Figure 4 Mating strategy of GA

Table 1 Performance comparison of models with different number of features

Number of features Training (CV = 10) Prediction1199032 Parameters of SVMMSE 119903

2 Test set Training set 119862 120574 120576

4 01197 0674 0722 0740 388833 06081 014915 01042 0715 0770 0805 163419 07973 027436 00945 0744 0840 0829 133573 07158 015137 00959 074 0821 0843 343067 05218 015958 00883 0761 0834 0883 609596 05871 023579 00815 0777 0847 0864 37770 08764 0166310 00823 0776 0858 0903 152236 06247 0143411 00714 0804 0861 0891 56937 06531 0157312 00780 0787 0864 0905 72787 07428 0151513 00817 0778 0862 0922 41957 07791 0157414 00812 0778 0882 0917 148391 05002 0205415 00734 0799 0870 0919 49915 05231 01077

Again different mutation strategies were used for differ-ent types of genes For float genes the values were randomlymutated upward or downward The new value was given by

119881new =

119881 minus 120573 (119881 minus 119881min) if random () lt 05

119881 + 120573 (119881max minus 119881) if random () ge 05(5)

where 120573 was a random number distributed in [0 1] 119881 and119881new are values before and after mutation and 119881min and 119881maxare the minimum and maximum values allowed for a gene

For feature gene several points were first randomlychosen for mutation and then a random number in [1119873](119873 is the total number of features) was chosen as new featurewhile avoiding duplicate features The GA was terminatedwhen the evolution reached 1000 generations In our pilotstudy (data not shown) 1000 generations were enough for theGA to convergeThe other parameters for GAwere as followspopulation size 100 cross rate 08 mutation rate 01 elite size2 and number of new individuals in each generation 8

3 Results and Discussion

31 GASVM Performance GA was run with different num-ber of features from 4 to 15 For each number of featuresGA was run 50 times and the best model was chosen forfurther analysis From Table 1 and Figure 5 the overall trendof the GA showed the following (1) the accuracy of themodelincreased with the number of the features (2) the accuracyof the model on training set was better than the accuracy onthe test set which was then better than the accuracy of crossvalidation

As the feature number increases the complexity alsoincreases which will often increase the probability of overfit-ting A complex model is also difficult to interpret and applyin practical use so generally speaking we need to find abalance between the accuracy and complexity of themodel Itis observed that (Table 1 Figure 5(a)) the prediction accuracy(1199032 = 0744) of cross validation (119899 = 10) of the 6-featuremodel was similar to that of the Abrahamrsquos model (1199032 =075) which used all 328 data points (Table 2) As the numberof features increased from 6 to 15 the prediction accuracy

6 BioMed Research International

4 6 8 10 12 14 1604

05

06

07

08

09

10

Number of features

CVr2

CV r2

Test r2

Train r2

(a)

0 200 400 600 800 1000

01

02

03

04

05

06

07

08

MSE

(10-

fold

CV

)

Generation

MSE (10-fold CV)R2 (test set)

(b)

Figure 5 (a) Performance comparison of models with different number of features (b) Evolution of the best 6-feature model

Table 2 Comparison of most relevant QSAR studies on BBB permeability

Descriptors 119873train 119873test Methods 119903train2 Predictive accuracy

on test set Reference

Δlop119875 log119875and log119875cyc 20 mdash LinearRegression 069 mdash Young et al [77]

Excess molar refractiondipolaritypolarisability H-bond acidityand basicitySolute McGowan volume

148 30 LFER 075 119903test2= 073 Platts et al [66]

Δ119866

11988255 mdash Linear

Regression 082 mdash Lombardo et al [78]

PSA the octanolwater partitioncoefficient and the conformationalflexibility

56 7 MLR 085 119903test2= 080 Iyer et al [79]

CODESSADRAGON (482) 200 110 PLSSVM

083097

119903test2= 081

119903test2= 096

Golmohammadi et al [62]

Molecular (CODESSA-PRO) descriptors(5) 113 19 MLR 078 119903test

2= 077 Katritzky et al [15]

Molecular fragment (ISIDA) descriptors 112 19 MLR 090 119903test2= 083 Katritzky et al [15]

PSA log119875 the number of H-bondacceptors E-state and VSA 144 10

CombinatorialQSAR (KNN

SVM)091 119903test

2= 08 Zhang et al [17]

Abraham solute descriptors andindicators 328 mdash LFER 075 mdash Abraham et al [51]

Abraham solute descriptors andindicators 164 164 LFER 071 119904 = 025 MAE = 020 Abraham et al [51]

CODESSAMarvinindicator (6) 260 63 GA based SVM 083 119903test2 = 084 RMSE =

023

This research GASVMfinal model

119862 = 133573 120574 = 0715761 120576= 0151289

CODESSAMarvinindicator (236) 260 63 GA based SVM 097 119903test2 = 055 RMSE =

031

This research GridSVM119862 = 80 120574 = 0015625 120576 =

00625

CODESSAMarvinindicator (6) 260 63 GA based SVM 086 119903test2 = 058 RMSE =

029This research GridSVM119862 = 80 120574 = 10 120576 = 0125

BioMed Research International 7

minus2 minus1 0 1 2minus2

minus1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(a)

minus2 minus1minus2

minus1

0 1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(b)

Figure 6 Prediction accuracy of the final model on training set (a) and test set (b)

on training set increased from 0829 to 0919 while theprediction accuracy on test set only slightly increased from0840 to 0870 accordingly Take all these into considerationthe 6-feature model (Table 1 Figure 5(a)) was chosen as ourfinal model (Figure 6) of which the prediction accuracy onboth test set and training set was similar (119903train

2minus119903test2= 011)

and high enough (gt082)Figure 5(b) showed the evolution of the prediction per-

formance of the model (MSE and 119903test2) In the first 100

generations the MSE decreased very fast followed with aplatform stage from about 100 to 650 generations Anotherdecrease occurred at about 650 generations and then theevolution became stable at about 850 generationsThemodelsdid not improve much in the last 150 generations which mayimply a convergence

A tabular presentation of relevant studies regarding theprediction of the blood-brain distribution is shown inTable 2These models were constructed by using different statisticallearning methods yielding different prediction capabilitywith 119877test

2 ranging from 05 to over 09 Generally regressionby SVM appears to be more robust than traditional linearapproaches such as PLS andMLR with respect to the nonlin-ear effects induced bymultiple potentially cooperative factorsgoverning the BBB permeability For example the SVMmodel by Golmohammadi et al [62] yielded the highest119877test

2

on a test set containing 110 molecules However it shouldbe noted that direct comparison with results from previousstudies is usually inappropriate because of differences in theirdatasets In this study a combination of both in vivo andin vitro data compiled by Abraham et al [51] was used fordeveloping BBB prediction model which is of high dataquality and covers large chemical diversity space In additionto the data source kernel parameter optimization and featureselection are two crucial factors influencing the predictionaccuracy of SVM models To reduce the computational costmost of the existing models addressed the feature selectionand parameter optimization procedures separately In thisstudy we used a GA scheme to perform the kernel parameteroptimization and feature selection simultaneously which ismore efficient at searching the optimal feature subset space

Abrahamrsquos model [51] is the best model that is currentlypublicly available A comparison of our models with Abra-hamrsquos models was shown in Table 2 In Table 2 the last 3rows are models in our study The same dataset was used inAbrahamrsquos research and this study but the data set was splitinto different training set and test set (our model traintest =26063 Abrahamrsquosmodel trainingtest = 164164) 7 variableswere used in Abrahamrsquos model compared to 6 in our finalmodel The 1199032 values for training set in Abrahamrsquos 164164model and 3280 model were 071 and 075 respectivelycompared with 083 for our model It has to be noted that thesize of our training set (260) is bigger than Abrahamrsquos (164)

Our model was also compared with grid method imple-mented with Python toolkit (gridpy) shipped with libsvm[61] for parameter optimization First since grid methodcannot be used to select feature subset all 326 features wereused to construct a BBB prediction model The predictionaccuracy of the training set was very high (119903train

2= 097)

but that of the test set was disappointing (119903test2= 055) Then

we used the same feature set as our final model (6 features)The result was slightly better but still too bad for test setprediction (119903train

2= 086 119903test

2= 058)

So compared with the grid-based method our GA-basedmethod could get better accuracy with fewer features whichsuggested that GA could get much better combination ofparameters and feature subset This was also observed inotherrsquos study [48]

32 Feature Analysis An examination of the descriptors usedin the model could provide an insight into the molecularproperties that are most relevant to BBB penetration Table 3showed the features used in the final 6-feature model andtheir meanings In order to explore the relative importanceand the underlying molecular properties of the descriptorsthe most frequently used features in all 50 6-feature modelswere analyzed Table 4 showed the top 10 most frequentlyused features Interestingly AbsCarboxy an indicator of theexistence of carboxylic acidwas themost significant propertySome features with similar meaning were also found to occurin the models such as PSA related features (M PSA 74

8 BioMed Research International

Table 3 Features used in the final model

Name MeaningM log119875 log119875 (Marvin)HA dependent HDSA-2 [Zefirovrsquos PC] H-bond donor surface area related (CODESSA)M PSA 74 PSA at pH 74 (Marvin)AbsCarboxy Carboxylic acid indicator (Abraham)HA dependent HDCA-2SQRT(TMSA) [Zefirovrsquos PC] H-bond donor charged area related (CODESSA)Average Complementary Information content (order 0) Topology descriptor (CODESSA)

Table 4 The most frequently used features for all 6-feature modelsa

Number Feature name Occurrence(50 models) Meaning

11 AbsCarboxy 36 Indicator for carboxylic aciddagger

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

14 H-acceptor surface areatotal molecularsurface area

101 Topographic electronic index (all bonds) Zefirovrsquos PC 12 Topological electronic index for allbonded pairs of atomsbDagger

8 M PSA 74 11 PSA at pH 74csect

267 ESP-HASA H-acceptors surface area Quantum-Chemical PC 10 H-acceptor surface area

5 delta log119863 9 log119863 (pH 65) minus log119863 (pH 74)deand

7 M PSA 70 9 PSA at pH 70sect

138 HA dependent HDCA-2 [Zefirovrsquos PC] 9 H-donors charged surface area

6 M PSA 65 8 PSA at pH 65sect

1 M log119875 7 log119875andaRows with the same symbol could be categorized into the same groupbTopological electronic index is a feature to characterize the distribution of molecular charge 119879 = sum119873119861

(119894lt119895)(|119902119894 minus 119902119895|119903

2119894119895) where 119902119894 is net charge on 119894th atom and

119903119894119895 is the distance between two bonded atomsc74 is the pH in bloodd65 is the pH in intestinee119863 is the ratio of the sum of the concentrations of all species of a compound in octanol to the sum of the concentrations of all species of the compound inwater For neutral compounds log119863 is equal to log119875

M PSA 70 andM PSA 65) andH-bond related descriptors(number 267 268 and 138) So we decided to put similardescriptors into the same group (Figure 7) to find the under-lying properties affecting BBB penetration

According to the associated molecular properties ofthe features those frequently used features are catego-rized into 5 groups AbsCarboxy (indicator of carboxylicacid) H-bonding (H-bonding ability including H-bonddonoracceptor related features) PSA (molecular polar sur-face area related features) lipophilicity (including M log119875and delta log119863) and molecular charge (including chargeand topological electronic index related features)

The following is observed

(1) Interestingly AbsCarboxy was also the most signif-icant feature which occurred 36 times in total 50models This may indicate that the carboxylic acidgroup plays an important role in the BBB penetrationwhich is consistent with the study of Abraham et al[51]

3633

28

1612

0

5

10

15

20

25

30

35

40

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 7 Top features for all 6-feature models (50 in all)

(2) H-bonding (H-bond donoracceptor) related surfacearea polar surface area log119875 related features and

BioMed Research International 9

Table 5 Most frequently used features for all top models (number of features range from 4 to 15)lowast

Number Descriptor name Occurrence(120 models) Meaning

11 AbsCarboxy 84 Indicator for carboxylic aciddagger

5 delta log119863 35 log119863 (pH 65) minus log119863 (pH 74)and

7 M PSA 70 32 PSA at pH 70sect

1 M log119875 28 log119875and

138 HA dependent HDCA-2 [Zefirovrsquos PC] 27 H-donors charged surface area

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

27 H-acceptor surface areatotal molecularsurface area

6 M PSA 65 25 PSA at pH 65sect

167 PPSA-1 Partial positive surface area [Quantum-Chemical PC] 25 Partial positive surface areasect

101 Topographic electronic index (all bonds) Zefirovrsquos PC 23 Topological electronic index for allbonded pairs of atomsDagger

8 M PSA 74 21 PSA at pH 74sectlowastRows with the same symbol could be categorized into the same group

84

54

103 157

63

23

0

20

40

60

80

100

120

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 8 The most frequently used features for all top models

topological electronic index related features are alsosignificant

In order to further confirm the previous finding all top10 models from features = 4 to 15 were further analyzedand the result was almost the same (Table 5 Figure 8) Thetop 10 most frequently used feature sets were very similarCompared to the 6-feature models the only difference wasthat H-bond related feature number 267 is replaced with PSArelated feature number 167

Again the descriptors were analyzed by group Thecomposites of the groups were almost the same Given thatthere are 4 descriptors in PSA group AbsCarboxy was alsothe most significant property followed by PSA lipophilicityand H-bonding having similar occurences then followedby molecular charge with relatively low frequencies Thehigh consistency suggested that these groups of features hadsignficant impact on the BBB penetration ability

PSA andH-bonding descriptors are highly relevant prop-erties PSA is the molecular areas contributed by polar atoms

(nitrogen sulphur oxygen and phosphorus) andmost of thetime these polar atoms can be H-bond acceptor or donorIf PSA and H-bonding were merged into one group (119899 =157) they will become the most significant property groupof features

33 Properties Relevant to BBB Penetration

331 Carboxylic Acid Group It was proposed by Abraham etal [51] that carboxylic acid group played an important role inBBB penetration While it was commonly believed that themost important molecular properties related to BBB pene-tration were H-bonding ability lipophilicity and molecularcharge [63 64] However our study confirmed Abrahamrsquosconclusion and showed that the importance of carboxylicacid group in BBB penetration could be underestimated

In the models by Abraham et al [51] the indicatorvariable of carboxylic acid group has the largest negativecoefficient indicating its importance in BBB penetrationand is consistent with observations in our model that theindicator of carboxylic acid group is the most frequentlyused descriptor Zhao et al [23] tried to classify compoundsinto BBB positive or BBB negative groups using H-bondingrelated descriptors and the indicator of carboxylic acid groupwas also found to be important in their model Furthermoreour results are consistent with the fact that basic moleculeshave a better BBB penetration than the acid molecules [10]

The carboxylic acid groupmay affect the BBB penetrationthroughmolecular charge interactions since inmost cases thecarboxylic acid group will exist in the ionized form carryinga negative chargeThe carboxylic acid group could also affectthe BBB penetration by formingH-bondwith BBB and henceweaken the BBB penetration abilities of molecules

Abraham et al suggested that the presence of carboxylicacid group which acted to hinder BBB penetration wasnot only simply due to the intrinsic hydrogen bonding andpolarity properties of neutral acids [51] There were some

10 BioMed Research International

other ways in which the carboxylic acid groups could affectthe BBB penetration such as acidic drugs which couldbind to albumin [65] the ionization of the carboxylic acidgroups which could increase the excess molar refraction andhydrogen bonding basicity and finally the carboxylic acidgroups which may be removed from brain by some effluxmechanism [66]

332 Polar Surface Area and H-Bonding Ability As pointedout in our previous analysis PSA and H-bonding abilityare actually two highly correlated properties If these twogroups are merged they will be the most significant groupof properties even more significant than the carboxylic acidindicator (Figure 8) Norinder and Haeberlein [6] concludedthat hydrogen bonding term is a cornerstone in BBB penetra-tion prediction In Zhao et alrsquos study [23] PSA AbsCarboxynumber of H-bonding donors and positively charged formfraction at pH 74 were all treated as H-bonding descriptorsand were found to be important in the final model

Furthermore almost all published models make use ofmolecular polar andorH-bonding ability related descriptorssuch as PSA [23 67 68] high-charged PSA [22] number ofhydrogen donors and acceptors [23 69] and hydrogen bondaciditybasicity [23] And in these models PSA andor H-bond ability are all negatively correlated with log119861119861 whichis in agreement with Abraham et alrsquos study [51] in which thecoefficients of the H-bond acidity and H-bond basicity areboth negative

After review of many previous works Norinder andHaeberlein [6] proposed that if the sum of number ofnitrogen and oxygen atoms (N + O) in a molecule was fiveor less it had a high chance of entering the brain As we allknow nitrogen and oxygen atoms have great impact on PSAandH-bonding Norinder and Haeberlein [6] also concludedthat BBB penetration could be increased by lowering theoverall hydrogen bonding ability of a compound such asby encouraging intramolecular hydrogen bonding After ananalysis of the CNS activity of 125 marketed drugs van deWaterbeemd et al [70] suggested that the upper limit for PSAin a molecule that is desired to penetrate the brain shouldbe around 90 A2 while Kelder et al [71] analyzed the PSAdistribution of 776 orally administered CNS drugs that havereached at least phase II studies and suggested that the upperlimit should be 60ndash70 A2

Having in mind that molecules mainly cross the BBBby passive diffusion we think it may be because moleculeswith strong H-bonding ability have a greater tendency toformH-bonds with the polar environment (the blood) henceweakening their ability to cross the BBB by passive diffusion

333 Lipophilicity Lipophilicity is another property widelyrecognized as being important in BBB penetration and mostof the current models utilize features related to lipophilicity[22 64 67 72] Lipophilicity was thought to be positivelycorrelated with log119861119861 that is increase the lipophilicity of amolecule will increase the BBB penetration of the moleculeNorinder andHaeberlein [6] also proposed that if log119875minus(N+O) gt 0 then log119861119861 was positive Van de Waterbeemd et

al [70] suggested that the log119863 of the molecule should bein [1 3] for good BBB penetration These observations areconsistent with the fact that the lipid bilayer is lipophilic innature and lipophilic molecules could cross the BBB and getinto the brain more easily than hydrophobic molecules

34 Molecular Charge From the viewpoint of computationalchemistry the distribution of molecular charge is a veryimportant property that affects the molecule propertiesgreatly It is the uncharged form that can pass the BBBby passive diffusion Fischer et al [73] have shown thatacid molecules with p119870a lt 4 and basic molecules withp119870a gt 10 could not cross the BBB by passive diffusionUnder physiological conditions (pH = 74) acid moleculeswith p119870a lt 4 and basic molecules with p119870a gt 10 will beionized completely and carry net charges As mentioned inour previous analysis the carboxylic acid group may affectthe BBB penetration through molecular charge interactionsMahar Doan et al [74] compared physicochemical propertiesof 93 CNS (119899 = 48) and non-CNS (119899 = 45) drugs and showedthat 0 of 48 CNS drugs have a negative charge and CNSdrugs tend to have less positive charge These are reasonablyconsistent with the study of Abraham et al [51] in which thecoefficient of the carboxylic acid indicator is negative

It has to be noted that all these molecular propertiesare not independent and they are related to each other Forexample the carboxylic acid group is related to both PSA andH-bonding ability for theOatom in the carboxylic acid groupis a polar atom and has a strong ability to form H-bondsthe carboxylic acid group which carries charge under mostconditions is also related to molecular charge The PSAH-bonding ability is also correlated tomolecular charge becausein many cases atoms could contribute to PSA or form H-bonds which could probably carry charges Lipophilicity isalso related to molecular charge

We can get a conclusion that the most important prop-erties for a molecule to penetrate BBB are carboxylic acidgroup PSAH-bonding ability lipophilicity and charge BBBpenetration is positively correlated with the lipophilicityand negatively correlated with the other three properties Acomparison of the physicochemical properties of 48 CNSdrugs and 45 non-CNS suggested that compared to non-CNSdrugs CNS drugs tend to be more lipophilic and more rigidand have fewer hydrogen-bond donors fewer charges andlower PSA (lt80 A2) [74] which is in reasonable consistencywith our finding except that the molecular flexibility is notimportant in our model

There are some other properties utilized in some existingmodels such as molecular weight molecular shape andmolecular flexibility It is suggested by van de Waterbeemdet al [70] that molecular weight should be less than 450 forgood BBB penetration while Hou and Xu [22] suggested thatthe influence of molecular bulkiness would be obvious whenthe size of themolecule was larger than a threshold and foundthat molecular weight made a negative contribution to theBBB penetration when the molecular weight is greater 360This is not widely observed in other studies In Zhao et alrsquosstudy [23] molecular weight was found to be not importantcompared to hydrogen bond properties

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

BioMed Research International 3

328 data points

Calculate descriptors

Remove bad values(323 data points and 326 features)

Data scale

Training settest set

GASVM

Abraham et al (2006)

Remove features with missing values

Scaled to [01]

Kennard-Stone method

Feature selection and parameter optimization simultaneously

AbsCarboxy log P log D PSA H-Bond number of

Remove features having no change over the dataset (SD = 0)Remove features having very high correlation ((1 minus r) le 1e minus 6)

atomsbondsrings charge and HOMOLUMO

Figure 1 Workflow of GASVMmodel for BBB penetration prediction

The dataset was then split into training set and testset using the Kennard-Stone method [56] which selects asubset of representative data points uniformly distributed inthe sample space [57] At start the Kennard-Stone methodchooses the data point that is the closest to the center of thedataset measured by Euclidean distance After that from allremaining data points the data point that is the furthest fromthose already selected is added to the training setThis processcontinues until the size of the training set reaches specifiedsize 260 data pointswere selected as training set and the other63 were used as test set

22 SVM Regression Details about SVM regression can befound in literatures [58ndash60] As in other multivariate statis-tical models the performance of SVM regression dependson the combination of several parameters In general 119862 is aregularization parameter that controls the tradeoff betweentraining error and model complexity If 119862 is too large themodel will have a high penalty for nonseparable points andmay store too many support vectors and get overfitting If itis too small the model may have underfitting Parameter 120576controls the width of the 120576-insensitive zone used to fit thetraining data The value of 120576 can affect the number of thesupport vectors used to construct the regression functionThe bigger 120576 is the fewer support vectors are selected Onthe other hand bigger 120576-values result in more flat estimatesHence both 119862 and 120576-values affect model complexity (butin a different way) The kernel type is another importantparameter In SVM regression radial basis function (RBF)(1) was the most commonly used kernel function for itsbetter generalization ability less number of parameters andless numerical difficulties [33] and was used in this studyParameter 120574 in RBF controls the amplitude of the RBF kerneland therefore controls the generalization ability of SVMregressionThe LIBSVM package (version 281) [61] was usedin this study for SVM regression calculation taking the form

119870(119909

119894 119909

119895) = exp (minus120574 100381710038171003817

1003817

1003817

119909

119894minus119909

119895

1003817

1003817

1003817

1003817

1003817

2) 120574 gt 0 (1)

where 119909119894and 119909

119895are training vectors (119894 = 119895 119909

119894= 119909

119895) and 120574 is

kernel parameter

23 Genetic Algorithms Genetic algorithms (GA) [41] arestochastic optimization and search method that mimicsbiological evolution as a problem-solving strategy They arevery flexible and attractive for optimization problems

Given a specific problem to solve the input to the GA isa set of potential solutions to that problem encoded in somefashion and a fitness function that allows each candidate tobe quantitatively evaluated (Figure 2) Selection mating andmutation justmimic the natural process For each generationindividuals are selected for reproduction according to theirfitness values Favorable individuals have a better chance tobe selected for reproduction and the offspring have chanceto mutate to keep diversity while the unfavorable individualsare less likely to survive After each generation whether theevolution is converged or the termination criteria are met ischecked if yes job is done if not the evolution goes intonext generation After many generations good individualswill dominate the population and we will get solutions thatare good enough for our problem

First in order to solve a problemwithGA each individualin the population should be represented by a chromosomeIn our study since the parameter optimization and featuresubset selection should be addressed simultaneously thechromosome is a combination of parameter genes and featuregene (Figure 3) where 119891

119899is an integer in the range of [1119873]

and 119873 is the number of candidate features for model con-struction A chromosome represents an individual in geneticalgorithms and parameters contained in chromosome couldbe used for SVM modeling Left part of the chromosome isthe parameter genes of which 119862 120574 and 120576 all are float genes

4 BioMed Research International

Define encoding strategy fitness function and GA parameters

Generate initial population (SVM parameters and selected features)

Fitness evaluation (MSE from 10-fold CV SVM)

Convergence

feature subset

Select mating and mutation

Yes

No

Optimized (C 120574 120576) and

Figure 2 Workflow of genetic algorithms

SVM parameters Selected features

Chromosome

C (float) 120574 (float) 120576 (float)

C (float) 120574 (float) 120576 (float) middot middot middot

middot middot middotf1(int) f2(int) f3(int) f4(int) fn(int)

f1(int) f2(int) f3(int) f4(int) fn(int)

Figure 3 Encoding of the chromosome

The feature gene is an array of integers and each integerrepresents a feature

Fitness function can be seen as a ruler which was usedto quantitatively evaluate and compare each candidate Inour study the mean squared error (MSE) of 10-fold crossvalidation (CV) for SVM was used as fitness function andsmaller fitness value indicated better individual Given atraining set containing 119899 compounds (1199091 1199101) (119909119899 119910119899) 119909119894is descriptor vector of compound 119894 and 119909

119894isin 119877

119899 119910119894is the

log119861119861 value and 119910119894isin minus1 +1 The objective function can

be calculated by

119891 (119909) =

119899

sum

119894=1(120572

119894minus120572

lowast

119894)119870 (119909

119894 119909) + 119887 (2)

120572

119894and 120572lowast

119894are Lagronia factors 119870(119909

119894 119909) is kernel of

radial basis function The MSE of 10-fold CV for SVM wascalculated by

MSE = 1119899

119899

sum

119894=1(

119883

119894minus119883

119894)

2 (3)

where 119899 is the number of all data points 119883119894is the predicted

value and119883119894is the experiment value

Tournament selection was used as the selection strategyin GA which selected the best 1 from 3 randomly chosencandidates The advantage of tournament selection overroulette wheel selection is that tournament selection does notneed to sort the whole population by fitness value

Since there are different types of genes in a chromosomedifferent mating strategies were used for different types ofgenes (Figure 4)

119881new = 1205731199011 + (1minus120573) 1199012 (4)

where 120573 uniformly distributed random number on theinterval [minus025 125] 119901

119899is the value of parent gene and119881new

is the value of child geneFor float genes the new value is a linear combination of

the parents (4) For feature gene uniform crossover is usedeach element of the child gene is selected randomly from thecorresponding items of parents

BioMed Research International 5

CParent A

CParent B

C

C

Child A

Child B

Uniform crossover

++ + +

++ + +

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

Vnew = 120573p1 + (1 minus 120573)p2

Figure 4 Mating strategy of GA

Table 1 Performance comparison of models with different number of features

Number of features Training (CV = 10) Prediction1199032 Parameters of SVMMSE 119903

2 Test set Training set 119862 120574 120576

4 01197 0674 0722 0740 388833 06081 014915 01042 0715 0770 0805 163419 07973 027436 00945 0744 0840 0829 133573 07158 015137 00959 074 0821 0843 343067 05218 015958 00883 0761 0834 0883 609596 05871 023579 00815 0777 0847 0864 37770 08764 0166310 00823 0776 0858 0903 152236 06247 0143411 00714 0804 0861 0891 56937 06531 0157312 00780 0787 0864 0905 72787 07428 0151513 00817 0778 0862 0922 41957 07791 0157414 00812 0778 0882 0917 148391 05002 0205415 00734 0799 0870 0919 49915 05231 01077

Again different mutation strategies were used for differ-ent types of genes For float genes the values were randomlymutated upward or downward The new value was given by

119881new =

119881 minus 120573 (119881 minus 119881min) if random () lt 05

119881 + 120573 (119881max minus 119881) if random () ge 05(5)

where 120573 was a random number distributed in [0 1] 119881 and119881new are values before and after mutation and 119881min and 119881maxare the minimum and maximum values allowed for a gene

For feature gene several points were first randomlychosen for mutation and then a random number in [1119873](119873 is the total number of features) was chosen as new featurewhile avoiding duplicate features The GA was terminatedwhen the evolution reached 1000 generations In our pilotstudy (data not shown) 1000 generations were enough for theGA to convergeThe other parameters for GAwere as followspopulation size 100 cross rate 08 mutation rate 01 elite size2 and number of new individuals in each generation 8

3 Results and Discussion

31 GASVM Performance GA was run with different num-ber of features from 4 to 15 For each number of featuresGA was run 50 times and the best model was chosen forfurther analysis From Table 1 and Figure 5 the overall trendof the GA showed the following (1) the accuracy of themodelincreased with the number of the features (2) the accuracyof the model on training set was better than the accuracy onthe test set which was then better than the accuracy of crossvalidation

As the feature number increases the complexity alsoincreases which will often increase the probability of overfit-ting A complex model is also difficult to interpret and applyin practical use so generally speaking we need to find abalance between the accuracy and complexity of themodel Itis observed that (Table 1 Figure 5(a)) the prediction accuracy(1199032 = 0744) of cross validation (119899 = 10) of the 6-featuremodel was similar to that of the Abrahamrsquos model (1199032 =075) which used all 328 data points (Table 2) As the numberof features increased from 6 to 15 the prediction accuracy

6 BioMed Research International

4 6 8 10 12 14 1604

05

06

07

08

09

10

Number of features

CVr2

CV r2

Test r2

Train r2

(a)

0 200 400 600 800 1000

01

02

03

04

05

06

07

08

MSE

(10-

fold

CV

)

Generation

MSE (10-fold CV)R2 (test set)

(b)

Figure 5 (a) Performance comparison of models with different number of features (b) Evolution of the best 6-feature model

Table 2 Comparison of most relevant QSAR studies on BBB permeability

Descriptors 119873train 119873test Methods 119903train2 Predictive accuracy

on test set Reference

Δlop119875 log119875and log119875cyc 20 mdash LinearRegression 069 mdash Young et al [77]

Excess molar refractiondipolaritypolarisability H-bond acidityand basicitySolute McGowan volume

148 30 LFER 075 119903test2= 073 Platts et al [66]

Δ119866

11988255 mdash Linear

Regression 082 mdash Lombardo et al [78]

PSA the octanolwater partitioncoefficient and the conformationalflexibility

56 7 MLR 085 119903test2= 080 Iyer et al [79]

CODESSADRAGON (482) 200 110 PLSSVM

083097

119903test2= 081

119903test2= 096

Golmohammadi et al [62]

Molecular (CODESSA-PRO) descriptors(5) 113 19 MLR 078 119903test

2= 077 Katritzky et al [15]

Molecular fragment (ISIDA) descriptors 112 19 MLR 090 119903test2= 083 Katritzky et al [15]

PSA log119875 the number of H-bondacceptors E-state and VSA 144 10

CombinatorialQSAR (KNN

SVM)091 119903test

2= 08 Zhang et al [17]

Abraham solute descriptors andindicators 328 mdash LFER 075 mdash Abraham et al [51]

Abraham solute descriptors andindicators 164 164 LFER 071 119904 = 025 MAE = 020 Abraham et al [51]

CODESSAMarvinindicator (6) 260 63 GA based SVM 083 119903test2 = 084 RMSE =

023

This research GASVMfinal model

119862 = 133573 120574 = 0715761 120576= 0151289

CODESSAMarvinindicator (236) 260 63 GA based SVM 097 119903test2 = 055 RMSE =

031

This research GridSVM119862 = 80 120574 = 0015625 120576 =

00625

CODESSAMarvinindicator (6) 260 63 GA based SVM 086 119903test2 = 058 RMSE =

029This research GridSVM119862 = 80 120574 = 10 120576 = 0125

BioMed Research International 7

minus2 minus1 0 1 2minus2

minus1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(a)

minus2 minus1minus2

minus1

0 1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(b)

Figure 6 Prediction accuracy of the final model on training set (a) and test set (b)

on training set increased from 0829 to 0919 while theprediction accuracy on test set only slightly increased from0840 to 0870 accordingly Take all these into considerationthe 6-feature model (Table 1 Figure 5(a)) was chosen as ourfinal model (Figure 6) of which the prediction accuracy onboth test set and training set was similar (119903train

2minus119903test2= 011)

and high enough (gt082)Figure 5(b) showed the evolution of the prediction per-

formance of the model (MSE and 119903test2) In the first 100

generations the MSE decreased very fast followed with aplatform stage from about 100 to 650 generations Anotherdecrease occurred at about 650 generations and then theevolution became stable at about 850 generationsThemodelsdid not improve much in the last 150 generations which mayimply a convergence

A tabular presentation of relevant studies regarding theprediction of the blood-brain distribution is shown inTable 2These models were constructed by using different statisticallearning methods yielding different prediction capabilitywith 119877test

2 ranging from 05 to over 09 Generally regressionby SVM appears to be more robust than traditional linearapproaches such as PLS andMLR with respect to the nonlin-ear effects induced bymultiple potentially cooperative factorsgoverning the BBB permeability For example the SVMmodel by Golmohammadi et al [62] yielded the highest119877test

2

on a test set containing 110 molecules However it shouldbe noted that direct comparison with results from previousstudies is usually inappropriate because of differences in theirdatasets In this study a combination of both in vivo andin vitro data compiled by Abraham et al [51] was used fordeveloping BBB prediction model which is of high dataquality and covers large chemical diversity space In additionto the data source kernel parameter optimization and featureselection are two crucial factors influencing the predictionaccuracy of SVM models To reduce the computational costmost of the existing models addressed the feature selectionand parameter optimization procedures separately In thisstudy we used a GA scheme to perform the kernel parameteroptimization and feature selection simultaneously which ismore efficient at searching the optimal feature subset space

Abrahamrsquos model [51] is the best model that is currentlypublicly available A comparison of our models with Abra-hamrsquos models was shown in Table 2 In Table 2 the last 3rows are models in our study The same dataset was used inAbrahamrsquos research and this study but the data set was splitinto different training set and test set (our model traintest =26063 Abrahamrsquosmodel trainingtest = 164164) 7 variableswere used in Abrahamrsquos model compared to 6 in our finalmodel The 1199032 values for training set in Abrahamrsquos 164164model and 3280 model were 071 and 075 respectivelycompared with 083 for our model It has to be noted that thesize of our training set (260) is bigger than Abrahamrsquos (164)

Our model was also compared with grid method imple-mented with Python toolkit (gridpy) shipped with libsvm[61] for parameter optimization First since grid methodcannot be used to select feature subset all 326 features wereused to construct a BBB prediction model The predictionaccuracy of the training set was very high (119903train

2= 097)

but that of the test set was disappointing (119903test2= 055) Then

we used the same feature set as our final model (6 features)The result was slightly better but still too bad for test setprediction (119903train

2= 086 119903test

2= 058)

So compared with the grid-based method our GA-basedmethod could get better accuracy with fewer features whichsuggested that GA could get much better combination ofparameters and feature subset This was also observed inotherrsquos study [48]

32 Feature Analysis An examination of the descriptors usedin the model could provide an insight into the molecularproperties that are most relevant to BBB penetration Table 3showed the features used in the final 6-feature model andtheir meanings In order to explore the relative importanceand the underlying molecular properties of the descriptorsthe most frequently used features in all 50 6-feature modelswere analyzed Table 4 showed the top 10 most frequentlyused features Interestingly AbsCarboxy an indicator of theexistence of carboxylic acidwas themost significant propertySome features with similar meaning were also found to occurin the models such as PSA related features (M PSA 74

8 BioMed Research International

Table 3 Features used in the final model

Name MeaningM log119875 log119875 (Marvin)HA dependent HDSA-2 [Zefirovrsquos PC] H-bond donor surface area related (CODESSA)M PSA 74 PSA at pH 74 (Marvin)AbsCarboxy Carboxylic acid indicator (Abraham)HA dependent HDCA-2SQRT(TMSA) [Zefirovrsquos PC] H-bond donor charged area related (CODESSA)Average Complementary Information content (order 0) Topology descriptor (CODESSA)

Table 4 The most frequently used features for all 6-feature modelsa

Number Feature name Occurrence(50 models) Meaning

11 AbsCarboxy 36 Indicator for carboxylic aciddagger

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

14 H-acceptor surface areatotal molecularsurface area

101 Topographic electronic index (all bonds) Zefirovrsquos PC 12 Topological electronic index for allbonded pairs of atomsbDagger

8 M PSA 74 11 PSA at pH 74csect

267 ESP-HASA H-acceptors surface area Quantum-Chemical PC 10 H-acceptor surface area

5 delta log119863 9 log119863 (pH 65) minus log119863 (pH 74)deand

7 M PSA 70 9 PSA at pH 70sect

138 HA dependent HDCA-2 [Zefirovrsquos PC] 9 H-donors charged surface area

6 M PSA 65 8 PSA at pH 65sect

1 M log119875 7 log119875andaRows with the same symbol could be categorized into the same groupbTopological electronic index is a feature to characterize the distribution of molecular charge 119879 = sum119873119861

(119894lt119895)(|119902119894 minus 119902119895|119903

2119894119895) where 119902119894 is net charge on 119894th atom and

119903119894119895 is the distance between two bonded atomsc74 is the pH in bloodd65 is the pH in intestinee119863 is the ratio of the sum of the concentrations of all species of a compound in octanol to the sum of the concentrations of all species of the compound inwater For neutral compounds log119863 is equal to log119875

M PSA 70 andM PSA 65) andH-bond related descriptors(number 267 268 and 138) So we decided to put similardescriptors into the same group (Figure 7) to find the under-lying properties affecting BBB penetration

According to the associated molecular properties ofthe features those frequently used features are catego-rized into 5 groups AbsCarboxy (indicator of carboxylicacid) H-bonding (H-bonding ability including H-bonddonoracceptor related features) PSA (molecular polar sur-face area related features) lipophilicity (including M log119875and delta log119863) and molecular charge (including chargeand topological electronic index related features)

The following is observed

(1) Interestingly AbsCarboxy was also the most signif-icant feature which occurred 36 times in total 50models This may indicate that the carboxylic acidgroup plays an important role in the BBB penetrationwhich is consistent with the study of Abraham et al[51]

3633

28

1612

0

5

10

15

20

25

30

35

40

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 7 Top features for all 6-feature models (50 in all)

(2) H-bonding (H-bond donoracceptor) related surfacearea polar surface area log119875 related features and

BioMed Research International 9

Table 5 Most frequently used features for all top models (number of features range from 4 to 15)lowast

Number Descriptor name Occurrence(120 models) Meaning

11 AbsCarboxy 84 Indicator for carboxylic aciddagger

5 delta log119863 35 log119863 (pH 65) minus log119863 (pH 74)and

7 M PSA 70 32 PSA at pH 70sect

1 M log119875 28 log119875and

138 HA dependent HDCA-2 [Zefirovrsquos PC] 27 H-donors charged surface area

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

27 H-acceptor surface areatotal molecularsurface area

6 M PSA 65 25 PSA at pH 65sect

167 PPSA-1 Partial positive surface area [Quantum-Chemical PC] 25 Partial positive surface areasect

101 Topographic electronic index (all bonds) Zefirovrsquos PC 23 Topological electronic index for allbonded pairs of atomsDagger

8 M PSA 74 21 PSA at pH 74sectlowastRows with the same symbol could be categorized into the same group

84

54

103 157

63

23

0

20

40

60

80

100

120

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 8 The most frequently used features for all top models

topological electronic index related features are alsosignificant

In order to further confirm the previous finding all top10 models from features = 4 to 15 were further analyzedand the result was almost the same (Table 5 Figure 8) Thetop 10 most frequently used feature sets were very similarCompared to the 6-feature models the only difference wasthat H-bond related feature number 267 is replaced with PSArelated feature number 167

Again the descriptors were analyzed by group Thecomposites of the groups were almost the same Given thatthere are 4 descriptors in PSA group AbsCarboxy was alsothe most significant property followed by PSA lipophilicityand H-bonding having similar occurences then followedby molecular charge with relatively low frequencies Thehigh consistency suggested that these groups of features hadsignficant impact on the BBB penetration ability

PSA andH-bonding descriptors are highly relevant prop-erties PSA is the molecular areas contributed by polar atoms

(nitrogen sulphur oxygen and phosphorus) andmost of thetime these polar atoms can be H-bond acceptor or donorIf PSA and H-bonding were merged into one group (119899 =157) they will become the most significant property groupof features

33 Properties Relevant to BBB Penetration

331 Carboxylic Acid Group It was proposed by Abraham etal [51] that carboxylic acid group played an important role inBBB penetration While it was commonly believed that themost important molecular properties related to BBB pene-tration were H-bonding ability lipophilicity and molecularcharge [63 64] However our study confirmed Abrahamrsquosconclusion and showed that the importance of carboxylicacid group in BBB penetration could be underestimated

In the models by Abraham et al [51] the indicatorvariable of carboxylic acid group has the largest negativecoefficient indicating its importance in BBB penetrationand is consistent with observations in our model that theindicator of carboxylic acid group is the most frequentlyused descriptor Zhao et al [23] tried to classify compoundsinto BBB positive or BBB negative groups using H-bondingrelated descriptors and the indicator of carboxylic acid groupwas also found to be important in their model Furthermoreour results are consistent with the fact that basic moleculeshave a better BBB penetration than the acid molecules [10]

The carboxylic acid groupmay affect the BBB penetrationthroughmolecular charge interactions since inmost cases thecarboxylic acid group will exist in the ionized form carryinga negative chargeThe carboxylic acid group could also affectthe BBB penetration by formingH-bondwith BBB and henceweaken the BBB penetration abilities of molecules

Abraham et al suggested that the presence of carboxylicacid group which acted to hinder BBB penetration wasnot only simply due to the intrinsic hydrogen bonding andpolarity properties of neutral acids [51] There were some

10 BioMed Research International

other ways in which the carboxylic acid groups could affectthe BBB penetration such as acidic drugs which couldbind to albumin [65] the ionization of the carboxylic acidgroups which could increase the excess molar refraction andhydrogen bonding basicity and finally the carboxylic acidgroups which may be removed from brain by some effluxmechanism [66]

332 Polar Surface Area and H-Bonding Ability As pointedout in our previous analysis PSA and H-bonding abilityare actually two highly correlated properties If these twogroups are merged they will be the most significant groupof properties even more significant than the carboxylic acidindicator (Figure 8) Norinder and Haeberlein [6] concludedthat hydrogen bonding term is a cornerstone in BBB penetra-tion prediction In Zhao et alrsquos study [23] PSA AbsCarboxynumber of H-bonding donors and positively charged formfraction at pH 74 were all treated as H-bonding descriptorsand were found to be important in the final model

Furthermore almost all published models make use ofmolecular polar andorH-bonding ability related descriptorssuch as PSA [23 67 68] high-charged PSA [22] number ofhydrogen donors and acceptors [23 69] and hydrogen bondaciditybasicity [23] And in these models PSA andor H-bond ability are all negatively correlated with log119861119861 whichis in agreement with Abraham et alrsquos study [51] in which thecoefficients of the H-bond acidity and H-bond basicity areboth negative

After review of many previous works Norinder andHaeberlein [6] proposed that if the sum of number ofnitrogen and oxygen atoms (N + O) in a molecule was fiveor less it had a high chance of entering the brain As we allknow nitrogen and oxygen atoms have great impact on PSAandH-bonding Norinder and Haeberlein [6] also concludedthat BBB penetration could be increased by lowering theoverall hydrogen bonding ability of a compound such asby encouraging intramolecular hydrogen bonding After ananalysis of the CNS activity of 125 marketed drugs van deWaterbeemd et al [70] suggested that the upper limit for PSAin a molecule that is desired to penetrate the brain shouldbe around 90 A2 while Kelder et al [71] analyzed the PSAdistribution of 776 orally administered CNS drugs that havereached at least phase II studies and suggested that the upperlimit should be 60ndash70 A2

Having in mind that molecules mainly cross the BBBby passive diffusion we think it may be because moleculeswith strong H-bonding ability have a greater tendency toformH-bonds with the polar environment (the blood) henceweakening their ability to cross the BBB by passive diffusion

333 Lipophilicity Lipophilicity is another property widelyrecognized as being important in BBB penetration and mostof the current models utilize features related to lipophilicity[22 64 67 72] Lipophilicity was thought to be positivelycorrelated with log119861119861 that is increase the lipophilicity of amolecule will increase the BBB penetration of the moleculeNorinder andHaeberlein [6] also proposed that if log119875minus(N+O) gt 0 then log119861119861 was positive Van de Waterbeemd et

al [70] suggested that the log119863 of the molecule should bein [1 3] for good BBB penetration These observations areconsistent with the fact that the lipid bilayer is lipophilic innature and lipophilic molecules could cross the BBB and getinto the brain more easily than hydrophobic molecules

34 Molecular Charge From the viewpoint of computationalchemistry the distribution of molecular charge is a veryimportant property that affects the molecule propertiesgreatly It is the uncharged form that can pass the BBBby passive diffusion Fischer et al [73] have shown thatacid molecules with p119870a lt 4 and basic molecules withp119870a gt 10 could not cross the BBB by passive diffusionUnder physiological conditions (pH = 74) acid moleculeswith p119870a lt 4 and basic molecules with p119870a gt 10 will beionized completely and carry net charges As mentioned inour previous analysis the carboxylic acid group may affectthe BBB penetration through molecular charge interactionsMahar Doan et al [74] compared physicochemical propertiesof 93 CNS (119899 = 48) and non-CNS (119899 = 45) drugs and showedthat 0 of 48 CNS drugs have a negative charge and CNSdrugs tend to have less positive charge These are reasonablyconsistent with the study of Abraham et al [51] in which thecoefficient of the carboxylic acid indicator is negative

It has to be noted that all these molecular propertiesare not independent and they are related to each other Forexample the carboxylic acid group is related to both PSA andH-bonding ability for theOatom in the carboxylic acid groupis a polar atom and has a strong ability to form H-bondsthe carboxylic acid group which carries charge under mostconditions is also related to molecular charge The PSAH-bonding ability is also correlated tomolecular charge becausein many cases atoms could contribute to PSA or form H-bonds which could probably carry charges Lipophilicity isalso related to molecular charge

We can get a conclusion that the most important prop-erties for a molecule to penetrate BBB are carboxylic acidgroup PSAH-bonding ability lipophilicity and charge BBBpenetration is positively correlated with the lipophilicityand negatively correlated with the other three properties Acomparison of the physicochemical properties of 48 CNSdrugs and 45 non-CNS suggested that compared to non-CNSdrugs CNS drugs tend to be more lipophilic and more rigidand have fewer hydrogen-bond donors fewer charges andlower PSA (lt80 A2) [74] which is in reasonable consistencywith our finding except that the molecular flexibility is notimportant in our model

There are some other properties utilized in some existingmodels such as molecular weight molecular shape andmolecular flexibility It is suggested by van de Waterbeemdet al [70] that molecular weight should be less than 450 forgood BBB penetration while Hou and Xu [22] suggested thatthe influence of molecular bulkiness would be obvious whenthe size of themolecule was larger than a threshold and foundthat molecular weight made a negative contribution to theBBB penetration when the molecular weight is greater 360This is not widely observed in other studies In Zhao et alrsquosstudy [23] molecular weight was found to be not importantcompared to hydrogen bond properties

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

4 BioMed Research International

Define encoding strategy fitness function and GA parameters

Generate initial population (SVM parameters and selected features)

Fitness evaluation (MSE from 10-fold CV SVM)

Convergence

feature subset

Select mating and mutation

Yes

No

Optimized (C 120574 120576) and

Figure 2 Workflow of genetic algorithms

SVM parameters Selected features

Chromosome

C (float) 120574 (float) 120576 (float)

C (float) 120574 (float) 120576 (float) middot middot middot

middot middot middotf1(int) f2(int) f3(int) f4(int) fn(int)

f1(int) f2(int) f3(int) f4(int) fn(int)

Figure 3 Encoding of the chromosome

The feature gene is an array of integers and each integerrepresents a feature

Fitness function can be seen as a ruler which was usedto quantitatively evaluate and compare each candidate Inour study the mean squared error (MSE) of 10-fold crossvalidation (CV) for SVM was used as fitness function andsmaller fitness value indicated better individual Given atraining set containing 119899 compounds (1199091 1199101) (119909119899 119910119899) 119909119894is descriptor vector of compound 119894 and 119909

119894isin 119877

119899 119910119894is the

log119861119861 value and 119910119894isin minus1 +1 The objective function can

be calculated by

119891 (119909) =

119899

sum

119894=1(120572

119894minus120572

lowast

119894)119870 (119909

119894 119909) + 119887 (2)

120572

119894and 120572lowast

119894are Lagronia factors 119870(119909

119894 119909) is kernel of

radial basis function The MSE of 10-fold CV for SVM wascalculated by

MSE = 1119899

119899

sum

119894=1(

119883

119894minus119883

119894)

2 (3)

where 119899 is the number of all data points 119883119894is the predicted

value and119883119894is the experiment value

Tournament selection was used as the selection strategyin GA which selected the best 1 from 3 randomly chosencandidates The advantage of tournament selection overroulette wheel selection is that tournament selection does notneed to sort the whole population by fitness value

Since there are different types of genes in a chromosomedifferent mating strategies were used for different types ofgenes (Figure 4)

119881new = 1205731199011 + (1minus120573) 1199012 (4)

where 120573 uniformly distributed random number on theinterval [minus025 125] 119901

119899is the value of parent gene and119881new

is the value of child geneFor float genes the new value is a linear combination of

the parents (4) For feature gene uniform crossover is usedeach element of the child gene is selected randomly from thecorresponding items of parents

BioMed Research International 5

CParent A

CParent B

C

C

Child A

Child B

Uniform crossover

++ + +

++ + +

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

Vnew = 120573p1 + (1 minus 120573)p2

Figure 4 Mating strategy of GA

Table 1 Performance comparison of models with different number of features

Number of features Training (CV = 10) Prediction1199032 Parameters of SVMMSE 119903

2 Test set Training set 119862 120574 120576

4 01197 0674 0722 0740 388833 06081 014915 01042 0715 0770 0805 163419 07973 027436 00945 0744 0840 0829 133573 07158 015137 00959 074 0821 0843 343067 05218 015958 00883 0761 0834 0883 609596 05871 023579 00815 0777 0847 0864 37770 08764 0166310 00823 0776 0858 0903 152236 06247 0143411 00714 0804 0861 0891 56937 06531 0157312 00780 0787 0864 0905 72787 07428 0151513 00817 0778 0862 0922 41957 07791 0157414 00812 0778 0882 0917 148391 05002 0205415 00734 0799 0870 0919 49915 05231 01077

Again different mutation strategies were used for differ-ent types of genes For float genes the values were randomlymutated upward or downward The new value was given by

119881new =

119881 minus 120573 (119881 minus 119881min) if random () lt 05

119881 + 120573 (119881max minus 119881) if random () ge 05(5)

where 120573 was a random number distributed in [0 1] 119881 and119881new are values before and after mutation and 119881min and 119881maxare the minimum and maximum values allowed for a gene

For feature gene several points were first randomlychosen for mutation and then a random number in [1119873](119873 is the total number of features) was chosen as new featurewhile avoiding duplicate features The GA was terminatedwhen the evolution reached 1000 generations In our pilotstudy (data not shown) 1000 generations were enough for theGA to convergeThe other parameters for GAwere as followspopulation size 100 cross rate 08 mutation rate 01 elite size2 and number of new individuals in each generation 8

3 Results and Discussion

31 GASVM Performance GA was run with different num-ber of features from 4 to 15 For each number of featuresGA was run 50 times and the best model was chosen forfurther analysis From Table 1 and Figure 5 the overall trendof the GA showed the following (1) the accuracy of themodelincreased with the number of the features (2) the accuracyof the model on training set was better than the accuracy onthe test set which was then better than the accuracy of crossvalidation

As the feature number increases the complexity alsoincreases which will often increase the probability of overfit-ting A complex model is also difficult to interpret and applyin practical use so generally speaking we need to find abalance between the accuracy and complexity of themodel Itis observed that (Table 1 Figure 5(a)) the prediction accuracy(1199032 = 0744) of cross validation (119899 = 10) of the 6-featuremodel was similar to that of the Abrahamrsquos model (1199032 =075) which used all 328 data points (Table 2) As the numberof features increased from 6 to 15 the prediction accuracy

6 BioMed Research International

4 6 8 10 12 14 1604

05

06

07

08

09

10

Number of features

CVr2

CV r2

Test r2

Train r2

(a)

0 200 400 600 800 1000

01

02

03

04

05

06

07

08

MSE

(10-

fold

CV

)

Generation

MSE (10-fold CV)R2 (test set)

(b)

Figure 5 (a) Performance comparison of models with different number of features (b) Evolution of the best 6-feature model

Table 2 Comparison of most relevant QSAR studies on BBB permeability

Descriptors 119873train 119873test Methods 119903train2 Predictive accuracy

on test set Reference

Δlop119875 log119875and log119875cyc 20 mdash LinearRegression 069 mdash Young et al [77]

Excess molar refractiondipolaritypolarisability H-bond acidityand basicitySolute McGowan volume

148 30 LFER 075 119903test2= 073 Platts et al [66]

Δ119866

11988255 mdash Linear

Regression 082 mdash Lombardo et al [78]

PSA the octanolwater partitioncoefficient and the conformationalflexibility

56 7 MLR 085 119903test2= 080 Iyer et al [79]

CODESSADRAGON (482) 200 110 PLSSVM

083097

119903test2= 081

119903test2= 096

Golmohammadi et al [62]

Molecular (CODESSA-PRO) descriptors(5) 113 19 MLR 078 119903test

2= 077 Katritzky et al [15]

Molecular fragment (ISIDA) descriptors 112 19 MLR 090 119903test2= 083 Katritzky et al [15]

PSA log119875 the number of H-bondacceptors E-state and VSA 144 10

CombinatorialQSAR (KNN

SVM)091 119903test

2= 08 Zhang et al [17]

Abraham solute descriptors andindicators 328 mdash LFER 075 mdash Abraham et al [51]

Abraham solute descriptors andindicators 164 164 LFER 071 119904 = 025 MAE = 020 Abraham et al [51]

CODESSAMarvinindicator (6) 260 63 GA based SVM 083 119903test2 = 084 RMSE =

023

This research GASVMfinal model

119862 = 133573 120574 = 0715761 120576= 0151289

CODESSAMarvinindicator (236) 260 63 GA based SVM 097 119903test2 = 055 RMSE =

031

This research GridSVM119862 = 80 120574 = 0015625 120576 =

00625

CODESSAMarvinindicator (6) 260 63 GA based SVM 086 119903test2 = 058 RMSE =

029This research GridSVM119862 = 80 120574 = 10 120576 = 0125

BioMed Research International 7

minus2 minus1 0 1 2minus2

minus1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(a)

minus2 minus1minus2

minus1

0 1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(b)

Figure 6 Prediction accuracy of the final model on training set (a) and test set (b)

on training set increased from 0829 to 0919 while theprediction accuracy on test set only slightly increased from0840 to 0870 accordingly Take all these into considerationthe 6-feature model (Table 1 Figure 5(a)) was chosen as ourfinal model (Figure 6) of which the prediction accuracy onboth test set and training set was similar (119903train

2minus119903test2= 011)

and high enough (gt082)Figure 5(b) showed the evolution of the prediction per-

formance of the model (MSE and 119903test2) In the first 100

generations the MSE decreased very fast followed with aplatform stage from about 100 to 650 generations Anotherdecrease occurred at about 650 generations and then theevolution became stable at about 850 generationsThemodelsdid not improve much in the last 150 generations which mayimply a convergence

A tabular presentation of relevant studies regarding theprediction of the blood-brain distribution is shown inTable 2These models were constructed by using different statisticallearning methods yielding different prediction capabilitywith 119877test

2 ranging from 05 to over 09 Generally regressionby SVM appears to be more robust than traditional linearapproaches such as PLS andMLR with respect to the nonlin-ear effects induced bymultiple potentially cooperative factorsgoverning the BBB permeability For example the SVMmodel by Golmohammadi et al [62] yielded the highest119877test

2

on a test set containing 110 molecules However it shouldbe noted that direct comparison with results from previousstudies is usually inappropriate because of differences in theirdatasets In this study a combination of both in vivo andin vitro data compiled by Abraham et al [51] was used fordeveloping BBB prediction model which is of high dataquality and covers large chemical diversity space In additionto the data source kernel parameter optimization and featureselection are two crucial factors influencing the predictionaccuracy of SVM models To reduce the computational costmost of the existing models addressed the feature selectionand parameter optimization procedures separately In thisstudy we used a GA scheme to perform the kernel parameteroptimization and feature selection simultaneously which ismore efficient at searching the optimal feature subset space

Abrahamrsquos model [51] is the best model that is currentlypublicly available A comparison of our models with Abra-hamrsquos models was shown in Table 2 In Table 2 the last 3rows are models in our study The same dataset was used inAbrahamrsquos research and this study but the data set was splitinto different training set and test set (our model traintest =26063 Abrahamrsquosmodel trainingtest = 164164) 7 variableswere used in Abrahamrsquos model compared to 6 in our finalmodel The 1199032 values for training set in Abrahamrsquos 164164model and 3280 model were 071 and 075 respectivelycompared with 083 for our model It has to be noted that thesize of our training set (260) is bigger than Abrahamrsquos (164)

Our model was also compared with grid method imple-mented with Python toolkit (gridpy) shipped with libsvm[61] for parameter optimization First since grid methodcannot be used to select feature subset all 326 features wereused to construct a BBB prediction model The predictionaccuracy of the training set was very high (119903train

2= 097)

but that of the test set was disappointing (119903test2= 055) Then

we used the same feature set as our final model (6 features)The result was slightly better but still too bad for test setprediction (119903train

2= 086 119903test

2= 058)

So compared with the grid-based method our GA-basedmethod could get better accuracy with fewer features whichsuggested that GA could get much better combination ofparameters and feature subset This was also observed inotherrsquos study [48]

32 Feature Analysis An examination of the descriptors usedin the model could provide an insight into the molecularproperties that are most relevant to BBB penetration Table 3showed the features used in the final 6-feature model andtheir meanings In order to explore the relative importanceand the underlying molecular properties of the descriptorsthe most frequently used features in all 50 6-feature modelswere analyzed Table 4 showed the top 10 most frequentlyused features Interestingly AbsCarboxy an indicator of theexistence of carboxylic acidwas themost significant propertySome features with similar meaning were also found to occurin the models such as PSA related features (M PSA 74

8 BioMed Research International

Table 3 Features used in the final model

Name MeaningM log119875 log119875 (Marvin)HA dependent HDSA-2 [Zefirovrsquos PC] H-bond donor surface area related (CODESSA)M PSA 74 PSA at pH 74 (Marvin)AbsCarboxy Carboxylic acid indicator (Abraham)HA dependent HDCA-2SQRT(TMSA) [Zefirovrsquos PC] H-bond donor charged area related (CODESSA)Average Complementary Information content (order 0) Topology descriptor (CODESSA)

Table 4 The most frequently used features for all 6-feature modelsa

Number Feature name Occurrence(50 models) Meaning

11 AbsCarboxy 36 Indicator for carboxylic aciddagger

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

14 H-acceptor surface areatotal molecularsurface area

101 Topographic electronic index (all bonds) Zefirovrsquos PC 12 Topological electronic index for allbonded pairs of atomsbDagger

8 M PSA 74 11 PSA at pH 74csect

267 ESP-HASA H-acceptors surface area Quantum-Chemical PC 10 H-acceptor surface area

5 delta log119863 9 log119863 (pH 65) minus log119863 (pH 74)deand

7 M PSA 70 9 PSA at pH 70sect

138 HA dependent HDCA-2 [Zefirovrsquos PC] 9 H-donors charged surface area

6 M PSA 65 8 PSA at pH 65sect

1 M log119875 7 log119875andaRows with the same symbol could be categorized into the same groupbTopological electronic index is a feature to characterize the distribution of molecular charge 119879 = sum119873119861

(119894lt119895)(|119902119894 minus 119902119895|119903

2119894119895) where 119902119894 is net charge on 119894th atom and

119903119894119895 is the distance between two bonded atomsc74 is the pH in bloodd65 is the pH in intestinee119863 is the ratio of the sum of the concentrations of all species of a compound in octanol to the sum of the concentrations of all species of the compound inwater For neutral compounds log119863 is equal to log119875

M PSA 70 andM PSA 65) andH-bond related descriptors(number 267 268 and 138) So we decided to put similardescriptors into the same group (Figure 7) to find the under-lying properties affecting BBB penetration

According to the associated molecular properties ofthe features those frequently used features are catego-rized into 5 groups AbsCarboxy (indicator of carboxylicacid) H-bonding (H-bonding ability including H-bonddonoracceptor related features) PSA (molecular polar sur-face area related features) lipophilicity (including M log119875and delta log119863) and molecular charge (including chargeand topological electronic index related features)

The following is observed

(1) Interestingly AbsCarboxy was also the most signif-icant feature which occurred 36 times in total 50models This may indicate that the carboxylic acidgroup plays an important role in the BBB penetrationwhich is consistent with the study of Abraham et al[51]

3633

28

1612

0

5

10

15

20

25

30

35

40

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 7 Top features for all 6-feature models (50 in all)

(2) H-bonding (H-bond donoracceptor) related surfacearea polar surface area log119875 related features and

BioMed Research International 9

Table 5 Most frequently used features for all top models (number of features range from 4 to 15)lowast

Number Descriptor name Occurrence(120 models) Meaning

11 AbsCarboxy 84 Indicator for carboxylic aciddagger

5 delta log119863 35 log119863 (pH 65) minus log119863 (pH 74)and

7 M PSA 70 32 PSA at pH 70sect

1 M log119875 28 log119875and

138 HA dependent HDCA-2 [Zefirovrsquos PC] 27 H-donors charged surface area

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

27 H-acceptor surface areatotal molecularsurface area

6 M PSA 65 25 PSA at pH 65sect

167 PPSA-1 Partial positive surface area [Quantum-Chemical PC] 25 Partial positive surface areasect

101 Topographic electronic index (all bonds) Zefirovrsquos PC 23 Topological electronic index for allbonded pairs of atomsDagger

8 M PSA 74 21 PSA at pH 74sectlowastRows with the same symbol could be categorized into the same group

84

54

103 157

63

23

0

20

40

60

80

100

120

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 8 The most frequently used features for all top models

topological electronic index related features are alsosignificant

In order to further confirm the previous finding all top10 models from features = 4 to 15 were further analyzedand the result was almost the same (Table 5 Figure 8) Thetop 10 most frequently used feature sets were very similarCompared to the 6-feature models the only difference wasthat H-bond related feature number 267 is replaced with PSArelated feature number 167

Again the descriptors were analyzed by group Thecomposites of the groups were almost the same Given thatthere are 4 descriptors in PSA group AbsCarboxy was alsothe most significant property followed by PSA lipophilicityand H-bonding having similar occurences then followedby molecular charge with relatively low frequencies Thehigh consistency suggested that these groups of features hadsignficant impact on the BBB penetration ability

PSA andH-bonding descriptors are highly relevant prop-erties PSA is the molecular areas contributed by polar atoms

(nitrogen sulphur oxygen and phosphorus) andmost of thetime these polar atoms can be H-bond acceptor or donorIf PSA and H-bonding were merged into one group (119899 =157) they will become the most significant property groupof features

33 Properties Relevant to BBB Penetration

331 Carboxylic Acid Group It was proposed by Abraham etal [51] that carboxylic acid group played an important role inBBB penetration While it was commonly believed that themost important molecular properties related to BBB pene-tration were H-bonding ability lipophilicity and molecularcharge [63 64] However our study confirmed Abrahamrsquosconclusion and showed that the importance of carboxylicacid group in BBB penetration could be underestimated

In the models by Abraham et al [51] the indicatorvariable of carboxylic acid group has the largest negativecoefficient indicating its importance in BBB penetrationand is consistent with observations in our model that theindicator of carboxylic acid group is the most frequentlyused descriptor Zhao et al [23] tried to classify compoundsinto BBB positive or BBB negative groups using H-bondingrelated descriptors and the indicator of carboxylic acid groupwas also found to be important in their model Furthermoreour results are consistent with the fact that basic moleculeshave a better BBB penetration than the acid molecules [10]

The carboxylic acid groupmay affect the BBB penetrationthroughmolecular charge interactions since inmost cases thecarboxylic acid group will exist in the ionized form carryinga negative chargeThe carboxylic acid group could also affectthe BBB penetration by formingH-bondwith BBB and henceweaken the BBB penetration abilities of molecules

Abraham et al suggested that the presence of carboxylicacid group which acted to hinder BBB penetration wasnot only simply due to the intrinsic hydrogen bonding andpolarity properties of neutral acids [51] There were some

10 BioMed Research International

other ways in which the carboxylic acid groups could affectthe BBB penetration such as acidic drugs which couldbind to albumin [65] the ionization of the carboxylic acidgroups which could increase the excess molar refraction andhydrogen bonding basicity and finally the carboxylic acidgroups which may be removed from brain by some effluxmechanism [66]

332 Polar Surface Area and H-Bonding Ability As pointedout in our previous analysis PSA and H-bonding abilityare actually two highly correlated properties If these twogroups are merged they will be the most significant groupof properties even more significant than the carboxylic acidindicator (Figure 8) Norinder and Haeberlein [6] concludedthat hydrogen bonding term is a cornerstone in BBB penetra-tion prediction In Zhao et alrsquos study [23] PSA AbsCarboxynumber of H-bonding donors and positively charged formfraction at pH 74 were all treated as H-bonding descriptorsand were found to be important in the final model

Furthermore almost all published models make use ofmolecular polar andorH-bonding ability related descriptorssuch as PSA [23 67 68] high-charged PSA [22] number ofhydrogen donors and acceptors [23 69] and hydrogen bondaciditybasicity [23] And in these models PSA andor H-bond ability are all negatively correlated with log119861119861 whichis in agreement with Abraham et alrsquos study [51] in which thecoefficients of the H-bond acidity and H-bond basicity areboth negative

After review of many previous works Norinder andHaeberlein [6] proposed that if the sum of number ofnitrogen and oxygen atoms (N + O) in a molecule was fiveor less it had a high chance of entering the brain As we allknow nitrogen and oxygen atoms have great impact on PSAandH-bonding Norinder and Haeberlein [6] also concludedthat BBB penetration could be increased by lowering theoverall hydrogen bonding ability of a compound such asby encouraging intramolecular hydrogen bonding After ananalysis of the CNS activity of 125 marketed drugs van deWaterbeemd et al [70] suggested that the upper limit for PSAin a molecule that is desired to penetrate the brain shouldbe around 90 A2 while Kelder et al [71] analyzed the PSAdistribution of 776 orally administered CNS drugs that havereached at least phase II studies and suggested that the upperlimit should be 60ndash70 A2

Having in mind that molecules mainly cross the BBBby passive diffusion we think it may be because moleculeswith strong H-bonding ability have a greater tendency toformH-bonds with the polar environment (the blood) henceweakening their ability to cross the BBB by passive diffusion

333 Lipophilicity Lipophilicity is another property widelyrecognized as being important in BBB penetration and mostof the current models utilize features related to lipophilicity[22 64 67 72] Lipophilicity was thought to be positivelycorrelated with log119861119861 that is increase the lipophilicity of amolecule will increase the BBB penetration of the moleculeNorinder andHaeberlein [6] also proposed that if log119875minus(N+O) gt 0 then log119861119861 was positive Van de Waterbeemd et

al [70] suggested that the log119863 of the molecule should bein [1 3] for good BBB penetration These observations areconsistent with the fact that the lipid bilayer is lipophilic innature and lipophilic molecules could cross the BBB and getinto the brain more easily than hydrophobic molecules

34 Molecular Charge From the viewpoint of computationalchemistry the distribution of molecular charge is a veryimportant property that affects the molecule propertiesgreatly It is the uncharged form that can pass the BBBby passive diffusion Fischer et al [73] have shown thatacid molecules with p119870a lt 4 and basic molecules withp119870a gt 10 could not cross the BBB by passive diffusionUnder physiological conditions (pH = 74) acid moleculeswith p119870a lt 4 and basic molecules with p119870a gt 10 will beionized completely and carry net charges As mentioned inour previous analysis the carboxylic acid group may affectthe BBB penetration through molecular charge interactionsMahar Doan et al [74] compared physicochemical propertiesof 93 CNS (119899 = 48) and non-CNS (119899 = 45) drugs and showedthat 0 of 48 CNS drugs have a negative charge and CNSdrugs tend to have less positive charge These are reasonablyconsistent with the study of Abraham et al [51] in which thecoefficient of the carboxylic acid indicator is negative

It has to be noted that all these molecular propertiesare not independent and they are related to each other Forexample the carboxylic acid group is related to both PSA andH-bonding ability for theOatom in the carboxylic acid groupis a polar atom and has a strong ability to form H-bondsthe carboxylic acid group which carries charge under mostconditions is also related to molecular charge The PSAH-bonding ability is also correlated tomolecular charge becausein many cases atoms could contribute to PSA or form H-bonds which could probably carry charges Lipophilicity isalso related to molecular charge

We can get a conclusion that the most important prop-erties for a molecule to penetrate BBB are carboxylic acidgroup PSAH-bonding ability lipophilicity and charge BBBpenetration is positively correlated with the lipophilicityand negatively correlated with the other three properties Acomparison of the physicochemical properties of 48 CNSdrugs and 45 non-CNS suggested that compared to non-CNSdrugs CNS drugs tend to be more lipophilic and more rigidand have fewer hydrogen-bond donors fewer charges andlower PSA (lt80 A2) [74] which is in reasonable consistencywith our finding except that the molecular flexibility is notimportant in our model

There are some other properties utilized in some existingmodels such as molecular weight molecular shape andmolecular flexibility It is suggested by van de Waterbeemdet al [70] that molecular weight should be less than 450 forgood BBB penetration while Hou and Xu [22] suggested thatthe influence of molecular bulkiness would be obvious whenthe size of themolecule was larger than a threshold and foundthat molecular weight made a negative contribution to theBBB penetration when the molecular weight is greater 360This is not widely observed in other studies In Zhao et alrsquosstudy [23] molecular weight was found to be not importantcompared to hydrogen bond properties

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

BioMed Research International 5

CParent A

CParent B

C

C

Child A

Child B

Uniform crossover

++ + +

++ + +

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

f1 f2 f3 f4 fn120574 120576 middot middot middot

Vnew = 120573p1 + (1 minus 120573)p2

Figure 4 Mating strategy of GA

Table 1 Performance comparison of models with different number of features

Number of features Training (CV = 10) Prediction1199032 Parameters of SVMMSE 119903

2 Test set Training set 119862 120574 120576

4 01197 0674 0722 0740 388833 06081 014915 01042 0715 0770 0805 163419 07973 027436 00945 0744 0840 0829 133573 07158 015137 00959 074 0821 0843 343067 05218 015958 00883 0761 0834 0883 609596 05871 023579 00815 0777 0847 0864 37770 08764 0166310 00823 0776 0858 0903 152236 06247 0143411 00714 0804 0861 0891 56937 06531 0157312 00780 0787 0864 0905 72787 07428 0151513 00817 0778 0862 0922 41957 07791 0157414 00812 0778 0882 0917 148391 05002 0205415 00734 0799 0870 0919 49915 05231 01077

Again different mutation strategies were used for differ-ent types of genes For float genes the values were randomlymutated upward or downward The new value was given by

119881new =

119881 minus 120573 (119881 minus 119881min) if random () lt 05

119881 + 120573 (119881max minus 119881) if random () ge 05(5)

where 120573 was a random number distributed in [0 1] 119881 and119881new are values before and after mutation and 119881min and 119881maxare the minimum and maximum values allowed for a gene

For feature gene several points were first randomlychosen for mutation and then a random number in [1119873](119873 is the total number of features) was chosen as new featurewhile avoiding duplicate features The GA was terminatedwhen the evolution reached 1000 generations In our pilotstudy (data not shown) 1000 generations were enough for theGA to convergeThe other parameters for GAwere as followspopulation size 100 cross rate 08 mutation rate 01 elite size2 and number of new individuals in each generation 8

3 Results and Discussion

31 GASVM Performance GA was run with different num-ber of features from 4 to 15 For each number of featuresGA was run 50 times and the best model was chosen forfurther analysis From Table 1 and Figure 5 the overall trendof the GA showed the following (1) the accuracy of themodelincreased with the number of the features (2) the accuracyof the model on training set was better than the accuracy onthe test set which was then better than the accuracy of crossvalidation

As the feature number increases the complexity alsoincreases which will often increase the probability of overfit-ting A complex model is also difficult to interpret and applyin practical use so generally speaking we need to find abalance between the accuracy and complexity of themodel Itis observed that (Table 1 Figure 5(a)) the prediction accuracy(1199032 = 0744) of cross validation (119899 = 10) of the 6-featuremodel was similar to that of the Abrahamrsquos model (1199032 =075) which used all 328 data points (Table 2) As the numberof features increased from 6 to 15 the prediction accuracy

6 BioMed Research International

4 6 8 10 12 14 1604

05

06

07

08

09

10

Number of features

CVr2

CV r2

Test r2

Train r2

(a)

0 200 400 600 800 1000

01

02

03

04

05

06

07

08

MSE

(10-

fold

CV

)

Generation

MSE (10-fold CV)R2 (test set)

(b)

Figure 5 (a) Performance comparison of models with different number of features (b) Evolution of the best 6-feature model

Table 2 Comparison of most relevant QSAR studies on BBB permeability

Descriptors 119873train 119873test Methods 119903train2 Predictive accuracy

on test set Reference

Δlop119875 log119875and log119875cyc 20 mdash LinearRegression 069 mdash Young et al [77]

Excess molar refractiondipolaritypolarisability H-bond acidityand basicitySolute McGowan volume

148 30 LFER 075 119903test2= 073 Platts et al [66]

Δ119866

11988255 mdash Linear

Regression 082 mdash Lombardo et al [78]

PSA the octanolwater partitioncoefficient and the conformationalflexibility

56 7 MLR 085 119903test2= 080 Iyer et al [79]

CODESSADRAGON (482) 200 110 PLSSVM

083097

119903test2= 081

119903test2= 096

Golmohammadi et al [62]

Molecular (CODESSA-PRO) descriptors(5) 113 19 MLR 078 119903test

2= 077 Katritzky et al [15]

Molecular fragment (ISIDA) descriptors 112 19 MLR 090 119903test2= 083 Katritzky et al [15]

PSA log119875 the number of H-bondacceptors E-state and VSA 144 10

CombinatorialQSAR (KNN

SVM)091 119903test

2= 08 Zhang et al [17]

Abraham solute descriptors andindicators 328 mdash LFER 075 mdash Abraham et al [51]

Abraham solute descriptors andindicators 164 164 LFER 071 119904 = 025 MAE = 020 Abraham et al [51]

CODESSAMarvinindicator (6) 260 63 GA based SVM 083 119903test2 = 084 RMSE =

023

This research GASVMfinal model

119862 = 133573 120574 = 0715761 120576= 0151289

CODESSAMarvinindicator (236) 260 63 GA based SVM 097 119903test2 = 055 RMSE =

031

This research GridSVM119862 = 80 120574 = 0015625 120576 =

00625

CODESSAMarvinindicator (6) 260 63 GA based SVM 086 119903test2 = 058 RMSE =

029This research GridSVM119862 = 80 120574 = 10 120576 = 0125

BioMed Research International 7

minus2 minus1 0 1 2minus2

minus1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(a)

minus2 minus1minus2

minus1

0 1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(b)

Figure 6 Prediction accuracy of the final model on training set (a) and test set (b)

on training set increased from 0829 to 0919 while theprediction accuracy on test set only slightly increased from0840 to 0870 accordingly Take all these into considerationthe 6-feature model (Table 1 Figure 5(a)) was chosen as ourfinal model (Figure 6) of which the prediction accuracy onboth test set and training set was similar (119903train

2minus119903test2= 011)

and high enough (gt082)Figure 5(b) showed the evolution of the prediction per-

formance of the model (MSE and 119903test2) In the first 100

generations the MSE decreased very fast followed with aplatform stage from about 100 to 650 generations Anotherdecrease occurred at about 650 generations and then theevolution became stable at about 850 generationsThemodelsdid not improve much in the last 150 generations which mayimply a convergence

A tabular presentation of relevant studies regarding theprediction of the blood-brain distribution is shown inTable 2These models were constructed by using different statisticallearning methods yielding different prediction capabilitywith 119877test

2 ranging from 05 to over 09 Generally regressionby SVM appears to be more robust than traditional linearapproaches such as PLS andMLR with respect to the nonlin-ear effects induced bymultiple potentially cooperative factorsgoverning the BBB permeability For example the SVMmodel by Golmohammadi et al [62] yielded the highest119877test

2

on a test set containing 110 molecules However it shouldbe noted that direct comparison with results from previousstudies is usually inappropriate because of differences in theirdatasets In this study a combination of both in vivo andin vitro data compiled by Abraham et al [51] was used fordeveloping BBB prediction model which is of high dataquality and covers large chemical diversity space In additionto the data source kernel parameter optimization and featureselection are two crucial factors influencing the predictionaccuracy of SVM models To reduce the computational costmost of the existing models addressed the feature selectionand parameter optimization procedures separately In thisstudy we used a GA scheme to perform the kernel parameteroptimization and feature selection simultaneously which ismore efficient at searching the optimal feature subset space

Abrahamrsquos model [51] is the best model that is currentlypublicly available A comparison of our models with Abra-hamrsquos models was shown in Table 2 In Table 2 the last 3rows are models in our study The same dataset was used inAbrahamrsquos research and this study but the data set was splitinto different training set and test set (our model traintest =26063 Abrahamrsquosmodel trainingtest = 164164) 7 variableswere used in Abrahamrsquos model compared to 6 in our finalmodel The 1199032 values for training set in Abrahamrsquos 164164model and 3280 model were 071 and 075 respectivelycompared with 083 for our model It has to be noted that thesize of our training set (260) is bigger than Abrahamrsquos (164)

Our model was also compared with grid method imple-mented with Python toolkit (gridpy) shipped with libsvm[61] for parameter optimization First since grid methodcannot be used to select feature subset all 326 features wereused to construct a BBB prediction model The predictionaccuracy of the training set was very high (119903train

2= 097)

but that of the test set was disappointing (119903test2= 055) Then

we used the same feature set as our final model (6 features)The result was slightly better but still too bad for test setprediction (119903train

2= 086 119903test

2= 058)

So compared with the grid-based method our GA-basedmethod could get better accuracy with fewer features whichsuggested that GA could get much better combination ofparameters and feature subset This was also observed inotherrsquos study [48]

32 Feature Analysis An examination of the descriptors usedin the model could provide an insight into the molecularproperties that are most relevant to BBB penetration Table 3showed the features used in the final 6-feature model andtheir meanings In order to explore the relative importanceand the underlying molecular properties of the descriptorsthe most frequently used features in all 50 6-feature modelswere analyzed Table 4 showed the top 10 most frequentlyused features Interestingly AbsCarboxy an indicator of theexistence of carboxylic acidwas themost significant propertySome features with similar meaning were also found to occurin the models such as PSA related features (M PSA 74

8 BioMed Research International

Table 3 Features used in the final model

Name MeaningM log119875 log119875 (Marvin)HA dependent HDSA-2 [Zefirovrsquos PC] H-bond donor surface area related (CODESSA)M PSA 74 PSA at pH 74 (Marvin)AbsCarboxy Carboxylic acid indicator (Abraham)HA dependent HDCA-2SQRT(TMSA) [Zefirovrsquos PC] H-bond donor charged area related (CODESSA)Average Complementary Information content (order 0) Topology descriptor (CODESSA)

Table 4 The most frequently used features for all 6-feature modelsa

Number Feature name Occurrence(50 models) Meaning

11 AbsCarboxy 36 Indicator for carboxylic aciddagger

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

14 H-acceptor surface areatotal molecularsurface area

101 Topographic electronic index (all bonds) Zefirovrsquos PC 12 Topological electronic index for allbonded pairs of atomsbDagger

8 M PSA 74 11 PSA at pH 74csect

267 ESP-HASA H-acceptors surface area Quantum-Chemical PC 10 H-acceptor surface area

5 delta log119863 9 log119863 (pH 65) minus log119863 (pH 74)deand

7 M PSA 70 9 PSA at pH 70sect

138 HA dependent HDCA-2 [Zefirovrsquos PC] 9 H-donors charged surface area

6 M PSA 65 8 PSA at pH 65sect

1 M log119875 7 log119875andaRows with the same symbol could be categorized into the same groupbTopological electronic index is a feature to characterize the distribution of molecular charge 119879 = sum119873119861

(119894lt119895)(|119902119894 minus 119902119895|119903

2119894119895) where 119902119894 is net charge on 119894th atom and

119903119894119895 is the distance between two bonded atomsc74 is the pH in bloodd65 is the pH in intestinee119863 is the ratio of the sum of the concentrations of all species of a compound in octanol to the sum of the concentrations of all species of the compound inwater For neutral compounds log119863 is equal to log119875

M PSA 70 andM PSA 65) andH-bond related descriptors(number 267 268 and 138) So we decided to put similardescriptors into the same group (Figure 7) to find the under-lying properties affecting BBB penetration

According to the associated molecular properties ofthe features those frequently used features are catego-rized into 5 groups AbsCarboxy (indicator of carboxylicacid) H-bonding (H-bonding ability including H-bonddonoracceptor related features) PSA (molecular polar sur-face area related features) lipophilicity (including M log119875and delta log119863) and molecular charge (including chargeand topological electronic index related features)

The following is observed

(1) Interestingly AbsCarboxy was also the most signif-icant feature which occurred 36 times in total 50models This may indicate that the carboxylic acidgroup plays an important role in the BBB penetrationwhich is consistent with the study of Abraham et al[51]

3633

28

1612

0

5

10

15

20

25

30

35

40

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 7 Top features for all 6-feature models (50 in all)

(2) H-bonding (H-bond donoracceptor) related surfacearea polar surface area log119875 related features and

BioMed Research International 9

Table 5 Most frequently used features for all top models (number of features range from 4 to 15)lowast

Number Descriptor name Occurrence(120 models) Meaning

11 AbsCarboxy 84 Indicator for carboxylic aciddagger

5 delta log119863 35 log119863 (pH 65) minus log119863 (pH 74)and

7 M PSA 70 32 PSA at pH 70sect

1 M log119875 28 log119875and

138 HA dependent HDCA-2 [Zefirovrsquos PC] 27 H-donors charged surface area

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

27 H-acceptor surface areatotal molecularsurface area

6 M PSA 65 25 PSA at pH 65sect

167 PPSA-1 Partial positive surface area [Quantum-Chemical PC] 25 Partial positive surface areasect

101 Topographic electronic index (all bonds) Zefirovrsquos PC 23 Topological electronic index for allbonded pairs of atomsDagger

8 M PSA 74 21 PSA at pH 74sectlowastRows with the same symbol could be categorized into the same group

84

54

103 157

63

23

0

20

40

60

80

100

120

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 8 The most frequently used features for all top models

topological electronic index related features are alsosignificant

In order to further confirm the previous finding all top10 models from features = 4 to 15 were further analyzedand the result was almost the same (Table 5 Figure 8) Thetop 10 most frequently used feature sets were very similarCompared to the 6-feature models the only difference wasthat H-bond related feature number 267 is replaced with PSArelated feature number 167

Again the descriptors were analyzed by group Thecomposites of the groups were almost the same Given thatthere are 4 descriptors in PSA group AbsCarboxy was alsothe most significant property followed by PSA lipophilicityand H-bonding having similar occurences then followedby molecular charge with relatively low frequencies Thehigh consistency suggested that these groups of features hadsignficant impact on the BBB penetration ability

PSA andH-bonding descriptors are highly relevant prop-erties PSA is the molecular areas contributed by polar atoms

(nitrogen sulphur oxygen and phosphorus) andmost of thetime these polar atoms can be H-bond acceptor or donorIf PSA and H-bonding were merged into one group (119899 =157) they will become the most significant property groupof features

33 Properties Relevant to BBB Penetration

331 Carboxylic Acid Group It was proposed by Abraham etal [51] that carboxylic acid group played an important role inBBB penetration While it was commonly believed that themost important molecular properties related to BBB pene-tration were H-bonding ability lipophilicity and molecularcharge [63 64] However our study confirmed Abrahamrsquosconclusion and showed that the importance of carboxylicacid group in BBB penetration could be underestimated

In the models by Abraham et al [51] the indicatorvariable of carboxylic acid group has the largest negativecoefficient indicating its importance in BBB penetrationand is consistent with observations in our model that theindicator of carboxylic acid group is the most frequentlyused descriptor Zhao et al [23] tried to classify compoundsinto BBB positive or BBB negative groups using H-bondingrelated descriptors and the indicator of carboxylic acid groupwas also found to be important in their model Furthermoreour results are consistent with the fact that basic moleculeshave a better BBB penetration than the acid molecules [10]

The carboxylic acid groupmay affect the BBB penetrationthroughmolecular charge interactions since inmost cases thecarboxylic acid group will exist in the ionized form carryinga negative chargeThe carboxylic acid group could also affectthe BBB penetration by formingH-bondwith BBB and henceweaken the BBB penetration abilities of molecules

Abraham et al suggested that the presence of carboxylicacid group which acted to hinder BBB penetration wasnot only simply due to the intrinsic hydrogen bonding andpolarity properties of neutral acids [51] There were some

10 BioMed Research International

other ways in which the carboxylic acid groups could affectthe BBB penetration such as acidic drugs which couldbind to albumin [65] the ionization of the carboxylic acidgroups which could increase the excess molar refraction andhydrogen bonding basicity and finally the carboxylic acidgroups which may be removed from brain by some effluxmechanism [66]

332 Polar Surface Area and H-Bonding Ability As pointedout in our previous analysis PSA and H-bonding abilityare actually two highly correlated properties If these twogroups are merged they will be the most significant groupof properties even more significant than the carboxylic acidindicator (Figure 8) Norinder and Haeberlein [6] concludedthat hydrogen bonding term is a cornerstone in BBB penetra-tion prediction In Zhao et alrsquos study [23] PSA AbsCarboxynumber of H-bonding donors and positively charged formfraction at pH 74 were all treated as H-bonding descriptorsand were found to be important in the final model

Furthermore almost all published models make use ofmolecular polar andorH-bonding ability related descriptorssuch as PSA [23 67 68] high-charged PSA [22] number ofhydrogen donors and acceptors [23 69] and hydrogen bondaciditybasicity [23] And in these models PSA andor H-bond ability are all negatively correlated with log119861119861 whichis in agreement with Abraham et alrsquos study [51] in which thecoefficients of the H-bond acidity and H-bond basicity areboth negative

After review of many previous works Norinder andHaeberlein [6] proposed that if the sum of number ofnitrogen and oxygen atoms (N + O) in a molecule was fiveor less it had a high chance of entering the brain As we allknow nitrogen and oxygen atoms have great impact on PSAandH-bonding Norinder and Haeberlein [6] also concludedthat BBB penetration could be increased by lowering theoverall hydrogen bonding ability of a compound such asby encouraging intramolecular hydrogen bonding After ananalysis of the CNS activity of 125 marketed drugs van deWaterbeemd et al [70] suggested that the upper limit for PSAin a molecule that is desired to penetrate the brain shouldbe around 90 A2 while Kelder et al [71] analyzed the PSAdistribution of 776 orally administered CNS drugs that havereached at least phase II studies and suggested that the upperlimit should be 60ndash70 A2

Having in mind that molecules mainly cross the BBBby passive diffusion we think it may be because moleculeswith strong H-bonding ability have a greater tendency toformH-bonds with the polar environment (the blood) henceweakening their ability to cross the BBB by passive diffusion

333 Lipophilicity Lipophilicity is another property widelyrecognized as being important in BBB penetration and mostof the current models utilize features related to lipophilicity[22 64 67 72] Lipophilicity was thought to be positivelycorrelated with log119861119861 that is increase the lipophilicity of amolecule will increase the BBB penetration of the moleculeNorinder andHaeberlein [6] also proposed that if log119875minus(N+O) gt 0 then log119861119861 was positive Van de Waterbeemd et

al [70] suggested that the log119863 of the molecule should bein [1 3] for good BBB penetration These observations areconsistent with the fact that the lipid bilayer is lipophilic innature and lipophilic molecules could cross the BBB and getinto the brain more easily than hydrophobic molecules

34 Molecular Charge From the viewpoint of computationalchemistry the distribution of molecular charge is a veryimportant property that affects the molecule propertiesgreatly It is the uncharged form that can pass the BBBby passive diffusion Fischer et al [73] have shown thatacid molecules with p119870a lt 4 and basic molecules withp119870a gt 10 could not cross the BBB by passive diffusionUnder physiological conditions (pH = 74) acid moleculeswith p119870a lt 4 and basic molecules with p119870a gt 10 will beionized completely and carry net charges As mentioned inour previous analysis the carboxylic acid group may affectthe BBB penetration through molecular charge interactionsMahar Doan et al [74] compared physicochemical propertiesof 93 CNS (119899 = 48) and non-CNS (119899 = 45) drugs and showedthat 0 of 48 CNS drugs have a negative charge and CNSdrugs tend to have less positive charge These are reasonablyconsistent with the study of Abraham et al [51] in which thecoefficient of the carboxylic acid indicator is negative

It has to be noted that all these molecular propertiesare not independent and they are related to each other Forexample the carboxylic acid group is related to both PSA andH-bonding ability for theOatom in the carboxylic acid groupis a polar atom and has a strong ability to form H-bondsthe carboxylic acid group which carries charge under mostconditions is also related to molecular charge The PSAH-bonding ability is also correlated tomolecular charge becausein many cases atoms could contribute to PSA or form H-bonds which could probably carry charges Lipophilicity isalso related to molecular charge

We can get a conclusion that the most important prop-erties for a molecule to penetrate BBB are carboxylic acidgroup PSAH-bonding ability lipophilicity and charge BBBpenetration is positively correlated with the lipophilicityand negatively correlated with the other three properties Acomparison of the physicochemical properties of 48 CNSdrugs and 45 non-CNS suggested that compared to non-CNSdrugs CNS drugs tend to be more lipophilic and more rigidand have fewer hydrogen-bond donors fewer charges andlower PSA (lt80 A2) [74] which is in reasonable consistencywith our finding except that the molecular flexibility is notimportant in our model

There are some other properties utilized in some existingmodels such as molecular weight molecular shape andmolecular flexibility It is suggested by van de Waterbeemdet al [70] that molecular weight should be less than 450 forgood BBB penetration while Hou and Xu [22] suggested thatthe influence of molecular bulkiness would be obvious whenthe size of themolecule was larger than a threshold and foundthat molecular weight made a negative contribution to theBBB penetration when the molecular weight is greater 360This is not widely observed in other studies In Zhao et alrsquosstudy [23] molecular weight was found to be not importantcompared to hydrogen bond properties

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

6 BioMed Research International

4 6 8 10 12 14 1604

05

06

07

08

09

10

Number of features

CVr2

CV r2

Test r2

Train r2

(a)

0 200 400 600 800 1000

01

02

03

04

05

06

07

08

MSE

(10-

fold

CV

)

Generation

MSE (10-fold CV)R2 (test set)

(b)

Figure 5 (a) Performance comparison of models with different number of features (b) Evolution of the best 6-feature model

Table 2 Comparison of most relevant QSAR studies on BBB permeability

Descriptors 119873train 119873test Methods 119903train2 Predictive accuracy

on test set Reference

Δlop119875 log119875and log119875cyc 20 mdash LinearRegression 069 mdash Young et al [77]

Excess molar refractiondipolaritypolarisability H-bond acidityand basicitySolute McGowan volume

148 30 LFER 075 119903test2= 073 Platts et al [66]

Δ119866

11988255 mdash Linear

Regression 082 mdash Lombardo et al [78]

PSA the octanolwater partitioncoefficient and the conformationalflexibility

56 7 MLR 085 119903test2= 080 Iyer et al [79]

CODESSADRAGON (482) 200 110 PLSSVM

083097

119903test2= 081

119903test2= 096

Golmohammadi et al [62]

Molecular (CODESSA-PRO) descriptors(5) 113 19 MLR 078 119903test

2= 077 Katritzky et al [15]

Molecular fragment (ISIDA) descriptors 112 19 MLR 090 119903test2= 083 Katritzky et al [15]

PSA log119875 the number of H-bondacceptors E-state and VSA 144 10

CombinatorialQSAR (KNN

SVM)091 119903test

2= 08 Zhang et al [17]

Abraham solute descriptors andindicators 328 mdash LFER 075 mdash Abraham et al [51]

Abraham solute descriptors andindicators 164 164 LFER 071 119904 = 025 MAE = 020 Abraham et al [51]

CODESSAMarvinindicator (6) 260 63 GA based SVM 083 119903test2 = 084 RMSE =

023

This research GASVMfinal model

119862 = 133573 120574 = 0715761 120576= 0151289

CODESSAMarvinindicator (236) 260 63 GA based SVM 097 119903test2 = 055 RMSE =

031

This research GridSVM119862 = 80 120574 = 0015625 120576 =

00625

CODESSAMarvinindicator (6) 260 63 GA based SVM 086 119903test2 = 058 RMSE =

029This research GridSVM119862 = 80 120574 = 10 120576 = 0125

BioMed Research International 7

minus2 minus1 0 1 2minus2

minus1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(a)

minus2 minus1minus2

minus1

0 1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(b)

Figure 6 Prediction accuracy of the final model on training set (a) and test set (b)

on training set increased from 0829 to 0919 while theprediction accuracy on test set only slightly increased from0840 to 0870 accordingly Take all these into considerationthe 6-feature model (Table 1 Figure 5(a)) was chosen as ourfinal model (Figure 6) of which the prediction accuracy onboth test set and training set was similar (119903train

2minus119903test2= 011)

and high enough (gt082)Figure 5(b) showed the evolution of the prediction per-

formance of the model (MSE and 119903test2) In the first 100

generations the MSE decreased very fast followed with aplatform stage from about 100 to 650 generations Anotherdecrease occurred at about 650 generations and then theevolution became stable at about 850 generationsThemodelsdid not improve much in the last 150 generations which mayimply a convergence

A tabular presentation of relevant studies regarding theprediction of the blood-brain distribution is shown inTable 2These models were constructed by using different statisticallearning methods yielding different prediction capabilitywith 119877test

2 ranging from 05 to over 09 Generally regressionby SVM appears to be more robust than traditional linearapproaches such as PLS andMLR with respect to the nonlin-ear effects induced bymultiple potentially cooperative factorsgoverning the BBB permeability For example the SVMmodel by Golmohammadi et al [62] yielded the highest119877test

2

on a test set containing 110 molecules However it shouldbe noted that direct comparison with results from previousstudies is usually inappropriate because of differences in theirdatasets In this study a combination of both in vivo andin vitro data compiled by Abraham et al [51] was used fordeveloping BBB prediction model which is of high dataquality and covers large chemical diversity space In additionto the data source kernel parameter optimization and featureselection are two crucial factors influencing the predictionaccuracy of SVM models To reduce the computational costmost of the existing models addressed the feature selectionand parameter optimization procedures separately In thisstudy we used a GA scheme to perform the kernel parameteroptimization and feature selection simultaneously which ismore efficient at searching the optimal feature subset space

Abrahamrsquos model [51] is the best model that is currentlypublicly available A comparison of our models with Abra-hamrsquos models was shown in Table 2 In Table 2 the last 3rows are models in our study The same dataset was used inAbrahamrsquos research and this study but the data set was splitinto different training set and test set (our model traintest =26063 Abrahamrsquosmodel trainingtest = 164164) 7 variableswere used in Abrahamrsquos model compared to 6 in our finalmodel The 1199032 values for training set in Abrahamrsquos 164164model and 3280 model were 071 and 075 respectivelycompared with 083 for our model It has to be noted that thesize of our training set (260) is bigger than Abrahamrsquos (164)

Our model was also compared with grid method imple-mented with Python toolkit (gridpy) shipped with libsvm[61] for parameter optimization First since grid methodcannot be used to select feature subset all 326 features wereused to construct a BBB prediction model The predictionaccuracy of the training set was very high (119903train

2= 097)

but that of the test set was disappointing (119903test2= 055) Then

we used the same feature set as our final model (6 features)The result was slightly better but still too bad for test setprediction (119903train

2= 086 119903test

2= 058)

So compared with the grid-based method our GA-basedmethod could get better accuracy with fewer features whichsuggested that GA could get much better combination ofparameters and feature subset This was also observed inotherrsquos study [48]

32 Feature Analysis An examination of the descriptors usedin the model could provide an insight into the molecularproperties that are most relevant to BBB penetration Table 3showed the features used in the final 6-feature model andtheir meanings In order to explore the relative importanceand the underlying molecular properties of the descriptorsthe most frequently used features in all 50 6-feature modelswere analyzed Table 4 showed the top 10 most frequentlyused features Interestingly AbsCarboxy an indicator of theexistence of carboxylic acidwas themost significant propertySome features with similar meaning were also found to occurin the models such as PSA related features (M PSA 74

8 BioMed Research International

Table 3 Features used in the final model

Name MeaningM log119875 log119875 (Marvin)HA dependent HDSA-2 [Zefirovrsquos PC] H-bond donor surface area related (CODESSA)M PSA 74 PSA at pH 74 (Marvin)AbsCarboxy Carboxylic acid indicator (Abraham)HA dependent HDCA-2SQRT(TMSA) [Zefirovrsquos PC] H-bond donor charged area related (CODESSA)Average Complementary Information content (order 0) Topology descriptor (CODESSA)

Table 4 The most frequently used features for all 6-feature modelsa

Number Feature name Occurrence(50 models) Meaning

11 AbsCarboxy 36 Indicator for carboxylic aciddagger

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

14 H-acceptor surface areatotal molecularsurface area

101 Topographic electronic index (all bonds) Zefirovrsquos PC 12 Topological electronic index for allbonded pairs of atomsbDagger

8 M PSA 74 11 PSA at pH 74csect

267 ESP-HASA H-acceptors surface area Quantum-Chemical PC 10 H-acceptor surface area

5 delta log119863 9 log119863 (pH 65) minus log119863 (pH 74)deand

7 M PSA 70 9 PSA at pH 70sect

138 HA dependent HDCA-2 [Zefirovrsquos PC] 9 H-donors charged surface area

6 M PSA 65 8 PSA at pH 65sect

1 M log119875 7 log119875andaRows with the same symbol could be categorized into the same groupbTopological electronic index is a feature to characterize the distribution of molecular charge 119879 = sum119873119861

(119894lt119895)(|119902119894 minus 119902119895|119903

2119894119895) where 119902119894 is net charge on 119894th atom and

119903119894119895 is the distance between two bonded atomsc74 is the pH in bloodd65 is the pH in intestinee119863 is the ratio of the sum of the concentrations of all species of a compound in octanol to the sum of the concentrations of all species of the compound inwater For neutral compounds log119863 is equal to log119875

M PSA 70 andM PSA 65) andH-bond related descriptors(number 267 268 and 138) So we decided to put similardescriptors into the same group (Figure 7) to find the under-lying properties affecting BBB penetration

According to the associated molecular properties ofthe features those frequently used features are catego-rized into 5 groups AbsCarboxy (indicator of carboxylicacid) H-bonding (H-bonding ability including H-bonddonoracceptor related features) PSA (molecular polar sur-face area related features) lipophilicity (including M log119875and delta log119863) and molecular charge (including chargeand topological electronic index related features)

The following is observed

(1) Interestingly AbsCarboxy was also the most signif-icant feature which occurred 36 times in total 50models This may indicate that the carboxylic acidgroup plays an important role in the BBB penetrationwhich is consistent with the study of Abraham et al[51]

3633

28

1612

0

5

10

15

20

25

30

35

40

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 7 Top features for all 6-feature models (50 in all)

(2) H-bonding (H-bond donoracceptor) related surfacearea polar surface area log119875 related features and

BioMed Research International 9

Table 5 Most frequently used features for all top models (number of features range from 4 to 15)lowast

Number Descriptor name Occurrence(120 models) Meaning

11 AbsCarboxy 84 Indicator for carboxylic aciddagger

5 delta log119863 35 log119863 (pH 65) minus log119863 (pH 74)and

7 M PSA 70 32 PSA at pH 70sect

1 M log119875 28 log119875and

138 HA dependent HDCA-2 [Zefirovrsquos PC] 27 H-donors charged surface area

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

27 H-acceptor surface areatotal molecularsurface area

6 M PSA 65 25 PSA at pH 65sect

167 PPSA-1 Partial positive surface area [Quantum-Chemical PC] 25 Partial positive surface areasect

101 Topographic electronic index (all bonds) Zefirovrsquos PC 23 Topological electronic index for allbonded pairs of atomsDagger

8 M PSA 74 21 PSA at pH 74sectlowastRows with the same symbol could be categorized into the same group

84

54

103 157

63

23

0

20

40

60

80

100

120

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 8 The most frequently used features for all top models

topological electronic index related features are alsosignificant

In order to further confirm the previous finding all top10 models from features = 4 to 15 were further analyzedand the result was almost the same (Table 5 Figure 8) Thetop 10 most frequently used feature sets were very similarCompared to the 6-feature models the only difference wasthat H-bond related feature number 267 is replaced with PSArelated feature number 167

Again the descriptors were analyzed by group Thecomposites of the groups were almost the same Given thatthere are 4 descriptors in PSA group AbsCarboxy was alsothe most significant property followed by PSA lipophilicityand H-bonding having similar occurences then followedby molecular charge with relatively low frequencies Thehigh consistency suggested that these groups of features hadsignficant impact on the BBB penetration ability

PSA andH-bonding descriptors are highly relevant prop-erties PSA is the molecular areas contributed by polar atoms

(nitrogen sulphur oxygen and phosphorus) andmost of thetime these polar atoms can be H-bond acceptor or donorIf PSA and H-bonding were merged into one group (119899 =157) they will become the most significant property groupof features

33 Properties Relevant to BBB Penetration

331 Carboxylic Acid Group It was proposed by Abraham etal [51] that carboxylic acid group played an important role inBBB penetration While it was commonly believed that themost important molecular properties related to BBB pene-tration were H-bonding ability lipophilicity and molecularcharge [63 64] However our study confirmed Abrahamrsquosconclusion and showed that the importance of carboxylicacid group in BBB penetration could be underestimated

In the models by Abraham et al [51] the indicatorvariable of carboxylic acid group has the largest negativecoefficient indicating its importance in BBB penetrationand is consistent with observations in our model that theindicator of carboxylic acid group is the most frequentlyused descriptor Zhao et al [23] tried to classify compoundsinto BBB positive or BBB negative groups using H-bondingrelated descriptors and the indicator of carboxylic acid groupwas also found to be important in their model Furthermoreour results are consistent with the fact that basic moleculeshave a better BBB penetration than the acid molecules [10]

The carboxylic acid groupmay affect the BBB penetrationthroughmolecular charge interactions since inmost cases thecarboxylic acid group will exist in the ionized form carryinga negative chargeThe carboxylic acid group could also affectthe BBB penetration by formingH-bondwith BBB and henceweaken the BBB penetration abilities of molecules

Abraham et al suggested that the presence of carboxylicacid group which acted to hinder BBB penetration wasnot only simply due to the intrinsic hydrogen bonding andpolarity properties of neutral acids [51] There were some

10 BioMed Research International

other ways in which the carboxylic acid groups could affectthe BBB penetration such as acidic drugs which couldbind to albumin [65] the ionization of the carboxylic acidgroups which could increase the excess molar refraction andhydrogen bonding basicity and finally the carboxylic acidgroups which may be removed from brain by some effluxmechanism [66]

332 Polar Surface Area and H-Bonding Ability As pointedout in our previous analysis PSA and H-bonding abilityare actually two highly correlated properties If these twogroups are merged they will be the most significant groupof properties even more significant than the carboxylic acidindicator (Figure 8) Norinder and Haeberlein [6] concludedthat hydrogen bonding term is a cornerstone in BBB penetra-tion prediction In Zhao et alrsquos study [23] PSA AbsCarboxynumber of H-bonding donors and positively charged formfraction at pH 74 were all treated as H-bonding descriptorsand were found to be important in the final model

Furthermore almost all published models make use ofmolecular polar andorH-bonding ability related descriptorssuch as PSA [23 67 68] high-charged PSA [22] number ofhydrogen donors and acceptors [23 69] and hydrogen bondaciditybasicity [23] And in these models PSA andor H-bond ability are all negatively correlated with log119861119861 whichis in agreement with Abraham et alrsquos study [51] in which thecoefficients of the H-bond acidity and H-bond basicity areboth negative

After review of many previous works Norinder andHaeberlein [6] proposed that if the sum of number ofnitrogen and oxygen atoms (N + O) in a molecule was fiveor less it had a high chance of entering the brain As we allknow nitrogen and oxygen atoms have great impact on PSAandH-bonding Norinder and Haeberlein [6] also concludedthat BBB penetration could be increased by lowering theoverall hydrogen bonding ability of a compound such asby encouraging intramolecular hydrogen bonding After ananalysis of the CNS activity of 125 marketed drugs van deWaterbeemd et al [70] suggested that the upper limit for PSAin a molecule that is desired to penetrate the brain shouldbe around 90 A2 while Kelder et al [71] analyzed the PSAdistribution of 776 orally administered CNS drugs that havereached at least phase II studies and suggested that the upperlimit should be 60ndash70 A2

Having in mind that molecules mainly cross the BBBby passive diffusion we think it may be because moleculeswith strong H-bonding ability have a greater tendency toformH-bonds with the polar environment (the blood) henceweakening their ability to cross the BBB by passive diffusion

333 Lipophilicity Lipophilicity is another property widelyrecognized as being important in BBB penetration and mostof the current models utilize features related to lipophilicity[22 64 67 72] Lipophilicity was thought to be positivelycorrelated with log119861119861 that is increase the lipophilicity of amolecule will increase the BBB penetration of the moleculeNorinder andHaeberlein [6] also proposed that if log119875minus(N+O) gt 0 then log119861119861 was positive Van de Waterbeemd et

al [70] suggested that the log119863 of the molecule should bein [1 3] for good BBB penetration These observations areconsistent with the fact that the lipid bilayer is lipophilic innature and lipophilic molecules could cross the BBB and getinto the brain more easily than hydrophobic molecules

34 Molecular Charge From the viewpoint of computationalchemistry the distribution of molecular charge is a veryimportant property that affects the molecule propertiesgreatly It is the uncharged form that can pass the BBBby passive diffusion Fischer et al [73] have shown thatacid molecules with p119870a lt 4 and basic molecules withp119870a gt 10 could not cross the BBB by passive diffusionUnder physiological conditions (pH = 74) acid moleculeswith p119870a lt 4 and basic molecules with p119870a gt 10 will beionized completely and carry net charges As mentioned inour previous analysis the carboxylic acid group may affectthe BBB penetration through molecular charge interactionsMahar Doan et al [74] compared physicochemical propertiesof 93 CNS (119899 = 48) and non-CNS (119899 = 45) drugs and showedthat 0 of 48 CNS drugs have a negative charge and CNSdrugs tend to have less positive charge These are reasonablyconsistent with the study of Abraham et al [51] in which thecoefficient of the carboxylic acid indicator is negative

It has to be noted that all these molecular propertiesare not independent and they are related to each other Forexample the carboxylic acid group is related to both PSA andH-bonding ability for theOatom in the carboxylic acid groupis a polar atom and has a strong ability to form H-bondsthe carboxylic acid group which carries charge under mostconditions is also related to molecular charge The PSAH-bonding ability is also correlated tomolecular charge becausein many cases atoms could contribute to PSA or form H-bonds which could probably carry charges Lipophilicity isalso related to molecular charge

We can get a conclusion that the most important prop-erties for a molecule to penetrate BBB are carboxylic acidgroup PSAH-bonding ability lipophilicity and charge BBBpenetration is positively correlated with the lipophilicityand negatively correlated with the other three properties Acomparison of the physicochemical properties of 48 CNSdrugs and 45 non-CNS suggested that compared to non-CNSdrugs CNS drugs tend to be more lipophilic and more rigidand have fewer hydrogen-bond donors fewer charges andlower PSA (lt80 A2) [74] which is in reasonable consistencywith our finding except that the molecular flexibility is notimportant in our model

There are some other properties utilized in some existingmodels such as molecular weight molecular shape andmolecular flexibility It is suggested by van de Waterbeemdet al [70] that molecular weight should be less than 450 forgood BBB penetration while Hou and Xu [22] suggested thatthe influence of molecular bulkiness would be obvious whenthe size of themolecule was larger than a threshold and foundthat molecular weight made a negative contribution to theBBB penetration when the molecular weight is greater 360This is not widely observed in other studies In Zhao et alrsquosstudy [23] molecular weight was found to be not importantcompared to hydrogen bond properties

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

BioMed Research International 7

minus2 minus1 0 1 2minus2

minus1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(a)

minus2 minus1minus2

minus1

0 1

0

1

Experiment logBB

Pred

icte

d lo

g BB

(b)

Figure 6 Prediction accuracy of the final model on training set (a) and test set (b)

on training set increased from 0829 to 0919 while theprediction accuracy on test set only slightly increased from0840 to 0870 accordingly Take all these into considerationthe 6-feature model (Table 1 Figure 5(a)) was chosen as ourfinal model (Figure 6) of which the prediction accuracy onboth test set and training set was similar (119903train

2minus119903test2= 011)

and high enough (gt082)Figure 5(b) showed the evolution of the prediction per-

formance of the model (MSE and 119903test2) In the first 100

generations the MSE decreased very fast followed with aplatform stage from about 100 to 650 generations Anotherdecrease occurred at about 650 generations and then theevolution became stable at about 850 generationsThemodelsdid not improve much in the last 150 generations which mayimply a convergence

A tabular presentation of relevant studies regarding theprediction of the blood-brain distribution is shown inTable 2These models were constructed by using different statisticallearning methods yielding different prediction capabilitywith 119877test

2 ranging from 05 to over 09 Generally regressionby SVM appears to be more robust than traditional linearapproaches such as PLS andMLR with respect to the nonlin-ear effects induced bymultiple potentially cooperative factorsgoverning the BBB permeability For example the SVMmodel by Golmohammadi et al [62] yielded the highest119877test

2

on a test set containing 110 molecules However it shouldbe noted that direct comparison with results from previousstudies is usually inappropriate because of differences in theirdatasets In this study a combination of both in vivo andin vitro data compiled by Abraham et al [51] was used fordeveloping BBB prediction model which is of high dataquality and covers large chemical diversity space In additionto the data source kernel parameter optimization and featureselection are two crucial factors influencing the predictionaccuracy of SVM models To reduce the computational costmost of the existing models addressed the feature selectionand parameter optimization procedures separately In thisstudy we used a GA scheme to perform the kernel parameteroptimization and feature selection simultaneously which ismore efficient at searching the optimal feature subset space

Abrahamrsquos model [51] is the best model that is currentlypublicly available A comparison of our models with Abra-hamrsquos models was shown in Table 2 In Table 2 the last 3rows are models in our study The same dataset was used inAbrahamrsquos research and this study but the data set was splitinto different training set and test set (our model traintest =26063 Abrahamrsquosmodel trainingtest = 164164) 7 variableswere used in Abrahamrsquos model compared to 6 in our finalmodel The 1199032 values for training set in Abrahamrsquos 164164model and 3280 model were 071 and 075 respectivelycompared with 083 for our model It has to be noted that thesize of our training set (260) is bigger than Abrahamrsquos (164)

Our model was also compared with grid method imple-mented with Python toolkit (gridpy) shipped with libsvm[61] for parameter optimization First since grid methodcannot be used to select feature subset all 326 features wereused to construct a BBB prediction model The predictionaccuracy of the training set was very high (119903train

2= 097)

but that of the test set was disappointing (119903test2= 055) Then

we used the same feature set as our final model (6 features)The result was slightly better but still too bad for test setprediction (119903train

2= 086 119903test

2= 058)

So compared with the grid-based method our GA-basedmethod could get better accuracy with fewer features whichsuggested that GA could get much better combination ofparameters and feature subset This was also observed inotherrsquos study [48]

32 Feature Analysis An examination of the descriptors usedin the model could provide an insight into the molecularproperties that are most relevant to BBB penetration Table 3showed the features used in the final 6-feature model andtheir meanings In order to explore the relative importanceand the underlying molecular properties of the descriptorsthe most frequently used features in all 50 6-feature modelswere analyzed Table 4 showed the top 10 most frequentlyused features Interestingly AbsCarboxy an indicator of theexistence of carboxylic acidwas themost significant propertySome features with similar meaning were also found to occurin the models such as PSA related features (M PSA 74

8 BioMed Research International

Table 3 Features used in the final model

Name MeaningM log119875 log119875 (Marvin)HA dependent HDSA-2 [Zefirovrsquos PC] H-bond donor surface area related (CODESSA)M PSA 74 PSA at pH 74 (Marvin)AbsCarboxy Carboxylic acid indicator (Abraham)HA dependent HDCA-2SQRT(TMSA) [Zefirovrsquos PC] H-bond donor charged area related (CODESSA)Average Complementary Information content (order 0) Topology descriptor (CODESSA)

Table 4 The most frequently used features for all 6-feature modelsa

Number Feature name Occurrence(50 models) Meaning

11 AbsCarboxy 36 Indicator for carboxylic aciddagger

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

14 H-acceptor surface areatotal molecularsurface area

101 Topographic electronic index (all bonds) Zefirovrsquos PC 12 Topological electronic index for allbonded pairs of atomsbDagger

8 M PSA 74 11 PSA at pH 74csect

267 ESP-HASA H-acceptors surface area Quantum-Chemical PC 10 H-acceptor surface area

5 delta log119863 9 log119863 (pH 65) minus log119863 (pH 74)deand

7 M PSA 70 9 PSA at pH 70sect

138 HA dependent HDCA-2 [Zefirovrsquos PC] 9 H-donors charged surface area

6 M PSA 65 8 PSA at pH 65sect

1 M log119875 7 log119875andaRows with the same symbol could be categorized into the same groupbTopological electronic index is a feature to characterize the distribution of molecular charge 119879 = sum119873119861

(119894lt119895)(|119902119894 minus 119902119895|119903

2119894119895) where 119902119894 is net charge on 119894th atom and

119903119894119895 is the distance between two bonded atomsc74 is the pH in bloodd65 is the pH in intestinee119863 is the ratio of the sum of the concentrations of all species of a compound in octanol to the sum of the concentrations of all species of the compound inwater For neutral compounds log119863 is equal to log119875

M PSA 70 andM PSA 65) andH-bond related descriptors(number 267 268 and 138) So we decided to put similardescriptors into the same group (Figure 7) to find the under-lying properties affecting BBB penetration

According to the associated molecular properties ofthe features those frequently used features are catego-rized into 5 groups AbsCarboxy (indicator of carboxylicacid) H-bonding (H-bonding ability including H-bonddonoracceptor related features) PSA (molecular polar sur-face area related features) lipophilicity (including M log119875and delta log119863) and molecular charge (including chargeand topological electronic index related features)

The following is observed

(1) Interestingly AbsCarboxy was also the most signif-icant feature which occurred 36 times in total 50models This may indicate that the carboxylic acidgroup plays an important role in the BBB penetrationwhich is consistent with the study of Abraham et al[51]

3633

28

1612

0

5

10

15

20

25

30

35

40

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 7 Top features for all 6-feature models (50 in all)

(2) H-bonding (H-bond donoracceptor) related surfacearea polar surface area log119875 related features and

BioMed Research International 9

Table 5 Most frequently used features for all top models (number of features range from 4 to 15)lowast

Number Descriptor name Occurrence(120 models) Meaning

11 AbsCarboxy 84 Indicator for carboxylic aciddagger

5 delta log119863 35 log119863 (pH 65) minus log119863 (pH 74)and

7 M PSA 70 32 PSA at pH 70sect

1 M log119875 28 log119875and

138 HA dependent HDCA-2 [Zefirovrsquos PC] 27 H-donors charged surface area

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

27 H-acceptor surface areatotal molecularsurface area

6 M PSA 65 25 PSA at pH 65sect

167 PPSA-1 Partial positive surface area [Quantum-Chemical PC] 25 Partial positive surface areasect

101 Topographic electronic index (all bonds) Zefirovrsquos PC 23 Topological electronic index for allbonded pairs of atomsDagger

8 M PSA 74 21 PSA at pH 74sectlowastRows with the same symbol could be categorized into the same group

84

54

103 157

63

23

0

20

40

60

80

100

120

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 8 The most frequently used features for all top models

topological electronic index related features are alsosignificant

In order to further confirm the previous finding all top10 models from features = 4 to 15 were further analyzedand the result was almost the same (Table 5 Figure 8) Thetop 10 most frequently used feature sets were very similarCompared to the 6-feature models the only difference wasthat H-bond related feature number 267 is replaced with PSArelated feature number 167

Again the descriptors were analyzed by group Thecomposites of the groups were almost the same Given thatthere are 4 descriptors in PSA group AbsCarboxy was alsothe most significant property followed by PSA lipophilicityand H-bonding having similar occurences then followedby molecular charge with relatively low frequencies Thehigh consistency suggested that these groups of features hadsignficant impact on the BBB penetration ability

PSA andH-bonding descriptors are highly relevant prop-erties PSA is the molecular areas contributed by polar atoms

(nitrogen sulphur oxygen and phosphorus) andmost of thetime these polar atoms can be H-bond acceptor or donorIf PSA and H-bonding were merged into one group (119899 =157) they will become the most significant property groupof features

33 Properties Relevant to BBB Penetration

331 Carboxylic Acid Group It was proposed by Abraham etal [51] that carboxylic acid group played an important role inBBB penetration While it was commonly believed that themost important molecular properties related to BBB pene-tration were H-bonding ability lipophilicity and molecularcharge [63 64] However our study confirmed Abrahamrsquosconclusion and showed that the importance of carboxylicacid group in BBB penetration could be underestimated

In the models by Abraham et al [51] the indicatorvariable of carboxylic acid group has the largest negativecoefficient indicating its importance in BBB penetrationand is consistent with observations in our model that theindicator of carboxylic acid group is the most frequentlyused descriptor Zhao et al [23] tried to classify compoundsinto BBB positive or BBB negative groups using H-bondingrelated descriptors and the indicator of carboxylic acid groupwas also found to be important in their model Furthermoreour results are consistent with the fact that basic moleculeshave a better BBB penetration than the acid molecules [10]

The carboxylic acid groupmay affect the BBB penetrationthroughmolecular charge interactions since inmost cases thecarboxylic acid group will exist in the ionized form carryinga negative chargeThe carboxylic acid group could also affectthe BBB penetration by formingH-bondwith BBB and henceweaken the BBB penetration abilities of molecules

Abraham et al suggested that the presence of carboxylicacid group which acted to hinder BBB penetration wasnot only simply due to the intrinsic hydrogen bonding andpolarity properties of neutral acids [51] There were some

10 BioMed Research International

other ways in which the carboxylic acid groups could affectthe BBB penetration such as acidic drugs which couldbind to albumin [65] the ionization of the carboxylic acidgroups which could increase the excess molar refraction andhydrogen bonding basicity and finally the carboxylic acidgroups which may be removed from brain by some effluxmechanism [66]

332 Polar Surface Area and H-Bonding Ability As pointedout in our previous analysis PSA and H-bonding abilityare actually two highly correlated properties If these twogroups are merged they will be the most significant groupof properties even more significant than the carboxylic acidindicator (Figure 8) Norinder and Haeberlein [6] concludedthat hydrogen bonding term is a cornerstone in BBB penetra-tion prediction In Zhao et alrsquos study [23] PSA AbsCarboxynumber of H-bonding donors and positively charged formfraction at pH 74 were all treated as H-bonding descriptorsand were found to be important in the final model

Furthermore almost all published models make use ofmolecular polar andorH-bonding ability related descriptorssuch as PSA [23 67 68] high-charged PSA [22] number ofhydrogen donors and acceptors [23 69] and hydrogen bondaciditybasicity [23] And in these models PSA andor H-bond ability are all negatively correlated with log119861119861 whichis in agreement with Abraham et alrsquos study [51] in which thecoefficients of the H-bond acidity and H-bond basicity areboth negative

After review of many previous works Norinder andHaeberlein [6] proposed that if the sum of number ofnitrogen and oxygen atoms (N + O) in a molecule was fiveor less it had a high chance of entering the brain As we allknow nitrogen and oxygen atoms have great impact on PSAandH-bonding Norinder and Haeberlein [6] also concludedthat BBB penetration could be increased by lowering theoverall hydrogen bonding ability of a compound such asby encouraging intramolecular hydrogen bonding After ananalysis of the CNS activity of 125 marketed drugs van deWaterbeemd et al [70] suggested that the upper limit for PSAin a molecule that is desired to penetrate the brain shouldbe around 90 A2 while Kelder et al [71] analyzed the PSAdistribution of 776 orally administered CNS drugs that havereached at least phase II studies and suggested that the upperlimit should be 60ndash70 A2

Having in mind that molecules mainly cross the BBBby passive diffusion we think it may be because moleculeswith strong H-bonding ability have a greater tendency toformH-bonds with the polar environment (the blood) henceweakening their ability to cross the BBB by passive diffusion

333 Lipophilicity Lipophilicity is another property widelyrecognized as being important in BBB penetration and mostof the current models utilize features related to lipophilicity[22 64 67 72] Lipophilicity was thought to be positivelycorrelated with log119861119861 that is increase the lipophilicity of amolecule will increase the BBB penetration of the moleculeNorinder andHaeberlein [6] also proposed that if log119875minus(N+O) gt 0 then log119861119861 was positive Van de Waterbeemd et

al [70] suggested that the log119863 of the molecule should bein [1 3] for good BBB penetration These observations areconsistent with the fact that the lipid bilayer is lipophilic innature and lipophilic molecules could cross the BBB and getinto the brain more easily than hydrophobic molecules

34 Molecular Charge From the viewpoint of computationalchemistry the distribution of molecular charge is a veryimportant property that affects the molecule propertiesgreatly It is the uncharged form that can pass the BBBby passive diffusion Fischer et al [73] have shown thatacid molecules with p119870a lt 4 and basic molecules withp119870a gt 10 could not cross the BBB by passive diffusionUnder physiological conditions (pH = 74) acid moleculeswith p119870a lt 4 and basic molecules with p119870a gt 10 will beionized completely and carry net charges As mentioned inour previous analysis the carboxylic acid group may affectthe BBB penetration through molecular charge interactionsMahar Doan et al [74] compared physicochemical propertiesof 93 CNS (119899 = 48) and non-CNS (119899 = 45) drugs and showedthat 0 of 48 CNS drugs have a negative charge and CNSdrugs tend to have less positive charge These are reasonablyconsistent with the study of Abraham et al [51] in which thecoefficient of the carboxylic acid indicator is negative

It has to be noted that all these molecular propertiesare not independent and they are related to each other Forexample the carboxylic acid group is related to both PSA andH-bonding ability for theOatom in the carboxylic acid groupis a polar atom and has a strong ability to form H-bondsthe carboxylic acid group which carries charge under mostconditions is also related to molecular charge The PSAH-bonding ability is also correlated tomolecular charge becausein many cases atoms could contribute to PSA or form H-bonds which could probably carry charges Lipophilicity isalso related to molecular charge

We can get a conclusion that the most important prop-erties for a molecule to penetrate BBB are carboxylic acidgroup PSAH-bonding ability lipophilicity and charge BBBpenetration is positively correlated with the lipophilicityand negatively correlated with the other three properties Acomparison of the physicochemical properties of 48 CNSdrugs and 45 non-CNS suggested that compared to non-CNSdrugs CNS drugs tend to be more lipophilic and more rigidand have fewer hydrogen-bond donors fewer charges andlower PSA (lt80 A2) [74] which is in reasonable consistencywith our finding except that the molecular flexibility is notimportant in our model

There are some other properties utilized in some existingmodels such as molecular weight molecular shape andmolecular flexibility It is suggested by van de Waterbeemdet al [70] that molecular weight should be less than 450 forgood BBB penetration while Hou and Xu [22] suggested thatthe influence of molecular bulkiness would be obvious whenthe size of themolecule was larger than a threshold and foundthat molecular weight made a negative contribution to theBBB penetration when the molecular weight is greater 360This is not widely observed in other studies In Zhao et alrsquosstudy [23] molecular weight was found to be not importantcompared to hydrogen bond properties

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

8 BioMed Research International

Table 3 Features used in the final model

Name MeaningM log119875 log119875 (Marvin)HA dependent HDSA-2 [Zefirovrsquos PC] H-bond donor surface area related (CODESSA)M PSA 74 PSA at pH 74 (Marvin)AbsCarboxy Carboxylic acid indicator (Abraham)HA dependent HDCA-2SQRT(TMSA) [Zefirovrsquos PC] H-bond donor charged area related (CODESSA)Average Complementary Information content (order 0) Topology descriptor (CODESSA)

Table 4 The most frequently used features for all 6-feature modelsa

Number Feature name Occurrence(50 models) Meaning

11 AbsCarboxy 36 Indicator for carboxylic aciddagger

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

14 H-acceptor surface areatotal molecularsurface area

101 Topographic electronic index (all bonds) Zefirovrsquos PC 12 Topological electronic index for allbonded pairs of atomsbDagger

8 M PSA 74 11 PSA at pH 74csect

267 ESP-HASA H-acceptors surface area Quantum-Chemical PC 10 H-acceptor surface area

5 delta log119863 9 log119863 (pH 65) minus log119863 (pH 74)deand

7 M PSA 70 9 PSA at pH 70sect

138 HA dependent HDCA-2 [Zefirovrsquos PC] 9 H-donors charged surface area

6 M PSA 65 8 PSA at pH 65sect

1 M log119875 7 log119875andaRows with the same symbol could be categorized into the same groupbTopological electronic index is a feature to characterize the distribution of molecular charge 119879 = sum119873119861

(119894lt119895)(|119902119894 minus 119902119895|119903

2119894119895) where 119902119894 is net charge on 119894th atom and

119903119894119895 is the distance between two bonded atomsc74 is the pH in bloodd65 is the pH in intestinee119863 is the ratio of the sum of the concentrations of all species of a compound in octanol to the sum of the concentrations of all species of the compound inwater For neutral compounds log119863 is equal to log119875

M PSA 70 andM PSA 65) andH-bond related descriptors(number 267 268 and 138) So we decided to put similardescriptors into the same group (Figure 7) to find the under-lying properties affecting BBB penetration

According to the associated molecular properties ofthe features those frequently used features are catego-rized into 5 groups AbsCarboxy (indicator of carboxylicacid) H-bonding (H-bonding ability including H-bonddonoracceptor related features) PSA (molecular polar sur-face area related features) lipophilicity (including M log119875and delta log119863) and molecular charge (including chargeand topological electronic index related features)

The following is observed

(1) Interestingly AbsCarboxy was also the most signif-icant feature which occurred 36 times in total 50models This may indicate that the carboxylic acidgroup plays an important role in the BBB penetrationwhich is consistent with the study of Abraham et al[51]

3633

28

1612

0

5

10

15

20

25

30

35

40

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 7 Top features for all 6-feature models (50 in all)

(2) H-bonding (H-bond donoracceptor) related surfacearea polar surface area log119875 related features and

BioMed Research International 9

Table 5 Most frequently used features for all top models (number of features range from 4 to 15)lowast

Number Descriptor name Occurrence(120 models) Meaning

11 AbsCarboxy 84 Indicator for carboxylic aciddagger

5 delta log119863 35 log119863 (pH 65) minus log119863 (pH 74)and

7 M PSA 70 32 PSA at pH 70sect

1 M log119875 28 log119875and

138 HA dependent HDCA-2 [Zefirovrsquos PC] 27 H-donors charged surface area

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

27 H-acceptor surface areatotal molecularsurface area

6 M PSA 65 25 PSA at pH 65sect

167 PPSA-1 Partial positive surface area [Quantum-Chemical PC] 25 Partial positive surface areasect

101 Topographic electronic index (all bonds) Zefirovrsquos PC 23 Topological electronic index for allbonded pairs of atomsDagger

8 M PSA 74 21 PSA at pH 74sectlowastRows with the same symbol could be categorized into the same group

84

54

103 157

63

23

0

20

40

60

80

100

120

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 8 The most frequently used features for all top models

topological electronic index related features are alsosignificant

In order to further confirm the previous finding all top10 models from features = 4 to 15 were further analyzedand the result was almost the same (Table 5 Figure 8) Thetop 10 most frequently used feature sets were very similarCompared to the 6-feature models the only difference wasthat H-bond related feature number 267 is replaced with PSArelated feature number 167

Again the descriptors were analyzed by group Thecomposites of the groups were almost the same Given thatthere are 4 descriptors in PSA group AbsCarboxy was alsothe most significant property followed by PSA lipophilicityand H-bonding having similar occurences then followedby molecular charge with relatively low frequencies Thehigh consistency suggested that these groups of features hadsignficant impact on the BBB penetration ability

PSA andH-bonding descriptors are highly relevant prop-erties PSA is the molecular areas contributed by polar atoms

(nitrogen sulphur oxygen and phosphorus) andmost of thetime these polar atoms can be H-bond acceptor or donorIf PSA and H-bonding were merged into one group (119899 =157) they will become the most significant property groupof features

33 Properties Relevant to BBB Penetration

331 Carboxylic Acid Group It was proposed by Abraham etal [51] that carboxylic acid group played an important role inBBB penetration While it was commonly believed that themost important molecular properties related to BBB pene-tration were H-bonding ability lipophilicity and molecularcharge [63 64] However our study confirmed Abrahamrsquosconclusion and showed that the importance of carboxylicacid group in BBB penetration could be underestimated

In the models by Abraham et al [51] the indicatorvariable of carboxylic acid group has the largest negativecoefficient indicating its importance in BBB penetrationand is consistent with observations in our model that theindicator of carboxylic acid group is the most frequentlyused descriptor Zhao et al [23] tried to classify compoundsinto BBB positive or BBB negative groups using H-bondingrelated descriptors and the indicator of carboxylic acid groupwas also found to be important in their model Furthermoreour results are consistent with the fact that basic moleculeshave a better BBB penetration than the acid molecules [10]

The carboxylic acid groupmay affect the BBB penetrationthroughmolecular charge interactions since inmost cases thecarboxylic acid group will exist in the ionized form carryinga negative chargeThe carboxylic acid group could also affectthe BBB penetration by formingH-bondwith BBB and henceweaken the BBB penetration abilities of molecules

Abraham et al suggested that the presence of carboxylicacid group which acted to hinder BBB penetration wasnot only simply due to the intrinsic hydrogen bonding andpolarity properties of neutral acids [51] There were some

10 BioMed Research International

other ways in which the carboxylic acid groups could affectthe BBB penetration such as acidic drugs which couldbind to albumin [65] the ionization of the carboxylic acidgroups which could increase the excess molar refraction andhydrogen bonding basicity and finally the carboxylic acidgroups which may be removed from brain by some effluxmechanism [66]

332 Polar Surface Area and H-Bonding Ability As pointedout in our previous analysis PSA and H-bonding abilityare actually two highly correlated properties If these twogroups are merged they will be the most significant groupof properties even more significant than the carboxylic acidindicator (Figure 8) Norinder and Haeberlein [6] concludedthat hydrogen bonding term is a cornerstone in BBB penetra-tion prediction In Zhao et alrsquos study [23] PSA AbsCarboxynumber of H-bonding donors and positively charged formfraction at pH 74 were all treated as H-bonding descriptorsand were found to be important in the final model

Furthermore almost all published models make use ofmolecular polar andorH-bonding ability related descriptorssuch as PSA [23 67 68] high-charged PSA [22] number ofhydrogen donors and acceptors [23 69] and hydrogen bondaciditybasicity [23] And in these models PSA andor H-bond ability are all negatively correlated with log119861119861 whichis in agreement with Abraham et alrsquos study [51] in which thecoefficients of the H-bond acidity and H-bond basicity areboth negative

After review of many previous works Norinder andHaeberlein [6] proposed that if the sum of number ofnitrogen and oxygen atoms (N + O) in a molecule was fiveor less it had a high chance of entering the brain As we allknow nitrogen and oxygen atoms have great impact on PSAandH-bonding Norinder and Haeberlein [6] also concludedthat BBB penetration could be increased by lowering theoverall hydrogen bonding ability of a compound such asby encouraging intramolecular hydrogen bonding After ananalysis of the CNS activity of 125 marketed drugs van deWaterbeemd et al [70] suggested that the upper limit for PSAin a molecule that is desired to penetrate the brain shouldbe around 90 A2 while Kelder et al [71] analyzed the PSAdistribution of 776 orally administered CNS drugs that havereached at least phase II studies and suggested that the upperlimit should be 60ndash70 A2

Having in mind that molecules mainly cross the BBBby passive diffusion we think it may be because moleculeswith strong H-bonding ability have a greater tendency toformH-bonds with the polar environment (the blood) henceweakening their ability to cross the BBB by passive diffusion

333 Lipophilicity Lipophilicity is another property widelyrecognized as being important in BBB penetration and mostof the current models utilize features related to lipophilicity[22 64 67 72] Lipophilicity was thought to be positivelycorrelated with log119861119861 that is increase the lipophilicity of amolecule will increase the BBB penetration of the moleculeNorinder andHaeberlein [6] also proposed that if log119875minus(N+O) gt 0 then log119861119861 was positive Van de Waterbeemd et

al [70] suggested that the log119863 of the molecule should bein [1 3] for good BBB penetration These observations areconsistent with the fact that the lipid bilayer is lipophilic innature and lipophilic molecules could cross the BBB and getinto the brain more easily than hydrophobic molecules

34 Molecular Charge From the viewpoint of computationalchemistry the distribution of molecular charge is a veryimportant property that affects the molecule propertiesgreatly It is the uncharged form that can pass the BBBby passive diffusion Fischer et al [73] have shown thatacid molecules with p119870a lt 4 and basic molecules withp119870a gt 10 could not cross the BBB by passive diffusionUnder physiological conditions (pH = 74) acid moleculeswith p119870a lt 4 and basic molecules with p119870a gt 10 will beionized completely and carry net charges As mentioned inour previous analysis the carboxylic acid group may affectthe BBB penetration through molecular charge interactionsMahar Doan et al [74] compared physicochemical propertiesof 93 CNS (119899 = 48) and non-CNS (119899 = 45) drugs and showedthat 0 of 48 CNS drugs have a negative charge and CNSdrugs tend to have less positive charge These are reasonablyconsistent with the study of Abraham et al [51] in which thecoefficient of the carboxylic acid indicator is negative

It has to be noted that all these molecular propertiesare not independent and they are related to each other Forexample the carboxylic acid group is related to both PSA andH-bonding ability for theOatom in the carboxylic acid groupis a polar atom and has a strong ability to form H-bondsthe carboxylic acid group which carries charge under mostconditions is also related to molecular charge The PSAH-bonding ability is also correlated tomolecular charge becausein many cases atoms could contribute to PSA or form H-bonds which could probably carry charges Lipophilicity isalso related to molecular charge

We can get a conclusion that the most important prop-erties for a molecule to penetrate BBB are carboxylic acidgroup PSAH-bonding ability lipophilicity and charge BBBpenetration is positively correlated with the lipophilicityand negatively correlated with the other three properties Acomparison of the physicochemical properties of 48 CNSdrugs and 45 non-CNS suggested that compared to non-CNSdrugs CNS drugs tend to be more lipophilic and more rigidand have fewer hydrogen-bond donors fewer charges andlower PSA (lt80 A2) [74] which is in reasonable consistencywith our finding except that the molecular flexibility is notimportant in our model

There are some other properties utilized in some existingmodels such as molecular weight molecular shape andmolecular flexibility It is suggested by van de Waterbeemdet al [70] that molecular weight should be less than 450 forgood BBB penetration while Hou and Xu [22] suggested thatthe influence of molecular bulkiness would be obvious whenthe size of themolecule was larger than a threshold and foundthat molecular weight made a negative contribution to theBBB penetration when the molecular weight is greater 360This is not widely observed in other studies In Zhao et alrsquosstudy [23] molecular weight was found to be not importantcompared to hydrogen bond properties

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

BioMed Research International 9

Table 5 Most frequently used features for all top models (number of features range from 4 to 15)lowast

Number Descriptor name Occurrence(120 models) Meaning

11 AbsCarboxy 84 Indicator for carboxylic aciddagger

5 delta log119863 35 log119863 (pH 65) minus log119863 (pH 74)and

7 M PSA 70 32 PSA at pH 70sect

1 M log119875 28 log119875and

138 HA dependent HDCA-2 [Zefirovrsquos PC] 27 H-donors charged surface area

268ESP-FHASA Fractional HASA (HASATMSA) Quantum-Chemical PC

27 H-acceptor surface areatotal molecularsurface area

6 M PSA 65 25 PSA at pH 65sect

167 PPSA-1 Partial positive surface area [Quantum-Chemical PC] 25 Partial positive surface areasect

101 Topographic electronic index (all bonds) Zefirovrsquos PC 23 Topological electronic index for allbonded pairs of atomsDagger

8 M PSA 74 21 PSA at pH 74sectlowastRows with the same symbol could be categorized into the same group

84

54

103 157

63

23

0

20

40

60

80

100

120

AbsCarboxy H-bonding PSA Lipophilicity Molecularcharge

Occ

urre

nce

Feature groups

Figure 8 The most frequently used features for all top models

topological electronic index related features are alsosignificant

In order to further confirm the previous finding all top10 models from features = 4 to 15 were further analyzedand the result was almost the same (Table 5 Figure 8) Thetop 10 most frequently used feature sets were very similarCompared to the 6-feature models the only difference wasthat H-bond related feature number 267 is replaced with PSArelated feature number 167

Again the descriptors were analyzed by group Thecomposites of the groups were almost the same Given thatthere are 4 descriptors in PSA group AbsCarboxy was alsothe most significant property followed by PSA lipophilicityand H-bonding having similar occurences then followedby molecular charge with relatively low frequencies Thehigh consistency suggested that these groups of features hadsignficant impact on the BBB penetration ability

PSA andH-bonding descriptors are highly relevant prop-erties PSA is the molecular areas contributed by polar atoms

(nitrogen sulphur oxygen and phosphorus) andmost of thetime these polar atoms can be H-bond acceptor or donorIf PSA and H-bonding were merged into one group (119899 =157) they will become the most significant property groupof features

33 Properties Relevant to BBB Penetration

331 Carboxylic Acid Group It was proposed by Abraham etal [51] that carboxylic acid group played an important role inBBB penetration While it was commonly believed that themost important molecular properties related to BBB pene-tration were H-bonding ability lipophilicity and molecularcharge [63 64] However our study confirmed Abrahamrsquosconclusion and showed that the importance of carboxylicacid group in BBB penetration could be underestimated

In the models by Abraham et al [51] the indicatorvariable of carboxylic acid group has the largest negativecoefficient indicating its importance in BBB penetrationand is consistent with observations in our model that theindicator of carboxylic acid group is the most frequentlyused descriptor Zhao et al [23] tried to classify compoundsinto BBB positive or BBB negative groups using H-bondingrelated descriptors and the indicator of carboxylic acid groupwas also found to be important in their model Furthermoreour results are consistent with the fact that basic moleculeshave a better BBB penetration than the acid molecules [10]

The carboxylic acid groupmay affect the BBB penetrationthroughmolecular charge interactions since inmost cases thecarboxylic acid group will exist in the ionized form carryinga negative chargeThe carboxylic acid group could also affectthe BBB penetration by formingH-bondwith BBB and henceweaken the BBB penetration abilities of molecules

Abraham et al suggested that the presence of carboxylicacid group which acted to hinder BBB penetration wasnot only simply due to the intrinsic hydrogen bonding andpolarity properties of neutral acids [51] There were some

10 BioMed Research International

other ways in which the carboxylic acid groups could affectthe BBB penetration such as acidic drugs which couldbind to albumin [65] the ionization of the carboxylic acidgroups which could increase the excess molar refraction andhydrogen bonding basicity and finally the carboxylic acidgroups which may be removed from brain by some effluxmechanism [66]

332 Polar Surface Area and H-Bonding Ability As pointedout in our previous analysis PSA and H-bonding abilityare actually two highly correlated properties If these twogroups are merged they will be the most significant groupof properties even more significant than the carboxylic acidindicator (Figure 8) Norinder and Haeberlein [6] concludedthat hydrogen bonding term is a cornerstone in BBB penetra-tion prediction In Zhao et alrsquos study [23] PSA AbsCarboxynumber of H-bonding donors and positively charged formfraction at pH 74 were all treated as H-bonding descriptorsand were found to be important in the final model

Furthermore almost all published models make use ofmolecular polar andorH-bonding ability related descriptorssuch as PSA [23 67 68] high-charged PSA [22] number ofhydrogen donors and acceptors [23 69] and hydrogen bondaciditybasicity [23] And in these models PSA andor H-bond ability are all negatively correlated with log119861119861 whichis in agreement with Abraham et alrsquos study [51] in which thecoefficients of the H-bond acidity and H-bond basicity areboth negative

After review of many previous works Norinder andHaeberlein [6] proposed that if the sum of number ofnitrogen and oxygen atoms (N + O) in a molecule was fiveor less it had a high chance of entering the brain As we allknow nitrogen and oxygen atoms have great impact on PSAandH-bonding Norinder and Haeberlein [6] also concludedthat BBB penetration could be increased by lowering theoverall hydrogen bonding ability of a compound such asby encouraging intramolecular hydrogen bonding After ananalysis of the CNS activity of 125 marketed drugs van deWaterbeemd et al [70] suggested that the upper limit for PSAin a molecule that is desired to penetrate the brain shouldbe around 90 A2 while Kelder et al [71] analyzed the PSAdistribution of 776 orally administered CNS drugs that havereached at least phase II studies and suggested that the upperlimit should be 60ndash70 A2

Having in mind that molecules mainly cross the BBBby passive diffusion we think it may be because moleculeswith strong H-bonding ability have a greater tendency toformH-bonds with the polar environment (the blood) henceweakening their ability to cross the BBB by passive diffusion

333 Lipophilicity Lipophilicity is another property widelyrecognized as being important in BBB penetration and mostof the current models utilize features related to lipophilicity[22 64 67 72] Lipophilicity was thought to be positivelycorrelated with log119861119861 that is increase the lipophilicity of amolecule will increase the BBB penetration of the moleculeNorinder andHaeberlein [6] also proposed that if log119875minus(N+O) gt 0 then log119861119861 was positive Van de Waterbeemd et

al [70] suggested that the log119863 of the molecule should bein [1 3] for good BBB penetration These observations areconsistent with the fact that the lipid bilayer is lipophilic innature and lipophilic molecules could cross the BBB and getinto the brain more easily than hydrophobic molecules

34 Molecular Charge From the viewpoint of computationalchemistry the distribution of molecular charge is a veryimportant property that affects the molecule propertiesgreatly It is the uncharged form that can pass the BBBby passive diffusion Fischer et al [73] have shown thatacid molecules with p119870a lt 4 and basic molecules withp119870a gt 10 could not cross the BBB by passive diffusionUnder physiological conditions (pH = 74) acid moleculeswith p119870a lt 4 and basic molecules with p119870a gt 10 will beionized completely and carry net charges As mentioned inour previous analysis the carboxylic acid group may affectthe BBB penetration through molecular charge interactionsMahar Doan et al [74] compared physicochemical propertiesof 93 CNS (119899 = 48) and non-CNS (119899 = 45) drugs and showedthat 0 of 48 CNS drugs have a negative charge and CNSdrugs tend to have less positive charge These are reasonablyconsistent with the study of Abraham et al [51] in which thecoefficient of the carboxylic acid indicator is negative

It has to be noted that all these molecular propertiesare not independent and they are related to each other Forexample the carboxylic acid group is related to both PSA andH-bonding ability for theOatom in the carboxylic acid groupis a polar atom and has a strong ability to form H-bondsthe carboxylic acid group which carries charge under mostconditions is also related to molecular charge The PSAH-bonding ability is also correlated tomolecular charge becausein many cases atoms could contribute to PSA or form H-bonds which could probably carry charges Lipophilicity isalso related to molecular charge

We can get a conclusion that the most important prop-erties for a molecule to penetrate BBB are carboxylic acidgroup PSAH-bonding ability lipophilicity and charge BBBpenetration is positively correlated with the lipophilicityand negatively correlated with the other three properties Acomparison of the physicochemical properties of 48 CNSdrugs and 45 non-CNS suggested that compared to non-CNSdrugs CNS drugs tend to be more lipophilic and more rigidand have fewer hydrogen-bond donors fewer charges andlower PSA (lt80 A2) [74] which is in reasonable consistencywith our finding except that the molecular flexibility is notimportant in our model

There are some other properties utilized in some existingmodels such as molecular weight molecular shape andmolecular flexibility It is suggested by van de Waterbeemdet al [70] that molecular weight should be less than 450 forgood BBB penetration while Hou and Xu [22] suggested thatthe influence of molecular bulkiness would be obvious whenthe size of themolecule was larger than a threshold and foundthat molecular weight made a negative contribution to theBBB penetration when the molecular weight is greater 360This is not widely observed in other studies In Zhao et alrsquosstudy [23] molecular weight was found to be not importantcompared to hydrogen bond properties

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

10 BioMed Research International

other ways in which the carboxylic acid groups could affectthe BBB penetration such as acidic drugs which couldbind to albumin [65] the ionization of the carboxylic acidgroups which could increase the excess molar refraction andhydrogen bonding basicity and finally the carboxylic acidgroups which may be removed from brain by some effluxmechanism [66]

332 Polar Surface Area and H-Bonding Ability As pointedout in our previous analysis PSA and H-bonding abilityare actually two highly correlated properties If these twogroups are merged they will be the most significant groupof properties even more significant than the carboxylic acidindicator (Figure 8) Norinder and Haeberlein [6] concludedthat hydrogen bonding term is a cornerstone in BBB penetra-tion prediction In Zhao et alrsquos study [23] PSA AbsCarboxynumber of H-bonding donors and positively charged formfraction at pH 74 were all treated as H-bonding descriptorsand were found to be important in the final model

Furthermore almost all published models make use ofmolecular polar andorH-bonding ability related descriptorssuch as PSA [23 67 68] high-charged PSA [22] number ofhydrogen donors and acceptors [23 69] and hydrogen bondaciditybasicity [23] And in these models PSA andor H-bond ability are all negatively correlated with log119861119861 whichis in agreement with Abraham et alrsquos study [51] in which thecoefficients of the H-bond acidity and H-bond basicity areboth negative

After review of many previous works Norinder andHaeberlein [6] proposed that if the sum of number ofnitrogen and oxygen atoms (N + O) in a molecule was fiveor less it had a high chance of entering the brain As we allknow nitrogen and oxygen atoms have great impact on PSAandH-bonding Norinder and Haeberlein [6] also concludedthat BBB penetration could be increased by lowering theoverall hydrogen bonding ability of a compound such asby encouraging intramolecular hydrogen bonding After ananalysis of the CNS activity of 125 marketed drugs van deWaterbeemd et al [70] suggested that the upper limit for PSAin a molecule that is desired to penetrate the brain shouldbe around 90 A2 while Kelder et al [71] analyzed the PSAdistribution of 776 orally administered CNS drugs that havereached at least phase II studies and suggested that the upperlimit should be 60ndash70 A2

Having in mind that molecules mainly cross the BBBby passive diffusion we think it may be because moleculeswith strong H-bonding ability have a greater tendency toformH-bonds with the polar environment (the blood) henceweakening their ability to cross the BBB by passive diffusion

333 Lipophilicity Lipophilicity is another property widelyrecognized as being important in BBB penetration and mostof the current models utilize features related to lipophilicity[22 64 67 72] Lipophilicity was thought to be positivelycorrelated with log119861119861 that is increase the lipophilicity of amolecule will increase the BBB penetration of the moleculeNorinder andHaeberlein [6] also proposed that if log119875minus(N+O) gt 0 then log119861119861 was positive Van de Waterbeemd et

al [70] suggested that the log119863 of the molecule should bein [1 3] for good BBB penetration These observations areconsistent with the fact that the lipid bilayer is lipophilic innature and lipophilic molecules could cross the BBB and getinto the brain more easily than hydrophobic molecules

34 Molecular Charge From the viewpoint of computationalchemistry the distribution of molecular charge is a veryimportant property that affects the molecule propertiesgreatly It is the uncharged form that can pass the BBBby passive diffusion Fischer et al [73] have shown thatacid molecules with p119870a lt 4 and basic molecules withp119870a gt 10 could not cross the BBB by passive diffusionUnder physiological conditions (pH = 74) acid moleculeswith p119870a lt 4 and basic molecules with p119870a gt 10 will beionized completely and carry net charges As mentioned inour previous analysis the carboxylic acid group may affectthe BBB penetration through molecular charge interactionsMahar Doan et al [74] compared physicochemical propertiesof 93 CNS (119899 = 48) and non-CNS (119899 = 45) drugs and showedthat 0 of 48 CNS drugs have a negative charge and CNSdrugs tend to have less positive charge These are reasonablyconsistent with the study of Abraham et al [51] in which thecoefficient of the carboxylic acid indicator is negative

It has to be noted that all these molecular propertiesare not independent and they are related to each other Forexample the carboxylic acid group is related to both PSA andH-bonding ability for theOatom in the carboxylic acid groupis a polar atom and has a strong ability to form H-bondsthe carboxylic acid group which carries charge under mostconditions is also related to molecular charge The PSAH-bonding ability is also correlated tomolecular charge becausein many cases atoms could contribute to PSA or form H-bonds which could probably carry charges Lipophilicity isalso related to molecular charge

We can get a conclusion that the most important prop-erties for a molecule to penetrate BBB are carboxylic acidgroup PSAH-bonding ability lipophilicity and charge BBBpenetration is positively correlated with the lipophilicityand negatively correlated with the other three properties Acomparison of the physicochemical properties of 48 CNSdrugs and 45 non-CNS suggested that compared to non-CNSdrugs CNS drugs tend to be more lipophilic and more rigidand have fewer hydrogen-bond donors fewer charges andlower PSA (lt80 A2) [74] which is in reasonable consistencywith our finding except that the molecular flexibility is notimportant in our model

There are some other properties utilized in some existingmodels such as molecular weight molecular shape andmolecular flexibility It is suggested by van de Waterbeemdet al [70] that molecular weight should be less than 450 forgood BBB penetration while Hou and Xu [22] suggested thatthe influence of molecular bulkiness would be obvious whenthe size of themolecule was larger than a threshold and foundthat molecular weight made a negative contribution to theBBB penetration when the molecular weight is greater 360This is not widely observed in other studies In Zhao et alrsquosstudy [23] molecular weight was found to be not importantcompared to hydrogen bond properties

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

BioMed Research International 11

Lobell et al [68] proposed that spherical shapes have asmall advantage compared with rod-like shapes with regardto BBB penetration they attributed this to the membranesthat are largely made from rod-shaped molecules and rod-like shapemay becomemore easily trappedwithinmembranewithout exiting into the brain compartment However inRose et alrsquos model [75] based on electrotopological statedescriptors showed that BBB penetration increased with lesssketch branching Crivori et al [76] tried to correlate descrip-tors derived from 3D molecular fields and BBB penetrationand concluded that the size and shape descriptors had nomarked impact on BBB penetration

Iyer et al [67] found that increasing the solute conforma-tional flexibility would increase log119861119861 while in the study ofMahar Doan et al [74] CNS drugs tend to be more rigidHowever the roles of molecular weight molecular shapeand molecular flexibility in BBB penetration seem to be stillunclear and not well received Further studies are still needed

4 Conclusion

In this study we have developed a GASVM model forthe BBB penetration prediction which utilized GA to dokernel parameters optimization and feature selection simul-taneously for SVM regression The results showed that ourmethod could get better performance than addressing thetwo problems separately The same GASVM method can beextended to be used on other QSAR modeling applications

In addition the most important properties (carboxylicacid group PSAH-bond ability lipophilicity and molecu-lar charge) governing the BBB penetration were illustratedthrough analyzing the SVM model The carboxylic acidgroup and PSAH-bond ability have the strongest effect Theexistence of carboxylic acid group (AbsCarboxy) PSAH-bonding and molecular charge is all negatively correlatedwith BBB penetration ability while the lipophilicity enhancesthe BBB penetration ability

The BBB penetration is a highly complex process andis a result of many cooperative effects In order to clarifythe factors that affect the BBB penetration further effortsare needed to investigate the mechanistic nature of the BBBand as pointed out by Goodwin and Clark [7] the mostfundamental need is for more high quality data both in vivoand in vitro upon which the next generation of predictivemodel can be built

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Daqing Zhang Jianfeng Xiao and Nannan Zhou contributedequally to this work

Acknowledgments

The authors gratefully acknowledge financial supportfrom the Hi-Tech Research and Development Program ofChina (Grant 2014AA01A302 to Mingyue Zheng and Grant2012AA020308 to Xiaomin Luo) National Natural ScienceFoundation of China (Grants 21210003 and 81230076 toHualiang Jiang and Grant 81430084 to Kaixian Chen) andNational SampT Major Project (Grant 2012ZX09301-001-002 toXiaomin Luo)

References

[1] K Lanevskij J Dapkunas L Juska P Japertas and R Didzi-apetris ldquoQSAR analysis of blood-brain distribution the influ-ence of plasma and brain tissue bindingrdquo Journal of Pharma-ceutical Sciences vol 100 no 6 pp 2147ndash2160 2011

[2] K Lanevskij P Japertas and R Didziapetris ldquoImproving theprediction of drug disposition in the brainrdquo Expert Opinion onDrug Metabolism amp Toxicology vol 9 no 4 pp 473ndash486 2013

[3] A R Mehdipour and M Hamidi ldquoBrain drug targeting acomputational approach for overcoming blood-brain barrierrdquoDrug Discovery Today vol 14 no 21-22 pp 1030ndash1036 2009

[4] J Bicker G Alves A Fortuna and A Falcao ldquoBlood-brainbarrier models and their relevance for a successful developmentof CNS drug delivery systems a reviewrdquo European Journal ofPharmaceutics andBiopharmaceutics vol 87 no 3 pp 409ndash4322014

[5] K Nagpal S K Singh and D N Mishra ldquoDrug targeting tobrain a systematic approach to study the factors parametersand approaches for prediction of permeability of drugs acrossBBBrdquo Expert Opinion on Drug Delivery vol 10 no 7 pp 927ndash955 2013

[6] U Norinder andM Haeberlein ldquoComputational approaches tothe prediction of the blood-brain distributionrdquo Advanced DrugDelivery Reviews vol 54 no 3 pp 291ndash313 2002

[7] J T Goodwin and D E Clark ldquoIn silico predictions of blood-brain barrier penetration considerations to lsquokeep in mindrsquordquoJournal of Pharmacology and Experimental Therapeutics vol315 no 2 pp 477ndash483 2005

[8] J A Nicolazzo S A Charman and W N Charman ldquoMethodsto assess drug permeability across the blood-brain barrierrdquoJournal of Pharmacy and Pharmacology vol 58 no 3 pp 281ndash293 2006

[9] L Di E H Kerns and G T Carter ldquoStrategies to assess blood-brain barrier penetrationrdquo Expert Opinion on Drug Discoveryvol 3 no 6 pp 677ndash687 2008

[10] Y Fan R Unwalla R A Denny et al ldquoInsights for predictingblood-brain barrier penetration of CNS targeted moleculesusing QSPR approachesrdquo Journal of Chemical Information andModeling vol 50 no 6 pp 1123ndash1133 2010

[11] M H Abraham ldquoThe factors that influence permeation acrossthe blood-brain barrierrdquo European Journal of Medicinal Chem-istry vol 39 no 3 pp 235ndash240 2004

[12] I F Martins A L Teixeira L Pinheiro and A O Falcao ldquoAbayesian approach to in silico blood-brain barrier penetrationmodelingrdquo Journal of Chemical Information and Modeling vol52 no 6 pp 1686ndash1697 2012

[13] M Adenot and R Lahana ldquoBlood-brain barrier permeationmodels discriminating between potential CNS and non-CNSdrugs including P-glycoprotein substratesrdquo Journal of Chemical

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

12 BioMed Research International

Information and Computer Sciences vol 44 no 1 pp 239ndash2482004

[14] M Muehlbacher G M Spitzer K R Liedl and J KornhuberldquoQualitative prediction of blood-brain barrier permeability on alarge and refined datasetrdquo Journal of Computer-AidedMolecularDesign vol 25 no 12 pp 1095ndash1106 2011

[15] A R Katritzky M Kuanar S Slavov et al ldquoCorrelation ofblood-brain penetration using structural descriptorsrdquo Bioor-ganic amp Medicinal Chemistry vol 14 no 14 pp 4888ndash49172006

[16] O Obrezanova J M R Gola E J Champness and MD Segall ldquoAutomatic QSAR modeling of ADME propertiesblood-brain barrier penetration and aqueous solubilityrdquo Journalof Computer-Aided Molecular Design vol 22 no 6-7 pp 431ndash440 2008

[17] L Zhang H Zhu T I Oprea A Golbraikh and A TropshaldquoQSAR modeling of the bloodndashbrain barrier permeability fordiverse organic compoundsrdquo Pharmaceutical Research vol 25no 8 pp 1902ndash1914 2008

[18] S van Damme W Langenaeker and P Bultinck ldquoPrediction ofblood-brain partitioning a model based on ab initio calculatedquantum chemical descriptorsrdquo Journal of Molecular Graphicsand Modelling vol 26 no 8 pp 1223ndash1236 2008

[19] D A Konovalov D Coomans E Deconinck and Y V HeydenldquoBenchmarking of QSAR models for blood-brain barrier per-meationrdquo Journal of Chemical Information and Modeling vol47 no 4 pp 1648ndash1656 2007

[20] T S Carpenter D A Kirshner E Y Lau S E WongJ P Nilmeier and F C Lightstone ldquoA method to predictblood-brain barrier permeability of drug-like compounds usingmolecular dynamics simulationsrdquo Biophysical Journal vol 107no 3 pp 630ndash641 2014

[21] J Zah G TerrersquoBlanche E Erasmus and S F Malan ldquoPhysic-ochemical prediction of a brain-blood distribution profile inpolycyclic aminesrdquo Bioorganic and Medicinal Chemistry vol 11no 17 pp 3569ndash3578 2003

[22] T J Hou and X J Xu ldquoADME evaluation in drug discovery3 Modeling blood-brain barrier partitioning using simplemolecular descriptorsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 6 pp 2137ndash2152 2003

[23] Y H Zhao M H Abraham A Ibrahim et al ldquoPredicting pen-etration across the blood-brain barrier from simple descriptorsand fragmentation schemesrdquo Journal of Chemical Informationand Modeling vol 47 no 1 pp 170ndash175 2007

[24] S R Mente and F Lombardo ldquoA recursive-partitioning modelfor blood-brain barrier permeationrdquo Journal of Computer-AidedMolecular Design vol 19 no 7 pp 465ndash481 2005

[25] P Garg and J Verma ldquoIn silico prediction of blood brain barrierpermeability an artificial neural network modelrdquo Journal ofChemical Information and Modeling vol 46 no 1 pp 289ndash2972006

[26] A Guerra J A Paez and N E Campillo ldquoArtificial neuralnetworks in ADMET modeling prediction of blood-brainbarrier permeationrdquoQSARampCombinatorial Science vol 27 no5 pp 586ndash594 2008

[27] Z Wang A Yan and Q Yuan ldquoClassification of blood-brain barrier permeation by Kohonenrsquos self-organizing neuralnetwork (KohNN) and support vector machine (SVM)rdquo QSARamp Combinatorial Science vol 28 no 9 pp 989ndash994 2009

[28] A Yan H Liang Y Chong X Nie and C Yu ldquoIn-silicoprediction of blood-brain barrier permeabilityrdquo SAR and QSARin Environmental Research vol 24 no 1 pp 61ndash74 2013

[29] J Shen F Cheng Y Xu W Li and Y Tang ldquoEstimationof ADME properties with substructure pattern recognitionrdquoJournal of Chemical Information and Modeling vol 50 no 6pp 1034ndash1041 2010

[30] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structure-activity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[31] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[32] K Heikamp and J Bajorath ldquoSupport vector machines for drugdiscoveryrdquo Expert Opinion on Drug Discovery vol 9 no 1 pp93ndash104 2014

[33] C W Hsu C C Chang and C J Lin A Practical Guide toSupport Vector Classication National Taiwan University TaipeiTaiwan 2006

[34] OChapelle VVapnikO Bousquet and SMukherjee ldquoChoos-ing multiple parameters for support vector machinesrdquoMachineLearning vol 46 no 1ndash3 pp 131ndash159 2002

[35] K-M Chung W-C Kao C-L Sun L-L Wang and C-J LinldquoRadius margin bounds for support vector machines with theRBF kernelrdquoNeural Computation vol 15 no 11 pp 2643ndash26812003

[36] F Friedrichs and C Igel ldquoEvolutionary tuning of multiple SVMparametersrdquoNeurocomputing vol 64 no 1ndash4 pp 107ndash117 2005

[37] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[38] M L Raymer W F Punch E D Goodman L A Kuhn andAK Jain ldquoDimensionality reduction using genetic algorithmsrdquoIEEE Transactions on Evolutionary Computation vol 4 no 2pp 164ndash171 2000

[39] S Salcedo-Sanz M Prado-Cumplido F Perez-Cruz and CBousono-Calzon ldquoFeature selection via genetic optimizationrdquoin Artificial Neural NetworksmdashIcann 2002 vol 2415 pp 547ndash552 Springer 2002

[40] ZQWang andDX Zhang ldquoFeature selection in text classifica-tion via SVM and LSIrdquo in Advances in Neural NetworksmdashISNN2006 vol 3971 of Lecture Notes in Computer Science part 1 pp1381ndash1386 Springer Berlin Germany 2006

[41] M Fernandez J Caballero L Fernandez andA Sarai ldquoGeneticalgorithm optimization in drug design QSAR bayesian-regularized genetic neural networks (BRGNN) and geneticalgorithm-optimized support vectors machines (GA-SVM)rdquoMolecular Diversity vol 15 no 1 pp 269ndash289 2011

[42] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[43] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[44] K Z Mao ldquoFeature subset selection for support vectormachines through discriminative function pruning analysisrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 34 no 1 pp 60ndash67 2004

[45] K-Q Shen C-J Ong X-P Li and E P V Wilder-SmithldquoFeature selection via sensitivity analysis of SVM probabilisticoutputsrdquoMachine Learning vol 70 no 1 pp 1ndash20 2008

[46] Y-W Chen and C-J Lin ldquoCombining SVMs with variousfeature selection strategiesrdquo in Feature Extraction vol 207 of

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 13: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

BioMed Research International 13

Studies in Fuzziness and Soft Computing pp 315ndash324 SpringerBerlin Germany 2006

[47] J Weston S Mukherjee O Chapelle M Pontil and V VapnikldquoFeature selection for SVMsrdquo inAdvances in Neural InformationProcessing Systems 13 pp 668ndash674 MIT Press 2000

[48] C-L Huang and C-J Wang ldquoA GA-based feature selection andparameters optimizationfor support vector machinesrdquo ExpertSystems with Applications vol 31 no 2 pp 231ndash240 2006

[49] X R Zhang and L C Jiao ldquoSimultaneous feature selectionand parameters optimization for SVM by immune clonalalgorithmrdquo in Advances in Natural Computation vol 3611of Lecture Notes in Computer Science pp 905ndash912 SpringerBerlin Germany 2005

[50] C Gold A Holub and P Sollich ldquoBayesian approach to featureselection and parameter tuning for support vector machineclassifiersrdquo Neural Networks vol 18 no 5-6 pp 693ndash701 2005

[51] MHAbrahamA Ibrahim Y Zhao andWEAgree Jr ldquoA database for partition of volatile organic compounds anddrugs frombloodplasmaserum to brain and anLFER analysis of the datardquoJournal of Pharmaceutical Sciences vol 95 no 10 pp 2091ndash21002006

[52] A R Katritzky V S Lobanov and M Karelson CODESSAReference Manual University of Florida 1996

[53] Marvin 405 ChemAxon 2006 httpwwwchemaxoncom[54] AMPAC 8 1992ndash2004 Semichem Shawnee Kan USA[55] I Guyon and A Elisseeff ldquoAn introduction to variable and

feature selectionrdquoTheJournal ofMachine LearningResearch vol3 pp 1157ndash1182 2003

[56] R W Kennard and L A Stone ldquoComputer aided design ofexperimentsrdquo Technometrics vol 11 no 1 pp 137ndash148 1969

[57] MDaszykowski BWalczak andD LMassart ldquoRepresentativesubset selectionrdquoAnalytica Chimica Acta vol 468 no 1 pp 91ndash103 2002

[58] R Collobert and S Bengio ldquoSVMTorch support vectormachines for large-scale regression problemsrdquo Journal ofMachine Learning Research vol 1 no 2 pp 143ndash160 2001

[59] A J Smola and B Scholkopf ldquoA tutorial on support vectorregressionrdquo Statistics and Computing vol 14 no 3 pp 199ndash2222004

[60] R G Brereton and G R Lloyd ldquoSupport Vector Machines forclassification and regressionrdquo Analyst vol 135 no 2 pp 230ndash267 2010

[61] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001

[62] H Golmohammadi Z Dashtbozorgi and W E Acree JrldquoQuantitative structurendashactivity relationship prediction ofblood-to-brain partitioning behavior using support vectormachinerdquo European Journal of Pharmaceutical Sciences vol 47no 2 pp 421ndash429 2012

[63] D E Clark ldquoIn silico prediction of blood-brain barrier perme-ationrdquo Drug Discovery Today vol 8 no 20 pp 927ndash933 2003

[64] S Vilar M Chakrabarti and S Costanzi ldquoPrediction ofpassive blood-brain partitioning straightforward and effectiveclassificationmodels based on in silico derived physicochemicaldescriptorsrdquo Journal of Molecular Graphics and Modelling vol28 no 8 pp 899ndash903 2010

[65] FHerve S Urien E Albengres J-CDuche and J-P TillementldquoDrug binding in plasma A summary of recent trends in thestudy of drug and hormone bindingrdquoClinical Pharmacokineticsvol 26 no 1 pp 44ndash58 1994

[66] J A Platts M H Abraham Y H Zhao A Hersey L Ijazand D Butina ldquoCorrelation and prediction of a large blood-brain distribution data setmdashan LFER studyrdquo European Journalof Medicinal Chemistry vol 36 no 9 pp 719ndash730 2001

[67] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing blood-brain barrier partitioning of organic moleculesusing membrane-interaction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

[68] M Lobell L Molnar and G M Keseru ldquoRecent advancesin the prediction of blood-brain partitioning from molecularstructurerdquo Journal of Pharmaceutical Sciences vol 92 no 2 pp360ndash370 2003

[69] Y N Kaznessis M E Snow and C J Blankley ldquoPredictionof blood-brain partitioning using Monte Carlo simulationsof molecules in waterrdquo Journal of Computer-Aided MolecularDesign vol 15 no 8 pp 697ndash708 2001

[70] H van de Waterbeemd G Camenisch G Folkers J RChretien andO A Raevsky ldquoEstimation of blood-brain barriercrossing of drugs using molecular size and shape and H-bonding descriptorsrdquo Journal of Drug Targeting vol 6 no 2 pp151ndash165 1998

[71] J Kelder P D J Grootenhuis DM Bayada L P CDelbressineand J-P Ploemen ldquoPolar molecular surface as a dominatingdeterminant for oral absorption and brain penetration ofdrugsrdquo Pharmaceutical Research vol 16 no 10 pp 1514ndash15191999

[72] F Ooms P Weber P-A Carrupt and B Testa ldquoA simplemodel to predict blood-brain barrier permeation from 3Dmolecular fieldsrdquo Biochimica et Biophysica ActamdashMolecularBasis of Disease vol 1587 no 2-3 pp 118ndash125 2002

[73] H Fischer R Gottschlich and A Seelig ldquoBlood-brain barrierpermeation molecular parameters governing passive diffu-sionrdquo Journal of Membrane Biology vol 165 no 3 pp 201ndash2111998

[74] K M Mahar Doan J E Humphreys L O Webster et al ldquoPas-sive permeability and P-glycoprotein-mediated efflux differen-tiate central nervous system (CNS) and non-CNS marketeddrugsrdquo Journal of Pharmacology and ExperimentalTherapeuticsvol 303 no 3 pp 1029ndash1037 2002

[75] K Rose L H Hall and L B Kier ldquoModeling blood-brainbarrier partitioning using the electrotopological staterdquo Journalof Chemical Information and Computer Sciences vol 42 no 3pp 651ndash666 2002

[76] P Crivori G Cruciani P-A Carrupt and B Testa ldquoPredict-ing blood-brain barrier permeation from three-dimensionalmolecular structurerdquo Journal ofMedicinal Chemistry vol 43 no11 pp 2204ndash2216 2000

[77] R C Young R C Mitchell T H Brown et al ldquoDevelopmentof a new physicochemical model for brain penetration andits application to the design of centrally acting H2 receptorhistamine antagonistsrdquo Journal of Medicinal Chemistry vol 31no 3 pp 656ndash671 1988

[78] F Lombardo J F Blake and W J Curatolo ldquoComputationof brain-blood partitioning of organic solutes via free energycalculationsrdquo Journal of Medicinal Chemistry vol 39 no 24 pp4750ndash4755 1996

[79] M Iyer R Mishra Y Han and A J Hopfinger ldquoPredict-ing bloodndashbrain barrier partitioning of organic moleculesusing membranendashinteraction QSAR analysisrdquo PharmaceuticalResearch vol 19 no 11 pp 1611ndash1621 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 14: Research Article A Genetic Algorithm Based Support Vector ...downloads.hindawi.com › journals › bmri › 2015 › 292683.pdfGenetic Algorithms. Genetic algorithms (GA) [ ]are stochastic

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology