Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
GPU based virtual screening techniques for faster drug discovery
Research Scholar: Jayaraj P. B (P100016CS)
Guided byDr. K. Muralikrishnan &Dr. G. Gopakumar
Department of Computer Science & Engineering, NIT Calicut
12/15/2016GPU based virtual screening techniques for faster drug discovery
1
1
Outline
GPU based virtual screening techniques for faster drug discovery
Introduction to Drug Discovery Computer aided Drug Discovery Virtual screening in Drug Discovery
Literature Survey Limitation of Virtual Screening Need for Parallelism
GPU Computing Proposed GPU Parallel methods for Virtual Screening
Random Forest Self Organizing Map Maximum Common Subgraph
Results & Comparison Conclusion & Future Work References Publications
12/15/2016
2
2
Introduction to Drug Discovery
o Drug discovery is an inventive process of developing a new drug which will be therapeutically active to a disease causing target molecule
Target : is a key molecule that is specific to a disease condition.
Ligand : a small molecule that will bind tightly to its target. This
binding results in a change of conformation of the target.
o It deals with design of molecule that is complementary in charge and shape to the target to which it binds
GPU based virtual screening techniques for faster drug discovery 12/15/2016
3
3
Traditional Drug discovery pipeline[66]
GPU based virtual screening techniques for faster drug discovery 12/15/2016
4
4
Image Courtesy : www.cresset-group.co
o Requires lot of wet lab experiments with target and large no of molecules
o Large exploration space need large execution time and money
Computer aided Drug Discovery[65]
12/15/2016GPU based virtual screening techniques for faster drug discovery
5
o Breaks the bottleneck by using modern computational techniques
o Data mining, machine learning, artificial intelligence, graph matching.
o It can avoid costly wet-lab experimentations
Image Courtesy : www.cresset-group.co
Virtual Screening[1,3]
Virtual Screening(VS) is a computational technique used indrug discovery for evaluating large number of molecules toidentify lead molecules that can be optimized to give a drugcandidate.
Because of VS only, the drug discovery consider enormouschemical space of over 1060 conceivable compounds forscreening[1].
This will speed up the Drug discovery process whilereducing the need for expensive wet lab work.
GPU based virtual screening techniques for faster drug discovery
6
6 12/15/2016
Types of Virtual Screening
12/15/2016GPU based virtual screening techniques for faster drug discovery
7
Image Courtesy : www.cresset-group.co
7
VS : related works
S Ekins et al. [1] shown that the in-silico methods available in Pharmacology can also be used for drug discovery purpose. Amanda Schierz [4] explained the shortcomings of virtual screening in her work.
o VS using Machine learning J.C. Gertrudes et al.[2], Patrick Walters et al.[8], Peter Ripphausen et al.[13], A.
Srinivas Reddy et al.[14] and Campbell McInnes et al.[15] have reviewed theapproaches used in virtual screening methods.
R. Burbidge et al. [11] proved that support vector machine is well suited to QSAR analysis of virtual screening.
Vladimir Svetnik et al.[12] , used random forest for Compound Classification and QSAR Modeling.
o VS using similarity searching Yiqun Cao et al. [52] in their work proposed and tested the performance of a new
backtracking algorithm for finding Maximum Common Subgraph between two given graphs.
Peter Willet [61] , John W. Raymond et al. [62, 70] and Paul J. Durand [71] reviewed and analyzed several fingerprint based and graph based similarity methods for clustering chemical structures
GPU based virtual screening techniques for faster drug discovery 12/15/2016
8
8
Issues in Virtual screening
GPU based virtual screening techniques for faster drug discovery
Since the spectrum of input for virtual screening is toolarge, serial computing will not be much helpful.
The task of training the model using millions ofmolecules can take too much of time.
Also Screening billions of molecules will be an awful taskwith serial computing.
Works at Open Source Drug Discovery Consortium of CSIRat IISc campus.
Need for parallelism.
9
9 12/15/2016
GPU Computing[17]
GPU based virtual screening techniques for faster drug discovery
Application Code
GPU CPU
Use GPU to Parallelize
Compute-Intensive Functions
Rest of SequentialCPU Code
12/15/2016
10
Image Courtesy : Nvidia CudaZone
10
CUDA - parallelcomputing platform &programming modelinvented by NVIDIAin 2006[9,10].
Problem Statement
GPU based virtual screening techniques for faster drug discovery
To design and develop efficient GPU based Parallel virtual
screening algorithms for faster drug discovery.
12/15/2016
11
11
GPU accelerated Virtual Screening
12/15/2016GPU based virtual screening techniques for faster drug discovery
12
12
Proposed Methods
Contributions are made in devising following data parallel methods for virtual screening.
Method 1: Random Forest
Method 2: Self Organizing
Method 3: Maximal Common Subgraph
GPU based virtual screening techniques for faster drug discovery 12/15/2016
13
13
Proposed Method-1
GPU Based Random Forest Classifier for Virtual Screening
Given information about the reaction of a set of molecules to a target, Predict the reaction of new molecules when they interact with the same target, as active or inactive?
12/15/2016
14
GPU based virtual screening techniques for faster drug discovery 14
Random Forest[21]
Random Forest (RF) is a committee of weak learners for solving prediction problems.
In RF, a decision tree, CART (Classification And Regression Tree) is used as a weak learner.
CART follows greedy, top-down binary, recursive partitioning, that divides feature space into sets of disjoint rectangular regions.
Each internal node has an associated splitting predicate.
12/15/2016
15
Proposed Method1
GPU based virtual screening techniques for faster drug discovery 15
Binary Decision Trees
4
3
6
9
Leaf nodes
Split nodes1
2
5
11 12 13
14 15 16
v
𝒗
≥𝒗 𝒕𝟑
𝒗
≥
𝒗
𝒇𝟏
8
5
17
77
10
1
Image Courtesy : www. machinelearningmastery.com/
12/15/2016
16
Proposed Method1
V : feature Vector: : : split function
: threshold ≥
≥
≥
≥
GPU based virtual screening techniques for faster drug discovery 16
≥
12/15/2016
17
Input and Parameters used:
N - number of training samples
n - number of decision trees in the forestM - total number of featuresm - number of features splitting a node
Algorithm
1. Set a number of trees, n as well as a number of
features, m to be used in the creation of the trees.2. Using bootstrapping, create n training samples for each
tree.3. Grow the trees.
Random Forest – Serial Algorithm[20]Proposed Method1
GPU based virtual screening techniques for faster drug discovery 17
12/15/2016
18
4. In each node, select a set m of M random features;
these randomly selected features are the featurespossible to perform a split on in the current node.
5. Split the node on the feature m, that best separates thetraining samples in the node with regard to their outputvalue.
6. For each instance in the test sample, let each predictorvotes an output value.
7. Final output of ensemble is the majority vote8. End
RF - Serial Algorithm Continued… Proposed Method1
GPU based virtual screening techniques for faster drug discovery 18
Molecule as a vector
• A molecule can be represented as a set of descriptors
• 179 descriptors areconsidered in theproposed work
• A molecule canbe considered as apoint in amultidimensional‘descriptor space’.
12/15/2016
19
Proposed Method1
GPU based virtual screening techniques for faster drug discovery
Image Courtesy : www.cresset-group.co
19
12/15/2016GPU based virtual screening techniques for faster drug discovery 20
Proposed Method1
Issues: Modelcreation andprediction seriallyfor millions ofmolecules takes toomuch of time tocomplete the VSprocess.
RF Classifier for LBVS20
Parallel RF – Related Works
Grahn et al. [27] presented a new parallel version of the Random Forests algorithm - CudaRF was implemented using CUDA.
These methods seem to under-utilize the available parallelism of many core machines.
Essen et al. [28] compared the effectiveness of FGPAs, GPUs and multi-core CPUs for accelerating Compact Random Forest(CRF) classifiers in their work.
Liao et. al [29] introduced CudaTree, a GPU Random Forest implementation which adaptively switches between data and instruction parallelism.
12/15/2016
21Proposed Method1
GPU based virtual screening techniques for faster drug discovery 21
Parallel Decision Tree construction
12/15/2016GPU based virtual screening techniques for faster drug discovery
22
To grow a decision tree on GPU, a hybrid method is developed in the proposed work.
Hybrid method uses combination of depth first and breadth first constructions
Depth First tree construction is utilized at tree top.
Tree construction will switch to breadth first tree construction after a threshold value.
This cross over threshold can be set by the number of nodes grown in the tree.
22
12/15/201623
Proposed Parallel Training Algorithm
GPU based virtual screening techniques for faster drug discovery
Proposed Method1
12/15/201624
Proposed Method1
Proposed Parallel Training Algorithm Continued…
GPU based virtual screening techniques for faster drug discovery
12/15/201625
Proposed Method1Proposed Parallel Training Algorithm
GPU based virtual screening techniques for faster drug discovery
12/15/201626
Proposed Method1Proposed Parallel Prediction Algorithm
GPU based virtual screening techniques for faster drug discovery
Data set used
o Input – Bio-assay SDF file from NCBI PubChem[34]o Feature Extraction Tool Used : POWERMV[32]
- 179 descriptors are generated for each data set[4]
Proposed Method1
GPU based virtual screening techniques for faster drug discovery 12/15/201627
27
Performance [serial and parallel]
Serial
Parallel
Proposed Method1
GPU based virtual screening techniques for faster drug discovery 12/15/201628
28
Speed up in TrainingProposed Method1
o For smaller data set, the speedup is less due to the overhead inCPU-GPU data transfer.
o For Larger data set, there is a visible computational boost of 10fold.
GPU based virtual screening techniques for faster drug discovery 12/15/201629
29
Speed up in Prediction
o The data set for classification were taken from GDB17[35],
a chemical universal database for unknown compounds.
o Speedup of 5 – 60 times is achieved
Proposed Method1
GPU based virtual screening techniques for faster drug discovery 12/15/201630
30
Proposed Method-2
GPU based Self Organizing Map for Virtual Screening.
12/15/2016
31
GPU based virtual screening techniques for faster drug discovery 31
Self Organizing Map[36]
Introduced by Prof. Teuvo Kohonen in 1982.
It is a type of artificial neural network (ANN) to produce a low-dimensional discretizedrepresentation of the input space of the training samples.
It is implemented as an unsupervised system of competitive learners.
This makes SOM useful for visualizing low-dimensional views of high-dimensional data.
12/15/2016
32Proposed Method2
GPU based virtual screening techniques for faster drug discovery 32
33
Input: vectors X, of length n
(x1,1, x1,2, ..., x1,i,…, x1,n)
(x2,1, x2,2, ..., x2,i,…, x2,n)
…
(xj,1, xj,2, ..., xj,i,…, xj,n)
…
(xp,1, xp,2, ..., xp,i,…, xp,n)
o Outputo A vector, Y, of length m: (y1, y2, ..., yi,…, ym)
o There is one weight vector of length n associated with each output unit.
o Each of the p vectors in the training data is classified as falling in one
of m clusters.
33GPU based virtual screening techniques for faster drug discovery 12/15/2016
Image Courtesy : http://www.csbdu.in/
Working of SOM
12/15/2016
34
Proposed Method2
SOM - Serial Algorithm[36]
GPU based virtual screening techniques for faster drug discovery 12/15/2016
34
12/15/201635
Proposed Method2
GPU based virtual screening techniques for faster drug discovery
SOM :Related Works
SOM Software packages SOM PAK[38] - developed at Helsinki University of Technology by
Kohonen and his team. Viscovery[39] - for medicinal document classification kohonen[40] - an R Language based SOM package
Paul Elzer et al.[41] in their work used SOM for industrial pharmaceutical research.
Dimitar Hristozov et al.[42] used SOM for fingerprint similarity based virtual screening.
Couldn’t seen any work in literature related to SOM based virtual screening.
Running the classical SOM algorithm for virtual screening of millions of molecules serially on even a powerful computer cannot complete execution in a limited time frame[36].
12/15/2016
36Proposed Method2
GPU based virtual screening techniques for faster drug discovery 36
Proposed iterative SOM for VS
An iteration of the proposed algorithm consists of
A model building phase and
A prediction phase.
The model building phase of the proposed algorithmcombines
The unsupervised learning capability of the SOM with
A supervised labeling of the trained SOM neurons
Each successive iteration builds a better prediction modelfor test data.
12/15/2016
37Proposed Method2
GPU based virtual screening techniques for faster drug discovery 37
12/15/201638
Proposed Method2Proposed iterative SOM for VS
GPU based virtual screening techniques for faster drug discovery
12/15/201639
Proposed Method2
Proposed iterative SOM continued …
GPU based virtual screening techniques for faster drug discovery
Proposed SOM for Virtual screeningProposed Method2
GPU based virtual screening techniques for faster drug discovery 12/15/201640
40
Issues: timeconsuming due tothe computeintensive winnerneuron finding,neuron weightupdating stepsand iterations.
Neuron labelsa : activei : inactivenl : next levelu : undefined
Parallel SOM– Related Works
Raghavendra D Prabhu[43], Alexander Campbell et al.[44] and Peter Wittek et al.[45] have described different parallel SOM implementation for their many core and multi architecture.
The Work by McConnell et al. [48] compared different parallel SOM implementations using OpenCL, CUDA and MPI.
Gavin Davidson[46] created a parallel version of the SOM algorithm using OpenCL.
Gaute Myklebust et al. [47] put forth ideas on node parallelism and training sample parallelism in SOM.
Node parallelism is used in the proposed work
12/15/2016
41Proposed Method2
GPU based virtual screening techniques for faster drug discovery 41
12/15/201642GPU based virtual screening techniques for faster drug discovery
Proposed Method2
Algorithm continued ……
12/15/201643
Proposed Method2
GPU based virtual screening techniques for faster drug discovery
Algorithm continued ……
12/15/201644
Proposed Method2
GPU based virtual screening techniques for faster drug discovery
Algorithm continued ……
12/15/201645
Proposed Method2
GPU based virtual screening techniques for faster drug discovery