Upload
madison-ferguson
View
216
Download
1
Embed Size (px)
Citation preview
Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat
Map Visualization for Biomarkers Detection of LGL Leukemia
By: David Garcia
Table of Contents
• What is LASSO?• How does LASSO Work?• LASSO and Feature Selection• LGL Leukemia• Statistical Biomarker Discovery• Methods and Results• Questions
What is LASSO?
• LASSO = Least Absolute Shrinkage and Selection Operator
• Developed by Robert Tibshirani in 1996
• LASSO is a method of feature selection
What is LASSO?
• Estimates regression coefficients bi for each feature xi
• Uses a penalty function via a tuning parameter l
• Sets coefficients of less relevant features to zero
How Does LASSO Work?
Regression Equation:
ŷ = b0 + b1x1 + b2x2 + … + bnxn
x1, x2, ..., xn are the variables/features
ŷ is the predicted outcome
How Does LASSO Work?
y1 = b0 + b1x11 + b2x12 + … + bnx1n
y2 = b0 + b1x21 + b2x22 + … + bnx2n
.
.
.
ym = b0 + b1xm1 + b2xm2 + … + bnxmn
How Does LASSO Work?
How Does LASSO Work?
y1 = b0 + b1x11 + b2x12 + … + bnx1n + e1
y2 = b0 + b1x21 + b2x22 + … + bnx2n + e2
.
.
.
ym = b0 + b1xm1 + b2xm2 + … + bnxmn + em
HOW DOES LASSO WORK?
• GOAL:
find b0, b1, …, bn that minimize the
square of the total prediction error
(e1 + e2 + ... + em)2
HOW DOES LASSO WORK?
• GOAL:
find b0, b1, …, bn that minimize the
square of the total prediction error
HOW DOES LASSO WORK?
• GOAL:
find b0, b1, …, bn that minimize the
square of the total prediction error
How Does LASSO Work?
• Presence of dependent variables (xi) leads to regression coefficients (bi) with very large variances
• Tuning parameter l used to restrict the regression coefficients
b0 + b1 + … + bn ≤ c
How Does LASSO Work?
-c
-c
c
c
LASSO and Feature Selection
• Use of l drives less relevant bi to zero
• LASSO can be used to filter features that contribute less to the expected result
ŷ = b0 + b1x1 + b2x2 + b3x3 + b4x4
LASSO and Feature Selection
• Use of l drives less relevant bi to zero
• LASSO can be used to filter features that contribute less to the expected result
l = 0.5
ŷ = b0 + 0x1 + b2x2 + 0x3 + b4x4
LASSO and Feature Selection
• LASSO can be used in bioinformatics to select genes that may contribute more to the presence of disease
l = 0.5
ŷ = b0 + 0x1 + b2x2 + 0x3 + b4x4
xi is the transcription level of gene i
ŷ is the presence or absence of disease
LGL Leukemia
• LGL = large granular lymphocytic
• Results from lack of programmed cell death
• No current standard treatment
Statistical Biomarker Discovery
• Other methods of biomarker detection select genes based on biomedical perspectives
• Proposed method uses a purely statistical approach
• Results need to be verified via further biomedical studies
Methods and Results
• sample of 45 subjects with 10444 attributes
• 37 infected / 8 normal
• y = 0 for normal / 1 for infected
• sample data standardized based on z score
• combination of heat map visualization and LASSO
Methods and Results
Methods and Results
• Testing set contains one sample
• Leave-one-out cross validation used to choose optimal l
• Authors choose l that results in the most shrinkage with a mean squared error within one standard error of the minimum• l = 0.02868446
Methods and Results
Methods and Results
• 21 genes selected from LASSO method
• "FCGBP", "KIT", "CD34", "NLGN2", "SPINK2", "HIPK1", "SNORA31", "NR4A3", "SNORA27", "CASK", "SNORA4", "ACSM3", "NELL2", "NAGPA", "VPS25", "LYZ", "DUSP2", "GOLGA8A", "PHGDH", "SERF1A“, "TNFSF9"
Methods and Results
• Database for Annotation, Visualization and Integrated Discovery (DAVID) tool used to classify genes
• One gene shows potential as LGL leukemia biomarker
Questions