Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia

Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat

Map Visualization for Biomarkers Detection of LGL Leukemia

By: David Garcia

Table of Contents

• What is LASSO?• How does LASSO Work?• LASSO and Feature Selection• LGL Leukemia• Statistical Biomarker Discovery• Methods and Results• Questions

What is LASSO?

• LASSO = Least Absolute Shrinkage and Selection Operator

• Developed by Robert Tibshirani in 1996

• LASSO is a method of feature selection

What is LASSO?

• Estimates regression coefficients bi for each feature xi

• Uses a penalty function via a tuning parameter l

• Sets coefficients of less relevant features to zero

How Does LASSO Work?

Regression Equation:

ŷ = b0 + b1x1 + b2x2 + … + bnxn

x1, x2, ..., xn are the variables/features

ŷ is the predicted outcome


y1 = b0 + b1x11 + b2x12 + … + bnx1n

y2 = b0 + b1x21 + b2x22 + … + bnx2n

.

.

.

ym = b0 + b1xm1 + b2xm2 + … + bnxmn



y1 = b0 + b1x11 + b2x12 + … + bnx1n + e1

y2 = b0 + b1x21 + b2x22 + … + bnx2n + e2

.

.

.

ym = b0 + b1xm1 + b2xm2 + … + bnxmn + em

HOW DOES LASSO WORK?

• GOAL:

find b0, b1, …, bn that minimize the

square of the total prediction error

(e1 + e2 + ... + em)2


• GOAL:




• GOAL:




• Presence of dependent variables (xi) leads to regression coefficients (bi) with very large variances

• Tuning parameter l used to restrict the regression coefficients

b0 + b1 + … + bn ≤ c


-c

-c

c

c

LASSO and Feature Selection

• Use of l drives less relevant bi to zero

• LASSO can be used to filter features that contribute less to the expected result

ŷ = b0 + b1x1 + b2x2 + b3x3 + b4x4


• Use of l drives less relevant bi to zero

• LASSO can be used to filter features that contribute less to the expected result

l = 0.5

ŷ = b0 + 0x1 + b2x2 + 0x3 + b4x4


• LASSO can be used in bioinformatics to select genes that may contribute more to the presence of disease

l = 0.5

ŷ = b0 + 0x1 + b2x2 + 0x3 + b4x4

xi is the transcription level of gene i

ŷ is the presence or absence of disease

LGL Leukemia

• LGL = large granular lymphocytic

• Results from lack of programmed cell death

• No current standard treatment

Statistical Biomarker Discovery

• Other methods of biomarker detection select genes based on biomedical perspectives

• Proposed method uses a purely statistical approach

• Results need to be verified via further biomedical studies

Methods and Results

• sample of 45 subjects with 10444 attributes

• 37 infected / 8 normal

• y = 0 for normal / 1 for infected

• sample data standardized based on z score

• combination of heat map visualization and LASSO

Methods and Results

Methods and Results

• Testing set contains one sample

• Leave-one-out cross validation used to choose optimal l

• Authors choose l that results in the most shrinkage with a mean squared error within one standard error of the minimum• l = 0.02868446

Methods and Results

Methods and Results

• 21 genes selected from LASSO method

• "FCGBP", "KIT", "CD34", "NLGN2", "SPINK2", "HIPK1", "SNORA31", "NR4A3", "SNORA27", "CASK", "SNORA4", "ACSM3", "NELL2", "NAGPA", "VPS25", "LYZ", "DUSP2", "GOLGA8A", "PHGDH", "SERF1A“, "TNFSF9"

Methods and Results

• Database for Annotation, Visualization and Integrated Discovery (DAVID) tool used to classify genes

• One gene shows potential as LGL leukemia biomarker

Questions

Documents

Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia