15
DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

Embed Size (px)

Citation preview

Page 1: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

DATA ANALYSIS

Module Code: CA660

Lecture Block 6: Alternative estimation methods and their implementation

Page 2: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

2

MAXIMUM LIKELIHOOD ESTIMATION• Recall general points: Estimation, definition of Likelihood

function for a vector of parameters and set of values x. Find most likely value of = maximise the Likelihood fn.

Also defined Log-likelihood (Support fn. S() ) and its derivative, the Score, together with Information content per observation, which for single parameter likelihood is given by

• Why MLE? (Need to know underlying distribution). Properties: Consistency; sufficiency; asymptotic efficiency (linked

to variance); unique maximum; invariance and, hence most convenient parameterisation; usually MVUE; amenable to conventional optimisation methods.

)(log2

2)(log)(2

xLExLEI

Page 3: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

3

VARIANCE, BIAS & CONFIDENCE

• Variance of an Estimator - usual form or for k independent estimates• For a large sample, variance of MLE can be approximated by

can also estimate empirically, using re-sampling* techniques.

• Variance of a linear function (of several estimates) – (common need in genomics analysis, e.g. heritability), in risk analysis

• Recall Bias of the Estimator then the Mean Square Error is defined to be: expands to so we have the basis for C.I. and tests of hypothesis.

)ˆ(E2)ˆ( EMSE

2

11

22 ˆ1ˆˆ

k

i

i

k

i

i k

)(

1ˆ 2

nI

22ˆ

2 ])ˆ([]})ˆ([)]ˆ(ˆ{[ EEEE

Page 4: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

4

COMMONLY-USED METHODS of obtaining MLE

• Analytical - solving or when simple solutions exist

• Grid search or likelihood profile approach• Newton-Raphson iteration methods• EM (expectation and maximisation) algorithm

N.B. Log.-likelihood, because max. same value as Likelihood Easier to compute Close relationship between statistical properties of MLE and Log-likelihood

0ddL 0d

dS

Page 5: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

5

MLE Methods in outline

Analytical : - recall Binomial example earlier

• Example : For Normal, MLE’s of mean and variance, (taking derivatives w.r.t mean and variance separately), and equivalent to sample mean and actual variance (i.e. /N),

- unbiased if mean known, biased if not.

• Invariance : One-to-one relationships preserved• Used: when MLE has a simple solution

0)(

xnx

d

dSScore

n

x

Page 6: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

6

MLE Methods in outline contd.Grid Search – ComputationalPlot likelihood or log-likelihood vs parameter. Various features• Relative Likelihood =Likelihood/Max. Likelihood (ML set =1). Peak of R.L. can be visually identified /sought algorithmically. e.g.

Plot likelihood and parameter space range - gives 2 peaks,

symmetrical around ( likelihood profile for e.g. well-known mixed linkage analysis problem. Or for similar example of populations following known proportion splits).

If now constrain MLE solution unique e.g. = R.F. between genes (possible mixed linkage phase).

])1()1([)( 20808020 LogS

10

5.0ˆ5.00

2.0ˆ

Page 7: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

7

MLE Methods in outline contd.

• Graphic/numerical Implementation - initial estimate of . Direction of search determined by evaluating likelihood to both sides of .

Search takes direction giving increase, because looking for max. Initial search increments large, e.g. 0.1, then when likelihood change starts to decrease or become negative, stop and refine increment.

Issues:• Multiple peaks – can miss global maximum, computationally

intensive ; see e.g. http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html

• Multiple Parameters - grid search. Interpretation of Likelihood profiles can be difficult, e.g. http://blogs.sas.com/content/iml/2011/10/12/maximum-likelihood-estimation-in-sasiml/

Page 8: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

8

Example in outline• Data e.g used to show a linkage relationship (non-independence) between

e.g. marker and a given disease gene, or (e.g. between sex and purchase) of computer games.

Escapes = individuals who are susceptible, but show no disease phenotype under experimental conditions: (express interest but no purchase record). So define as proportion of escapes and R.F. respectively.

is penetrance for disease trait or of purchasing, i.e. P{ that individual with susceptible genotype has disease phenotype}. P{individual of given sex and interested who actually buys}

Purpose of expt.-typically to estimate R.F. between marker and gene or proportion of a sex that purchases

• Use: Support function = Log-Likelihood. Often quite complex, e.g. for above example, might have

,1

)1ln()()ln()1ln(),( 4321 kkkkS

Page 9: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

9

Example contd.

• Setting 1st derivatives (Scores) w.r.t and w.r.t.

• Expected value of Score (w.r.t. is zero, (see analogies in classical sampling/hypothesis testing). Similarly for . Here, however, No simple analytical solution, so can not solve directly for either.

• Using grid search, likelihood reaches maximum at e.g.

• In general, this type of experiment tests H0: Independence between the factors (marker and gene), (sex and purchase)

• and H0: no escapes Uses Likelihood Ratio Test statistics. (M.L.E. 2 equivalent)

22.0ˆ,02.0ˆ

)5.0( )0(

0 0

Page 10: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

10

MLE Methods in outline contd.Newton-Raphson IterationHave Score () = 0 from previously. N-R consists of replacing Score by linear

terms of its Taylor expansion, so if ´´ a solution, ´=1st guess Repeat with ´´ replacing ´ Each iteration - fits a parabola to Likelihood Fn.

• Problems - Multiple peaks, zero Information, extreme estimates • Multiple parameters – need matrix notation, where S matrix e.g. has

elements = derivatives of S(, ) w.r.t. and respectively. Similarly, Information matrix has terms of form

Estimates are

0)]([

)()()(

2

2

d

Sd

d

dS

d

dS

22 )(

)]([

dSd

dSd

.),(),(2

2

2

etcSESE

)()(1 1

SIN

L.F.

2nd

1st

Variance of Log-L i.e.S()

Page 11: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

11

MLE Methods in outline contd.Expectation-Maximisation Algorithm - Iterative. Incomplete data

(Much genomic, financial and other data fit this situation e.g. linkage analysis with marker genotypes of F2 progeny. Usually 9 categories observed for 2-locus, 2-allele model, but 16 = complete info., while 14 give info. on linkage. Some hidden, but if linkage parameter known, expected frequencies can be predicted and the complete data restored using expectation).

• Steps: (1) Expectation estimates statistics of complete data, given observed incomplete data.

• -(2) Maximisation uses estimated complete data to give MLE.

• Iterate till converges (no further change)

Page 12: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

12

E-M contd.

Implementation

• Initial guess, ´, chosen (e.g. =0.25 say = R.F.). • Taking this as “true”, complete data is estimated, by distributional

statements e.g. P(individual is recombinant, given observed genotype) for R.F. estimation.

• MLE estimate ´´ computed. • This, for R.F. sum of recombinants/N. • Thus MLE, for fi observed count,

Convergence ´´ = ´ or

)(1

GRPfN ii

)00001.0(tolerance

Page 13: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

13

LIKELIHOOD : C.I. and H.T.• Likelihood Ratio Tests – c.f. with 2.

• Principal Advantage of G is Power, as unknown parameters involved in hypothesis test.

Have : Likelihood of taking a value A which maximises

it, i.e. its MLE and likelihood under H0 : N , (e.g. N = 0.5)• Form of L.R. Test Statistic or, conventionally

- choose; easier to interpret.• Distribution of G ~ approx. 2 (d.o.f. = difference in dimension of

parameter spaces for L(A), L(N) )• Goodness of Fit : notation as for 2 , G ~ 2

n-1 :

• Independence: notation again as for 2

)(

)(2

xL

xLLogG

N

A

)(

)(2

xL

xLLogG

A

N

i

i

n

i

i E

OLogOG

1

2

ij

ij

r

i

c

j

ij E

OLogOG

1 1

2

Page 14: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

14

Likelihood C. I.’s – graphical method • Example: Consider the following Likelihood function is the unknown parameter ; a, b observed counts• For 4 data sets observed, A: (a,b) = (8,2), B: (a,b)=(16,4) C: (a,b)=(80, 20) D: (a,b) = (400, 100)

• Likelihood estimates can be plotted vs possible parameter values, with MLE = peak value.

e.g. MLE = 0.2, Lmax=0.0067 for A, and Lmax=0.0045 for B etc.

Set A: Log Lmax- Log L=Log(0.0067) - Log(0.00091)= 2 gives 95% C.I. so =(0.035,0.496) corresponding to L=0.00091, 95% C.I. for A.

Similarly, manipulating this expression, Likelihood value corresponding to 95% confidence interval given as L = (7.389)-1 Lmax

Note: Usually plot Log-likelihood vs parameter, rather than Likelihood. As sample size increases, C.I. narrower and symmetric

baL )1()(

Page 15: DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

15

Maximum Likelihood Benefits

• Strong estimator properties – sufficiency, efficiency, consistency, non-bias etc. as before

• Good Confidence Intervals Coverage probability realised and intervals meaningful • MLE Good estimator of a CI MSE consistent Absence of Bias - does not “stand-alone” – minimum variance important

Asymptotically Normal Precise – large sample Inferences valid, ranges realistic

0)ˆ( 2 ELim n

)ˆ(E

nasN )1,0(~ˆ