Upload
roy-cannon
View
216
Download
0
Embed Size (px)
DESCRIPTION
Rank-based measures Regression Model Y : a univariate response Z Z (Z(Z,, Z ) : multiple covariates 1 p 3
Citation preview
Rank-Based Approach to OptimalScore via Dimension Reduction
Shao-Hsuan WangNational Taiwan University, Taiwan
Nov 2015
1
Rank-based measures
Kendall’s
Concordance Index
Rank correlation
Widely used in medical statistics, epidemiology,economics, and sociology, etc.
2
Rank-based measures
Regression Model
Y : a univariate response
Z (Z , , Z ): multiple covariates1 p
3
Rank-based measures
YRResponse
Y
TZ
TZRComposite score
4
Rank-based measures
(Y T T, Z) (Y, Z)For pair of observations
concordant :T
1 1 and 2 2
T
,
YY andTZ Z Y Y andTZ Z1 2
discordant :TY Y and
1 2 1
TZ Z Y
2 1 2
TY andTZ Z1 2 1 2 1 2 1 2
5
Rank-based measures
Kendall’s P(Y TY, Z T Z ) P(YY T, Z T Z )1 2
Rank correlation1 2 1
T T
2 1 2
rc P(YY , Z Z )1 2 1 2
Concordance IndexT TCI P( Z Z | YY)1 2 1 2
6
Rank-based measures
YRResponse
Y
TZ
TZRComposite score
7
Rank-based measures
There could not exist amonotonic association !!
8
Motivation
Composite score
TZ
g(Z)measurable functions
10
C-max
YRResponse
Concordance-index function :
C(g) P(g(Z
g(Z)RComposite score
)g(Z)|YY)
C C1 2 1 2
(g) C-max : max supgF c
Optimal score : m(Z) such that msupC(g)gF 11c
Intrinsic modelbehind Rank-based measures
M1 Distributional assumption
: Generalized Regression Model (Han 1987)
M2 Structural assumption
: Dimension Reduction (Li 1991, Cook 1991)
12
Intrinsic modelbehind Rank-based measures
M1
a non-degeneratemonotonic function on R
Y G(md (Z),)0
13
Intrinsic modelbehind Rank-based measures
M1
a non-degeneratemonotonic function on R
Y G(md (Z),)0
an unspecifed bivariate function strictlyincreasing at each component for the other
one being fixed14
Intrinsic modelbehind Rank-based measures
M2
Y D G(md (Z),)0
a multivariate polynomial of the unknown degreed 0
15
Intrinsic modelbehind Rank-based measures
M2Dimension Reduction
m (Z) Tm (B Z)d d k 00 0 0
(1)d 0 be the smallest degree such that YZ | md (Z)0
B(2) 0 { 01 , , 0k0 } is a basis of the central subspace (CS)
16
Model Flexibility
Linear regression model Y T0 Z
T Binary Choice model
Accelerated Failure time model
Y I(log(Y)
0 ZT0
0)
Z Generalized linear regression model (GLM)
Non-monotonic regression modelY ( T0
2Z)
17
Types of covariates
all discrete but continuous covariates
Covariates which moments could not exist
18
Theories
Propositions:
(1) Existence m (Z arg maxC(g)d0 g
(2) Uniqueness f (Z)arg maxC(g) f (Z) cm (Z) c
(3) Optimality
d0g
for a ploynomial fd0
d0 1
(z) of the degree
d0 2
d0
g(Z)arg maxC(g) g(Z)T(m (Z))dg
for some monotonic function T0
19
Summary
TZ could not be the best composite score
Model flexibility
Various types of covariates
Optimal score : existence, uniqueness, andoptimality
20
How to estimatedk
0
0
: structural degree: structural dimension
S(B ) : the central subspace0
m (BZ) : the optimal scored k 0
C0 0
max : the C-max
Estimation Procedure
Estimation Procedure
Derivem (Z) by maximizing the concordance index function viaStep1 d
the generalized single-index form of the polynomial
Tips: (1)d p
m(Z) c Z rj T Zd r r 1 pr0r1 rpr j1
n n
I( T Z T Z ,Y Y )(2)
C (m (Z)) C () i j i j
i1 j1n d 0n n n
i1 j1
I(Y Y )i j
Estimation Procedure
Step 2Apply the outer grandient approach to obtain B
Tips : (1)T
k
m (u) m (B u)d d k 00 0 0
(2)
col(S(B)) col( m (u)(m T(u))dW(u))0 puR
d0 d0
Estimation Procedure
Step 3 Derive the estimator of
Tips : (1)
mdkT(Bk Z)
Z BTZkn n
I( T Z T Z,Y Y )(2)ˆ arg max
(3) T
i1 j1
T
i
n n
i1 j1
j i j
I(Y Y )i j
m (B Z) ˆ Zdk k
Estimation Procedure
Step 4 Adopt the concordance-based generalized BIC to estimateTd, k, S(B),m (BZ), and C0 0 0 d0k0
Tips : (1)IC(d,k)
0 max
TnC (m (Blogn kdZ)) (C 1)n dk k
withIC(0,k)1/2(2)
(d,k) arg maxIC(d,k)0d,1 p1
2 k
Asymptotic results
Consistent model selection--- parsimonious model among the class of
Correct models(d ,k )0 0
n-consistency of estimators of TS(B)andm (B Z)0
Asymptotic normality of estimators of C
d0k0 0
max
27
Wine Data
• Vinho verde wine : red wine and white wine(from the Minho Region of Northern Portugal)
• Collected from May/2004 -February/2007
• Red wine : sample size (n)=1599White wine : n=4898
•Physicochemical and sensory tests
Wine data
Response (Y):Preferences 0 (bad) -10 (excellent)
11 Covariates (Z) :fixed acidity, volatile acidity,citric acid, residual sugar,chlorides, free sulfur dioxide,total sulfur dioxide, density,PH, sulphates, and alcohol
Wine data
30
Wine data
31
Thank You !