Upload
shuai-li
View
105
Download
1
Embed Size (px)
Citation preview
1. Solving a classification problem first may be wasteful
2. Need to address class distribution drift in test sets
Quantification Performance Measures1. Capture quantification goals directly, OR2. Balance quantification and classification goals (hybrid)3. Challenging to optimize on voluminous, streaming data
1. Receive a data point 2. Fix dual variables, take SGD step to update model3. Fix model, take SGD steps to update dual variables4. Updates extremely cheap: closed form for dual variables
Goal: Estimate the relative prevalence of classes of interestin large unlabeled populations in online, streaming settings
Applications of Quantification
Sentiment Analysis
KatyCipriano
The best part of the meal isthe dessert which they dontmake themselves – justsayin. @bouzagloabc
2 hours ago
Tweet
JuliaChild
Loved the food – worth the45 minute wait! Can’t waitfor my Sunday brunch atABC. @bouzagloabc
1 hours ago
Tweet
GordonRamsay
It was RAAAAW. @bouzagloabc
3 days ago
Tweet
PaulaDeen
@GordonRamsay Samy theowner threw me out just forpointing that out! Disastrousservice
2 days ago
Tweet
Several applications directly require estimates of class ratiosa.k.a. Counting, Class probability re-estimation, Class prior estimation
Epidemiology
Challenges
Online Optimization Methods for the Quantification ProblemPurushottam Kar¹, Shuai Li², Harikrishna Narasimhan³, Sanjay Chawla⁴, Fabrizio Sebastiani⁴
¹IIT Kanpur, India, ²University of Insubria, Italy, ³Harvard University, USA, ⁴QCRI-HBKU, Qatar
Full Paper: http://tinyurl.com/quantonline 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Quantification Performance Measures
† ‡Quantification Performance Measure Hybrid Performance Measure
Nested Concave Measures Pseudo-concave Measures
NegKLD†
QMeasure‡
BAKLD‡
CQReward‡
BKReward‡ Ne
ste
d C
on
cave
Me
asu
res
Normalized Square Score† 1. Dual computation of nested functions difficult, costly updates2. Solution: apply duality to nested functions in nested manner!
Key Idea
1. Use the level set function as a proxy objective function2. Exploit the fact that the level set functions are concave
Key Idea
Fenchel Duality
Level Set StructureLevel sets are convex
Fenchel “dual”Dual variablesAny ccv function
Linear in TPR and TNR for fixed values of dual variables!
NEMSIS (streaming)
Pse
ud
o C
on
cave
Me
asu
res
CAN (non-streaming)
Guarantee forNEMSIS, SCAN
1. Execute E and M steps approximately in “streaming epochs”2. E epochs use streaming data to estimate 3. M epochs execute NEMSIS on streaming data - optimize proxy4. Epochs made progressively longer: more accurate E,M steps
SCAN (streaming)
Find new level Optimize proxy
Progress in proxyprovably linked toprogress in perf.
Level function
ccv
cvx
ccv
E
M
E M E M E M …
Guarantee for CAN
Experimental Results
ccv: concave cvx: convex
Superior accuraciesand training timesacross quant andhybrid measures aswell as datasets
NS: dual updates made using actual TPR/TNR values not surrogates
KDD08
PPI
CovertypeKDD08
AdultCod-RNA
Covertype Adult
Attractive trade-off b/wquant/class performanceusing BAKLD perf.
Robustness to drift inclass proportions (smalleris better in PosKLD)
Theoretical Guarantees
Classification accuracy: 50% But … #False pos. = #False neg.⇒ Perfect quantification (Perfect classification impossible)
Balanced Accuracy (BA)
Observation: All quantification measures naturally nestedconcave or pseudo concave – exploit to optimize scalably?
Psephology
Cause-specific Mortality analysis
Transfer Learning