Upload
rodger-hart
View
216
Download
0
Embed Size (px)
Citation preview
Interactive Identification of Information Needs andIts Application to Medical Informatics
Rey-Long Liu
劉瑞瓏
Dept. of Information Management
Chung Hua University
中華大學資訊管理學系
2
Outline
Introduction Information Need Identification (INI): What & Why Interactive INI
INEED: Incremental Mining for Interactive INI The profile miner The information need identifier
Empirical evaluation Application to Medical Informatics Conclusion
3
Introduction
Information Need Identification (INI) for Information portals Online service guidance Internet search engines People finding
Interactive INI, which needs to consider Precision (P) Precision Effectiveness (PE) Recall (R) Recall Effectiveness (RE)
C
R
C
n2
2
C
n2
1
C
n1
2
C
n1
1
C
n
2
C
n
1
C
n
C
1
2
1
2
C
1
2
1
1
C
12
2
C
12
1
C
1
1
C
1
2
C
1
C
11
2
C
11
1
C
12
2
C
12
1
C
1
2
C
1
1C
1
2
1
2
C
1
2
1
1
C
1
C
1
2
1
2
C
1
2
1
1
C
1
2
1
2
C
1
2
1
1
C
n22
C
1
2
1
2
C
1
2
‧‧‧
4
Introduction (Cont.)
Main Challenges Each information space has its own content and
structure. Each information space is intrinsically dynamic. Users are often unable (or unwilling) to precisely
express their information needs (INs). Their queries are often quite short.
Users prefer simpler and fewer interactions.
5
INEED
(3) Information
Information Storage
Interface
Information Provider
(4) Information Required
Profile Miner
IN Identifier
INEED
Category Profile
(0)Content & Taxonomy
(2)Request
(1)Interaction
6
The Profile Miner
Incremental profile mining
Given: The document d to be added to category c.Effect: Updating the profiles of c and related categories. Procedure:(1) While c is not the root of the text hierarchy, do
(1.1) For each distinct word w in d, do(1.1.1) If w is not a profile term for c, add <w, sw,c> to the profile of c (strength sw,c is unknown);
(1.2) For each pair <w, sw,c> in the profile of c, do(1.2.1) sw,c = P(w|c) (Bc / iP(w|ci));
(1.2.2) For each sibling b of c, update sw,b in the profile of b; (1.3) c father of c.
7
The Profile Miner (Cont.)
f
Updating the profiles of related categories once a document is added
New document added to f
The s-values of the profile terms are updated ‧‧‧
‧‧‧
‧‧‧
‧‧‧
The s-values of the profile terms are updated
8
The Profile Miner (Cont.)
經理人員
決策制定、協調整合
業務處
市場規劃、商品推展
管理處
內務行政、績效管理
研發處
整合評估、流程制定
行銷部
行銷文宣、廣告宣傳
客戶部
訂單管理、銷售分析
品保部
品質維護、產品測試
製造部
產品生產、設計製造
行政部
營運管理
資訊部
系統規劃、研發維護
人事課員工聘用、人才培育
會計課
帳目管理、預算編排
出納課
款項收付
電腦整合課
生產資訊、資訊運用
資訊管理課
系統管理、辦公室自動化
An example:
9
管理處
內務、行政、管理
研發處
研發、生產、流程
品保部
品質、管理、測試
資訊部
資訊、系統、建置
電腦整合課
生產、整合、運用
……
……
……
生產管理之相關資訊 ?
The Profile Miner (Cont.)
經理人員
業務處
市場、規劃、銷售
行銷部
行銷、廣告、宣傳
客戶部
訂單、管理、分析
具有代表性 P(w|c) 高區別能力 P(w|c) * Bc/ iP(w|ci) 強
S=P(w|c) * (Bc / iP(w|ci)管理處
內務、行政、管理
研發處
研發、生產、流程
品保部
品質、管理、測試
資訊部
資訊、系統、建置
電腦整合課
生產、整合、運用
……
……
……
生產管理系統建置與維護
生產品質維護
context
10
The IN Identifier
11
The IN Identifier (Cont.)
(1) For each category c, HitScorec 0;(2) For each pair (w, c), where w is a word in the query Q and c is a category,
(2.1) If sw,c > 1 and Support(w, c) minSupport,(2.1.1) ns (sw,c – 1) / (number of siblings of c);(2.1.2) HitScorec HitScorec + ns TF(w, Q);
(3) S The set of all categories; (4) While the target category has not been identified and interaction is still allowed, do
(4.1) Let p1 and p2 be two pedigrees (in S) with the highest average HitScore;(4.2) Let t1 and t2 be the categories with the highest HitScore in p1 and p2;(4.3) Display t1 and t2 (and their basic information) for the user to select;(4.4) If either t1 or t2 is exactly the target, return the space under the target;(4.5) Else if neither t1 nor t2 is of interest, S S – {the categories under t1 and t2};(4.6) Else if both t1 and t2 are of interest, g ClimbUp(common ancestor of t1 and t2), and return the space under g;(4.7) Else
(4.7.1) Let t be the category that is of interest;(4.7.2) If t is a leaf, g ClimbUp(father of t), and return the space under g;(4.7.3) Else S {the categories under t};
(5) Return S;
12
The IN Identifier (Cont.)
Finding two candidate categories for interaction
(1) (2) (3)
(4) (5)
p1
p2
t1t2
13
The IN Identifier (Cont.)
Function ClimbUp(f), where f is a category to start climbing (1) If f is the root, return f;(2) While the target category has not been identified and interaction is still allowed,
(2.1) fsibling A sibling of f;(2.2) funcle A sibling of the father of f;(2.3) Display fsibling and funcle (and their basic information) for the user to select;(2.4) If either fsibling or funcle is exactly the target, return the target;(2.5) Else if neither fsibling nor funcle is of interest, return f;(2.6) Else if both fsibling and funcle are of interest,
(2.6.1) f grandfather of f;(2.6.2) If f is the root, return f;
(2.7) Else if fsibling is of interest, return father of f;(2.8) Else return {f, funcle};
(3) Return f;
14
The IN Identifier (Cont.)
Generalization by climbing the hierarchy
Possible results of generalizationFinding two categories for generalization
fsibling
funclef
2.6
2.4
2.42.5
2.6
2.7
15
Experiment
Experimental Data Source: Yahoo! (http://www.yahoo.com) Coverage: Computers & Internet, Society and
Culture, and Science Size: 214 categories; depth: 8 Training data: 2216 documents Test data: 168 queries extracted from another set
of site summaries
16
Experiment (Cont.)
Each system could conduct at most 5 interactions for each query
System Description Note
INEED As described with two settings for minSupport: 0.001 and 0.0005.INEED-0.001
INEED-0.0005
BruteForceAs in most search engines, the whole information space is considered (no INI is conducted).
RandomCNThe system employs top-down navigation. At each level, two categories are randomly selected for the user to confirm.
Repeat 10 times
IdealCNThe system employs top-down navigation. At each level, the target is always in the candidates identified by the system.
NBThe output category is determined by the conditional probabilities of the query terms occurring the categories, with two feature set sizes: 5000 and 8000.
NB-5000
NB-8000
17
Experiment (Cont.)
Precision BruteForce was poor Interaction is good for
precision INEED improved 14%~2
0% w.r.t NB0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5
最大允許互動次數
Pre
cisi
on
INEED-0.001
INEED-0.0005
BruteForce
RandomCN
IdealCN
NB-5000
NB-8000
0.92
0.94
0.96
0.98
1
0 1 2 3 4 5
最大允許互動次數
Rec
all
INEED-0.001
INEED-0.0005
BruteForce 1
RandomCN
IdealCN
NB-5000
NB-8000
Recall INEED was good in both
precision and recall BruteForce and CN
achieved 100% recall INEED achieved 100%
recall using only 2 interactions
18
Experiment (Cont.)
00.10.20.30.40.50.60.70.8
1 2 3 4 5
最大允許互動次數
Prec
isio
n-ef
fect
iven
ess INEED-0.001
INEED-0.0005
RandomCN
IdealCN
NB-5000
NB-8000
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
最大允許互動次數
Rec
all-
effe
ctiv
enes
s
INEED-0.001
INEED-0.0005
RandomCN
IdealCN
NB-5000
NB-8000
Precision-effectiveness BruteForce was excluded INEED improved more
(19%~32%) w.r.t. NB interactions by INEED were more effective
Recall-effectiveness INEED performed best INEED improved 2%~2
0% w.r.t. NB
19
Experiment (Cont.)
0.92
0.94
0.96
0.98
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Rec
all
INEED-0.001
INEED-0.0005
BruteForce
RandomCN
IdealCN
NB-5000
NB-8000
0.448
0.64
0.418
0.646
0.469 0.4650.437 0.468
0
0.2
0.4
0.6
0.8
Precision Recall
INEED-0.001
INEED-0.0005
NB-5000
NB-8000
Precision vs.Recall BruteForec and CN
always achieved 100% recall
INEED performed best (its curve lied on the upper right corner)
When no interaction is allowed
INEED improved 38% recall w.r.t. NB
Precision of INEED improved 62% in the first interaction (NB only improved 29%)
20
Experiment (Cont.)
Test query:Virtual world featuring 3-D ray-traced graphics. Wander around, meet other netizens, and try to solve some puzzles. Features animation and sound clips,
Correct target identified by INEED:Computers and Internet → Multimedia → Virtual Reality → Exhibits
Erroneous category identified by NB:Computers and Internet → Software → Operating Systems → Windows → Windows 95
An example:
21
Application to Medical Informatics
Medical knowledge management People finding Knowledge finding
Medical information portal Online navigation guidance Cost-effective retrieval of information
22
Application to Medical Informatics (Cont.)
Medical e-community Community establishment & retention Information recommendation
Medical decision support Assimilation of new cases Retrieval & analysis of similar cases
23
Conclusion
Interactive INI as an essential component for the sharing, navigation, and recommendation of medical information and knowledge
INEED as an effective tool for interactive INI Exactly identify the information space that may satisfy the user’s
information needs Effectively interact with the user Intelligently reduce the user’s load in query formation and result
cognition
24
ThanksThanks