Upload
barry-mesman
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
1
DiQuest.com
Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
March, 2001
Next generation web Next generation web search and Question-search and Question-answering technologyanswering technology
DiQuest.com
Intelligent Dialog Interface SolutionIntelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
Oct, 2001
Gary Geunbae LeeGary Geunbae Lee
Dept. of CSE, Postech Dept. of CSE, Postech
& DiQuest.com& DiQuest.com
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
2
ContentsContentsContentsContents
Commercial e-solutions: search, QA, CRM
Natural Language Processing Technology
Information Retrieval Technology
Intelligent QA solutions
Conclusions
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
3
Conventional search engine Conventional search engine Conventional search engine Conventional search engine
Directory based Yahoo: everything
AOL search: web+AOL contents
Directhit: click monitoring for popular site top ranking
Looksmart: human compiled web site directory
Search based Altavista: you know
Excite: you know
Lycos: from search directory service
Fastsearch: first time 0.2 billion web page indexing
Inktomi: highly scalable indexing system
Google: link analysis (high precision)
Current trends: directory+ search integration
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
4
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
5
Recent NL search and QA systemsRecent NL search and QA systemsRecent NL search and QA systemsRecent NL search and QA systems
Internet search with natural language and intelligence askjeeves: horizontal question-answering
Northernlight: natural language and phrasal search (clustering)
Empas: korean natural language search (?)
Lexiquest: lexipacks: ontology/dictionary for specific domain (context search)
Oingo: meaning oriented search (big ontology)
Natural language question answering Neuromedia (nativeminds): chatter bot (Eliza technology)
Easyask: data-base question answering
Brightware: web, email question answering (faq finding), recommendation
inquizit technology: natural language semantic analysis (concept engine)
YY-software: automatic email answering
Answerlogic: wordnet based question-answering
Answers.com: faq finding
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
6
Interaction with customers for e-businessInteraction with customers for e-businessInteraction with customers for e-businessInteraction with customers for e-business
Internet users over 130m up to 350m by 2003 (eMarketer)
Internet commerce $1.3trillion by 2003 (Forrester research)
From e-commerce to e-business
Time
E-business
sophistication
contents
Transactions
communicationsIntelligent CRM
Customer historyPurchase likelihood
Staffing requirementsPrior information history
Corporate policy about serviceetc
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
7
Customer interaction channelCustomer interaction channelCustomer interaction channelCustomer interaction channel
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
8
CRM architecture – 3 different viewsCRM architecture – 3 different viewsCRM architecture – 3 different viewsCRM architecture – 3 different views
Integration of data warehousing & data mining, web call-center, automatic sales and marketing
Web-enabled
Operational
Analytical Collaborative
• Sales force automation• Marketing Automation• Field Service Automation• Customer Service/Support
• Data Warehouse• Data Mart• Marketing Automation• Data Marketing
• Voice(IVR,CTI,ACD)• e-Mail• Fax/Direct Mail• Web Site
Source: META Group, June 1999
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
9
World wide CRM marketWorld wide CRM marketWorld wide CRM marketWorld wide CRM market
• Application License
$8.3 billion
• Implementation
$5.2 billion
• SW Maintenance
$3.2 billion
Year 2003 CRM Year 2003 CRM marketmarket
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
10
Call Center
Inbound Calls Outbound Calls
Contact CenterFax
WWW / EmailTelephone
Kiosk
Sales Force AutomationDirect Mail
Call Center solutions: integration of media
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
11
ContentsContentsContentsContents
Commercial e-solutions: search, QA, CRM
Natural Language Processing Technology
Information Retrieval Technology
Intelligent QA solutions
Conclusions
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
12
NLP technology: eliza scriptingNLP technology: eliza scriptingNLP technology: eliza scriptingNLP technology: eliza scripting
<heading-0> "Rule Heading"
a:0.2 the rule activation level
p:35 *what*keyword* the pattern priority and word pattern
r:robot's reply
<work-0>
a:0.5
p:60 Wh *your*job*
r:I’m a full time Verbot
<leasure-2>
a:0.4
p:30 What time * your * job over.
r:I don’t get any time off, I always have to be here available for you.
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
13
POS tagging (with morpheme analysis)POS tagging (with morpheme analysis)POS tagging (with morpheme analysis)POS tagging (with morpheme analysis)
포항공대 이근배 교수님께서 신을 신고 신고하러 가신다 .[ 0,0 ( 0,0 )] 등 1.000000e+00(1.000000e+00) s< 문장시작 >([)[ 1,10( 1,1 )] 미 8.288423e-11(6.102822e-13) MPO< 포항공대 >( 포항공대 )[11,11( 2,2 )] 등 8.736421e-02(2.559207e-20) s<#>(#)[12,18( 3,3 )] 미 9.236515e-08(7.008548e-24) MPN< 이근배 >( 이근배 )[19,19( 4,4 )] 등 8.736421e-02(2.939022e-31) s<#>(#)[20,23( 5,5 )] 등 4.469725e+00(1.564634e-25) MC< 교수 >( 교수 )[24,26( 6,6 )] 등 1.373613e+02(1.504397e-25) -< 님 >( 님 )[27,30( 7,7 )] 등 1.307859e+01(1.831031e-25) jC< 이 >( 께서 )[31,31( 8,8 )] 등 8.736421e-02(7.678394e-33) s<#>(#)[32,34( 9,9 )] 등 3.250709e+00(3.667919e-27) MC< 신 >( 신 )[35,37(10,10)] 등 1.264760e+01(3.865534e-27) jC< 을 >( 을 )[38,38(11,11)] 등 8.736421e-02(1.621005e-34) s<#>(#)[39,41(12,12)] 등 5.807344e+00(1.021970e-28) DR< 신 >( 신 )[42,43(13,13)] 등 3.936314e+01(1.918250e-28) eCC< 고 >( 고 )[44,44(14,14)] 등 8.736421e-02(8.044147e-36) s<#>(#)[45,49(15,15)] 등 8.588220e-04(1.297090e-33) MC< 신고 >( 신고 )[50,51(16,16)] 등 2.626376e+01(1.404345e-33) y< 하 >( 하 )[52,56(17,19)] 등 1.445488e+03(1.043073e-31) eCC< 러 >( 러 )[52,56(17,19)] 등 1.445488e+03(1.043073e-31) s<#>(#)[52,56(17,19)] 등 1.445488e+03(1.043073e-31) DI< 가 >( 가 )[57,58(20,20)] 등 4.657808e+01(1.348953e-31) eGS< 시 >( 시 )[59,61(21,21)] 등 1.841659e+01(4.754894e-31) eGE< 는다 >( ㄴ다 )[62,64(22,22)] 등 1.250000e-07(1.365400e-38) s.<.>(.)[65,65(23,23)] 등 2.500000e-05(1.638481e-49) s< 문장끝 >(])
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
14
NLP technology: postag exampleNLP technology: postag exampleNLP technology: postag exampleNLP technology: postag example
POSTAG architecturePOSTAG architecture
Features Statistics+rule combination
Tight coupling with morpheme analysis
Morpheme graph representation
Pattern dictionary concepts for unknown
words
100,000 morpheme dic.
1,500 morpheme pattern dic.
Morpheme dicMorpheme dic
Morpheme pattern dicMorpheme pattern dicMorph.Morph.
AnalyzerAnalyzer
Morph.Morph.AnalyzerAnalyzer
Morphemegraph
POS taggPOS taggerer
POS taggPOS taggerer
Morph Morph adjacency tableadjacency table
POS BigramPOS Bigram
Syllable TrigramSyllable Trigram
Input sentence
ErrorErrorcorrectercorrecter
ErrorErrorcorrectercorrecter
ErrorErrorCorrection rulesCorrection rules
Error corrected Morpheme
graph
Parser, applicatiParser, applicationon
Parser, applicatiParser, applicationon
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
15
Unknown word guessingUnknown word guessing Unknown word guessingUnknown word guessing
morpheme pattern dic
Syllable constraints for each part of speec
h in Korean
Lexical probabilities for unknown words Syllable tri-gram equations
morpheme anlaysis with unknown word guessingmorpheme anlaysis with unknown word guessing
POS<lemma> (allomorph) [connect info.]
HIㅂ<ZV*ZVㅂ> (ZV*워) [축약>]
DIㅅ<ZV*ZV젓> (ZV*젓) [규>]
HIㅂ<ZV*ZVㅂ> (ZV*워) [축약>]
Pattern dic for unknown wordsPattern dic for unknown words
),|Pr(#),|Pr( 만종종박만
)Pr(
)|Pr(
MPN
MPN 박종만
),|#Pr()#,|#Pr( 박종박
Morpheme dicMorpheme dic
Morpheme pattern dicMorpheme pattern dic
Morph.Morph.AnalyzerAnalyzer
Morph.Morph.AnalyzerAnalyzer
Morph Morph adjacency tableadjacency table
Input sentence
FilterFilterFilterFilter Filtering info.Filtering info.
Filtered Morpheme
graph
POS taggerPOS taggerPOS taggerPOS tagger
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
16
Syntactic parsingSyntactic parsingSyntactic parsingSyntactic parsing
/complete/end---s< 문장시작 >([)complete \ /np---MCC< 엄마 >( 엄마 ) \ /v/(v\np[j 가 ]) \ / \(v/(v\np[j 가 ]))\np---jC< 이 >( 가 ) \ /v[D] \ / \ /np---TCH< 아이 >( 애 ) \ / \ /v/(v\np[j 에게 ]) \ / \ / \(v/(v\np[j 에게 ]))\np---jC< 에게 >( 에게 ) \ / \v[D]\{np[j 가 ]} \ / \ /np---MCC< 심부름 >( 심부름 ) \ / \ /v/(v\np[j 를 ]) \ / \ / \(v/(v\np[j 를 ]))\np---jC< 을 >( 을 ) \ / \v[D]\{np[j 가 ],np[j 에게 ]} \ / \v[D]\{np[j 가 ],np[j 를 ],np[j 에게 ]}---DR< 시키 >(시켜 ) \ /vp[ 었 ] \ / \vp[ 었 ]\v---eGSt< 었 >( ㅆ ) \ /s[ 서술 ] \ / \s[ 서술 ]\vp---eGEs< 다 >( 다 ) \ /s[ 서술 ] \ / \s\s---s.<.>(.) \end \end\X---s< 문장끝 >(])
엄마가 아이에게 심부름을 시켰다
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
17
Syntactic parsing: pospar exampleSyntactic parsing: pospar exampleSyntactic parsing: pospar exampleSyntactic parsing: pospar example
Functional applicationX/(Args {Y}) Y X/ArgsY X\(Args {Y}) X\Args
CompositionX/(X\ArgsX) Y/(Y\ArgsY)
X/(X\(ArgsX ArgsY))Y\ArgsY X\(ArgsX {Y})
X/(X\(ArgsX ArgsY))
CoordinationX CONJ X X
Variable category$v, $vp
Featured categoryv : D, H, I, Evp : 었 , 었었 , 고있 , 어있 , 겠 , 더 , 시s : 평서 , 의문 , 명령 , 청유 , 약속 , 문장np : j 이 , j 를 , j 에게
morpheme category Syntactic dic.
DIㄹ <날 >[*] v \{np[j가 ]} DI여 <하 >[*] v \{np[j가 ], np[j를 ]} eGEm<어 라 >[*] s[명 령 ] \v
Syntactic pattern dic MC*<*>[*] n MC*<*>[*] np
Syntax dicSyntax dic
Syntax pattern dicSyntax pattern dic
Parse tree
Syntactic Syntactic AnalyzerAnalyzer
Syntactic Syntactic AnalyzerAnalyzer
Syntactic CateSyntactic Category Trigramgory Trigram
Korean CCGKorean CCG
Morpheme graph
Semantic Semantic AnalyzerAnalyzer
Semantic Semantic AnalyzerAnalyzer
Syntactic dic. and Syntactic pattern dic.Syntactic dic. and Syntactic pattern dic.
POSPAR architecturePOSPAR architecture
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
18
Semantic analysis: exampleSemantic analysis: exampleSemantic analysis: exampleSemantic analysis: example
자연어처리를 전공한 교수가 가르치는 과목은 ?(What is the course name that a professor whose major is NLP teaches?)
--------------------------- Semantic Result --------------------------
Scope: [0, 17][ques, [contra, term(<quant,bare,sing>,X7, [and, [course,X7], [teach,EV3, term(<def,bare,sing>,X6, [and, [professor,X6], [major,EV1,X6, term(<quant,bare,sing>,X1, [NLP,X1])]]),X7,\_:p[j 에 ];0F]])]]
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
19
Semantic analysis: system overviewSemantic analysis: system overviewSemantic analysis: system overviewSemantic analysis: system overview
Morphological Analyzer
QLF StructuresSyntactic Trees
Semantic AnalyzerPOS Tagger
Semantic Dictionaries(base/dom/pat/user/rel)
Thesauri
Semantic-based ApplicationsSemantic-based Applications
Slot-Filler Generator
Input Sentence
K-CCG Parser
Topic/Subject Extractor...… ...…
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
20
NLP technology: Korean WordNetNLP technology: Korean WordNetNLP technology: Korean WordNetNLP technology: Korean WordNet
Map Korean words to other existing thesaurus (WordNet)
Using bi-lingual dictionary
Automatic mapping tools using WSD techniques
Korean word English word WordNet synset
kwi_j
ew1
ws1
ewm
…
ws2
wsk
wsn
… …
…
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
21
NLP technology: Korean WordNetNLP technology: Korean WordNetNLP technology: Korean WordNetNLP technology: Korean WordNet
Multiple heuristics for WSD
Maximum similarity
Prior probability
Sense ordering
IS-A relation
Word match
Cooccurrence
Combining heuristics with machine leaning techniques
Decision tree
Logistic regression
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
22
ContentsContentsContentsContents
Commercial e-solutions: search, QA, CRM
Natural Language Processing Technology
Information Retrieval Technology
Intelligent QA solutions
Conclusions
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
23
Several gaps in the searchSeveral gaps in the searchSeveral gaps in the searchSeveral gaps in the search
Search engineSearch engine
tasktask
Info needInfo need
Verbal formVerbal form
queryquery
resultsresults
Query refinementQuery refinement
webweb
Mis-conceptionMis-conception
Mis-translationMis-translation
Mis-formulationMis-formulation
Polysemy/synonymyPolysemy/synonymy
interactiveinteractive
QA (askjeeve)QA (askjeeve)
Nlp query Nlp query
(easyask, lexiquest)(easyask, lexiquest)
Queries in context Queries in context (domain)(domain)
(autonomy, verity)(autonomy, verity)
Clustering (northernlight)Clustering (northernlight)
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
24
Web search vs classical IRWeb search vs classical IRWeb search vs classical IRWeb search vs classical IR
Classical IR
Fixed document corpus
Document relevancy is the goal
Contexts (domain) and individual users (preferences) ignored
Web search
Public web: static + dynamic (generated from RDB)
High quality ranking is the goal (meet the user need given poor query and heterogeneity
of the web)
Various needs such as informational, navigational, transactional
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
25
Search engine techniquesSearch engine techniquesSearch engine techniquesSearch engine techniques
First generation TF/IDF from standard IR
Use only page data (text data)
Html parsing for weighting
Second generation Use off-page and web specific data
Such as link (connectivity) analysis, click-through data (relevance feedback), anchor-
text data
Third generation Answer the need behind the query
Semantic analysis, context determination, dynamic corpus from RDB, validity
(authority), cross-lingual/cross-media, question-answering, specific enterprise site
search, etc
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
26
ⓝⓝ constructing high-quality corpus: Web constructing high-quality corpus: Web ROBOTROBOTⓝⓝ constructing high-quality corpus: Web constructing high-quality corpus: Web ROBOTROBOT
Web-page Filtering
• URL Domain Filtering• File Type Filtering• URL Name Filtering
11st st trying trying TargetTarget
Web DocumentWeb DocumentFiltered Target URL
File info.(Date, Size, Link)
File Collection & Management
• manage independent
site • saved by URL Hierarchy• make Log Files
DomainSite 1
Smart Updating
• save new Web-page• overwrite updated-page
Saved File
Collected File.
Result File Pool
WEBtagger
Result File Manager
WEBtagger
Result File Manager
DomainSite 2
Updating sameUpdating sameWeb DocumentWeb Document
Various User-Input Option Various User-Input Option ::Filtering ConstraintsFiltering Constraints
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
27
High quality corups: WEB preprocessingHigh quality corups: WEB preprocessingHigh quality corups: WEB preprocessingHigh quality corups: WEB preprocessing
Result Result File PoolFile Pool
• Entry Dic, Pattern DicEntry Dic, Pattern Dic• Postag Noun Trie Dic.Postag Noun Trie Dic.• C4.5 RuleC4.5 Rule
HTML Refiner HTML Refiner
Sentence ExtractorSentence Extractor
Word Spacing Corrector
Word Spacing Corrector
Result File ManagerResult File Manager
nROBOT (Web Crawler)
nROBOT (Web Crawler)
• Regexp Patterns RuleRegexp Patterns Rule• Heuristic RuleHeuristic Rule• Abbreviation DicAbbreviation Dic• Symbol-Delimiter DBSymbol-Delimiter DB• C4.5 RuleC4.5 Rule
• Tag Corrector & ParserTag Corrector & Parser• Garbage String FilterGarbage String Filter
Web documentWeb documentInputInput
POSTAGPOSTAG TTS SystemTTS SystemSAASAA
XML DOC.Form A Form B
POSNIRPOSNIR
XML DOC. Form D Form C
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
28
High quality corpus: Automatic indexingHigh quality corpus: Automatic indexingHigh quality corpus: Automatic indexingHigh quality corpus: Automatic indexing
Morpheme Analysis
POS Tagging
Term Extraction
Term weighting
Documents
index DB
Indexing architecture Based on general morpheme tagging
Term Extraction
nominals - single terms compound noun generation
– using rules automatically learned– filtering through precision (preventing over-gener
ation)
compound noun segmentation– based on mutual information
Term weighting
for document ranking based on TF, IDF measures
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
29
Compound nouns in indexingCompound nouns in indexingCompound nouns in indexingCompound nouns in indexing
POSNIR features Noun extraction
ex) 철수는 회의에서 그 사건을 보고할지도 모른다 . (Chulsoo may report the ac
cident at the meeting) bogo(O)
ex) 지도를 보고 길을 찾는다 . (see a map and find a load) bogo(X) Compound noun segmentation
Compound noun patterns plus statistical collocation (mutual informa
tion)
ex) 대학생선교회 (undergraduate missionary) 대학생 / 선교회 (O), 대학 (un
iversity)/ 생선 (fish)/ 교회 (church) (X) Compound noun indexing (phrasal indexing)
Using automatically acquired extraction rules
Broad coverage of compound noun pattern recognition
ex) 증기로 움직이는 기관차 (locomotive operating by steam) 증기 (
steam)/ 기관차 (locomotive)
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
30
Dealing with user queries: NL queryDealing with user queries: NL queryDealing with user queries: NL queryDealing with user queries: NL query
NLP Engine
MorphemeAnalysis
Tagging
Query Term Extractionand
Boolean FormulationDBDB
Boolean Operationand Ranking
DB Search
NL QuerySearh Result
Tagged
Sequence
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
31
Humans extracts meaning in many linguistic levels Humans extracts meaning in many linguistic levels but current web search is only counting words – Is but current web search is only counting words – Is it enough?it enough?
Humans extracts meaning in many linguistic levels Humans extracts meaning in many linguistic levels but current web search is only counting words – Is but current web search is only counting words – Is it enough?it enough?
Part of words – morpheme
Word order
Word lexicals
Text structure or document structure
Clue words/cue phrase
Pronunciation/prosody
World knowledge
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
32
NLP helps high-precision web searchNLP helps high-precision web searchNLP helps high-precision web searchNLP helps high-precision web search
Information retrieval dilemma
Hard to ask right questions
Too much information
Irrelevant information
No information (phrase mismatch)
NLP tools to help avoiding information dilemma
Context of words: collocations
Syntax cues:how word is used
Concept mapping with clustering
Interactivity by clarifying dialog
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
33
Other Related Other Related TechnologyTechnologyOther Related Other Related TechnologyTechnology
IR Application
XML & KM
MachineLearning
Q/A Applicatio
n
Domain Ontology Mgmt ToolDomain Ontology Mgmt Tool
Contents Auto-BuilderContents Auto-Builder
Text PreprocessorText Preprocessor
Intelligent Web RobotIntelligent Web Robot
Text SummarizerText Summarizer
Text CategorizerText Categorizer
Document CategorizationDocument Categorization
Similar Text ClusteringSimilar Text Clustering
Information ExtractionInformation Extraction
Wrapper InductionWrapper Induction
K-Wordnet Auto-Builder K-Wordnet Auto-Builder
Answer SuggesterAnswer Suggester
Multi-Lingual IR EngineMulti-Lingual IR Engine Fuzzy-SQL GeneratorFuzzy-SQL Generator
Shopping Aid Agent SolutionShopping Aid Agent Solution
NL-Query AnalyzerNL-Query Analyzer
FAQ Finder SolutionFAQ Finder Solution
Korean NLP Core Engine
POS-TaggingPOS-Tagging
Syntactic AnalysisSyntactic Analysis
Semantic-Discourse AnalysisSemantic-Discourse Analysis DBQ/A SolutionDBQ/A Solution
Comp-Noun AnalyzerComp-Noun Analyzer
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
34
ContentsContentsContentsContents
Commercial e-solutions: search, QA, CRM
Natural Language Processing Technology
Information Retrieval Technology
Intelligent QA solutions
Conclusions
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
35
The third generation search engineThe third generation search engineThe third generation search engineThe third generation search engine
Natural language question-answering : Answer-providing for dialog
questions
Answer sentence extraction (DiQuest d-Answer) Pre-defined question types Semantic-level processing of NL query
Answer finding from FAQ (DiQuest e-Answer) Systematic construction of FAQ Finding semantically same questions from FAQ list Email/Web call center applications
Answer finding from R-Database (DiQuest db-Answer) Finding answers from R-DB attributes SQL conversion from natural language query
Companies Neuromedia, Answerfriend, Answers.com, Brightware, Answerlogic, Easyask, etc.
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
36
Easy Interface with Natural DialoguesEasy Interface with Natural DialoguesEasy Interface with Natural DialoguesEasy Interface with Natural Dialogues
DiQuest Q/A : Total dialog information retrieval solutions Easy and accurate information retrieval using natural language dialog
Retrieval from any information source including internet/intranet web documents, FAQ knowledge, data
bases
DiQuest
Q/A Solution
DiQuest
d-Answer
DiQuest
e-Answer
DiQuest
db-Answer
Other NLP
Applications
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
37
Why Dialog Web Interface?Why Dialog Web Interface?Why Dialog Web Interface?Why Dialog Web Interface?
Efficiency: no need for web surfing Accuracy: exact description of search Convenience: using everyday dialog sentences Customer satisfaction guaranteed!
Easy to catch customers’ needs in natural language query (Not easy to catch customers’ needs using only keywords query) Customer-oriented Web content management Customer–oriented FAQ K/B construction and maintenance Personal profile management for each customer (CRM)
Customer-Side
Company-Side
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
38
Spectrum of ProductsSpectrum of ProductsSpectrum of ProductsSpectrum of Products
CRM/KM/E-commerce Information retrieval Document processing Language processing
shopping mall
retrieval
Wireless question
answering
FAQ finding
NL-SQL conversion
Answer indexing
Complex term
indexing
Morphology
Syntax
Email/web call center
Intranet question
answeringVertical IR
Answer sentence extraction
Question type
processing
Structure indexing
Semantics Dialogs
ServiceService PackagePackage ComponentComponent LibraryLibrary
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
39
Branded ProductsBranded ProductsBranded ProductsBranded Products
BrandBrand propertiesproperties performanceperformance applicationsapplications
dAnswerdAnswer
eAnswereAnswer
dbAnwerdbAnwer
• Vertical retrieval• High speed indexing• Answer sentence extraction• Optimized retrieval
• Vertical retrieval• High speed indexing• Answer sentence extraction• Optimized retrieval
• Answer finding from FAQ knowledge base• Real time FAQ construction/indexing• Possible fusion with d-Answer/ db-Answer
• Answer finding from FAQ knowledge base• Real time FAQ construction/indexing• Possible fusion with d-Answer/ db-Answer
• SQL feature computation• Automatic vocabulary construction• Optimized for given RDB schema
• SQL feature computation• Automatic vocabulary construction• Optimized for given RDB schema
• 0.1 million doc. answer sentence extraction (about 1sec response)• 1 million doc vertical IR 1 million doc vertical IR (about 0.3 sec)(about 0.3 sec)• platform: Linux, Solaris, HPUX
• 0.1 million doc. answer sentence extraction (about 1sec response)• 1 million doc vertical IR 1 million doc vertical IR (about 0.3 sec)(about 0.3 sec)• platform: Linux, Solaris, HPUX
• Over 10,000 FAQ doc.Over 10,000 FAQ doc. (about 0.3 sec response)(about 0.3 sec response)• More than 1000 simultaneous access• platform: Linux, Solaris
• Over 10,000 FAQ doc.Over 10,000 FAQ doc. (about 0.3 sec response)(about 0.3 sec response)• More than 1000 simultaneous access• platform: Linux, Solaris
•100% retrieval accuracy100% retrieval accuracy• Over 100,000 records (0.3 sec response)• platform: Linux, Solaris
•100% retrieval accuracy100% retrieval accuracy• Over 100,000 records (0.3 sec response)• platform: Linux, Solaris
• Document search for KM/internet/portals• Answer finding for KM/intranet• High precision search for wireless application
• Document search for KM/internet/portals• Answer finding for KM/intranet• High precision search for wireless application
• Email call center• Web call center• Automatic FAQ knowledge base construction• CRM analysis
• Email call center• Web call center• Automatic FAQ knowledge base construction• CRM analysis
• Product search for e- commerce (B2B, B2C, B2G)• Employ portal/business portal• Intranet/KM DB search
• Product search for e- commerce (B2B, B2C, B2G)• Employ portal/business portal• Intranet/KM DB search
competitorscompetitors
• Verity• Askjeeves• Verity• Askjeeves
• Brightware• Egains• Brightware• Egains
• Easyask• ELF/Microsoft• Easyask• ELF/Microsoft
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
40
DiDiQQuest uest dd-Answer-Answer: Vertical IR agent with answer extraction: Vertical IR agent with answer extractionDiDiQQuest uest dd-Answer-Answer: Vertical IR agent with answer extraction: Vertical IR agent with answer extraction
High precision optimizable IR engine Horizontal IR limitations : focusing high speed indexing, sacrificing high precision Why Vertical IR?
User intention analysis using language processing
Optimization possible for specific domain/portal
Intelligent IR engine for answer sentence extraction Conventional natural language IR (e.g. askjeeves) limitations
Only provide documents which possibly include query terms
It is the USER who needs to find exact information in the documents Why Q/A System ?
Provide direct answers (information) rather than thousand of documents
Towards true meaning of information retrieval (next generation IR)
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
41
DiQuest
d-Answer
Result
File & Answer
“R 관”“ 공학관 7 층”
Query Analysis
DiDiQQuest uest dd-Answer-AnswerDiDiQQuest uest dd-Answer-Answer
DiQuest d-Answer Merits Spectrum of solutions from high precision IR to intelligent question-answering
system with natural language dialog query Web site question answering engine: extract sentences that contain possible
answers as well as documents for users’ questions
DiQuest d-Answer : Question Example
“ 삼성그룹 회장은 ?” (Who is the chairman of Samsung group?)
“ 야후코리아의 홈페이지 주소와 김경희 팀장의 이메일은 ?”
“ 야후코리아의 사장은 누구인가”
“ 윈도우 미의 가격 ?” (What is the price of Windows ME?)
“ 물건 반납에 관한 것을 상담하려면 어디에 전화해야 하나요 ?”
“ 화공과는 어디에 있나요” (Where is the CE dept.?)
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
42
DiDiQQuest uest dd-Answer Preview-Answer PreviewDiDiQQuest uest dd-Answer Preview-Answer Preview
Answer Suggestions
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
43
DiQuest SiteQ – Natural Language Answer Extraction
System Architecture
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
44
DiDiQQuest uest ee-Answer-AnswerDiDiQQuest uest ee-Answer-Answer
FAQ finding engine FAQ : frequently asked question knowledge-base (question/answer pairs)
80% of user questions can be processed using well constructed FAQ lists
Automatically finding optimized answers from FAQ lists
Reducing email/phone calls using automatic FAQ finding solutions (customer satisfaction increased)
Finding semantically same questions from FAQ knowledge-base
Exact pin-pointing of users’ question intentions
Structural analysis of sentences for finding same-meaning questions
Highly precise retrieval using specialized analysis for question and answer parts in faq KB
Conventional keyword IR techniques cannot retrieve semantically same questions !!
Intelligent answering agent with FAQ
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
45
DiDiQQuest uest ee-Answer Preview -Answer Preview (1)(1)DiDiQQuest uest ee-Answer Preview -Answer Preview (1)(1)
Answer Suggestion
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
46
DiDiQQuest uest ee-Answer Preview -Answer Preview (2)(2)DiDiQQuest uest ee-Answer Preview -Answer Preview (2)(2)
e-Answer combined with d-Answer
Answer Suggestion
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
47
System Architecture
DiQuest FAQ Finder
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
48
DiDiQQuest uest DBDB-Answer-AnswerDiDiQQuest uest DBDB-Answer-Answer
Database search engine information retrieval in the relational database (using SQL computation)
automatic term indexing by analyzing running database
Translate users natural language questions into standard SQL for relational database computing
Recursive natural language query (automatic query refinement)
Fusion solutions with e-Answer and d-Answer
Integrated search for product description texts with product database
Integrated search for web documents with highly variable data in structured database
Intelligent RDB search Engine
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
49
DiDiQQuest uest DBDB-Answer Preview-Answer Preview(1)(1)DiDiQQuest uest DBDB-Answer Preview-Answer Preview(1)(1)
자연어 질의 분석 후 SQL 생성
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
50
DiDiQQuest uest DBDB-Answer Preview-Answer Preview(2)(2)DiDiQQuest uest DBDB-Answer Preview-Answer Preview(2)(2)
이전 결과에 대한 담화 (Discourse) 유지
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
51
DiQuest DBQ – Natural Language SQL Interface
System Architecture
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
52
Web Total Web Total Q/AQ/A System Architecture System ArchitectureWeb Total Web Total Q/AQ/A System Architecture System Architecture
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
53
E-commerce applicationsE-commerce applicationsSAA (Shopping Aid Agent) – web mining back-end SAA (Shopping Aid Agent) – web mining back-end solutionsolution
E-commerce applicationsE-commerce applicationsSAA (Shopping Aid Agent) – web mining back-end SAA (Shopping Aid Agent) – web mining back-end solutionsolution
Web robots Category specific web crawling (remove duplicates)
Categorizer Categorize the web documents into the pre-defined domain classes
Extractor Web information extraction to build R-db extraction using mDTD (modified Document Type Definition) Sequential mDTD learning to generate new mDTD rules
Natural Language query to automatically constructed RDB
Comparison-based shopping, automatic job search, continued-educ
ation
Whizbanglabs.com (from CMU)
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
54
E-commerce SAAE-commerce SAAE-commerce SAAE-commerce SAA
SGML Documents
DTD
DocType Definition
Analysis & Encoding
Training Documents(structured HTML)
mDTD
Learning Extraction
Web Documents(structured and
semi-structured Documents)
Basic Idea
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
55
E-commerce SAAE-commerce SAAE-commerce SAAE-commerce SAA
ExampleExtraction
mDTDParsing
Learned mDTD
Structured Documents
Seed mDTDs
SequentialLearning
Domain DB for
AV
ExtractionSlot Filling
DB building
Sequential mDTD Learner
Extractor
Web Robot
Seed URLs
HTMLGathering
Categorizer
mDTDParsing
Learned mDTD
Structured /Semi-structured Documents
HTMLDocument
s
knnBi-categorizing
DomainDocument
s
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
56
ContentsContentsContentsContents
Commercial e-solutions: search, QA, CRM
Natural Language Processing Technology
Information Retrieval Technology
Intelligent QA solutions
Conclusions
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
57
NLP QA/vertical search applicationsNLP QA/vertical search applicationsNLP QA/vertical search applicationsNLP QA/vertical search applications
Internet/intranet vertical retrieval
eCRM/web-based CRM (automated call center)
Comparison based e-shopping mall/meta mall
WAP enabled PDA/cell phone retrieval
KMS embedded solutions
Voice enabled retrieval/ voice portal retrieval
DiQuest.com Intelligent Dialog Interface Solution Intelligent Dialog Interface Solution for friendly User Interactions in Internet WEB Environmentfor friendly User Interactions in Internet WEB Environment
58
Future perspectivesFuture perspectivesFuture perspectivesFuture perspectives
Long term future Apple’s bow tied man -- new millenium dream
SF films -- “angel” in “disclosure” movie
HAL in space odyssey 2001 (forever dream?)
Short term future General magic’s portico system (http://www.genmagic.com/portico/portico_home.shtml)
Microsoft persona project -- peedy (http://msdn.microsoft.com/workshop/c-frame.htm#/
workshop/imedia/agent/default.asp)
Diquest.com – total QA solution (www.diquest.com demo)