View
219
Download
0
Category
Preview:
Citation preview
Center of Signal and Image ProcessingGeorgia Institute of Technology
Discovering Knowledge in and Extracting Information from
Multimedia Patterns
Chin-Hui LeeSchool of ECE, Georgia Institute of Technology
Atlanta, GA 30332, USAchl@ece.gatech.edu
(Most work finished in Bell Labs, some work done while visiting NUS in 2001-2002)
ISIMP2004, HKPolyU, Oct. 21,2004
2 Center of Signal and Image ProcessingGeorgia Institute of Technology
Outline• Rich content of heterogeneous media patterns
– Text, audio, video, speech, image, object, graphics, sketch, etc. – Web is becoming the largest multimedia databases & playground
• 4M in human information processing technology– Multimedia, multi-modal, multi-lingual, multi-disciplinary
• Technology dimensions (more language engineering)– Parametrization, feature extraction, modeling, segmentation, etc.– Coding, synthesis, recognition, verification, understanding, etc.
• Knowledge discovery and information extraction – From spotting cues and events to understanding media patterns
• Summary and emerging opportunities
3 Center of Signal and Image ProcessingGeorgia Institute of Technology
Evolution of Language and Media
Paper Radio Historic Flow of Knowledge & Civilization
Print(1450AD)
Telegraph &Telephone
TV Computer & Digital
Processing
Hyper & Virtual
Media ? (21st Cen)
WrittenLanguage (3000BC)
SpokenLanguage
ElectronicMedia
(1900AD)Recording
Media
Internet & WWW
4 Center of Signal and Image ProcessingGeorgia Institute of Technology
Growth in Network TrafficGrowth in Network Traffic
Data
0
1000
2000
3000
97 98 99 0096 01
Voice
Tera
byte
s pe
r D
ay
Year
Voice Traffic:576 TB/day
Data Traffic:1178 TB/day at YE’00,
2136 TB/day at YE’01
5 Center of Signal and Image ProcessingGeorgia Institute of Technology
The Internet ExplosionThe Internet ExplosionThe Internet Explosion
Internet Hosts
CAGR since 1998 100%
traffictraffic
2,000,000,000 Web Pages
75,000,000
275,000,000 Worldwide Users
6 Center of Signal and Image ProcessingGeorgia Institute of Technology
Heterogeneous Multimedia Pages
Rich Content:
• Audio• Video• Image• Speech• Graphics• Objects• Comic Strips• Files.xxx• Links• Multilingual
7 Center of Signal and Image ProcessingGeorgia Institute of Technology
Picasso’s “Parade” (1917)
A Picture is worth more than a thousand words?
On display at IFC, HK
8 Center of Signal and Image ProcessingGeorgia Institute of Technology
Ubiquitous Wireless AccessUbiquitous Wireless Access(Mobile Info Access and Transactions)(Mobile Info Access and Transactions)
Devices
Services
Internet
Corporate Networks
LANs
Air Access Interface
Network
Any wireless device Any air interface Any desired network Any service
9 Center of Signal and Image ProcessingGeorgia Institute of Technology
Multimodal Access of Multimedia DBs(Research & Business Opps for Info Intelligence)
User Model
User Input
Keyboard
Speech
MM-pad
Speech Recognizer
Text Processing
Multimedia Presentation
User Intent Understanding
Audio/video Recognizer
Audio/Video Rendering
Indexed A/V Database
A/V Browser
InformationAppliance
Info Fusion
Raw A/VDatabase
Multimedia Processing
User Feedback
VideoMultimedia
IndexingAudioText
Info Fusion & Retrieval
Q&A Dialogue
Network
10 Center of Signal and Image ProcessingGeorgia Institute of Technology
Human Information Technologies & 4M• Multimedia Documents
– Audio, video, speech, image, text, chart, map, etc.– Indexing, retrieval, presentation, rendering, etc.
• Multi-Modal Human-Machine Interface (HCI)– Speech, gesture, point ‘n’ click, pen, MM sketch pad, etc.– Multiple sensory inputs and feedbacks
• Multilingual Information Sources– Multilingual human language understanding– Multilingual presentation, cross-language referencing
• Multidisciplinary Collaborative Research– Engineers, scientists, artists, psychologists, etc.– Human factors, behavior science, wide range of soft topics
11 Center of Signal and Image ProcessingGeorgia Institute of Technology
Human Language Engineering Abstraction
• Modeling of Input-Output Relationship– Shannon’s Channel Modeling and Decoding Paradigm
• Signal Processing of Linguistic Features– e.g. latent semantic analysis and vector space representation
• Similarity Measures between Documents– Clustering and modeling of linguistic events
• Machine Learning Techniques for Classifier Design• Document Classification, Verification, Understanding
– Many research and business opportunities
12 Center of Signal and Image ProcessingGeorgia Institute of Technology
Vector Space Representation of Queries & Documents (Latent Semantic Indexing)
qX
Credit CardServices
DepositServices
id
ConsumerLending
Home EquityService
LoanServicing
13 Center of Signal and Image ProcessingGeorgia Institute of Technology
Query Vector Feature Extraction• Text Pre-processing (SMART, Salton, 1971)
– extract root form of a word, e.g. check for checking– remove ignore words, e.g. um, uh– remove stop words, e.g. I would like to– count occurrences of remaining key terms
QueryVectorText
Speech
MorphologicalFiltering
Query-VectorExtractionASR
Stop/Ignore List
Key Term List
Center of Signal and Image ProcessingGeorgia Institute of Technology
LSA Based Feature Extraction• LSA Matrix (also known as Routing Matrix) C
– number of times word occurs in :– total number of words present in :– total number of occurs in A :– “indexing” power of in corpus A :– normalized entropy:
jijiij nnc ⋅−= /)1( ε
iw
iw
iw
jAjA
10log1log
1 ≤≤−=⋅⋅∑ = in
nN
j nn
Ni i
ij
i
ij εε
ijnsum)column (jn⋅
sum) row(⋅inii εη −=1
power indexing maximum if0 ⋅== iiji nnεprobable)(equally power no if1 N
niji
in ⋅==ε{
Center of Signal and Image ProcessingGeorgia Institute of Technology
LSA Feature Space• Mapping into Latent Semantic Space S
– each document vector (N column vectors of matrix C ) is mapped to an (1xR)-vector
– each term vector (M row vectors of matrix C ) is mapped to an (1xR)-vector
– each query vector (a new Mx1 vector) is mapped to an (1xR)-vector through the pseudo-document vector
– closeness in the S space is much easier measured for both document-document and term-term comparisons
jaSdv t
jtj =
Stu ii =ib
jaib
M
N
jd
Stu ii =
Sdv tj
tj =
=• •
it
200150000,100000,10(SVD)
−≈≈≈
=
RNM
TSDC t
S
Center of Signal and Image ProcessingGeorgia Institute of Technology
Confidence Scoring• Inner Product: tyxyxs •=),(
• Cosine:)],([cosor||||),( 1 yxsyx
yxyxst
−•=
• Confidence Scoring: Sigmoid function fitting1)( ]1[),;( −+−+= βαβα sesConf
• Other Scores– Euclidean, Manhattan, etc.
• Generalized Scores– between any two vectors: );,(),( Γ= yxfyxs
17 Center of Signal and Image ProcessingGeorgia Institute of Technology
Term Clustering• MMT characterizes all co-occurrences between
terms, the (i, j) cell of MMT infers the similarity between wi and wj
• Define a distance measureSuSu
uSuSuSuwwK
ji
Tji
2
),cos(),( ==
),(cos),( 1jiji wwKwwD −=
jiji
• Given D, one can perform word clustering using any clustering algorithm, e.g. K-Means
• For document clustering, use MTM instead
18 Center of Signal and Image ProcessingGeorgia Institute of Technology
K-Means Term Clustering Example• 9492 words into 100 clusters (one example)
oub bank Singapore cent uob db account share singtel trade Bangkok manage save entity annual ocbc tangible debt stikeppel custom transact currency deposit card sixth citibank integer subscribe handset creation loan auditor merger autom merge sharehold attract uncondiasx optu sembawang ibra restructursingland landlord uic yaw sgx
19 Center of Signal and Image ProcessingGeorgia Institute of Technology
Document Clustering Example• 2000 documents into 100 clusters (one example)
N Korea Proposes Resumed Talks with S Korea-YonhapNorth Korea Proposes Resuming Talks with SeoulSouth Korea Set for Key Vote on Approach to NorthKorea to Replace Four to Eight Ministers on FridayS.Korea to Push North Policy Despite Kim Setback
……
20 Center of Signal and Image ProcessingGeorgia Institute of Technology
Conventional View on PR
Unknown Pattern dj
Classifier Ti
Classifier T1
Classifier Tm
…..
L1(dj)
Li(dj)
Lm(dj)Label by m-th classifiers
T/F?
T/F?
T/F?
…..
Modeling and recognition units are the same !
21 Center of Signal and Image ProcessingGeorgia Institute of Technology
Shannon’s Channel Modeling Paradigm –An Information Theoretic Perspective
OI IChannelP(O|I)
ChannelDecoder
( | ) ( )ˆ arg max ( | ) arg max( )I I
P O I P II P I OP O∈Γ ∈Γ= =
• Channel input is hidden (unobserved) while output is observed and used to infer the input (which is often approximated by a structural Markov model in many problems in speech, language and MM processing)
• Channel Modeling with (I, O) pairs in training• Modeling units are usually smaller than recognition units
22 Center of Signal and Image ProcessingGeorgia Institute of Technology
Other Applications in Pattern Recognition
Application Input Output P(I) P(O|I)
OCR Error Model
Character (Letter) LM
Noisy Letters
Actual Letters
Optical Char. Recognition
Tagging ModelPOS Tag LMWord Sequence
POS Tag Sequence
Part-of-Speech Tagging
Parsing ModelLM of Derivations
Word Sequence
Parse TreeParsing
Semantic Model
Concept LMWord Sequence
Semantic Concept
Text Understanding
Translation Model
Source LM
Target Sentence
Source Sentence
Machine Translation
Bio-genetic Model
LM of Nucleotides
Noisy DNA Sequence
Actual DNA Sequence
Bioinformatics
23 Center of Signal and Image ProcessingGeorgia Institute of Technology
Modeling Input-Output Associations• Artificial Neural Network (ANN)
– MLP functional approximation and input-output mapping• Classification and Regression Tree (CART)
– Multi-layer tree approximation• Support Vector Machine (SVM) and LVQ• Kernel-based, mixture of experts, Bayesian network • Other Machine Learning Techniques• Many New Applications
– Rule induction, statistical parsing, machine translation, etc.– Pronunciation modeling and multilingual transliteration– Information retrieval, text categorization, and call routing
24 Center of Signal and Image ProcessingGeorgia Institute of Technology
Hidden Markov Model (HMM) -Dynamic Time or Space Warping
PΛ(X|C) = ∑ PΛ(X, q|C)q
PΛ(X, q|C) = a0 Π aqt-1 qt bqt(xt)t
X = (x1, x2, x3, ….., xT )
• Each state represents a process ofmeasurable observations.
• Inter-process transition is governed by afinite state Markov chain.
• Processes are stochastic and individualobservations do not immediately identifythe hidden state.
HMM models spectral and temporal variations simultaneously!
25 Center of Signal and Image ProcessingGeorgia Institute of Technology
Text Categorization: Training Classifiers
(1) Feature Extraction &
Reduction(2) Classifier
Learning
Pi
Ni
Pi
Ni
Training set for each category Ci , i= 1,…,m. (Positive +Negative)
Classifier Tifor category Ci
Doc. in new feature space
26 Center of Signal and Image ProcessingGeorgia Institute of Technology
Related Work on Classifier Design• Decision Tree: Simple, popular, and powerful
classifier. Many available tools, C4.5, CART, ID3
( ) 01
,D
i ii
f X W wx w=
= −∑Linear discriminative function:
• Support Vector Machine (SVM)• Naïve Bayes: simple distributions for each class• K-Nearest Neighbor (kNN)• Semantic Perceptron Net (SPN)• Hidden Markov Model (HMM) • Discriminative Training
27 Center of Signal and Image ProcessingGeorgia Institute of Technology
Reading Tables in Documents (TTS)
COMPANY TODAY' S YESTERDAY' S OPEN CHANGE OPEN CHANGE BLUE I NC 75 1/ 2 + 1 1/ 8 74 9/ 16 - 4 1/ 4 GREEN. COM 89 1/ 4 + 2 88 5/ 8 - 2 13/ 16 RED I NC 22 1/ 4 + 5/ 16 21 13/ 16 - 3/ 8 YELLOW LTD 103 3/ 8 - 1 13/ 16 101 - 4 PURPLE I NC 27 11/ 16 - 2 5/ 8 27 5/ 8 - 1 1/ 8 BROWN. COM 68 + 11/ 16 66 11/ 16 - 1 5/ 8 PI NK LTD 130 7/ 16 + 1 1/ 16 130 - 2 3/ 8
Document understanding is needed before rendering !
28 Center of Signal and Image ProcessingGeorgia Institute of Technology
Web Information Access & PresentationNews Page (HTML)
Sampras volunteers for Davis Cup doublesduty
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Sampras …….----------------------------------------------------------
News Content(Text)
SummaryLinks
• Web data mining• Web content extraction• Topic detection and automatic summarization• Information rendering and presentation• Q&A construction for natural interface
29 Center of Signal and Image ProcessingGeorgia Institute of Technology
Image Segmentation & Annotation
• Concept definition needed?• What is image understanding?
“Building, sky, lake, tree, landscape”
30 Center of Signal and Image ProcessingGeorgia Institute of Technology
Concept vs. Content Based Search
GoogleATR ConceptSearch
Query Taxonomy
……
… …
ATR
31 Center of Signal and Image ProcessingGeorgia Institute of Technology
Multilingual IA (IIS/Taiwan)Top 4 keywords Top 4 keywordsImages Images
彩虹 (Rainbow)天氣 (Weather)花 (Flower)自然 (Nature)
向日葵 (Sunflower)花 (Flower)植物 (Plant)沙漠 (Desert)
海豹 (Seal)哺乳類 (Mammal)海岸 (Coast)動物 (Animal)
太陽系(Solar System)慧星 (Comet)熱帶魚(Tropical Fish)太空 (Universe)
瀑布 (Waterfall)地形 (Landform)自然 (Nature)蟑螂 (Cockroach)
狗 (Dog)哺乳類 (Mammal)穿山甲 (Pangolin) 羊(Sheep)
32 Center of Signal and Image ProcessingGeorgia Institute of Technology
Cross-Language Web Search (IIS/Taiwan)• A Web search service allows users to query in one language and
search documents that are written or indexed in another language.
33 Center of Signal and Image ProcessingGeorgia Institute of Technology
Audio Segmentation & Annotation (DP-Based Often Involved Segmental Models like HMM)
Audio Speech
34 Center of Signal and Image ProcessingGeorgia Institute of Technology
SpeechFind: Speech & Speaker AnnotationFully searchable online database of spoken word collections spanning the 20th century
http://svoice.colorado.edu (Bowen Zhou)
35 Center of Signal and Image ProcessingGeorgia Institute of Technology
Video & Audio Segmentation(Story Segmentation of Audiovisual Documents)
36 Center of Signal and Image ProcessingGeorgia Institute of Technology
Video Clip Browsing over IP on 3G
37 Center of Signal and Image ProcessingGeorgia Institute of Technology
From Web Search to Web Mining• Exploring the Development of Advanced IR
Techniques through Web Mining
Weblogs, texts, images, …
• Cross-Language IR• Concept Search • Personalized Search• Multimedia Search
Knowledge Discovery & Info Extraction
Search Engine
Language info Speaker ProfileImage SemanticsBackground infoTerm ExtractionFace/Object IDEtc.
• Anchor Texts• Query Term Logs• Query Session Logs• Audio/Image Banks
38 Center of Signal and Image ProcessingGeorgia Institute of Technology
Personal Media: A New Scenario
media miningcontent
analysis
authored story
semantic analysis media servernavigation
Specification Media Space Composition Presentation
39 Center of Signal and Image ProcessingGeorgia Institute of Technology
Summary• Rich content of heterogeneous media patterns
– Text, audio, video, speech, image, object, graphics, sketch, etc. – Web is becoming the largest multimedia databases & playground
• 4M in human information processing technology– Multimedia, multi-modal, multi-lingual, multi-disciplinary
• Technology dimensions– Parametrization, feature extraction, modeling, segmentation, etc.– Coding, synthesis, recognition, verification, understanding, etc.
• Knowledge discovery and information extraction – Spotting cues/events embedded in unconstrained media patterns
• Many emerging research opportunities
Center of Signal and Image ProcessingGeorgia Institute of Technology
Recommended