View
214
Download
0
Category
Tags:
Preview:
Citation preview
Towards Learning Dialogue Structures from Speech Data and Domain Knowledge:
Challenges to Conceptual Clustering using Multiple and
Complex Knowledge Source
Jens-Uwe MollerNatural Language Systems Division,
Dept. of Computer Science, Univ. of Hamburg
Overview Dialog modeling based on a set of units
called dialog act Dialog acts from theory doesn’t fit with
a specific domain Labeling dialog is time consuming and
subjective learn an application specific dialog acts
from speech data using conceptual clustering
The learning task Learning dialog acts from turns Unsupervised classification (no
prior definition of dialog acts is given)
Hierarchy classification with inspectable classifying rules
Features Domain knowledge: structure of task, task
knowledge represented by goals and plans Word recognizer: word hypotheses Prosodic data: Pause & Stress mark
important unit Lexical semantics Syntax (less important in spoken dialog) Semantics (larger units of lexical
semantics)
COWEB Symbolic machine learning algorithm Build a classification tree Distinction between subnodes are made
from a function overall attribute Support probabilistic data Support multiple overlapping
hierarchies (for ambiguous case) Can handle multiple entries of one
attribute (e.g. stream of words)
COWEB (2) Learning from simultaneous events Learn from structure data:
Conceptual Graphs. Learn case descriptions from
terminological descriptions Subsumption = correclation
criterion over structured data. e.g. subsumption of individuals to classes
Metrics for Measuring Domain Independence of
Semantic Classes
Andrew Pargellis, Eric Fosler-Lussier, Alexandros Potamianos, Chin-Hui Lee
Dialogue Systems Research Dept., Bell Labs, Lucent Technologies Murray Hill,
NJ, USA
Introduction Employ semantic classes
(concepts) from another domain Need to identify domain-
independent concepts base on comparison across domain
Domain-independent concepts should occur in similar syntactic (lexical) contexts across domains
Comparing concepts across domains
Concept-comparison method
Concept-projection method
Concept-comparison method Find the similarity between all pairs of
concepts across the two domains Two concepts are similar if their
respective bigram contexts are similar Use left and right context bigram
language models
Kullback-Leibler (KL) distance Compare how san francisco and newark
are used in the Travel domain with how comedies and westerns are used in the Movie domain
Distance between two concepts
Concept-projection method How well a single concept from one domain
is represented in another domain. How the words comedies and westerns are
used in both domains
Useful for identifying the degree of domain-independence for a particular concept.
Result: Concept-comparison
Result: Concept-projection
Concept Example
Semi-Automatic Acquisition of Domain-Specific Semantic
Structures
Siu K.C., Meng H.M.Human-Computer Communications Laboratory
Department of Systems Engineering
and Engineering Management
The Chinese University of Hong Kong
Grammar induction Use unannotated corpora Portable across domain & language Output grammar has reasonable
coverage of within-domain data and reject out-of-domain data
Amenable to interactive refinement by human
Support optional injection of prior knowledge
Spatial clustering Use kullback-liebler distance. use left and right context. Consider word with pre-set
minimum occurrence. (set to 5) use left and right context. Consider
word w1, w2 (later be c1, c2) pair-wise for words that have a least pre-set minimum occurrence. (set to 5)
Temporal clustering Use Mutual Information (MI). N-highest MI pairs are clustered
(N=5 in experiment)
Do spatial clustering and temporal clustering iteratively
Post-process by human
Automatic Concept identification In goal-
oriented conversations
Ananlada Chotimongkol and Alexander I. Rudnicky
Language Technologies Institute Carnegie Mellon
University
Concept identification First step towards the goal of
automatically inferring domain ontologies
Goal-oriented human-human conversation has a clear structure
This structure can be used to automatically identify domain topics, e.g. dialog classfication
Clustering algorithm Hierarchical clustering Mutual information based
Criterion=minimize the loss of average mutual information
Kullback-Lierbler based Criterion=word pair with minimum
distance
Evaluation metrics Reference concept from class-
based n-gram model Cluster concept=majority concept Precision Recall Singularity score (SS) Quality score (QS)
Recommended