Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Lecture 6
Engineering Knowledge Base Query Agents
1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing5. Idea 3: Automatic Sentence-Level Paraphraser6. Evaluation
Silei Xu, Sina Semnani, Monica Lam 1
Stanford CS224v CourseConversational Virtual Assistants with Deep Learning
A.
A. enie: Open Pretrained Assistant
Pretrained Language Models
Read Form-filling Instructions
Fill Web Forms
Customer Support
APIs
DB Access Dialogues
GroundingPrimitives
GenericDialogue
Models
Agents
Transaction Dialogue State Machine
Restaurants Play Songs Turn on Lights
Abstract APIDialogues
DB Schemas
A.
A.
Summary: Methodology• Complete everything that the computer can do:
• What if the computer is missing functionality?
3
“Ask not what your country can do for you but
what you can do for your country?”
“Ask not what your user wants to know, but
what the computer can tell the user?”
John F. Kennedy:
Our Motto:
A.
A.
Summary: ThingTalk Query Representation• Completeness by mapping SQL to ThingTalk deliberately• Drop rename: since we can run join on columns with 2 different names• Unions, differences: support through inheritance supertable
• Match user’s mental model: more natural to understand by humans too!• SQL is not the most intuitive
• Precise, unambiguous, canonicalized• Facilitate training accuracy
• Inter-operate with action APIs (to be discussed later)
4
new!
A.
A.
Summary: Paraphrasing Methodology• Method• Synthesize canonical sentences (synthetic-text, logical form)• Human provides paraphrases (paraphrased-text, logical form)• No need to annotate
• Big plus: reduce cost of data acquisition• Limitations: • Still expensive• Inaccurate paraphrases• Lack of variety: resembles the original terminology• Does not work with real input
5
A.
A.
Outline
1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing5. Idea 3: Automatic Sentence-Level Paraphraser
6. Evaluation
6
A.
A.
Why is NL so Hard?
• Alternatives for just 1 fact: “Dr. Smith is Ann’s doctor”
7
Relation Part-of-Speech (POS) Unknown: Ann
Doctor Has-a Who has Dr. Smith as a doctor?
Is-a Who is Dr. Smith a doctor of ?
Active Verb Whom does Dr. Smith treat?
Passive Verb Who is treated by Dr. Smith?
Patient Has-a Who does Dr. Smith have as a patient?
Is-a Who is a patient of Dr. Smith?
Active Verb Who consults with Dr. Smith?
Passive Verb By whom is Dr. Smith consulted?
Unknown: Dr. Smith
Who does Ann have as a doctor?
Who is a doctor of Ann?
Who treats Ann?
By whom is Ann treated?
Who has Ann as a patient?
Who is Ann a patient of?
With whom does Ann consult?
Who is consulted by Ann?
A.
A.
Why is NL so Hard?
• Alternatives for just 1 fact: “Dr. Smith is Ann’s doctor”
• Type-based terminology• Example: an operation “>= ”
• Date/time: “later than”, “after”, …• Temperature: “higher than”, “warmer than”, “hotter than”, “over”,
…• Weight: “heavier than”, “over”, “more than”, …• Distance: “farther than”, “longer than”, …
8
A.
A.
Why is NL so Hard?
• Alternatives for just 1 fact: “Dr. Smith is Ann’s doctor”• Type-based terminology: “>=”• Alternative phrasing• Example: “restaurants with the highest rating”• “restaurants with best reviews”• “top-rated restaurants”• “best restaurants”
• Domain-specific “shortcuts” in NL• Word-level: “my father’s brother” à uncle• Sentence-level:
“Send a message to the sender of some message” à “reply”
9
A.
A.
Why is NL so Hard?
• Expressiveness: all database queries
• Variety in saying the same thing
10
PropertyLevel
SentenceLevel
Domain-independent Idea 1
Domain-dependent Idea 2 Idea 3
A.
A.
Outline
1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing
5. Idea 3: Automatic Sentence-Level Paraphraser6. Evaluation
11
STANFORDLAM
DB Constructs: Ground English to ThingTalkOperation English Template ThingTalk Example
Selection table with fname equal to value table, fname = value restaurants with rating equal to 3Projection the fname of table [fname] of table the cuisine of restaurantsSubquery the table1 of table2 table1,in_array(id,any(table1 of table2)) reviews of restaurant XJoin table1 with their table2 table join table2 restaurants with their reviews
Aggregatethe number of table count (table) The number of restaurantsthe op fname in table op (fname of table) The average rating of restaurants
Aggregate & Group by
the number of table in each fname count (table by fname) The number of restaurantsthe op fname1 in table in each fname2 op (fname1 of table by fname2) The average rating of restaurants
Ranking the n table with the min fname sort (fname asc of table)[1:n] the 3 restaurants with the min rating
Quantifiertable1 with table2 table1, contains(table2, any(table2) restaurants with review with …table1 with no table2 table1, !contains(table2, any(table2) restaurants with no review with …
Row-wise function
the distance of table from location [distance(geo, location)] of table The distance of restaurants from herethe number of fname in table [count(fname)] of table The number of reviews in restaurants
Canonical English templates: covers all queries
STANFORDLAM
Idea 1
Add more domain-independent templates
for variety
STANFORDLAM
Standard Variation in NLProperty Type Comparison Weight Lighter, heavierHeight Taller, shorterAge Older, youngerLength Shorter, longerSize Smaller, biggerPrice Cheaper, more expensiveSpeed Slower, fasterTemperature Colder, hotterTime Earlier, before, later, afterDuration Shorter, longerDistance Closer, nearer, farther, more distant
Subject Type Interrogative Words People WhoObject WhatTime WhenLocation Where
SentencePurpose Example Grammar
Declarative I am looking for …Imperative Search for ..Interrogative What is …Exclamatory —
From English grammar books
STANFORDLAM
NL: Connectives
restaurant that serves Italian cuisine and was rated 5 starsrestaurant with rating 5 and Italian cuisine5-star restaurant that serves Italian cuisineItalian restaurant with 5 stars5-star Italian restaurant
cuisine == “Italian” rating == 5
A.
A.
Property-Level Templates
16
POS Annotation Example Template Example utterance
Is-a Noun
Has-a Noun
Active verb
Passive verb
Adjective
Prepositional
alumniOf property in people table
alumni of <value> table who are [noun phrase] value people who are alumni of Stanford
a <value> degree table with a value [noun phrase] people with a Stanford degree
graduated from <value> table who [verb phrase] value people who graduated from Stanford
educated at <value> table [passive verb phrase] value people who educated at Stanford
<value> value table Stanford people
from <value> table [prepositional phrase] value people from Stanford
Based on POS (Part-of-Speech)
A.
A.
Genie Templates for English
• Kinds of templates
• Canonical templates: to ground English to all possible queries• Templates for attributes based on POS• Templates based on kinds of sentences, types, connectives
• Total: 900 templates
17
A.
A.
Quiz
• 900 templates! How much work is it?
• Is it worth it?
• How many templates are there in other languages?
18
A.
A.
Synthesis: Using Property Annotations (POS)
19
POS Annotation
Is-a Noun alumni of <value>
Has-a Noun a <value> degree
Active verb graduated from <value>
Passive verb educated at <value>
Adjective <value>
Prepositional from <value>
is-a noun
<value>Stanford
<table> who are <is-a noun>
alumni of Stanford
people who are alumni of Stanford
<table>People
<filtered table>
A.
A.
Synthesis: Using Property Annotations (POS)
20
POS Annotation
Is-a Noun alumni of <value>
Has-a Noun a <value> degree
Active verb graduated from <value>
Passive verb educated at <value>
Adjective <value>
Prepositional from <value>
<table> with <has-a noun>
<value>Stanford
<table> who have <has-a noun>
alumni of Stanford
people who are alumni of Stanford
people with a Stanforddegree
people who have a Stanford degree
<table>People
has-a noun
<filtered table>
A.
A.
Synthesis: Using Property Annotations (POS)
21
POS Annotation
Is-a Noun alumni of <value>
Has-a Noun a <value> degree
Active verb graduated from <value>
Passive verb educated at <value>
Adjective <value>
Prepositional from <value>
<value>Stanford
<table> who <active verb>
alumni of Stanford
people who are alumni of Stanford
people with a Stanforddegree
people who have a Stanford degree
people who graduated from Stanford
<table>People
active-verb
<filtered table>
A.
A.
Synthesis: Using Property Annotations (POS)
22
POS Annotation
Is-a Noun alumni of <value>
Has-a Noun a <value> degree
Active verb graduated from <value>
Passive verb educated at <value>
Adjective <value>
Prepositional from <value>
<value>Stanford
<table> who were <passive verb>
<table>People
alumni of Stanford
people who are alumni of Stanford
people with a Stanforddegree
people who have a Stanford degree
people who graduated from Stanford
people educated at Stanford
people who were educated at Stanford
<table> <passive verb>
passive-verb
<filtered table>
A.
A.
Synthesis: Using Property Annotations (POS)
23
POS Annotation
Is-a Noun alumni of <value>
Has-a Noun a <value> degree
Active verb graduated from <value>
Passive verb educated at <value>
Adjective <value>
Prepositional from <value>
<value>Stanford
<adjective> <table>
<table>People
alumni of Stanford
people who are alumni of Stanford
people with a Stanforddegree
people who have a Stanford degree
people who graduated from Stanford
people educated at Stanford
people who were educated at Stanford
Stanford people
adjective
<filtered table>
A.
A.
Synthesis: Using Property Annotations (POS)
24
POS Annotation
Is-a Noun alumni of <value>
Has-a Noun a <value> degree
Active verb graduated from <value>
Passive verb educated at <value>
Adjective <value>
Prepositional from <value>
<value>Stanford
<table> <prepositional>
<table>People
alumni of Stanford
people who are alumni of Stanford
people with a Stanforddegree
people who have a Stanford degree
people who graduated from Stanford
people educated at Stanford
people who were educated at Stanford
Stanford people
people from Stanford
prepositional
<filtered table>
A.
A.
Synthesis: Using Property Annotations (POS)
25
POS Annotation
Is-a Noun alumni of <value>
Has-a Noun a <value> degree
Active verb graduated from <value>
Passive verb educated at <value>
Adjective <value>
Prepositional from <value>
<value>Stanford
<table> <prepositional>
<table>People
alumni of Stanford
people who are alumni of Stanford
people with a Stanforddegree
people who have a Stanford degree
people who graduated from Stanford
people educated at Stanford
people who were educated at Stanford
Stanford people
people from Stanford
…
prepositional
<filtered table>
A.
A.
Synthesis: Multiple Filtersalumni of Stanford
people who are alumni of Stanford
people with a Stanforddegree
people who have a Stanford degree
people who graduated from Stanford
people educated at Stanford
people who were educated at Stanford
Stanford people
people from Stanford
…
employee of Apple
Apple as their employer
works for Apple
employed by Apple
…
alumni of Stanford who are employee of Applealumni of Stanford who have Apple as their employeralumni of Stanford who works for Applealumni of Stanford employed by Applealumni of Stanford who are employed by Applepeople who are alumni of Stanford and employee of Applepeople who are alumni of Stanford and have Apple as their employerpeople who are alumni of Stanford and works for Applepeople who are alumni of Stanford and are employed by Appleemployee of Apple with a Stanford degreePeople with a Stanford degree who have Apple as their employerPeople with a Stanford degree that works for Apple…
<filtered table> (1) <filtered table> (2) <filtered table> (1) & (2)
A.
A.
Synthesis: Add <search> To Get to Full Questions
27
alumni of Stanford
people who are alumni of Stanford
people with a Stanforddegree
people who have a Stanford degree
people who graduated from Stanford
people educated at Stanford
people who were educated at Stanford
Stanford people
people from Stanford
…
<generic verb for search> <filtered table>
search for
find
get
…
<generic verb for search> Search for alumni of StanfordSearch for people who are alumni of StanfordSearch for people who have a Stanford degreeSearch for people who graduated from StanfordSearch for people educated at Stanford Search for Stanford peopleSearch for people from StanfordFind alumni of StanfordFind people who are alumni of StanfordFind people who have a Stanford degreeFind people who graduated from StanfordFind people educated at Stanford Find Stanford peopleFind people from StanfordGet alumni of StanfordGet people who are alumni of StanfordGet people who have a Stanford degreeGet people who graduated from StanfordGet people educated at Stanford Get Stanford peopleGet people from…
<filtered table> <questions>
A.
A.
Template Syntax
28
TargetThe target non-terminal for the template;$root is the top-level non-terminal for a command
Expansion:a list of literals or non-terminals to compose the target
Semantic function:Build the ThingTalk abstract syntax tree of the target
$root : “show me” $filtered_table => return filtered_table;
$table “who” $verb_filter => addFilter(table, verb_filter)$filtered_table :
Natural Language ThingTalk abstract syntax treeNonterminals
A.
A.
Template Syntax
29
$filtered_table : $table “who” $verb_filter => addFilter(table, verb_filter)
$root : “show me” $filtered_table => return filtered_table;
“people”:@people()
“graduated from Stanford”:alumniOf == entity(“Stanford”)
“people who graduated from Stanford”
“show me people who graduated from Stanford”
@people() filter alumniOf == entity(“Stanford”)
@people() filter alumniOf == entity(“Stanford”)
A.
A.
1. Load templates and manifest
2. depth = 03. while depth < max_depth {
depth ++for each template whose non-terminals have been expanded {
1. exhaustively generate nl-thingtalk pairs using non-terminals from a lower depth2. apply semantic function and filter out rejected results
2. sample from generated results based on pruning_size3. save them for the target non-terminal }}
4. for each phrase generated for $root {for each constant placeholder {
replace it with real values sampled from parameter dataset of matching type }}
3030
At least one non-terminal from depth depth – 1 to avoid duplicates
All results from lower depths are memorized to save time (at the cost of memory)
Semantic function may reject an expansion:E.g., conflict filters like “rating > 3 && rating > 4”
Synthesis Algorithm
Bottom-up generation
Augment synthetic data with real-world values
STANFORDLAM
Neural Semantic Parser Model• Pre-trained BERT encoder
• LSTM decoder
Schema2QA: High-Quality and Low-Cost Q&A Agents for the Structured WebSilei Xu, Giovanni Campagna, Jian Li, and Monica S. LamIn Proceedings of the 29th ACM International Conference on Information and Knowledge Management, October 2020
A.
A. Comparison with SEMPRE
32
SEMPRE (Overnight paper) Genie Schema2QA
Manual: Annotate properties (same POS) Manual: Annotate properties (different POS)
Automatic: Grammar-based synthesis (canonical only)
Automatic: Grammar-based synthesis(with 900 templates)
Manual: Paraphrase synthesized sentences
Manual: Paraphrase 2% of synthesized sentences
Train with only paraphrased sentences Train with synthesized + few-shot paraphrased data
A.
A.
Overnight Dataset [Wang 2015]
• Train & evaluation: manual paraphrase• 8 domains• 26K examples
33
A.
A.
Results: Comparison with SEMPRE
34
A.
A.
Evaluate on Realistic User Input
• Schema2QA [Xu 2020a]
• Based on real-world Schema.org crawls
• Evaluation: much more realistic user input
35
namecuisinereviews…
restaurant
questions annotate
A.
A.
Evaluate on Realistic User Input
• Schema2QA [Xu 2020a]
• Based on real-world Schema.org crawls
• Evaluation: much more realistic user input• Over 2/3 of the questions have at least 2 properties in them
• Contains unseen values during training
36
Restaurant People Movie Book Music Hotel Average
Dev 528 499 389 362 326 433 424.5
Test 524 500 413 410 288 528 443.8
A.
A.
Training Data
37
Restaurant People Movie Book Music Hotel Average
# of Properties 25 13 16 15 19 18 17.7
Schema2QA
Human Annotations 122 95 111 96 103 83 101.7Synthetic 270K 270K 270K 270K 270K 270K 270K
Human Paraphrase 6.4K 7.1K 3.8K 3.9K 3.6K 3.3K 4.7K
A.
A.
Evaluation Result
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Baseline: Templates only Templates + manual annotation & paraphrases
A.
A.
Quiz
• Template-based generation: 900 templates
• Is it worth the work?
• Do we need to repeat for every language?
39
A.
A.
Outline
1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing
5. Idea 3: Automatic Sentence-Level Paraphraser6. Evaluation
40
STANFORDLAM
Idea 2
Property Annotationwith an Automatic Paraphraser
A.
A.
Auto-Annotator
42
alumniOf property in people table
1. Generate canonical annotation based on name, and assign type by POS Tagger2. Construct simple example sentences with templates3. Paraphrase with a neural paraphrase model4. Parse the paraphrases with POS-based parser to extract annotations
POS Annotation Example Template Example utterance
Is-a Noun table who are [noun phrase] value
Has-a Noun table with a value [noun phrase]
Active verb table who [verb phrase] value
Passive verb table [passive verb phrase] value
Adjective value table
Prepositional table [prepositional phrase] value
people with a Stanford degree
people who graduated from Stanford
people who educated at Stanford
Stanford people
people from Stanford
a <value> degree
graduated from <value>
educated at <value>
<value>
from <value>
alumni of <value> people who are alumni of Stanford
A.
A.
Auto-Annotator
• Use simple sentences: one property at a time
• Less mistakes• Focus on the property to obtain more variety
• Always use real-world values• More context to the language model
• All annotations are amplified with templates to compose different questions
43
A.
A.
Training
44
Restaurant People Movie Book Music Hotel Average
# of Properties 25 13 16 15 19 18 17.7
Schema2QA
Manual Annotations 122 95 111 96 103 83 101.7Synthetic 270K 270K 270K 270K 270K 270K 270K
Human Paraphrase 6.4K 7.1K 3.8K 3.9K 3.6K 3.3K 4.7K
AutoQAAuto Annotations 151 121 157 150 144 160 147.2
Synthetic 270K 270K 270K 270K 270K 270K 270K
A.
A.
Evaluation on Schema2QA Data
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Templates Only With Auto-Annotator With Manual Annotations & Paraphrases
A.
A.
Evaluation on Schema2QA Dataset
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Templates Only With Auto-Annotator With Manual Annotations & ParaphrasesAccuracy
goes up by ~19%
A.
A.
Evaluation on Schema2QA Dataset
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Templates Only With Auto-Annotator With Manual Annotations & Paraphrases
A.
A.
Outline
1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing5. Idea 3: Automatic Sentence-Level Paraphraser6. Evaluation
48
STANFORDLAM
Neural Paraphrasing
A.
A.
Paraphraser
• Fine-tune a Seq2Seq model on a paraphrasing dataset
• how hard can it be?• A paraphrasing dataset• sentence pairs (Xi, Yi) where Xi and Yi are paraphrases of each other
• What is the loss function in training?
• Predict the next token in the gold sentence• Negative log likelihood
• What is the metric for evaluation?
50
A.
A.
BLEU Score: Validation Metric• Bilingual Evaluation Understudy (2002)
• Compares machine generations to one or several human-written references• Computes a similarity score by matching n-grams of the generated text
against the references:
𝐵𝐿𝐸𝑈 = 𝛽'!"#
$
𝑝!#%!
• 𝛽 is a function of the length of the generated text, to penalize short ones• n-grams: Overlapping spans of n-words• 𝑝! is n-gram precision: (# matched n-grams) / (# n-grams in generated text)
• a gram in the reference can be matched only once• k is usually 4
• A popular, but a lame metric• BERTscore replaces exact match with soft-matching
• Using BERT pre-trained representations
A.
A.
Paraphrase Generation Datasets
• How the model performs depends on the dataset• What defines good paraphrases?
A.
A.
What is a Paraphrase?
1. Two sentences are paraphrases if their representations are similar, according to some model (Lewis et al, 2020)
2. Two sentences that describe the same picture are paraphrases (Prakash et al, 2016)
a dog makes a face while rolling on the ground.a brown and white dog laying on his back smilinga dog is on it's back on the grass with an open mouth.a dog laying on its back in the grass with its mouth open.a dog with its mouth open lays in the grass
MSCOCO dataset
A.
A.
What is a Paraphrase? 3. Translations from the same sentence
The ParaBank 2 Dataset (Hu et al, 2019)• Example
French: L'homme est né libre, et partout il est dans les fers, Rousseau• Google Translations• Man is born free, but everywhere he is in chains• Man was born free, and everywhere he is in chains
• Human Translation• Man is born free, and everywhere he is in shackles• People are born free, but they are in iron chains throughout
A.
A.
The ParaBank 2 Dataset
• Created using a bilingual (English-Czech) corpus of news, books, movie subtitles, etc.
• English translations of Czech à paraphrases of the English counterpart
• Lots of tricks to improve the grammaticality and diversity of machine translated sentences
• Scores ~90% on grammaticality• 84% on semantic similarity of pairs, according to human judges
55
A.
A.
Our Paraphraser
• Fine-tune a BART pretrained seq2seq modelwith the ParaBank 2 dataset
56
A.
A.
BART Pre-trained Seq2Seq Model
57
Bert: Masked model, not generational GPT: Next word prediction, not bidirectional
BART: Seq-2-seq, Denoising model, bidirectional, generational
Graphic courtesy of: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Mike Lewis*, Yinhan Liu*, Naman Goyal*, etc, arXiv: 1910.13461
A.
A.
Bart: Pretrained Model
• Denoising• Token masking• Token deletion• Text span masking: replace a span with one mask token• Sentence permutation• Document rotation: starting at a random position
• Model• 6-layer transformer each for encode and decode• BART-large: 12 layers
• Useful for downstream tasks• Question answering, entailment, summarization, response generation, translation
58
A.
A.
Quiz
• In the task of generating paraphrases for ThingTalk, what is the best definition of a paraphrase?
• What properties do we want the paraphrases to have?
• Pre-training improves the performance on many downstream tasks. Does pre-training (e.g. BART) help paraphrase generation?
59
A.
A.
Outline
1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing
5. Idea 3: Automatic Sentence-Level Paraphraser6. Evaluation
61
STANFORDLAM
Idea 3
Automatic sentence-level paraphrases
A.
A.
Auto-Paraphrase
Synthetic Paraphrased
Search some cafeteria that have greater star than 3, and do not have smoking.
Search for a restaurant that has more than 3 stars and doesn't smoke .
Find restaurants close to my home. Find restaurants near me.
Search for people who are employed by Stanford.
[greedy] Look for people employed at Stanford.
[temperature=0.3] Look for people who work at Stanford.
[temperature=1.2] Find people at Stanford.
[temperature=1.5] Actually, look for those who are currently employed at Stanford.
ParaphrasedSentence
SyntheticSentence
BART-based paraphraser
A.
A.
Self-TrainingWe need to filter out the noisy paraphrases
1. Train a parser with synthetic dataset2. Generate potentially noisy paraphrases of the synthetic dataset3. Use the parser from (1) to parse the paraphrases of (2)4. Remove all paraphrases where the new parse does not match the label5. Add filtered paraphrases to the training set6. Repeat
Auto-Paraphraser
Paraphraser Semantic Parseri Paraphrase
Filter
TRAINlogical formsparaphrases
paraphrases + original logical forms
Synthetic dataset
A.
A.
Genie Summary
65
Database schema and values
Auto-Annotator
Paraphraser
POS-Based Annotation Extraction
Template-based Data Synthesizer
attribute annotations
Auto-Paraphraser
Paraphraser Semantic Parseri Paraphrase
Filter
TRAINlogical formsparaphrases
paraphrases + original logical forms
English-grammar-based comprehensive
templates
A.
A.
Quiz
• Why bother with self-trainingif we only accept paraphrases that are already parsed correctly?
• Do we need to filter noise on property-level paraphrases?
• Can we skip property-level paraphrases?
66
A.
A.
Outline
1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing
5. Idea 3: Automatic Sentence-Level Paraphraser6. Evaluation
67
STANFORDLAM
Results on Overnight Dataset
0%20%40%60%80%
100%
Basketball* Blocks Calendar Housing Publications Recipes Restaurants Social Average
SOTA (out-of-domain human data) Herzig and Berant (2018) Genie (no human data) SOTA (in-domain human data) Cao et al. (2019)
Building a Semantic Parser OvernightYushi Wang, Jonathan Berant, Percy LiangIn Proceedings of the 53rd Annual Meeting of ACL, 2015
* Herzig and Berant did not report Basketball numbers
AutoQA: From Databases To Q&A Semantic Parsers With Only Synthetic Training DataSilei Xu*, Sina J. Semnani*, Giovanni Campagna, Monica S. Lam In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, November 2020
Testing on paraphrased data
A.
A. Schema2QA: Training Set
• Training takes about 3 hours using V100 for 30K iterations
69
Restaurant People Movie Book Music Hotel Average
# of Properties 25 13 16 15 19 18 17.7
Schema2QA
# of Annotations 122 95 111 96 103 83 101.7
Synthetic 270K 270K 270K 270K 270K 270K 270K
Human Paraphrase 6.4K 7.1K 3.8K 3.9K 3.6K 3.3K 4.7K
AutoQA
# of Annotations 151 121 157 150 144 160 147.2
Synthetic 270K 270K 270K 270K 270K 270K 270K
Auto paraphrase 281K 299K 331K 212K 341K 285K 292K
A.
A.
Evaluation Result
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Templates Only Auto-Annotator
Auto-Annotator + Naive Paraphraser AutoQA (Auto-Annotator + Auto-Paraphraser)
(Schema2QA) Manual Annotations & Paraphrases
A.
A.
Evaluation Result
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Templates Only With Auto-Annotator
Auto-Annotator + Naive Paraphraser AutoQA (Auto-Annotator + Auto-Paraphraser)
(Schema2QA) Manual Annotations & ParaphrasesAccuracy
goes up by ~19%
A.
A.
Evaluation Result
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Templates Only With Auto-Annotator
With Auto-Annotator + Naive Paraphraser AutoQA (Auto-Annotator + Auto-Paraphraser)
(Schema2QA) Manual Annotations & ParaphrasesAccuracy
goes downby ~10%!
(No filtering on paraphrases)
A.
A.
Evaluation Result
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Templates OnlyWith Auto-AnnotatorWith Auto-Annotator + Naive ParaphraserWith Auto-Annotator + Auto-Paraphraser Accuracy
goes up by ~8%
A.
A.
Evaluation Result
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Templates Only With Auto-Annotator
With Auto-Annotator + Naive Paraphraser With Auto-Annotator + Auto-Paraphraser
With Manual Annotations & Paraphrases There is a ~6% gap
A.
A.
• Auto-annotator
• phrase-level• generic
• Auto-paraphraser
• sentence-level• value-specific
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Auto-Annotator Auto-Paraphraser Auto-Annotation+Auto-Paraphraser
Auto-Annotator & Auto-Paraphraser are Complementary
A.
A.
Change the BERT-LSTM to Fine-Tuning BART
0%
20%
40%
60%
80%
100%
Restaurants People Movies Books Music Hotels Average
Templates Only With Auto-AnnotatorWith Auto-Annotator + Naive Paraphraser With Auto-Annotator + Auto-ParaphraserWith Manual Annotations & Paraphrases With Auto-annotator+Auto-Paraphraser on BART
A.
A.
Quiz
• Now we can automate everything, should we generate as much data as possible?
77
A.
A.
Quiz
• Now we can automate everything, how much data should we generate?
• Accuracy grows logarithmically with the amount of data • [Oren et al 2021] On Schema2QA dataset,
a carefully sampled dataset with 5K examples can achieve comparable accuracy (83.4%) with a model trained with 1M examples (85%)
• Find the sweet point that balance accuracy and computation cost!
78
A.
A.
Quiz
• Is the performance good enough?
• How do we improve the performance?
79
A.
A.
Conclusions
• Paraphraser: Fine-tuned on BART using The ParaBank 2 Dataset
• Self-Training: Use model i to label more data to train model i+1• Data synthesis
1. Property-level paraphrases to extract POS2. Domain-independent templates (900)
3. Sentence-level paraphrases with noise-filtering with self-training• Importance to test with real data• Fully automatic tool: Schema à Question semantic parser
80
A.
A.
References• [Wang 2015] Building a Semantic Parser Overnight• [Su 2017]: Cross-domain Semantic Parsing via Paraphrasing• [Campagna 2019] Genie: A Generator of Natural Language Semantic Parsers for Virtual
Assistant Commands• [Xu 2020a] Schema2QA: High-Quality and Low-Cost Q&A Agents for the Structured Web• [Xu 2020b] AutoQA: From Databases To Q&A Semantic Parsers With Only Synthetic Training Data• [Marion 2021] Structured Context and High-Coverage Grammar for Conversational Question
Answering over Knowledge Graphs• [Oren et al 2021] Finding needles in a haystack: Sampling Structurally-diverse Training Sets from
Synthetic Data for Compositional Generalization
81