Engineering Knowledge Base Query Agents

Lecture 6

Engineering Knowledge Base Query Agents

1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing5. Idea 3: Automatic Sentence-Level Paraphraser6. Evaluation

Silei Xu, Sina Semnani, Monica Lam 1

Stanford CS224v CourseConversational Virtual Assistants with Deep Learning

A.

A. enie: Open Pretrained Assistant

Pretrained Language Models

Read Form-filling Instructions

Fill Web Forms

Customer Support

APIs

DB Access Dialogues

GroundingPrimitives

GenericDialogue

Models

Agents

Transaction Dialogue State Machine

Restaurants Play Songs Turn on Lights

Abstract APIDialogues

DB Schemas

A.

A.

Summary: Methodology• Complete everything that the computer can do:

• What if the computer is missing functionality?

3

“Ask not what your country can do for you but

what you can do for your country?”

“Ask not what your user wants to know, but

what the computer can tell the user?”

John F. Kennedy:

Our Motto:

A.

A.

Summary: ThingTalk Query Representation• Completeness by mapping SQL to ThingTalk deliberately• Drop rename: since we can run join on columns with 2 different names• Unions, differences: support through inheritance supertable

• Match user’s mental model: more natural to understand by humans too!• SQL is not the most intuitive

• Precise, unambiguous, canonicalized• Facilitate training accuracy

• Inter-operate with action APIs (to be discussed later)

4

new!

A.

A.

Summary: Paraphrasing Methodology• Method• Synthesize canonical sentences (synthetic-text, logical form)• Human provides paraphrases (paraphrased-text, logical form)• No need to annotate

• Big plus: reduce cost of data acquisition• Limitations: • Still expensive• Inaccurate paraphrases• Lack of variety: resembles the original terminology• Does not work with real input

5

A.

A.

Outline

1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing5. Idea 3: Automatic Sentence-Level Paraphraser

6. Evaluation

6

A.

A.

Why is NL so Hard?

• Alternatives for just 1 fact: “Dr. Smith is Ann’s doctor”

7

Relation Part-of-Speech (POS) Unknown: Ann

Doctor Has-a Who has Dr. Smith as a doctor?

Is-a Who is Dr. Smith a doctor of ?

Active Verb Whom does Dr. Smith treat?

Passive Verb Who is treated by Dr. Smith?

Patient Has-a Who does Dr. Smith have as a patient?

Is-a Who is a patient of Dr. Smith?

Active Verb Who consults with Dr. Smith?

Passive Verb By whom is Dr. Smith consulted?

Unknown: Dr. Smith

Who does Ann have as a doctor?

Who is a doctor of Ann?

Who treats Ann?

By whom is Ann treated?

Who has Ann as a patient?

Who is Ann a patient of?

With whom does Ann consult?

Who is consulted by Ann?

A.

A.

Why is NL so Hard?

• Alternatives for just 1 fact: “Dr. Smith is Ann’s doctor”

• Type-based terminology• Example: an operation “>= ”

• Date/time: “later than”, “after”, …• Temperature: “higher than”, “warmer than”, “hotter than”, “over”,

…• Weight: “heavier than”, “over”, “more than”, …• Distance: “farther than”, “longer than”, …

8

A.

A.

Why is NL so Hard?

• Alternatives for just 1 fact: “Dr. Smith is Ann’s doctor”• Type-based terminology: “>=”• Alternative phrasing• Example: “restaurants with the highest rating”• “restaurants with best reviews”• “top-rated restaurants”• “best restaurants”

• Domain-specific “shortcuts” in NL• Word-level: “my father’s brother” à uncle• Sentence-level:

“Send a message to the sender of some message” à “reply”

9

A.

A.

Why is NL so Hard?

• Expressiveness: all database queries

• Variety in saying the same thing

10

PropertyLevel

SentenceLevel

Domain-independent Idea 1

Domain-dependent Idea 2 Idea 3

A.

A.

Outline

1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing

5. Idea 3: Automatic Sentence-Level Paraphraser6. Evaluation

11

STANFORDLAM

DB Constructs: Ground English to ThingTalkOperation English Template ThingTalk Example

Selection table with fname equal to value table, fname = value restaurants with rating equal to 3Projection the fname of table [fname] of table the cuisine of restaurantsSubquery the table1 of table2 table1,in_array(id,any(table1 of table2)) reviews of restaurant XJoin table1 with their table2 table join table2 restaurants with their reviews

Aggregatethe number of table count (table) The number of restaurantsthe op fname in table op (fname of table) The average rating of restaurants

Aggregate & Group by

the number of table in each fname count (table by fname) The number of restaurantsthe op fname1 in table in each fname2 op (fname1 of table by fname2) The average rating of restaurants

Ranking the n table with the min fname sort (fname asc of table)[1:n] the 3 restaurants with the min rating

Quantifiertable1 with table2 table1, contains(table2, any(table2) restaurants with review with …table1 with no table2 table1, !contains(table2, any(table2) restaurants with no review with …

Row-wise function

the distance of table from location [distance(geo, location)] of table The distance of restaurants from herethe number of fname in table [count(fname)] of table The number of reviews in restaurants

Canonical English templates: covers all queries

STANFORDLAM

Idea 1

Add more domain-independent templates

for variety

STANFORDLAM

Standard Variation in NLProperty Type Comparison Weight Lighter, heavierHeight Taller, shorterAge Older, youngerLength Shorter, longerSize Smaller, biggerPrice Cheaper, more expensiveSpeed Slower, fasterTemperature Colder, hotterTime Earlier, before, later, afterDuration Shorter, longerDistance Closer, nearer, farther, more distant

Subject Type Interrogative Words People WhoObject WhatTime WhenLocation Where

SentencePurpose Example Grammar

Declarative I am looking for …Imperative Search for ..Interrogative What is …Exclamatory —

From English grammar books

STANFORDLAM

NL: Connectives

restaurant that serves Italian cuisine and was rated 5 starsrestaurant with rating 5 and Italian cuisine5-star restaurant that serves Italian cuisineItalian restaurant with 5 stars5-star Italian restaurant

cuisine == “Italian” rating == 5

A.

A.

Property-Level Templates

16

POS Annotation Example Template Example utterance

Is-a Noun

Has-a Noun

Active verb

Passive verb

Adjective

Prepositional

alumniOf property in people table

alumni of <value> table who are [noun phrase] value people who are alumni of Stanford

a <value> degree table with a value [noun phrase] people with a Stanford degree

graduated from <value> table who [verb phrase] value people who graduated from Stanford

educated at <value> table [passive verb phrase] value people who educated at Stanford

<value> value table Stanford people

from <value> table [prepositional phrase] value people from Stanford

Based on POS (Part-of-Speech)

A.

A.

Genie Templates for English

• Kinds of templates

• Canonical templates: to ground English to all possible queries• Templates for attributes based on POS• Templates based on kinds of sentences, types, connectives

• Total: 900 templates

17

A.

A.

Quiz

• 900 templates! How much work is it?

• Is it worth it?

• How many templates are there in other languages?

18

A.

A.

Synthesis: Using Property Annotations (POS)

19

POS Annotation

Is-a Noun alumni of <value>

Has-a Noun a <value> degree

Active verb graduated from <value>

Passive verb educated at <value>

Adjective <value>

Prepositional from <value>

is-a noun

<value>Stanford

<table> who are <is-a noun>

alumni of Stanford

people who are alumni of Stanford

<table>People

<filtered table>

A.

A.


20

POS Annotation





Adjective <value>


<table> with <has-a noun>

<value>Stanford

<table> who have <has-a noun>

alumni of Stanford


people with a Stanforddegree

people who have a Stanford degree

<table>People

has-a noun

<filtered table>

A.

A.


21

POS Annotation





Adjective <value>


<value>Stanford

<table> who <active verb>

alumni of Stanford




people who graduated from Stanford

<table>People

active-verb

<filtered table>

A.

A.


22

POS Annotation





Adjective <value>


<value>Stanford

<table> who were <passive verb>

<table>People

alumni of Stanford





people educated at Stanford

people who were educated at Stanford

<table> <passive verb>

passive-verb

<filtered table>

A.

A.


23

POS Annotation





Adjective <value>


<value>Stanford

<adjective> <table>

<table>People

alumni of Stanford







Stanford people

adjective

<filtered table>

A.

A.


24

POS Annotation





Adjective <value>


<value>Stanford

<table> <prepositional>

<table>People

alumni of Stanford







Stanford people

people from Stanford

prepositional

<filtered table>

A.

A.


25

POS Annotation





Adjective <value>


<value>Stanford

<table> <prepositional>

<table>People

alumni of Stanford







Stanford people


…

prepositional

<filtered table>

A.

A.

Synthesis: Multiple Filtersalumni of Stanford







Stanford people


…

employee of Apple

Apple as their employer

works for Apple

employed by Apple

…

alumni of Stanford who are employee of Applealumni of Stanford who have Apple as their employeralumni of Stanford who works for Applealumni of Stanford employed by Applealumni of Stanford who are employed by Applepeople who are alumni of Stanford and employee of Applepeople who are alumni of Stanford and have Apple as their employerpeople who are alumni of Stanford and works for Applepeople who are alumni of Stanford and are employed by Appleemployee of Apple with a Stanford degreePeople with a Stanford degree who have Apple as their employerPeople with a Stanford degree that works for Apple…

<filtered table> (1) <filtered table> (2) <filtered table> (1) & (2)

A.

A.

Synthesis: Add <search> To Get to Full Questions

27

alumni of Stanford







Stanford people


…

<generic verb for search> <filtered table>

search for

find

get

…

<generic verb for search> Search for alumni of StanfordSearch for people who are alumni of StanfordSearch for people who have a Stanford degreeSearch for people who graduated from StanfordSearch for people educated at Stanford Search for Stanford peopleSearch for people from StanfordFind alumni of StanfordFind people who are alumni of StanfordFind people who have a Stanford degreeFind people who graduated from StanfordFind people educated at Stanford Find Stanford peopleFind people from StanfordGet alumni of StanfordGet people who are alumni of StanfordGet people who have a Stanford degreeGet people who graduated from StanfordGet people educated at Stanford Get Stanford peopleGet people from…

<filtered table> <questions>

A.

A.

Template Syntax

28

TargetThe target non-terminal for the template;$root is the top-level non-terminal for a command

Expansion:a list of literals or non-terminals to compose the target

Semantic function:Build the ThingTalk abstract syntax tree of the target

$root : “show me” $filtered_table => return filtered_table;

$table “who” $verb_filter => addFilter(table, verb_filter)$filtered_table :

Natural Language ThingTalk abstract syntax treeNonterminals

A.

A.

Template Syntax

29

$filtered_table : $table “who” $verb_filter => addFilter(table, verb_filter)

$root : “show me” $filtered_table => return filtered_table;

“people”:@people()

“graduated from Stanford”:alumniOf == entity(“Stanford”)

“people who graduated from Stanford”

“show me people who graduated from Stanford”

@people() filter alumniOf == entity(“Stanford”)

@people() filter alumniOf == entity(“Stanford”)

A.

A.

1. Load templates and manifest

2. depth = 03. while depth < max_depth {

depth ++for each template whose non-terminals have been expanded {

1. exhaustively generate nl-thingtalk pairs using non-terminals from a lower depth2. apply semantic function and filter out rejected results

2. sample from generated results based on pruning_size3. save them for the target non-terminal }}

4. for each phrase generated for $root {for each constant placeholder {

replace it with real values sampled from parameter dataset of matching type }}

3030

At least one non-terminal from depth depth – 1 to avoid duplicates

All results from lower depths are memorized to save time (at the cost of memory)

Semantic function may reject an expansion:E.g., conflict filters like “rating > 3 && rating > 4”

Synthesis Algorithm

Bottom-up generation

Augment synthetic data with real-world values

STANFORDLAM

Neural Semantic Parser Model• Pre-trained BERT encoder

• LSTM decoder

Schema2QA: High-Quality and Low-Cost Q&A Agents for the Structured WebSilei Xu, Giovanni Campagna, Jian Li, and Monica S. LamIn Proceedings of the 29th ACM International Conference on Information and Knowledge Management, October 2020

A.

A. Comparison with SEMPRE

32

SEMPRE (Overnight paper) Genie Schema2QA

Manual: Annotate properties (same POS) Manual: Annotate properties (different POS)

Automatic: Grammar-based synthesis (canonical only)

Automatic: Grammar-based synthesis(with 900 templates)

Manual: Paraphrase synthesized sentences

Manual: Paraphrase 2% of synthesized sentences

Train with only paraphrased sentences Train with synthesized + few-shot paraphrased data

A.

A.

Overnight Dataset [Wang 2015]

• Train & evaluation: manual paraphrase• 8 domains• 26K examples

33

A.

A.

Results: Comparison with SEMPRE

34

A.

A.

Evaluate on Realistic User Input

• Schema2QA [Xu 2020a]

• Based on real-world Schema.org crawls

• Evaluation: much more realistic user input

35

namecuisinereviews…

restaurant

questions annotate

A.

A.

Evaluate on Realistic User Input

• Schema2QA [Xu 2020a]

• Based on real-world Schema.org crawls

• Evaluation: much more realistic user input• Over 2/3 of the questions have at least 2 properties in them

• Contains unseen values during training

36

Restaurant People Movie Book Music Hotel Average

Dev 528 499 389 362 326 433 424.5

Test 524 500 413 410 288 528 443.8

A.

A.

Training Data

37


# of Properties 25 13 16 15 19 18 17.7

Schema2QA

Human Annotations 122 95 111 96 103 83 101.7Synthetic 270K 270K 270K 270K 270K 270K 270K

Human Paraphrase 6.4K 7.1K 3.8K 3.9K 3.6K 3.3K 4.7K

A.

A.

Evaluation Result

0%

20%

40%

60%

80%

100%

Restaurants People Movies Books Music Hotels Average

Baseline: Templates only Templates + manual annotation & paraphrases

A.

A.

Quiz

• Template-based generation: 900 templates

• Is it worth the work?

• Do we need to repeat for every language?

39

A.

A.

Outline



40

STANFORDLAM

Idea 2

Property Annotationwith an Automatic Paraphraser

A.

A.

Auto-Annotator

42

alumniOf property in people table

1. Generate canonical annotation based on name, and assign type by POS Tagger2. Construct simple example sentences with templates3. Paraphrase with a neural paraphrase model4. Parse the paraphrases with POS-based parser to extract annotations

POS Annotation Example Template Example utterance

Is-a Noun table who are [noun phrase] value

Has-a Noun table with a value [noun phrase]

Active verb table who [verb phrase] value

Passive verb table [passive verb phrase] value

Adjective value table

Prepositional table [prepositional phrase] value

people with a Stanford degree


people who educated at Stanford

Stanford people


a <value> degree

graduated from <value>

educated at <value>

<value>

from <value>

alumni of <value> people who are alumni of Stanford

A.

A.

Auto-Annotator

• Use simple sentences: one property at a time

• Less mistakes• Focus on the property to obtain more variety

• Always use real-world values• More context to the language model

• All annotations are amplified with templates to compose different questions

43

A.

A.

Training

44


# of Properties 25 13 16 15 19 18 17.7

Schema2QA

Manual Annotations 122 95 111 96 103 83 101.7Synthetic 270K 270K 270K 270K 270K 270K 270K


AutoQAAuto Annotations 151 121 157 150 144 160 147.2

Synthetic 270K 270K 270K 270K 270K 270K 270K

A.

A.

Evaluation on Schema2QA Data

0%

20%

40%

60%

80%

100%


Templates Only With Auto-Annotator With Manual Annotations & Paraphrases

A.

A.

Evaluation on Schema2QA Dataset

0%

20%

40%

60%

80%

100%


Templates Only With Auto-Annotator With Manual Annotations & ParaphrasesAccuracy

goes up by ~19%

A.

A.

Evaluation on Schema2QA Dataset

0%

20%

40%

60%

80%

100%


Templates Only With Auto-Annotator With Manual Annotations & Paraphrases

A.

A.

Outline

1. Why is natural language (English) so hard? 2. Idea 1: Domain-Independent Templates3. Idea 2: Automatic Property-Level Annotation4. Neural Paraphrasing5. Idea 3: Automatic Sentence-Level Paraphraser6. Evaluation

48

STANFORDLAM

Neural Paraphrasing

A.

A.

Paraphraser

• Fine-tune a Seq2Seq model on a paraphrasing dataset

• how hard can it be?• A paraphrasing dataset• sentence pairs (Xi, Yi) where Xi and Yi are paraphrases of each other

• What is the loss function in training?

• Predict the next token in the gold sentence• Negative log likelihood

• What is the metric for evaluation?

50

A.

A.

BLEU Score: Validation Metric• Bilingual Evaluation Understudy (2002)

• Compares machine generations to one or several human-written references• Computes a similarity score by matching n-grams of the generated text

against the references:

𝐵𝐿𝐸𝑈 = 𝛽'!"#

$

𝑝!#%!

• 𝛽 is a function of the length of the generated text, to penalize short ones• n-grams: Overlapping spans of n-words• 𝑝! is n-gram precision: (# matched n-grams) / (# n-grams in generated text)

• a gram in the reference can be matched only once• k is usually 4

• A popular, but a lame metric• BERTscore replaces exact match with soft-matching

• Using BERT pre-trained representations

A.

A.

Paraphrase Generation Datasets

• How the model performs depends on the dataset• What defines good paraphrases?

A.

A.

What is a Paraphrase?

1. Two sentences are paraphrases if their representations are similar, according to some model (Lewis et al, 2020)

2. Two sentences that describe the same picture are paraphrases (Prakash et al, 2016)

a dog makes a face while rolling on the ground.a brown and white dog laying on his back smilinga dog is on it's back on the grass with an open mouth.a dog laying on its back in the grass with its mouth open.a dog with its mouth open lays in the grass

MSCOCO dataset

A.

A.

What is a Paraphrase? 3. Translations from the same sentence

The ParaBank 2 Dataset (Hu et al, 2019)• Example

French: L'homme est né libre, et partout il est dans les fers, Rousseau• Google Translations• Man is born free, but everywhere he is in chains• Man was born free, and everywhere he is in chains

• Human Translation• Man is born free, and everywhere he is in shackles• People are born free, but they are in iron chains throughout

A.

A.

The ParaBank 2 Dataset

• Created using a bilingual (English-Czech) corpus of news, books, movie subtitles, etc.

• English translations of Czech à paraphrases of the English counterpart

• Lots of tricks to improve the grammaticality and diversity of machine translated sentences

• Scores ~90% on grammaticality• 84% on semantic similarity of pairs, according to human judges

55

A.

A.

Our Paraphraser

• Fine-tune a BART pretrained seq2seq modelwith the ParaBank 2 dataset

56

A.

A.

BART Pre-trained Seq2Seq Model

57

Bert: Masked model, not generational GPT: Next word prediction, not bidirectional

BART: Seq-2-seq, Denoising model, bidirectional, generational

Graphic courtesy of: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Mike Lewis*, Yinhan Liu*, Naman Goyal*, etc, arXiv: 1910.13461

A.

A.

Bart: Pretrained Model

• Denoising• Token masking• Token deletion• Text span masking: replace a span with one mask token• Sentence permutation• Document rotation: starting at a random position

• Model• 6-layer transformer each for encode and decode• BART-large: 12 layers

• Useful for downstream tasks• Question answering, entailment, summarization, response generation, translation

58

A.

A.

Quiz

• In the task of generating paraphrases for ThingTalk, what is the best definition of a paraphrase?

• What properties do we want the paraphrases to have?

• Pre-training improves the performance on many downstream tasks. Does pre-training (e.g. BART) help paraphrase generation?

59

A.

A.

Outline



61

STANFORDLAM

Idea 3

Automatic sentence-level paraphrases

A.

A.

Auto-Paraphrase

Synthetic Paraphrased

Search some cafeteria that have greater star than 3, and do not have smoking.

Search for a restaurant that has more than 3 stars and doesn't smoke .

Find restaurants close to my home. Find restaurants near me.

Search for people who are employed by Stanford.

[greedy] Look for people employed at Stanford.

[temperature=0.3] Look for people who work at Stanford.

[temperature=1.2] Find people at Stanford.

[temperature=1.5] Actually, look for those who are currently employed at Stanford.

ParaphrasedSentence

SyntheticSentence

BART-based paraphraser

A.

A.

Self-TrainingWe need to filter out the noisy paraphrases

1. Train a parser with synthetic dataset2. Generate potentially noisy paraphrases of the synthetic dataset3. Use the parser from (1) to parse the paraphrases of (2)4. Remove all paraphrases where the new parse does not match the label5. Add filtered paraphrases to the training set6. Repeat

Auto-Paraphraser

Paraphraser Semantic Parseri Paraphrase

Filter

TRAINlogical formsparaphrases

paraphrases + original logical forms

Synthetic dataset

A.

A.

Genie Summary

65

Database schema and values

Auto-Annotator

Paraphraser

POS-Based Annotation Extraction

Template-based Data Synthesizer

attribute annotations

Auto-Paraphraser

Paraphraser Semantic Parseri Paraphrase

Filter

TRAINlogical formsparaphrases

paraphrases + original logical forms

English-grammar-based comprehensive

templates

A.

A.

Quiz

• Why bother with self-trainingif we only accept paraphrases that are already parsed correctly?

• Do we need to filter noise on property-level paraphrases?

• Can we skip property-level paraphrases?

66

A.

A.

Outline



67

STANFORDLAM

Results on Overnight Dataset

0%20%40%60%80%

100%

Basketball* Blocks Calendar Housing Publications Recipes Restaurants Social Average

SOTA (out-of-domain human data) Herzig and Berant (2018) Genie (no human data) SOTA (in-domain human data) Cao et al. (2019)

Building a Semantic Parser OvernightYushi Wang, Jonathan Berant, Percy LiangIn Proceedings of the 53rd Annual Meeting of ACL, 2015

* Herzig and Berant did not report Basketball numbers

AutoQA: From Databases To Q&A Semantic Parsers With Only Synthetic Training DataSilei Xu*, Sina J. Semnani*, Giovanni Campagna, Monica S. Lam In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, November 2020

Testing on paraphrased data

A.

A. Schema2QA: Training Set

• Training takes about 3 hours using V100 for 30K iterations

69


# of Properties 25 13 16 15 19 18 17.7

Schema2QA

# of Annotations 122 95 111 96 103 83 101.7



AutoQA

# of Annotations 151 121 157 150 144 160 147.2


Auto paraphrase 281K 299K 331K 212K 341K 285K 292K

A.

A.

Evaluation Result

0%

20%

40%

60%

80%

100%


Templates Only Auto-Annotator

Auto-Annotator + Naive Paraphraser AutoQA (Auto-Annotator + Auto-Paraphraser)

(Schema2QA) Manual Annotations & Paraphrases

A.

A.

Evaluation Result

0%

20%

40%

60%

80%

100%


Templates Only With Auto-Annotator

Auto-Annotator + Naive Paraphraser AutoQA (Auto-Annotator + Auto-Paraphraser)

(Schema2QA) Manual Annotations & ParaphrasesAccuracy

goes up by ~19%

A.

A.

Evaluation Result

0%

20%

40%

60%

80%

100%



With Auto-Annotator + Naive Paraphraser AutoQA (Auto-Annotator + Auto-Paraphraser)

(Schema2QA) Manual Annotations & ParaphrasesAccuracy

goes downby ~10%!

(No filtering on paraphrases)

A.

A.

Evaluation Result

0%

20%

40%

60%

80%

100%


Templates OnlyWith Auto-AnnotatorWith Auto-Annotator + Naive ParaphraserWith Auto-Annotator + Auto-Paraphraser Accuracy

goes up by ~8%

A.

A.

Evaluation Result

0%

20%

40%

60%

80%

100%



With Auto-Annotator + Naive Paraphraser With Auto-Annotator + Auto-Paraphraser

With Manual Annotations & Paraphrases There is a ~6% gap

A.

A.

• Auto-annotator

• phrase-level• generic

• Auto-paraphraser

• sentence-level• value-specific

0%

20%

40%

60%

80%

100%


Auto-Annotator Auto-Paraphraser Auto-Annotation+Auto-Paraphraser

Auto-Annotator & Auto-Paraphraser are Complementary

A.

A.

Change the BERT-LSTM to Fine-Tuning BART

0%

20%

40%

60%

80%

100%


Templates Only With Auto-AnnotatorWith Auto-Annotator + Naive Paraphraser With Auto-Annotator + Auto-ParaphraserWith Manual Annotations & Paraphrases With Auto-annotator+Auto-Paraphraser on BART

A.

A.

Quiz

• Now we can automate everything, should we generate as much data as possible?

77

A.

A.

Quiz

• Now we can automate everything, how much data should we generate?

• Accuracy grows logarithmically with the amount of data • [Oren et al 2021] On Schema2QA dataset,

a carefully sampled dataset with 5K examples can achieve comparable accuracy (83.4%) with a model trained with 1M examples (85%)

• Find the sweet point that balance accuracy and computation cost!

78

A.

A.

Quiz

• Is the performance good enough?

• How do we improve the performance?

79

A.

A.

Conclusions

• Paraphraser: Fine-tuned on BART using The ParaBank 2 Dataset

• Self-Training: Use model i to label more data to train model i+1• Data synthesis

1. Property-level paraphrases to extract POS2. Domain-independent templates (900)

3. Sentence-level paraphrases with noise-filtering with self-training• Importance to test with real data• Fully automatic tool: Schema à Question semantic parser

80

A.

A.

References• [Wang 2015] Building a Semantic Parser Overnight• [Su 2017]: Cross-domain Semantic Parsing via Paraphrasing• [Campagna 2019] Genie: A Generator of Natural Language Semantic Parsers for Virtual

Assistant Commands• [Xu 2020a] Schema2QA: High-Quality and Low-Cost Q&A Agents for the Structured Web• [Xu 2020b] AutoQA: From Databases To Q&A Semantic Parsers With Only Synthetic Training Data• [Marion 2021] Structured Context and High-Coverage Grammar for Conversational Question

Answering over Knowledge Graphs• [Oren et al 2021] Finding needles in a haystack: Sampling Structurally-diverse Training Sets from

Synthetic Data for Compositional Generalization

81

https://nlp.stanford.edu/pubs/wang-berant-liang-acl2015.pdf

https://arxiv.org/pdf/1704.05974.pdf

https://almond-static.stanford.edu/papers/genie-pldi19.pdf

https://almond-static.stanford.edu/papers/schema2qa-cikm2020.pdf

https://almond-static.stanford.edu/papers/autoqa-emnlp2020.pdf



Documents

Engineering Knowledge Base Query Agents