11
ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser Zhi Chen 1 , Lu Chen 1* , Yanbin Zhao 1 , Ruisheng Cao 1 , Zihan Xu 1 , Su Zhu 2 and Kai Yu 1* 1 X-LANCE Lab, Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University Shanghai Jiao Tong University, Shanghai, China State Key Lab of Media Convergence Production Technology and Systems, Beijing, China 2 AISpeech Co., Ltd., Suzhou, China {zhenchi713, chenlusz, kai.yu}@sjtu.edu.cn Abstract Given a database schema, Text-to-SQL aims to translate a natural language question into the corresponding SQL query. Under the setup of cross-domain, traditional semantic parsing models struggle to adapt to unseen database schemas. To improve the model generaliza- tion capability for rare and unseen schemas, we propose a new architecture, ShadowGNN, which processes schemas at abstract and se- mantic levels. By ignoring names of seman- tic items in databases, abstract schemas are exploited in a well-designed graph projection neural network to obtain delexicalized rep- resentation of question and schema. Based on the domain-independent representations, a relation-aware transformer is utilized to fur- ther extract logical linking between question and schema. Finally, a SQL decoder with context-free grammar is applied. On the chal- lenging Text-to-SQL benchmark Spider, em- pirical results show that ShadowGNN outper- forms state-of-the-art models. When the an- notated data is extremely limited (only 10% training set), ShadowGNN gets over absolute 5% performance gain, which shows its pow- erful generalization ability. Our implemen- tation will be open-sourced at https://github. com/WowCZ/shadowgnn. 1 Introduction Recently, Text-to-SQL has drawn a great deal of at- tention from the semantic parsing community (Be- rant et al., 2013; Cao et al., 2019, 2020). The ability to query a database with natural language (NL) en- gages the majority of users, who are not familiar with SQL language, in visiting large databases. A number of neural approaches have been proposed to translate questions into executable SQL queries. On public Text-to-SQL benchmarks (Zhong et al., * The corresponding authors are Lu Chen and Kai Yu. What are the names of teams that do not have match season record? SELECT name FROM team WHERE team id NOT IN (SELECT team FROM match season) team match reason team id name player team team position name years played player id player id NL Question: Semantic Schema: SQL Query: What are the col_1 of tab_1 that do not have tab_2 record? SELECT col_1 FROM tab_1 WHERE col_9 NOT IN (SELECT col_8 FROM tab_2) tab_1 tab_2 col_9 col_1 tab_3 col_2 col_8 col_6 col_7 col_3 col_4 col_5 Abstract Question: Abstract Schema: Abstract Query: (a) (b) Figure 1: An example to demonstrate the impact of do- main information. (b) is the human-labeled abstract representation of Text-to-SQL content from domain- aware example (a). The green nodes and orange nodes represent columns and tables respectively. 2017; Krishnamurthy et al., 2017), exact match ac- curacy even excesses more than 80%. However, the cross-domain problem for Text-to-SQL is a prac- tical challenge and ignored by the prior datasets. To be clarified, a database schema is regarded as a domain. The domain information consists of two parts: the semantic information (e.g., the table name) of the schema components and the structure information (e.g., the primary-key relation between a table and a column) of the schema. The recently released dataset, Spider (Yu et al., 2018), hides the database schemas of the test set, which are totally unseen on the training set. In this cross-domain setup, domain adaptation is chal- lenging for two main reasons. First, the semantic information of the domains in the test and devel- opment set are unseen in the training set. On the given development set, 35% of words in database schemas do not occur in the schemas on the training set. It is hard to match the domain representations in the question and the schema. Second, there is arXiv:2104.04689v2 [cs.CL] 14 Apr 2021

arXiv:2104.04689v2 [cs.CL] 14 Apr 2021

Embed Size (px)

Citation preview

ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser

Zhi Chen1, Lu Chen1∗, Yanbin Zhao1, Ruisheng Cao1, Zihan Xu1,Su Zhu2 and Kai Yu1∗

1X-LANCE Lab, Department of Computer Science and EngineeringMoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University

Shanghai Jiao Tong University, Shanghai, ChinaState Key Lab of Media Convergence Production Technology and Systems, Beijing, China

2AISpeech Co., Ltd., Suzhou, China{zhenchi713, chenlusz, kai.yu}@sjtu.edu.cn

Abstract

Given a database schema, Text-to-SQL aims totranslate a natural language question into thecorresponding SQL query. Under the setupof cross-domain, traditional semantic parsingmodels struggle to adapt to unseen databaseschemas. To improve the model generaliza-tion capability for rare and unseen schemas,we propose a new architecture, ShadowGNN,which processes schemas at abstract and se-mantic levels. By ignoring names of seman-tic items in databases, abstract schemas areexploited in a well-designed graph projectionneural network to obtain delexicalized rep-resentation of question and schema. Basedon the domain-independent representations, arelation-aware transformer is utilized to fur-ther extract logical linking between questionand schema. Finally, a SQL decoder withcontext-free grammar is applied. On the chal-lenging Text-to-SQL benchmark Spider, em-pirical results show that ShadowGNN outper-forms state-of-the-art models. When the an-notated data is extremely limited (only 10%training set), ShadowGNN gets over absolute5% performance gain, which shows its pow-erful generalization ability. Our implemen-tation will be open-sourced at https://github.com/WowCZ/shadowgnn.

1 Introduction

Recently, Text-to-SQL has drawn a great deal of at-tention from the semantic parsing community (Be-rant et al., 2013; Cao et al., 2019, 2020). The abilityto query a database with natural language (NL) en-gages the majority of users, who are not familiarwith SQL language, in visiting large databases. Anumber of neural approaches have been proposedto translate questions into executable SQL queries.On public Text-to-SQL benchmarks (Zhong et al.,

∗The corresponding authors are Lu Chen and Kai Yu.

What are the names of teams that do not have match season record?

SELECT name FROM team WHERE team id NOT IN (SELECT team FROM match season)

team match reason

team id

name

playerteam

team

position

name

years playedplayer id

player id

NL Question:

Semantic Schema:

SQL Query:

What are the col_1 of tab_1 that do not have tab_2 record?

SELECT col_1 FROM tab_1 WHERE col_9NOT IN (SELECT col_8 FROM tab_2)

tab_1 tab_2

col_9

col_1

tab_3col_2

col_8

col_6

col_7

col_3col_4

col_5

Abstract Question:

Abstract Schema:

Abstract Query:

(a) (b)

Figure 1: An example to demonstrate the impact of do-main information. (b) is the human-labeled abstractrepresentation of Text-to-SQL content from domain-aware example (a). The green nodes and orange nodesrepresent columns and tables respectively.

2017; Krishnamurthy et al., 2017), exact match ac-curacy even excesses more than 80%. However, thecross-domain problem for Text-to-SQL is a prac-tical challenge and ignored by the prior datasets.To be clarified, a database schema is regarded asa domain. The domain information consists oftwo parts: the semantic information (e.g., the tablename) of the schema components and the structureinformation (e.g., the primary-key relation betweena table and a column) of the schema.

The recently released dataset, Spider (Yu et al.,2018), hides the database schemas of the test set,which are totally unseen on the training set. Inthis cross-domain setup, domain adaptation is chal-lenging for two main reasons. First, the semanticinformation of the domains in the test and devel-opment set are unseen in the training set. On thegiven development set, 35% of words in databaseschemas do not occur in the schemas on the trainingset. It is hard to match the domain representationsin the question and the schema. Second, there is

arX

iv:2

104.

0468

9v2

[cs

.CL

] 1

4 A

pr 2

021

a considerable discrepancy among the structureof the database schemas. Especially, the databaseschemas always contain semantic information. Itis difficult to get the unified representation of thedatabase schema. Under the cross-domain setup,the essential challenge is to alleviate the impact ofthe domain information.

First, it is necessary to figure out which rolethe semantic information of the schema compo-nents play during translating an NL question intoa SQL query. Consider the example in Fig. 1(a),for the Text-to-SQL model, the basic task is tofind out all the mentioned columns (name) andtables (team, match season) by looking up theschema with semantic information (named as se-mantic schema). Once the mentioned columns andtables in the NL question are exactly matched withschema components, we can abstract the NL ques-tion and the semantic schema by replacing the gen-eral component type with the specific schema com-ponents. As shown in Fig. 1(b), we can still inferthe structure of the SQL query using the abstractNL question and the schema structure. With the cor-responding relation between semantic schema andabstract schema, we can restore the abstract queryto executable SQL query with domain information.Inspired by this phenomenon, we decompose theencoder of the Text-to-SQL model into two mod-ules. First, we propose a Graph Projection NeuralNetwork (GPNN) to abstract the NL question andthe semantic schema, where the domain informa-tion is removed as much as possible. Then, we usethe relation-aware transformer to get unified rep-resentations of abstract NL question and abstractschema.

Our approach, named ShadowGNN, is evalu-ated on the challenging cross-domain Text-to-SQLdataset, Spider. Contributions are summarized as:

• We propose the ShadowGNN to alleviate theimpact of the domain information by abstract-ing the representation of NL question and SQLquery. It is a meaningful method to apply tosimilar cross-domain tasks.

• To validate the generalization capability ofour proposed ShadowGNN, we conduct theexperiments with limited annotated data. Theresults show that our proposed ShadowGNNcan obtain absolute over 5% accuracy gaincompared with state-of-the-art model, whenthe annotated data only has the scale of 10%of the training set.

• The empirical results show that our approachoutperforms state-of-the-art models (66.1%accuracy on test set) on the challenging Spi-der benchmark. The ablation studies furtherconfirm that GPNN is important to abstractthe representation of the NL question and theschema.

2 Background

In this section, we first introduce relational graphconvolution network (R-GCN) (Schlichtkrull et al.,2018), which is the basis of our proposed GPNN.Then, we introduce the relation-aware transformer,which is a transformer variant considering relationinformation during calculating attention weights.

2.1 Relational Graph Convolution NetworkBefore describing the details of R-GCN, we firstgive notations of relational directed graph. Wedenote this kind of graph as G = (V, E ,R) withnodes (schema components) vi ∈ V and directedlabeled edge (vi, r, vj) ∈ E , where vi is the sourcenode, vj is the destination node and r ∈ R is theedge type from vi to vj . N r

i represents the set ofthe neighbor indices of node vi under relation r,where vi plays the role of the destination node.

Each node of the graph has an input feature xi,which can be regarded as the initial hidden stateh(0)i of the R-GCN. The hidden state of each node

in the graph is updated layer by layer with follow-ing step:Sending Message At the l-th layer R-GCN, eachedge (vi, r, vj) of the graph will send a messagefrom the source node vi to the destination node vj .The message is calculated as below:

m(l)ij = W(l)

r h(l−1)i , (1)

where r is the relation from vi to vj and W(l)r is a

linear transformation, which is a trainable matrix.Following Equation 1, the scale of the parameterof calculating message is proportional to the num-ber of the node types. To increase the scalability,R-GCN regularizes the message-calculating param-eter with the basis decomposition method, whichis defined as below:

W(l)r =

B∑b=1

a(l)rbV

(l)b , (2)

where B is the basis number, a(l)rb is the coefficientof the basis transformation V

(l)b . For different edge

types, the basis transformations are shared and onlythe coefficient a(l)rb dependents on r.Aggregating Message After the message sendingprocess, all the incoming messages of each nodewill be aggregated. Combined with Equations 1and 2, R-GCN simply averages these incomingmessages as:

g(l)i =

∑r∈R

∑j∈N r

i

1

ci,r(

B∑b=1

a(l)rbV

(l)b )h

(l−1)j , (3)

where ci,r equals to |N ri |.

Updating State After aggregating messages, eachnode will update its hidden state from h

(l−1)i to

h(l)i ,

h(l)i = σ(g

(l)i + W

(l)0 h

(l−1)i ), (4)

where σ is an activation function (i.e., ReLU) andW

(l)0 is a weight matrix. For each layer of R-GCN,

the update process can be simply denoted as:

Y = R-GCN(X,G), (5)

where X = {hi}|G|i=1, |G| is the number of the nodesand G is the graph structure.

2.2 Relation-aware TransformerWith the success of the large-scale language mod-els, the transformer architecture has been widelyused in natural language process (NLP) tasks toencode the sequence X = [xi]

ni=1 with the self-

attention mechanism. As introduced in Vaswaniet al. (2017), a transformer is stacked by self-attention layers, where each layer transforms xi

to yi with H heads as follows:

e(h)ij =

xiW(h)Q (xjW

(h)K )>√

dz/H, (6)

α(h)ij = softmax

j{e(h)ij }, (7)

z(h)i =

n∑j=1

α(h)ij xjW

(h)V , (8)

zi = Concat(z(1)i , . . . , z

(H)i ), (9)

yi = LayerNorm(xi + zi), (10)

yi = LayerNorm(yi + FC(ReLU(FC(yi)))),(11)

where h is the head index, dz is the hidden dimen-sion of z(h)i , α(h)

ij is attention probability, Concatdenotes the concatenation operation, LayerNorm

is layer normalization (Ba et al., 2016) and FC is afull connected layer. The transformer function canbe simply denoted as:

Y = Transformer(X), (12)

where Y = {yi}|X|i=1 and X = {xi}|X|i=1 and |X| isthe sequence length.

Relation-aware transformer (RAT) (Shaw et al.,2018) is an important extension of the traditionaltransformer, which regards the input sequence asa labeled, directed, fully-connected graph. Thepairwise relations between input elements are con-sidered in RAT. RAT incorporates the relation in-formation in Equation 6 and Equation 8. The edgefrom element xi to element xj is represented byvectors rij,K and rij,V , which are represented asbiases incorporated in self-attention layer, as fol-lows:

e(h)ij =

xiW(h)Q (xjW

(h)K + rij,K)>√

dz/H, (13)

α(h)ij = softmax

j{e(h)ij }, (14)

z(h)i =

n∑j=1

α(h)ij (xjW

(h)V + rij,V ), (15)

where rij,K and rij,V are shared in different at-tention heads. For each layer of RAT, the updateprocess can be simply represented as:

Y = RAT(X,R), (16)

where R = {R}|X|,|X|i=1,j=1 is the relation matrixamong the sequence tokens and Rij means the re-lation type between i-th token and j-th token.

Both R-GCN and RAT have been successfullyapplied into Text-to-SQL tasks. Bogin et al.(2019a) utilizes R-GCN to encode the structureof the semantic schema to get the global representa-tions of the nodes. Wang et al. (2020) considers notonly the schema structure but also the schema linkbetween the schema and the NL question. Theyproposed a unified framework to model the repre-sentation of the schema and the question with RAT.However, they do not explicitly explore the impactof the domain information. In the next section,we will introduce our proposed GPNN and explainhow to use GPNN to get the abstract representationof the schema and the question.

⊗ dot product

max

Attention

R-GCN

Transformer

NL Question

Semantic Schema

Abstract Schema

Projection Layer ×𝑁

R-GCN

RAT Layer ×𝑀

𝑅&'

𝑅('

Decoder

⊙𝒂'(,)

𝒔'(,)

𝒒0(,)

𝒖

𝒆𝒏×𝒎

+𝒑𝒏×𝒎

𝒂(7)

𝒒(7)

𝒇(9)Projection Attention Character Encoding

Figure 2: The structure of our proposed ShadowGNN. ShadowGNN has three kinds of input: abstract schema,semantic schema, and natural language question. The encoder of ShadowGNN consists of two module: a stackof graph projection layers and a stack of relation-aware self-attention layers. To clarify the introduction of GPNNlayer, we ignore the pretrained model RoBERTa in the figure.

3 Method

Text-to-SQL models take the NL questions Q ={qi}ni=1 and the semantic schema G = {sj}mj=1

as the input. In our proposed ShadowGNN, theencoder has been decomposed into two modules.The first module filters the specific domain infor-mation with a well-designed graph projection neu-ral network (GPNN). The second module lever-ages relation-aware transformer to further get uni-fied representations of question and schema. Thistwo-phase encoder of ShadowGNN simulates theinference process of a human when translating aquestion to a SQL query under cross-domain setup:abstracting and inferring.

3.1 Graph Projection Neural Network

In this subsection, we introduce the structure ofGPNN. As we discussed, the schema consists ofdatabase structure information and domain seman-tic information. GPNN looks at the schema fromthese two perspectives. Thus, GPNN has threekinds of inputs, abstract schema, semantic schema,and NL question. The input of the abstract schemais the type (table or column) of the schema nodeswithout any domain information, which can be re-garded as a projection of semantic schema. Eachnode in the abstract schema is represented by aone-hot vector a(0)j , which has two dimensions.

For semantic schema and NL question, we firstuse pretrained language model RoBERTa (Liuet al., 2019) to initialize their representations. Wedirectly concatenate NL question and semantic

schema together, which formats as “ [CLS] ques-tion [SEP] tables columns [SEP]". Each node namein the semantic schema may be tokenized into sev-eral sub-tokens or sub-words. We add an averagepooling layer behind the final layer of the RoBERTato align the sub-tokens to the corresponding node.We indicate the initial representation of NL ques-tion and semantic schema as q(0)

i and s(0)j .

The main motivation of GPNN is to abstract therepresentations of question and schema. The ab-stract schema has been distilled from the semanticschema. The essential challenge lies on abstractingquestion representation. There are two separateoperations in each GPNN layer: Projection At-tention and Character Encoding. The projectionattention of GPNN is to take the semantic schemaas the bridge, where question updates its represen-tation using abstract schema but attention infor-mation is calculated with the vectors of semanticschema. The character encoding is to augment thestructure representation of the question sentenceand the schema graph.Projection Attention In each GPNN layer, thereis first an attention operation between NL questionand semantic schema, as follows:

eij = q(l)i W

(l)Q (s

(l)j W

(l)K )>, (17)

αij = softmaxj{eij}, (18)

where W(l)Q and W

(l)K are trainable parameters at l-

th projection layer and en×m = {eij}n,mi=1,j=1 is thematrix of the weight score. n is the length of thequestion, and m is the number of schema nodes.

Before operating attention mechanism, inspiredby (Bogin et al., 2019a), we first calculate the max-imum values u of attention probability,

uj = maxi{αij}, (19)

where the physical meaning of uj is the most prob-ability that the j-th component of the schema ismentioned by the question. We distinct the initialrepresentation of the abstract schema by multiply-ing u on l-th layer abstract schema representationa(l) in element-wise way, a(l) = a(l) · u.

When updating the question representation, wetake the representation of augmented abstractschema a(l) as key value of attention at l-th layerof GPNN,

bi =m∑j=1

αij a(l)j W

(l)V , (20)

q(l+1)i = gate(bi) ∗ bi + (1− gate(bi)) ∗ q(l)

i ,(21)

where gate(·) = sigmoid(Linear(·)) and W(l)V is

trainable weight. When updating semantic schema,we take the transpose of the above attention matrixas the attention from schema to question,

em×n = (en×m)> = {eij}m,ni=1,j=1. (22)

Similar to the update process of question fromEquation 17- 21, the update process of semanticschema s(l+1) takes em×n as attention score andq(l) as attention value. We can see that we onlyuse the augmented abstract schema to update thequestion representation. In this way, the domaininformation contained in question representationwill be removed. The update process of the abstractschema a(l+1) is the same as the semantic schemaupdating, where their attention weight em×n onthe question q(l) is shared. Noting that the input ofattention operation for the abstract schema is theaugmented abstract representation a.Character Encoding We have used the projec-tion attention mechanism to update the three kindsof vectors. Then, we combine the characters ofschema and NL question and continue encodingschema and question with R-GCN(·) function andTransformer(·) function respectively, as shown inFig. 2.,

a(l+1) = R-GCN(a(l+1), G), (23)

s(l+1) = R-GCN(s(l+1), G), (24)

q(l+1) = Transformer(q(l+1)). (25)

Until now, the projection layer has been introduced.Graph projection neural network (GPNN) is a stackof the projection layers. After GPNN module, weget the abstract representation of the schema andthe question, indicated as a(N) and q(N).

3.2 Schema LinkingThe schema linking (Guo et al., 2019; Lei et al.,2020) can be regarded as a kind of prior knowl-edge, where the related representation betweenquestion and schema will be tagged accordingto the matching degree. There are 7 tags in to-tal: Table Exact Match, Table Partial Match, Col-umn Exact Match, Column Partial Match, ColumnValue Exact Match, Column Value Partial Match,and No Match. The column values store in thedatabases. As the above description, the schemalinking can be represented as D = {dij}n,mi=1,j=1,which dij means the match degree between i-thword of question and j-th node name of schema.To integrate the schema linking information intoGPNN module, we calculate a prior attention scorepn×m = Linear(Embedding(dij)), where dij isthe one-hot representation of match type dij . Theattention score in Equation 17 is updated as follow-ing:

eij = q(l)i W

(l)Q (s

(l)j W

(l)K )> + pij , (26)

where pij is the prior score from pn×m. The priorattention score is shared among all the GPNN lay-ers.

3.3 RATIf we split the schema into the tables and thecolumns, there are three kinds of inputs: question,table, column. RATSQL (Wang et al., 2020) lever-ages the relation-aware transformer to unify the rep-resentation of the three inputs. RATSQL defines allthe relations R = {Rij}(n+m),(n+m)

i=1,j=1 among thethree inputs and uses the RAT(·) function to getunified representation of question and schema. Thedetails of the defined relations among three com-ponents are introduced in RATSQL (Wang et al.,2020). The schema linking relations are the subsetofR. In this paper, we leverage the RAT to furtherunify the abstract representation of question q(N)

and schema a(N), which is generated by previousGPNN module. We concatenate sentence sequenceq(N) and schema sequence a(N) together into alonger sequence representation, which is the initialinput of RAT module. After RAT module, the final

unified representation of question and schema isindicated as:

f (M) = RAT(concat(q(N),a(N)),R). (27)

3.4 Decoder with SemQL GrammarTo effectively constrain the search space during syn-thesis, IRNet (Guo et al., 2019) designed a context-free SemQL grammar as the intermediate repre-sentation between NL question and SQL, which isessentially an abstract syntax tree (AST). SemQLrecovers the tree nature of SQL. To simplify thegrammar tree, SemQL in IRNet did not cover allthe keywords of SQL. For example, the columnscontained in GROUPBY clause can be inferredfrom SELECT clause or the primary key of a tablewhere an aggregate function is applied to one of itscolumns. In our system, we improve the SemQLgrammar, where each keyword in SQL sentenceis corresponded to a SemQL node. During thetraining process, the labeled SQL needs to be trans-ferred into an AST. During the evaluation process,the AST needs to recovered as the correspondingSQL. The recover success rate means the rate thatthe recovered SQL totally equals to labeled SQL.Our improved grammar raises the recover successrate from 89.6% to 99.9% tested on dev set.

We leverage the coarse-to-fine approach (Dongand Lapata, 2018) to decompose the decoding pro-cess of a SemQL query into two stages, which issimilar with IRNet. The first stage is to predicta skeleton of the SemQL query with skeleton de-coder. Then, a detail decoder fills in the missingdetails in the skeleton by selecting columns andtables.

4 Experiments

In this section, we evaluate the effectiveness of ourproposed ShadowGNN than other strong baselines.We further conduct the experiments with limited an-notated training data to validate the generalizationcapability of the proposed ShadowGNN. Finally,we ablate other designed choices to understandtheir contributions.

4.1 Experiment SetupDataset & Metrics We conduct the experiments onthe Spider (Yu et al., 2018), which is a large-scale,complex and cross-domain Text-to-SQL bench-mark. The databases on the Spider are split into 146training, 20 development and 40 test. The human-labeled question-SQL query pairs are divided into

Approaches Dev. TestGlobal-GNN (Bogin et al., 2019b) 52.7% 47.4%

R-GCN + Bertrand-DR (Kelkar et al., 2020) 57.9% 54.6%IRNet v2 (Guo et al., 2019) 63.9% 55.0%

RATSQL v3 + BERT-large (Wang et al., 2020) 69.7% 65.6%RATSQL♣ + RoBERTa-large 70.2% 64.0%

GPNN + RoBERTa-large 69.9% 65.7%ShadowGNN + RoBERTa-large 72.3% 66.1%

Table 1: The exact match accuracy on the developmentset and test set. ♣ means the model is implemented byus, where the only difference is the encoder part com-pared with the proposed ShadowGNN model.

8625/1034/2147 for train/development/test. Thetest set is not available for the public, like all thecompetition challenges. We report the results withthe same metrics as (Yu et al., 2018): exact matchaccuracy and component match accuracy.

Baselines The main contribution of this paper lieson the encoder of the Text-to-SQL model. As forthe decoder of our evaluated models, we improvethe SemQL grammar of the IRNet (Guo et al.,2019), where the recover success rate raises from89.6% to 99.9%. The SQL query first is repre-sented by an abstract syntax tree (AST) followingthe well-designed grammar (Lin et al., 2019). Then,the AST is flattened as a sequence (named SemQLquery) by the deep-first search (DFS) method. Dur-ing decoding, it is still predicted one by one withLSTM decoder. We also leverage the coarse-to-fineapproach to the decoder as IRNet. A skeleton de-coder first outputs a skeleton of the SemQL query.Then, a detail decoder fills in the missing detailsin the skeleton by selecting columns and tables. R-GCN (Bogin et al., 2019a; Kelkar et al., 2020) andRATSQL (Wang et al., 2020) are two other strongbaselines, which improve the representation abilityof the encoder.

Implementations We implement ShadowGNNand our baseline approaches with PyTorch (Paszkeet al., 2019). We use the pretrained mod-els RoBERTa from PyTorch transformer reposi-tory (Wolf et al., 2019). We use Adam with defaulthyperparameters for optimization. The learningrate is set to 2e-4, but there is 0.1 weight decay forthe learning rate of pretrained model. The hiddensizes of GPNN layer and RAT layer are set to 512.The dropout rate is 0.3. Batch size is set to 16. Thelayers of GPNN and RAT in ShadowGNN encoderare set to 4.

What

is

name

and

capacity

of

stadium

with

most

concert

after

year

?

Figure 3: The cosine similarity of two questions. Thepositions of “name" and ’capacity’ in the two questionsare exchanged.

4.2 Experimental Results

To fairly compared with our proposed Shad-owGNN, we implement RATSQL (Wang et al.,2020) with the same coarse-to-fine decoder andRoBERTa augmentation of ShadowGNN model.We also report the performance of GPNN encoderon test set. The detail implementations of these twobaselines show as following:

• RATSQL♣ RATSQL model replaces the fourprojection layers with another four relation-aware self-attention layers. There are totallyeight relation-aware self-attention layers inthe encoder, which is consistent with orignalRAT-SQL setup (Wang et al., 2020).

• GPNN Compared with ShadowGNN, GPNNmodel directly removes the relation-awaretransformer. There are only four projectionlayers in the encoder, which can get betterperformance than eight layers.

Table 1 presents the exact match accuracy of thenovel models on development set and test set. Com-pared with the state-of-the-art RATSQL, our pro-posed ShadowGNN gets absolute 2.6% and 0.5%improvement on development set and test set withRoBERTa augmentation. Compared with our im-plemented RATSQL♣, ShadowGNN can still stayahead, which has absolute 2.1% and 2.1% improve-ment on development set and test set. ShadowGNNimproved the encoder and SemQL grammar ofIRNet obtains absolute 11.1% accuracy gain on

DATA

Exa

ct M

atch

Acc

urac

y

40%

50%

60%

70%

80%

10% 50% 100%

GPNN RATSQL ShadowGNN

small large

Training Data

Figure 4: The exact match accuracy of GPNN, RAT-SQL and ShadowGNN on the limited training datasets.The limited training datasets are randomly sampledfrom fully training dataset with 10%, 50% and 100%sampling probability.

test set. As shown in Table 1, our proposed pureGPNN model achieves comparable performancewith state-of-the-art approach on test set. Com-pared with other GNN-based models (Global-GNNand R-GCN), GPNN gets over 10% improvementon development set and test set. To the best ofour knowledge, our proposed GPNN gets the bestperformance on Spider dataset among all the GNN-based models.

4.3 Generalization CapabilityWe design an experiment to validate the effec-tiveness of the graph projection neural network(GPNN). Considering a question “What is nameand capacity of stadium with most concert afteryear ?", which has been preprocessed, “name" and“capacity" are column names. We exchange theirpositions and calculate the cosine similarity withthe representations of the final GPNN layer in Shad-owGNN model. Interestingly, we find that “name"has the most similar with “capacity", as shown inFigure 3. The semantic meaning of the two columnnames seems to be removed that the representationsof the two column names only dependent on theexisted positions. It indicates the GPNN can getthe abstract representation of the question.

To further validate the generalization ability ofour proposed ShadowGNN, we conduct the exper-iments on the limited annotated training datasets.The limited training datasets are sampled from fullytraining dataset with 10%, 50% and 100% samplingrate. As shown in Figure 4, there is a large perfor-mance gap between RATSQL and ShadowGNN,when the annotated data is extremely limited onlyoccupied 10% of the fully training dataset. Shad-

Approaches Easy Medium Hard Extra Hard AllR-GCN (Kelkar et al., 2020) 70.4% 54.1% 35.6% 28.2% 50.7%

R-GCN♣ 78.9% 63.2% 46.6% 29.8% 58.7%R-GCN+RAT 85.0% 70.9% 56.3% 32.7% 65.6%

GPNN 87.5% 74.9% 59.2% 41.6% 69.9%RATSQL♣ 87.1% 74.9% 57.5% 46.4% 70.2%

ShadowGNN 87.5% 78.0% 61.5% 45.8% 72.3%

Table 2: The match accuracy of the ablation methods atfour hardness levels on development set. ♣ means themodel is implemented by us.

owGNN outperforms RATSQL and GPNN withover 5% accuracy rate on development set. Underthis limited training data setup, we find an inter-esting phenomenon that the convergence speed ofShadowGNN is much faster than the other twomodels. As described in Section 3, the two-phaseencoder of ShadowGNN simulates the inferenceprocess of a human when translating a question toa SQL query: abstracting and inferring. The exper-iments on limited annotated training datasets showthese two phases are both necessary, which not onlycan improve the performance but also speed up theconvergence.

4.4 Ablation StudiesWe conduct ablation studies to analyze the con-tributions of well-designed graph projection neu-ral network (GPNN). Except RATSQL and GPNNmodels, we implement other two ablation models:R-GCN and R-GCN+RAT. First, we introduce theimplementations of the ablation models.

• R-GCN♣ We directly remove the projectionpart in the GPNN. When updating the ques-tion representation, we use the representationof semantic schema as attention value insteadof abstract representation.

• R-GCN+RAT In this model, there are fourR-GCN layers and four relation-aware self-attention layers. To be comparable, the ini-tial input of R-GCN is the sum of semanticschema and abstract schema.

The decoder parts of these four ablation modelsare the same as the decoder of ShadowGNN. Wepresent the accuracy of the ablation models at thefour hardness levels on the development set, whichis defined in (Yu et al., 2018). As shown in Table 2,ShadowGNN can get the best performance at threehardness levels. Compared with R-GCN (Kelkaret al., 2020), our implemented R-GCN based onSemQL grammar gets higher performance. Com-pared with R-GCN+RAT model, ShadowGNN still

gets the better performance, where the initial inputinformation is absolutely the same. It denotes thatit is necessary and effective to abstract the repre-sentation of question and schema explicitly.

5 Related Work

Text-to-SQL Recent models evaluated on Spiderhave pointed out several interesting directions forText-to-SQL research. An AST-based decoder (Yinand Neubig, 2017) was first proposed for generat-ing general-purpose programming languages. IR-Net (Guo et al., 2019) used a similar AST-baseddecoder to decode a more abstracted intermedi-ate representation (IR), which is then transformedinto an SQL query. RAT-SQL (Wang et al., 2020)introduced a relation-aware transformer encoderto improve the joint encoding of question andschema, and reached the best performance on theSpider (Yu et al., 2018) dataset. BRIDGE (Linet al., 2020) leverages the database content to aug-ment the schema representation. RYANSQL (Choiet al., 2020) formulates the Text-to-SQL task as aslot-filling task to predict each SELECT statement.EditSQL (Zhang et al., 2019), IGSQL (Cai andWan, 2020) and R2SQL (Hui et al.) consider the di-alogue context during translating the utterance intoSQL query. GAZP (Zhong et al., 2020) proposesa zero-shot method to adapt an existing seman-tic parser to new domains. PIIA (Li et al., 2020)proposes a human-in-loop method to enhance Text-to-SQL performance.Graph Neural Network Graph neural network(GNN) (Li et al., 2015) has been widely ap-plied in various NLP tasks, such as text classifi-cation (Chen et al., 2020b; Lyu et al., 2021), textgeneration (Zhao et al., 2020), dialogue state track-ing (Chen et al., 2020a; Zhu et al., 2020) and dia-logue policy (Chen et al., 2018a,b, 2019, 2020c,d).It also has been used to encode the schema in amore structured way. Prior work (Bogin et al.,2019a) constructed a directed graph of foreign keyrelations in the schema and then got the correspond-ing schema representation with GNN. Global-GNN(Bogin et al., 2019a) also employed a GNN to de-rive the representation of the schema and softlyselect a set of schema nodes that are likely to ap-pear in the output query. Then, it discriminativelyre-ranks the top-K queries output from a generativedecoder. We proposed Graph Projection NeuralNetwork (GPNN), which is able to extract the ab-stract representation of the NL question and the

semantic schema.Generalization Capability To improve thecompositional generalization of a sequence-to-sequence model, SCAN (Lake and Baroni, 2018)(Simplified version of the CommAI Navigationtasks) dataset has been published. SCAN taskrequires models to generalize knowledge gainedabout the other primitive verbs (“walk", “run" and“look") to the unseen verb “jump". Russin et al.(2019) separates syntax from semantics in the ques-tion representation, where the attention weight iscalculated based on syntax vectors but the hiddenrepresentation of the decoder is the weight sum ofthe semantic vectors. Different from this work, welook at the semi-structured schema from two per-spectives (schema structure and schema semantics).Our proposed GPNN aims to use the schema se-mantics as the bridge to get abstract representationof the question and schema.

6 Conclusion

In this paper, we propose a graph project neuralnetwork (GPNN) to abstract the representation ofquestion and schema with simple attention way. Wefurther unify the abstract representation of questionand schema outputted from GPNN with relative-aware transformer (RAT). The experiments demon-strate that our proposed ShadowGNN can get excel-lent performance on the challenging Text-to-SQLtask. Especially when the annotated training data islimited, our proposed ShadowGNN gets more per-formance gain on exact match accuracy and conver-gence speed. The ablation studies further indicatethe effectiveness of our proposed GPNN. Recently,we notice that some Text2SQL-specific pretrainedmodels have been proposed, e.g., TaBERT (Yinet al., 2020) and GraPPa (Yu et al., 2020). In futurework, we will evaluate our proposed ShadowGNNwith these adaptive pretrained models.

Acknowledgements

We thank the anonymous reviewers for theirthoughtful comments. This work has been sup-ported by No. SKLMCPTS2020003 Project.

References

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin-ton. 2016. Layer normalization. arXiv preprintarXiv:1607.06450.

Jonathan Berant, Andrew Chou, Roy Frostig, and PercyLiang. 2013. Semantic parsing on freebase fromquestion-answer pairs. In Proceedings of the 2013conference on empirical methods in natural lan-guage processing, pages 1533–1544.

Ben Bogin, Jonathan Berant, and Matt Gardner. 2019a.Representing schema structure with graph neuralnetworks for text-to-sql parsing. In Proceedings ofthe 57th Annual Meeting of the Association for Com-putational Linguistics, pages 4560–4565.

Ben Bogin, Matt Gardner, and Jonathan Berant. 2019b.Global reasoning over database structures for text-to-sql parsing. In Proceedings of the 2019 Confer-ence on Empirical Methods in Natural LanguageProcessing and the 9th International Joint Confer-ence on Natural Language Processing (EMNLP-IJCNLP), pages 3650–3655.

Yitao Cai and Xiaojun Wan. 2020. Igsql: Databaseschema interaction graph based neural model forcontext-dependent text-to-sql generation. In Pro-ceedings of the 2020 Conference on Empirical Meth-ods in Natural Language Processing (EMNLP),pages 6903–6912.

Ruisheng Cao, Su Zhu, Chen Liu, Jieyu Li, and KaiYu. 2019. Semantic parsing with dual learning. InProceedings of ACL, pages 51–64, Florence, Italy.

Ruisheng Cao, Su Zhu, Chenyu Yang, Chen Liu, RaoMa, Yanbin Zhao, Lu Chen, and Kai Yu. 2020. Un-supervised dual paraphrasing for two-stage semanticparsing. In Proceedings of the 58th Annual Meet-ing of the Association for Computational Linguistics,pages 6806–6817.

Lu Chen, Cheng Chang, Zhi Chen, Bowen Tan, Mil-ica Gašic, and Kai Yu. 2018a. Policy adaptation fordeep reinforcement learning-based dialogue man-agement. In Proceedings of IEEE International Con-ference on Acoustics Speech and Signal Processing(ICASSP), pages 6074–6078. IEEE.

Lu Chen, Zhi Chen, Bowen Tan, Sishan Long, Mil-ica Gašic, and Kai Yu. 2019. Agentgraph: To-ward universal dialogue management with struc-tured deep reinforcement learning. IEEE/ACMTransactions on Audio, Speech, and Language Pro-cessing, 27(9):1378–1391.

Lu Chen, Boer Lyu, Chi Wang, Su Zhu, Bowen Tan,and Kai Yu. 2020a. Schema-guided multi-domaindialogue state tracking with graph attention neuralnetworks. In AAAI, pages 7521–7528.

Lu Chen, Bowen Tan, Sishan Long, and Kai Yu.2018b. Structured dialogue policy with graph neu-ral networks. In Proceedings of the 27th Inter-national Conference on Computational Linguistics(COLING), pages 1257–1268.

Lu Chen, Yanbin Zhao, Boer Lyu, Lesheng Jin, ZhiChen, Su Zhu, and Kai Yu. 2020b. Neural graphmatching networks for chinese short text matching.

In Proceedings of the 58th Annual Meeting of theAssociation for Computational Linguistics, pages6152–6158.

Zhi Chen, Lu Chen, Xiaoyuan Liu, and Kai Yu.2020c. Distributed structured actor-critic reinforce-ment learning for universal dialogue management.IEEE/ACM Transactions on Audio, Speech, and Lan-guage Processing, 28:2400–2411.

Zhi Chen, Xiaoyuan Liu, Lu Chen, and Kai Yu. 2020d.Structured hierarchical dialogue policy with graphneural networks. arXiv preprint arXiv:2009.10355.

DongHyun Choi, Myeong Cheol Shin, EungGyun Kim,and Dong Ryeol Shin. 2020. Ryansql: Recursivelyapplying sketch-based slot fillings for complex text-to-sql in cross-domain databases. arXiv preprintarXiv:2004.03125.

Li Dong and Mirella Lapata. 2018. Coarse-to-fine de-coding for neural semantic parsing. In Proceedingsof the 56th Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers),pages 731–742.

Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao,Jian-Guang Lou, Ting Liu, and Dongmei Zhang.2019. Towards complex text-to-sql in cross-domaindatabase with intermediate representation. In Pro-ceedings of the 57th Annual Meeting of the Asso-ciation for Computational Linguistics, pages 4524–4535.

Binyuan Hui, Ruiying Geng, Qiyu Ren, Binhua Li,Yongbin Li, Jian Sun, Fei Huang, Luo Si, PengfeiZhu, and Xiaodan Zhu. Dynamic hybrid relation net-work for cross-domain context-dependent semanticparsing. arXiv preprint arXiv:2101.01686.

Amol Kelkar, Rohan Relan, Vaishali Bhardwaj,Saurabh Vaichal, and Peter Relan. 2020. Bertrand-dr: Improving text-to-sql using a discriminative re-ranker. arXiv preprint arXiv:2002.00557.

Jayant Krishnamurthy, Pradeep Dasigi, and Matt Gard-ner. 2017. Neural semantic parsing with type con-straints for semi-structured tables. In Proceedings ofthe 2017 Conference on Empirical Methods in Natu-ral Language Processing, pages 1516–1526.

Brenden Lake and Marco Baroni. 2018. Generalizationwithout systematicity: On the compositional skillsof sequence-to-sequence recurrent networks. In In-ternational Conference on Machine Learning, pages2873–2882. PMLR.

Wenqiang Lei, Weixin Wang, Zhixin Ma, Tian Gan,Wei Lu, Min-Yen Kan, and Tat-Seng Chua. 2020.Re-examining the role of schema linking in text-to-SQL. In Proceedings of the 2020 Conference onEmpirical Methods in Natural Language Process-ing (EMNLP), pages 6943–6954, Online. Associa-tion for Computational Linguistics.

Yujia Li, Daniel Tarlow, Marc Brockschmidt, andRichard Zemel. 2015. Gated graph sequence neuralnetworks. arXiv preprint arXiv:1511.05493.

Yuntao Li, Bei Chen, Qian Liu, Yan Gao, Jian-GuangLou, Yan Zhang, and Dongmei Zhang. 2020. “whatdo you mean by that?”-a parser-independent interac-tive approach for enhancing text-to-sql. In Proceed-ings of the 2020 Conference on Empirical Methodsin Natural Language Processing (EMNLP), pages6913–6922.

Kevin Lin, Ben Bogin, Mark Neumann, JonathanBerant, and Matt Gardner. 2019. Grammar-based neural text-to-sql generation. arXiv preprintarXiv:1905.13326.

Xi Victoria Lin, Richard Socher, and Caiming Xiong.2020. Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In Findingsof the Association for Computational Linguistics:EMNLP 2020, pages 4870–4888, Online. Associa-tion for Computational Linguistics.

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,Luke Zettlemoyer, and Veselin Stoyanov. 2019.Roberta: A robustly optimized bert pretraining ap-proach. arXiv preprint arXiv:1907.11692.

Boer Lyu, Lu Chen, Su Zhu, and Kai Yu. 2021. Let:Linguistic knowledge enhanced graph transformerfor chinese short text matching. arXiv preprintarXiv:2102.12671.

Adam Paszke, Sam Gross, Francisco Massa, AdamLerer, James Bradbury, Gregory Chanan, TrevorKilleen, Zeming Lin, Natalia Gimelshein, LucaAntiga, et al. 2019. Pytorch: An imperative style,high-performance deep learning library. In Ad-vances in neural information processing systems,pages 8026–8037.

Jake Russin, Jason Jo, Randall C O’Reilly, and YoshuaBengio. 2019. Compositional generalization in adeep seq2seq model by separating syntax and seman-tics. arXiv preprint arXiv:1904.09708.

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem,Rianne Van Den Berg, Ivan Titov, and Max Welling.2018. Modeling relational data with graph convolu-tional networks. In European Semantic Web Confer-ence, pages 593–607. Springer.

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani.2018. Self-attention with relative position represen-tations. In Proceedings of the 2018 Conference ofthe North American Chapter of the Association forComputational Linguistics: Human Language Tech-nologies, Volume 2 (Short Papers), pages 464–468.

Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N Gomez, ŁukaszKaiser, and Illia Polosukhin. 2017. Attention is allyou need. In Advances in neural information pro-cessing systems, pages 5998–6008.

Bailin Wang, Richard Shin, Xiaodong Liu, OleksandrPolozov, and Matthew Richardson. 2020. RAT-SQL:Relation-aware schema encoding and linking fortext-to-SQL parsers. In Proceedings of the 58th An-nual Meeting of the Association for ComputationalLinguistics, pages 7567–7578, Online. Associationfor Computational Linguistics.

Thomas Wolf, Lysandre Debut, Victor Sanh, JulienChaumond, Clement Delangue, Anthony Moi, Pier-ric Cistac, Tim Rault, Rémi Louf, Morgan Funtow-icz, Joe Davison, Sam Shleifer, Patrick von Platen,Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,Teven Le Scao, Sylvain Gugger, Mariama Drame,Quentin Lhoest, and Alexander M. Rush. 2019.Huggingface’s transformers: State-of-the-art naturallanguage processing. ArXiv, abs/1910.03771.

Pengcheng Yin and Graham Neubig. 2017. A syntacticneural model for general-purpose code generation.In Proceedings of the 55th Annual Meeting of theAssociation for Computational Linguistics (Volume1: Long Papers), pages 440–450.

Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Se-bastian Riedel. 2020. Tabert: Pretraining for jointunderstanding of textual and tabular data. arXivpreprint arXiv:2005.08314.

Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, BailinWang, Yi Chern Tan, Xinyi Yang, Dragomir Radev,Richard Socher, and Caiming Xiong. 2020. Grappa:Grammar-augmented pre-training for table semanticparsing. arXiv preprint arXiv:2009.13845.

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga,Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingn-ing Yao, Shanelle Roman, et al. 2018. Spider: Alarge-scale human-labeled dataset for complex andcross-domain semantic parsing and text-to-sql task.In Proceedings of the 2018 Conference on Empiri-cal Methods in Natural Language Processing, pages3911–3921.

Rui Zhang, Tao Yu, Heyang Er, Sungrok Shim,Eric Xue, Xi Victoria Lin, Tianze Shi, CaimingXiong, Richard Socher, and Dragomir Radev. 2019.Editing-based sql query generation for cross-domaincontext-dependent questions. In Proceedings of the2019 Conference on Empirical Methods in Natu-ral Language Processing and the 9th InternationalJoint Conference on Natural Language Processing(EMNLP-IJCNLP), pages 5341–5352.

Yanbin Zhao, Lu Chen, Zhi Chen, Ruisheng Cao,Su Zhu, and Kai Yu. 2020. Line graph enhancedamr-to-text generation with mix-order graph atten-tion networks. In Proceedings of the 58th AnnualMeeting of the Association for Computational Lin-guistics, pages 732–741.

Victor Zhong, Mike Lewis, Sida I. Wang, and LukeZettlemoyer. 2020. Grounded adaptation for zero-shot executable semantic parsing. In Proceedings of

the 2020 Conference on Empirical Methods in Nat-ural Language Processing (EMNLP), pages 6869–6882, Online. Association for Computational Lin-guistics.

Victor Zhong, Caiming Xiong, and Richard Socher.2017. Seq2sql: Generating structured queriesfrom natural language using reinforcement learning.CoRR, abs/1709.00103.

Su Zhu, Jieyu Li, Lu Chen, and Kai Yu. 2020. Effi-cient context and schema fusion networks for multi-domain dialogue state tracking. In Proceedings ofthe 2020 Conference on Empirical Methods in Natu-ral Language Processing: Findings, pages 766–781.