Learning to Generate Pseudo-code from Source Code using Statistical Machine Translation

15/11/13 1

Learning to Generate Pseudo-codefrom Source Code

using Statistical Machine Translation

Yusuke OdaHiroyuki FudabaGraham NeubigHideaki HataSakriani SaktiTomoki TodaSatoshi Nakamura

IEEE/ACM ASE, November 13, 2015

15/11/13 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 2

Summary of This Study

● This presentation introduces summaries of key techniquesused in Pseudogen tool. [Fudaba+2015]

● Goal:

– Generating natural language sentenceswhich describe the behavior of each statement in source code.

– We call these output sentences "pseudo-code."

● Approach:

– Used 2 different frameworks of statistical machine translation (SMT).


Contribution of Pseudo-code

● Pseudo-code aid code reading for programming beginners.

● Programmers can double-check their code through pseudo-code.

Assisting Code Reading Debugging

if x / 5 == 0:

if x divided by 5 is 0

if x % 5 == 0:

Fix

SourceCode

Pseudocode

in naturallanguage


Pseudo-code in This Study● Line-to-line Assumption

– Each statement in source code can be written by one phrase in natural languagewith same meaning.

● This assumption represents a minimal relationship between programming and natural language.

– We ignore more complicated cases so far (e.g. snippets, functions, documents).

if x % 5 == 0:(body)

y = 'foo'

(if...)else:(body)

print('bar')

if x is divisible by 5,

assign a string 'foo' to y.

if not,

print a string 'bar' to the output stream.

Python English (to be generated)


Related Work for Sentence Generation● Rule-based methods e.g. [Buse+ '08], [Sridhara+ '10], [Sridhara+ '11], [Moreno+ '13]

– Can use detailed information, however requires high cost maintainance.

os.print(・) →　　print ・ to output streamos.print(・) →　　print ・ to output stream

msg →　　messagemsg →　　message

print message to output system

Search on rule table

Combine

print message to output system

Search on KB

Propose

KnowledgeBase

KnowledgeBase

os.print(msg)print message to output systemos.print(msg)print message to output system

os.print(msg)

os.print(msg)

● Data(IR)-based methods e.g. [Haiduc+ '10], [Eddy+ '13], [Wong+ '13], [Rodeghero+ '14]

– Can use large corpora from real wold, however sometimes occurs search error.


Statistical Machine Translation(SMT)


Statistical Machine Translation (SMT)● Key idea: Combining good parts of rule-based and data-based methods.

1. Training: Extract transformation rules between two languages from large corpus.

2. Generating: Search accurate combination of rules for an input data.

● Merit

1. Automated: Most translation rules are automatically obtained.

2. Scalable: Increasing the amount of corpus improve translation quality.

● We used 2 different SMT frameworks:

1. Phrase-based machine translation (PBMT)

2. Tree-to-string machine translation (T2SMT)

CorpusTranslator

Training Generating

SourceSentence

TargetSentence


1. Tokenize

if

x

%

5

==

0

:

if if if

2. Select Phrase Pairs

Phrase-based Machine Translation (PBMT)● Use token strings to generate output.

Python: if x % 5 == 0:

English: if x is divisible by 5

4. Synthesize Target Sentence

Simple method, we only need tokenizersCannot capture source structures

x x

%5

by5

==0:

isdivisible

3. Reorder

if if

x x

%5

by5

==0:

isdivisible


Tree-to-string Machine Translation (T2SMT)● Use syntax trees to generate output.

Python: if x % 5 == 0:

1. Parse

if :

if

cmp body

binop == 0

% 5x

if :

if

cmp

body

binop == 0

%

5x

if X

Y isdivisible

by Z

x 5

X

Y Z

2. Select Subtrees

Can capture source structuresComplicated method, we need tree treatment

3. Synthesize Target Sentence

English: if x is divisible by 5


TranslationModel

CombineFeatures

RuleExtraction

TranslationRules

& Stats

Phrase-levelRelationship

Training Process of SMT Methods

SourceCorpus

TargetCorpus

MakingWord

AlignmentAlignment

Token-levelRelationship

MakingLanguage

Model

TargetLanguage

Model

Evaluate Fluency of Output


Word Alignment● Making word alignment (token-level relationship)

– Using a statistical model.

if x % 5 == 0 :

if

x

is

divisible

by

5

Python

English


Rule Extraction (PBMT)● Making word alignment (token-level relationship)

– Using a statistical model.

● Extract phrase pairs according to aligned words.

if x % 5 == 0 :

if

x

is

divisible

by

5

== 0 : → is divisible

x % 5 == → x is divisible by 5

if x → if x

% 5 → by 5

5 == 0 → is divisible by 5

...and so on

Python

English


x % 5 == 0

cmpbinop

x is divisible by 5

x

x

5

5

Rule Extraction (T2SMT)● Given word alignments, tree-to-string rules are extracted according to

aligned words and the source parse tree.

cmpbinop

if

cmpbinop

5x

if

isdivisible

by

x

% == 0 :

5

if

＋ − −

X % Y == 0

cmpbinop

X is divisible by Y


SMT for Pseudo-code Generation


Requirements for SMT Methods

PBMT T2SMT

● Tokenizer for natural language

– Use NLP tools.

● English: Stanford Tokenizer

● Japanese: MeCab

● Tokenizer for natural language

– Like as PBMT

● Tokenizer for programming language

– Use the tokenizer provided from programming language itself.

● Parser for programming language

– Parser should generate parse trees

● Includes all tokens as its leaf nodesto be used for word alignment

– But most programming languages provide only AST parser.


Problem of AST

• Problem: Mismatching of token nodes.

If

Compare

BinOp

Name

%Loadx

Num

5 ==

Num

0

Body

id ctx

left op right

left

ops[0]

comparators[0]

n

n

test

body

if x is divisible by 5

?

English

– There are redundant nodes.

– Some words in natural language arealigned to inner nodes in AST.

Our approachApplying simple transformation rules

to avoid token mismatching


Parse-like Tree (1): Head Insertion

1. Insert HEAD leaves (= label of each nodes).

If

Compare

BinOp

Name

%Loadx

Num

5 ==

Num

0

Body

NumNumNameBinOpCompareIf

id ctx

left op right

left

ops[0]

comparators[0]

n

n

test

body

HEAD

HEAD

HEAD

HEAD

HEAD

HEAD


Parse-like Tree (2): Pruning


2. Delete redundant nodes.If

Compare

BinOp

Name

%Loadx

Num

5 ==

Num

0

Body

NumNumNameBinOpCompareIf

id ctx

left op right

left

ops[0]

comparators[0]

n

n

test

body

HEAD

HEAD

HEAD

HEAD

HEAD

HEAD


Parse-like Tree (3): Simplification


2. Delete redundant nodes.

3. Integrate some nodes.If

Compare

BinOp

Name

%x

Num

5 ==

Num

0NumNumNameIf

id

left op right

left

ops[0]

comparators[0]

n

n

test

HEAD

HEAD

HEAD

HEAD

x 5 0


Parse-like Tree (4): Final Tree

• Finally, we obtain the parse-like tree below.

If

Compare

BinOp

% ==If

leftop

right

left

ops[0]

comparators[0]

test

HEAD

x 5 0

if x is divisible by 5English


Experiments


Corpus Summaries● We gathered 2 corpus with different language pairs.

1. Python-to-English

• Python ... Extracted from Django framework

• English ... Handmade by 1 human

• Amount ... 18,805 pairs

• Usage ... 17,000 for training, 1,805 for evaluation

2. Python-to-Japanese

– Python ... Extracted from student code for programming exercise

– Japanese ... Handmade by 1 human

– Amount ... 722 pairs

– Usage ... 10-fold cross varidation (9/10 for training, 1/10 for evaluation)


Evaluated Methods

PBMT

Raw-T2SMT

Modified-T2SMT

Phrase-based

Tree-to-string

Tree-to-string

Token stringsgenerated from tokenize module

AST generated from ast module

Parse-like tree(AST with transformation rules)

Method Framework Input data structure


Evaluation Setting● We examined 2 points:

Intrinsic evaluation:Translation quality

Extrinsic evaluation:Code understanding

● Apply evaluation metrics used in machine translation studies

– Automatic evaluation: BLEU

– Human evaluation: Acceptability

● Examine our generator in actual task:

Python

Pseudocode

ReadAnswer

Readability

➔ 0➔ 1➔ 2➔ 3➔ 4➔ 5

Record Time

● Python + no pseudo-code● Python + generated pseudo-code● Python + human-written pseudo-code


Results: Intrinsic Evaluation

● BLEU and Acceptability has the same tendencies:

Modified-T2SMT > Raw-T2SMT > PBMT

● Modified-T2SMT method has the best performance in all settings.

– 72% of test samples achieve the highest Acceptability (= gramatically correct & fluent)

GenaeratorBLEU%

English Japanese

PBMT 25.71 51.67

Raw-T2SMT 49.74 55.66

Modified-T2SMT 54.08 62.88

PBMT Raw-T2SMT Reduced-T2SMT0%

20%

40%

60%

80%

100%

5

4

3

2

1

Cu

mu

lativ

e A

cce

pta

bili

ty

Human Evaluation: Acceptability[Goto et al. 2013] (Python-Japanese)

50% 63% 72%

(do not compare scores between English and Japanese)

Automatic Evaluation: BLEU [Papineni et al. 2002]


Results: Code Understanding

● Generated pseudo-code can improve code readability compared with no pseudo-code.

● But reading time increases.

– This comes from generation error (oracle pseudo-code decreases reading time).

Group Pseudo-code Readability(6-grade Likert)

Mean ReadingTime [s]

Experienced(8 people)

No 2.55 41.37

Generated 2.71 46.48

Human-written 3.05 35.65

Inexperienced(6 people)

No 1.32 24.99

Generated 1.81 39.52

Human-written 2.10 24.97

Code Readability and Reading Time (Python-Japanese, Modified-T2SMT)


Conclusion / Future Works● Summary:

– Generating natural language sentence (we call it pseudo-code) from source statements using statistical machine translation (SMT).

– For tree-to-string (T2SMT) method, we apply transformation rules to make parse-like tree.

● Results:

– SMT can generate acceptable sentences.

● 54% BLEU in English, 62% BLEU and 72% highest Acceptability in Japanese

– Generated sentences can aid code readability.

● However reading time is slower than human-written pseudo-code.There is still room for improvement.

● Future Works:

– Considering more complicated generation

● Input: snippets, functions, classes

● Output: multiple sentences, documents

– Applying to more language pairs

– Automated preprocessing

Engineering

Learning to Generate Pseudo-code from Source Code using Statistical Machine Translation