12
Code recognition & CL modeling through AST Xingzhong Xu Hong Man

Code recognition & CL modeling through AST Xingzhong Xu Hong Man

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

Code recognition & CL modeling through AST   

Xingzhong XuHong Man

Page 2: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

Outline

• Introduction of AST in SSP

• AST for Code Recognition

• AST for Cognitive Linguistic Modeling

• Summary and Future Work

2Semantic Signal Processing Stevens

Page 3: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

Introduction of AST in SSP

• Most language application use Abstract Syntax Tree(AST) as an Intermediate Representation(IR) to help the computer semantically understanding code in programming domain.*

• Signal Processing Code• How to semantically analyzing it?• How to semantically modeling it?

*Terence Parr, The Definitive Antlr Reference: Building Domain-Specific Languages (Pragmatic Programmers), 2007**ANTLR

for (i = 0; i < n; i++){ acc0 += d_taps[i] * input[i];}

3Semantic Signal Processing Stevens

Page 4: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

Code Recognition 

• In order to perform code re-hosting and other semantic code analysis, we may firstly recognize the functionality of each code segment.

• In Computer Science, there are two approaches to perform Code Recognition:1. AST based recognition [Gabel, 2008] [Roy 2009]

o Generate the ASTo Perform Tree Matcher

2. Random Test based recognition [Jiang, 2009] [Bertran, 2005]o Segment the codeo Test the I/O behavior

4Semantic Signal Processing Stevens

Page 5: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

Code Recognition 

• AST represents the source code in programming domain. • Radio and computational primitives has their feature in AST.

o Filter ≈ LOOP + ACCUMULATION + MULTIPLY

5Semantic Signal Processing Stevens

for (i = 0; i < n; i++){ acc0 += d_taps[i] * input[i];}

Page 6: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

Code Recognition Result

• In order to test the idea, I design a Code Recognition demo (not fully debugged).

• Source: GNU-Radio 3.2.2 (C++)• Objective: Recognize and print the filter code.• Platform: Ubuntu 10.04 + Java SE 1.6+ ANTLR 3.2• Process:

• Generate AST for each C++ file. • Match the filter sub-tree pattern.• Print the matched code segment.

6Semantic Signal Processing Stevens

Page 7: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

Code Recognition Result

• Result:• Totally 932 C++ source files in GNU-Radio.• 689 files successfully analyzed (to be continued).• 59 filter patterns found.

for (i = 0; i < n; i += N_UNROLL){   acc0 += d_taps[i + 0] *  input[i + 0];   acc1 += d_taps[i + 1] *  input[i + 1];   acc2 += d_taps[i + 2] *  input[i + 2];   acc3 += d_taps[i + 3] *  input[i + 3]; }

for (int j = 0; j < d_len; j++) {   if (j != 0) d_pn = 2.0*d_reference->next_bit()-1.0;    sum += *in++ * d_pn;}

for (i=0; i < d_ff_taps.size(); i++)      acc += conj(d_ff_delayline[(i+d_ff_index) & ff_mask]) * d_ff_taps[i];

7Semantic Signal Processing Stevens

Page 8: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

CL Modeling 

• Intermediate Representation:• AST (Programming Domain) • CL Modeling (Signal Processing Domain)

8Semantic Signal Processing Stevens

k = N – i;

Page 9: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

CL Modeling

9Semantic Signal Processing Stevens

k = N – i;

• Rewrite and mapping the structure and tokens from the AST to CL Modeling Tree.

Page 10: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

CL Modeling Result

• In order to test our idea, I designed a CL Modeling demo based on AST.*

• One tree rewriter will translate and modify the current AST to CL Modeling Tree.

• Based on the CL Modeling Tree, print the CL Modeling XML file.

https://sites.google.com/site/stevensxingzhong/home/clmb

10Semantic Signal Processing Stevens

*Terence Parr, Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, Pragmatic Programmers, 2010.

Page 11: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

Summary & Future Work 

• The programming domain AST is a key interface for language application, in SSP project: • Code Recognition: Determine the functionality of the code

segment.• Cognitive Linguistic Modeling: As an intermediate form to

modeling the radio code.• Future Work:

• Cover more code, C++, Matlab, VHDL etc.• Discover more computational and radio primitive.• Fully support CL Modeling.

11Semantic Signal Processing Stevens

Page 12: Code recognition & CL modeling through AST Xingzhong Xu Hong Man

Reference

1. Jiang L. and Su, Z. 2009. Automatic Mining of Functionally equivalent code fragments via random testing. In Proceedings of the Eighteenth international Symposium on Software Testing and Analysis.

2. Gabel, M., Jiang, L., and Su, Z. 2008. Scalable detection of semantic clones. In Proceedings of the 30th international Conference on Software Engineering.

3. C.K. Roy, J.R. Cordy and R. Koschke B. 2009. Comparison and Evaluation of code Clone Detection Techniques and Tools: A Qualitative Approach. Science of Computer Programming.

4. Bertran, M., Babot, F., and Climent, A. 2005. An Input/Output Semantics for Distributed Program Equivalence Reasoning. Electron. Notes Theor. Comput. Sci. 137,1 (Jul.2005)

5. Terence Parr, The Definitive Antlr Reference: Building Domain-Specific Languages (Pragmatic Programmers), 2007

6. Terence Parr, Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, Pragmatic Programmers, 2010.

12Semantic Signal Processing Stevens