30
Source-code Plagiarism 05/24/2022 1 Presented by Merin Paul Mtech CS-IS S1 Guide : Ms Sangeetha Jamal Dept of Computer Science

Plagiarism introduction

Embed Size (px)

Citation preview

Page 1: Plagiarism introduction

04/13/2023 1

Source-code Plagiarism

Presented byMerin Paul

Mtech CS-IS S1

Guide : Ms Sangeetha Jamal Dept of Computer Science

Page 2: Plagiarism introduction

04/13/2023 2

Contents

IntroductionTypes of Source-code Plagiarism

Textual Similarity Functional Similarity

Source Code Detection Algorithms.Detecting TechniquesTools used for code based plagiarism.Conclusion

Page 3: Plagiarism introduction

04/13/2023 3

IntroductionPlagiarism in source-code files occurs when source-code

is copied and edited without proper acknowledgment of the original author.

Techniques for plagiarism: Lexical changes and structural changes.

Lexical changes: changes that can be done to the source-code without affecting the parsing of the program

Page 4: Plagiarism introduction

04/13/2023 4

IntroductionStructural changes: changes made to the source code that

will affect the parsing of the code and involve program debugging.

Reasons for code copying: Code reusing.Programmer limitationCoincidentally implement using the same logic

Page 5: Plagiarism introduction

04/13/2023 5

TYPES OF SOURCE CODE PLAGIARISM

Textual Similarity

Functional Similarity

Page 6: Plagiarism introduction

04/13/2023 6

Textual Similarity

Two individual source codes look similar based on their textual content.

Textual content mean the words, letters, variable names, etc

Type 1, Type 2, Type 3.

Page 7: Plagiarism introduction

04/13/2023 7

Type IThe copied code fragment is as same as the original one

without any modification except white spaces, comments and line modifications.

int a; // counter// count five timesfor(a = 0; a < 5; a++){ printf(“a = %d”, a); // print value of a}return 0;

Page 8: Plagiarism introduction

04/13/2023 8

Type I

int a;

/* Loop increasing of a and print a value of it */

for(a = 0; a < 5; a++){

printf(“a = %d”, a);

}

return 0;

Page 9: Plagiarism introduction

04/13/2023 9

Type IISame as Type I and also with modifications to variable

names, function names and other user-defined identifiers.

if(a > b){ a = a - 1; b = b * a; // comment 1 }else{ b = a; // comment 2a = 0; }

Page 10: Plagiarism introduction

04/13/2023 10

Type IIif(m > n)

{m=m - 5;

n=n*m; //my comment 1

}

else

{n=m; //my comment

2m=0;

}

Page 11: Plagiarism introduction

04/13/2023 11

Type IIIA copied code fragment is done by inserting or

removing unnecessary statements.

if(a > b) { a = a - 1; b = b * a; }else { b = a; a = 0; }

Page 12: Plagiarism introduction

04/13/2023 12

Type IIIif(a > b)

{

a = a – 1;

c = 0; // this statement is added

b = b * a;

}

else

{

b = a;

a = 0;

}

Page 13: Plagiarism introduction

04/13/2023 13

Functional similarity

fragment 1 : fragment 2:

int i , j = 1; int factorial(int n)

for(i = 1; i <= VALUE; i++) {

j = j * i; if(n == 0) return 1;

else return factorial(n – 1)*n;

}

It refers to the code fragments that have the same semantic or functionality.

Page 14: Plagiarism introduction

04/13/2023 14

Source Code Detection AlgorithmsText based Token-based Parse tree-based PDG-based Metrics-based Hybrid Approaches

Page 15: Plagiarism introduction

04/13/2023 15

CONTD..Text based

Find textual match between two source codes..Simple and Fast.

Token based Using a lexer to convert the program into tokens.Find a match in token sequences. More robust to simple text replacements.

Page 16: Plagiarism introduction

04/13/2023 16

CONTD…Parse Trees

Build and compare parsetreesContains the complete information about the

source codeTree comparison can normalize conditional

statements.

Program Dependency Graphs (PDGs) Captures the actual flow of control in a program.Allows higher-level equivalences to be located.More complex.

Page 17: Plagiarism introduction

04/13/2023 17

CONTD…Metrics

capture 'scores' of code segments according to certain criteria.

Metrics are simple to calculate.Lead to false positives.

• HybridCombination of two or more previous

techniques.

Page 18: Plagiarism introduction

04/13/2023 18

Detecting TechniquesDetection via Lexical Similarities

The process of lexical analysis takes source code and converts it into a stream of lexical tokens.

Source code undergoes a series of transformation.Identification of reserved words, identifiers, and

numbers are beneficial for plagiarism detection.

Page 19: Plagiarism introduction

04/13/2023 19

CONTD…

int[] A = {1,2,3,4};for(int i = 0; i < A.length; i++) {A[i] = A[i] + 1;}

int[] B = {1, 2, 3, 4};for(int j = 0; j < B.length; j++) {B[j] = B[j] + 1;}

Page 20: Plagiarism introduction

04/13/2023 20

CONTD…

LITERAL_int LBRACK RBRACK IDENT ASSIGN LCURLY NUM_INT COMMA NUM_INTCOMMA NUM_INT COMMA NUM_INT RCURLY SEMILITERAL_for LPAREN LITERAL_int IDENT ASSIGN NUM_INT SEMI IDENT LTIDENT DOT IDENT SEMI IDENT INC RPAREN LCURLYNUM_INT SEMIRCURLY

Page 21: Plagiarism introduction

04/13/2023 21

Detection via Parse Tree Similarities

Page 22: Plagiarism introduction

04/13/2023 22

Detection via MetricsCalculate and compare attribute counts.

Programs with similar attribute counts are potentially similar programs.

Counts of operators and operands are typically used to construct attribute counts.

Page 23: Plagiarism introduction

04/13/2023 23

Tools used for code based plagiarismJplag

Finds similarities among multiple sets of source code files. JPlag operates in two phases.First phase: All programs to be compared are parsed and

converted into token strings.Second phase: Token strings are compared in pairs for

determining the similarity of each pair.It is more robust. It supports Java, c#, C, C++ and natural

language text.

Page 24: Plagiarism introduction

04/13/2023 24

CONTD..

MOSS (Measure Of Software Similarity)

Measure Of Software Similarity was developed in 1994 by Alex Aiken.

It analyzes code written in languages like C, C++, Python, Visual Basic, Javascript, FORTRAN, Lisp, Ada etc.

Provided as an internet service and given a list of source files.

Page 25: Plagiarism introduction

04/13/2023 25

CONTD… YAP (Yet Another Plague)

Token-based system.YAP works in two phases. The first phase generates a token file for each submission.The second phase compares pairs of token files using the

token matching algorithm, Running-Karp-Rabin Greedy-String-Tiling algorithm (RKRGST)

Page 26: Plagiarism introduction

04/13/2023 26

ConclusionPlagiarism in programming assignments is an inevitable

issue for most academics teaching programming.Plagiarism Detection systems are built based on a few

languages.Most of the detection software checking is done with

some repository situated in an organization. As the number of digital copies are going up the

repository size should be large and the plagiarism Detection software should be able to handle it.

Page 27: Plagiarism introduction

04/13/2023 27

ConclusionPlagiarism in programming assignments is an inevitable

issue for most academics teaching programming.Most popular plagiarism detection algorithms use string-

matching to create token string representations of programs.

The tokens of each document are compared on a pair-wise basis to determine similar source-code segments between the files.

String-matching systems are language-dependent depending on the programming languages supported by their parsers

Page 28: Plagiarism introduction

04/13/2023 28

References1) G. Cosma and M. Joy,” An Approach to Source-Code Plagiarism

Detection and Investigation Using Latent Semantic Analysis” IEEE Trans. Computers, vol. 61, no. 3, pp. 379-391, March 2012

2) Georgina Cosma, Mike Joy, Daniel White and Jane Yau, 9th August 2007 ,ICS,University of Ulster http://www.ics.heacademy.ac.uk/resources/assessment/plagiarism/

3) Okiemute Omuta ”Electronic Source Code Plagiarism Detection” Computer Engineering Department,European University of Lefke, North Cyprus

4) S. Schleimer, D. Wilkerson, and A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting,” Proc. the ACM SIGMOD Int’l Conf. Management of Data, pp. 76-85, 2003

Page 29: Plagiarism introduction

04/13/2023 29

References4) M.J. Wise, “YAP3: Improved Detection of Similarities in Computer

Program and Other Texts,” Proc. 27th SIGCSE Technical Symp., pp. 130-134, 1996.

Page 30: Plagiarism introduction

04/13/2023 30

THANK U!!!