Upload
r-gandhimathi-rajamani
View
219
Download
0
Embed Size (px)
Citation preview
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
1/31
AUTOMATED SCORING SYSTEM FOR ESSAYSBy
ARUNA P 2009103010
DHIVYA PRIYA R 2009103528DIVYA HARSHINI R 2009103530
A project report submitted to the
FACULTY OF INFORMATION AND
COMMUNICATION ENGINEERING
in partial fulfillment of the requirements
for the award of the degree of
BACHELOR OF ENGINEERING
in
COMPUTER SCIENCE AND ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
ANNA UNIVERSITY
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
2/31
CHENNAI - 600025
April 2012
CERTIFICATE
Certified that this project report titled Automated Essay Scoring System
is the bonafide work of Aruna P (2009103010),Dhivya Priya R(2009103528) and Divya
Harshini R (2009103530) who carried out the project work under my supervision, for the partialfulfillment of the requirements for the award of the degree of Bachelor of Engineering in
Computer Science and Engineering. Certified further that to the best of my knowledge, the work
reported herein does not form part of any other thesis or dissertation on the basis of which adegree or an award was conferred on an earlier occasion on these are any other candidates.
Place: Chennai Prof.Dr.K.S.EaswarakumarDate: Professor and Head ,
Department of Computer Science and Engineering,
Anna University,Chennai - 600025
COUNTERSIGNED
Head of the Department
Department of Computer Science and Engineering
Anna University
Chennai600025
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
3/31
ACKNOWLEDGEMENTS
We express our deep gratitude to our guide, Prof.Dr.K.S.Easwarakumar for guiding us throughevery phase of the project. We appreciate his thoroughness, tolerance and ability to share hisknowledge with us. We thank him for being easily approachable and quite thoughtful. Apart
from adding his own input, he has encouraged us to think on our own and give form to our
thoughts. We owe him for harnessing our potential and bringing out the best in us. Without hisimmense support through every step of the way, we could never have done it to this extent.
We are extremely grateful to Prof.Dr.K.S.Easwarakumar, Professor of Computer Science and
Engineering, Anna University, Chennai 600025, for extending the facilities of the Departmenttowards our project and for his unstinting support.
We express our thanks to the panel of reviewers Dr.Arul Siromoney, Dr.Madhan
Karky, Dr.A P Shanthi and Miss.Suganya for their valuable suggestions and critical reviews
throughout the course of our project.
We thank our parents, family, and friends for bearing with us throughout the course of our
project and for the opportunity they provided us in undergoing this course in such a prestigious
institution.
Aruna P Dhivya Priya R Divya Harshini R
ABSTRACT
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
4/31
The objective ofautomated essay scoring system is to assign scores to essays written in
an educational setting. It is a method ofeducational assessmentand an application of Natural
Language Processing using word based document vector construction method and by adoptingContent Vector Analysis (CVA) . CVA can be used in this case as the distribution of words in
corporate datasets can be expected to be of random nature. Our system will use Model Based
approach in order to overcome huge space, storage and training requirements. Here, we calculatethe deviation of the students essay with respect to the ideally scored essays.The system evaluates the essays based on established rubrics viz. Surface Features,
Spelling errors, Grammar mistakes and correlation with the topic. Then, the individual raw
scores so determined are taken and weights are assigned depending on the salience. The scoresare then subjected to regression techniques using which the final score is calculated. In this
manner, we can ensure that the essays are graded uniformly with equity and less fatigue.
Contents
Certificate iAcknowledgements ii
Abstract(English) iii
Abstract(Tamil) iv
List of Figures viiiList of Tables ix
1 INTRODUCTION 1
1.1 Basic Cryptography . . . . . . . . . . . . . . . . . . . . . . . . .2
1.1.1 Symmetric Key Cryptography . . . . . . . . . . . . . . .
2
1.1.2 Public Key Cryptography . . . . . . . . . . . . . . . . . .3
1.1.2.1 Encryption . . . . . . . . . . . . . . . . . . . .
41.2 Other Flavours of Cryptography . . . . . . . . . . . . . . . . . .
5
1.2.1 Identity-Based Cryptography . . . . . . . . . . . . . . . .5
1.3 Provable Security . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2 PRELIMINARIES 72.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Bilinear Pairing . . . . . . . . . . . . . . . . . . . . . . .
7
2.1.2 Hardness Assumptions . . . . . . . . . . . . . . . . . . .8
2.1.2.1 Discrete Logarithm Problem . . . . . . . . . .
82.1.2.2 Computational Diffie-Hellman Problem . . . .
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
5/31
8
2.1.2.3 Decisional Diffie-Hellman Problem . . . . . . . 8
2.1.2.4 Bilinear Diffie-Hellman problem . . . . . . . .9
2.1.2.5 Decisional Bilinear Diffie-Hellman Problem . . 9
3 RELATED WORK 113.1 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
v
3.1.1 Formal model of Encryption . . . . . . . . . . . . . . . .
113.1.2 Security of Encryption Schemes . . . . . . . . . . . . . .
12
3.1.2.1 IND-CPA Game . . . . . . . . . . . . . . . . . 133.1.2.2 IND-CCA Game . . . . . . . . . . . . . . . . . 14
3.1.2.3 IND-CCA2 Game . . . . . . . . . . . . . . . . 15
3.2 Identity-Based Encryption . . . . . . . . . . . . . . . . . . . . .
173.2.1 Boneh Franklin IBE: . . . . . . . . . . . . . . . . . . . . 19
3.2.2 Other ID-Based Schemes: . . . . . . . . . . . . . . . . . 21
3.3 Proxy Re-Encryption . . . . . . . . . . . . . . . . . . . . . . . .23
3.4 Digital Signatures . . . . . . . . . . . . . . . . . . . . . . . . .
25
4 REQUIREMENT ANALYSIS 274.1 Product Perspective . . . . . . . . . . . . . . . . . . . . . . . . .
27
4.2 Product Functionality . . . . . . . . . . . . . . . . . . . . . . . .27
4.3 User Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
4.5 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . .
31
5 SYSTEM DESIGN 365.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . .
36
5.2 Encryption Scheme - A variant of Twin Boneh- Franklin Scheme .
375.3 Signature Algorithm - Hess Signature Scheme . . . . . . . . . . .
38
5.4 Proxy Re-encryption Scheme . . . . . . . . . . . . . . . . . . . .40
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
6/31
6 SYSTEM DEVELOPMENT 43
6.1 Scheme Implementation . . . . . . . . . . . . . . . . . . . . . .
436.1.1 Tools used for implementation . . . . . . . . . . . . . . .
43
6.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .457 RESULTS AND DISCUSSIONS 50
7.1 Screenshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8 CONCLUSIONS 568.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
vi
References 57
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
7/31
vii
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
8/31
List of Figures4.1 UseCase Diagram . . . . . . . . . . . . . . . . . . . . . . . . . .
29
4.2 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . .304.3 Registration Process . . . . . . . . . . . . . . . . . . . . . . . . .
31
4.4 Encrypting Questionnaire . . . . . . . . . . . . . . . . . . . . . .32
4.5 Conducting Exam . . . . . . . . . . . . . . . . . . . . . . . . . .
33
4.6 Encrypting Answerscript . . . . . . . . . . . . . . . . . . . . . .34
4.7 Re-encryption and Evaluation Phase . . . . . . . . . . . . . . . .
355.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . .
36
6.1 Accessing Different Centers Question . . . . . . . . . . . . . . .
476.2 Tampering Marks . . . . . . . . . . . . . . . . . . . . . . . . . .
48
6.3 Unsuccessful Decryption . . . . . . . . . . . . . . . . . . . . . .49
7.1 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.2 Login Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
517.3 Student Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.4 Questions Display . . . . . . . . . . . . . . . . . . . . . . . . . .
537.5 AnswerScript with DUMMY ID . . . . . . . . . . . . . . . . . .
53
7.6 Faculty Interface . . . . . . . . . . . . . . . . . . . . . . . . . .54
7.7 View Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
9/31
viii
List of Tables
3.1 Encryption schemes . . . . . . . . . . . . . . . . . . . . . . . . .22
3.2 Signature schemes . . . . . . . . . . . . . . . . . . . . . . . . .
25
6.1 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
10/31
ix
CHAPTER 1
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
11/31
INTRODUCTION
Writing tests are increasingly being included in large-scale assessmentprograms and high stakes decisions. However, Automated Essay Scoring(AES) systems developed to overcome issues of marker inconsistency,volume, speed, cost and so on, also raise issues of score validity. In orderto fill a crucial gap identified in the current approaches used to evaluate
AES systems, we propose a framework that draws upon the current theoryof validation, for assessing the validity of scores produced from AutomatedEssay Scoring systems (AES) in a systematic and comprehensive manner.
MOTIVATION:With the advent of online examinations like GRE, GMAT and CET4 there had
been an increasing call for automation in the scoring process. Scoring of objective
questions has been considerably simple and it has been existent since years back.But the evaluation of essays has been in practice only manually in most of the
cases. This is because of the high complexity involved in programming such a
system that will perform as good as a human in its cognition. With the evolution of
advanced text database practices and Natural Language Processing (NLP)
techniques, this has become possible off late.
Any automated essay grading system should offer several salient features, most
importantly, the following:
1. Speed: Score generated in a matter of seconds as against the time-consuming
manual correction.
2. Ease/Less fatigue: Process made easy by automation as against the laborious
manual task.
3. Equitable: No place for any unjust favoring or unfair partiality or preference; all
scores are generated unbiased.
4. Uniformity: Overcomes the problem of different mindsets or attitudes of
different evaluators; ensures all essays are graded on a similar outlook.
SCOPE:
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
12/31
As automation has become the order of the day, with most of the jobs being
automated, evaluating essays has been an issue for long time. Since late 1960s,systems have been developed to evaluate essays automatically. To address the
shortcomings of the earlier systems, we adopt a new approach and propose a
versatile system.All the phases of the system, namely content discovery,analysis,grading of content shall operate in an unsupervised fashion and with no
need of manual assessment.
LITERATURE REVIEW::Lin Bin,Lu Jun,Yao Jian-Min,Zhu Qiao-Ming[1]proposes in AUTOMATEDESSAY SCORING USING KNN ALGORITHM a methodology of transformation
of Essays into Vectors in which the training set of essays is converted into vectorsof word frequencies.Then they are then transformed into word weights and these
weight vectors occupy the training space. To score the test essay, it is also
converted into a weight vector. A search is conducted to find the training vectorsmost similar to it.This is measured by the cosine between the test and trainingvectors. The closest matches among the training set are used to assign a score to
the test essay in the lines of Burstein.
Feature Selection KNN:
After eliminating the stop words, the features of the essays viz. words, phrases and
arguments are chosen .The value of each vector is expressed by the term frequencyand inversed document frequency (TF-IDF) weight. Similarity of essays is
calculated with cosine in the KNN algorithm.Term frequency TF is used to selectfeatures by predetermined thresholds.To find the highest information features, we
need to calculate information gain for each word. Information gain IG for
classification is a measure of how common a feature is in a particular classcompared to how common it is in all other classes.K-Nearest Neighbor Algorithm for Text Categorization:
We determine the most similar k features as nearest neighbors to a givenfeature, and assign individual scores according to the distance of the neighbors
calculated from suitable methods like Euclidean and Cosine relation. The finalscore is the weighted sum.
http://en.wikipedia.org/wiki/Information_gain_in_decision_treeshttp://en.wikipedia.org/wiki/Information_gain_in_decision_trees8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
13/31
--The major issues arise because of the following limitations of using KNN
algorithm.1. Memory based : Large space requirement to store the entire dataset is required.2. Unreliable Neighborhood-Lack of Overlapping results: Since the dataset usually
gives a sparse matrix, there are no overlapping values. But similarity measuresrequire high overlapping for higher reliability.3. Unsuitable for corporate dataset- Due to Sparseness: By the above argument,since the corporate datasets are usually sparse, KNN is less suitable for them.
In[2]AUTOMATED ESSAY SCORING SYSTEM FOR CET4 Yali Li,Yonghong
Yan comes up with a methodology involving the following Score-Determining
Components.The surface Features involve the number of characters in the
document(Chars),the number of words in the document(words),the number of
different words (Diffwds),the fourth root of the number of words in the document,as suggested by the Page(Rootwds),the number of sentences in the
document(Sents),average word length(Wordlen=Chars/Words),average sentence
length(Sentlen=Words/Sents),number of words longer than five
characters(BW5).Grammar checking uses ALEK(Assessment of Lexical
Knowledge)a tool.Bigram and trigram of part-of-speech tag sequence are used.For
Sentence Error Detection,Parts-of-Speech tag analysis is used.To determine the
relation to the topic, 2 approaches are being used viz.Simple Comparison of
Keywords and Content Vector Analysis. The final score is computed by linear
regression which is the linear weighted sum of several components.
The major limitations occur due to linear Regression and are as follows:
Incomplete description of relationship among variables-
1. Extremes are ignored.
2. Only Mean is considered
Sensitive to outliers
The precision attained by this methodology is 70.125%
In [3]AUTOMATED ESSAY SCORING USING GENERALIZED LATENT SEMANTIC
ANALYSIS by Md. Monjurul Islam , A. S. M. Latiful Hoque,Informational retrieval by Latent
Semantic analysis using Singular Value Decomposition is achieved by the following process as
depicted in the block diagram. It uses n-gram by document matrix.
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
14/31
The issues that occur in the performance due to SVD are:
1. Very high order of complexity of the Algorithm : O(n^2k^3).
2. Requires Normal distribution of term : Words are required to be normally distributed across
the documents. But in corporate datasets there is sparse distribution.
Yali Li, YonghongYans hypotheisis in [4] AN EFFECTIVE AUTOMATED ESSAYSCORING SYSTEM USING SUPPORT VECTOR REGRESSION follows Dataset
Construction Using Character n-grams over words.The key idea is to use Content Vector
Analysis(CVA) over Latent Semantic Analysis(LSA). Uses SVM( Support Vector Machine)
which is Model Based and popular in text classification problems where very high-dimensional
spaces are the norm.Support Vector Regression is used for the final score calculation. Evaluation
of rhetorical arguments is also possible by treating each argument as a mini-document. The
process involves vector construction for each document by extraction of words,subsequent
morphological analysis, followed by frequency vector construction and finally weight-
assignment based on salience( relative freq and inverse relative freq).In CVA,cosine relation
between test vector and document/class vector is computed and the class with highest correlationis selected.
PROBLEM DEFINITION:
From the survey of the related literature it is apparent that the development of a simple system
that grades user essays with an accuracy similar to manual correction remains as a great
challenge in the arena of educational data mining.Manual evaluation has its own drawbacks. It is time-consuming and requires an arduous task of
reading and evaluating when the corpus is very large. There is also an additional possibility of
unduly favoring of the preferred candidate. When an essay is being graded by different
evaluators there is a likely chance that the problem of different mindsets evolves. So essays arenot graded with a uniform outlook. The existing automated systems are more complex and
require large amount of training before deploying them..Hence, the evaluation systems must be
improved to support automated grading in a faster, simpler, more scalable and an equitablemanner.
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
15/31
.
CONTRIBUTIONS:
Using Model-based approach :
Our system will use Model Based approach because memory based approach requires a large
training dataset. Also, this is popular in text classification problems where very high-dimensionalspaces are the norm. In comparison to the Memory based approach that needs huge space,
storage and training requirements, our model based approach of calculating the deviation of theessay examined from the ideal scored essaysis better preferred.
Salience based correlation method:
To assess the consistency of essays with the topic, we use Content Vector Analysis (CVA) inpreference to Latent Semantic Analysis (LSA). This is because in LSA a higher order
algorithmic complexity of O(n^2k^3) is involved in SVD and words are necessarily required to
exhibit normal distribution for a good performance.CVA can be used in this case as the
distribution of words in corporate datasets can be expected to be of randomnature only.Salience is determined by the relative frequency of the word in the document and the inverse
relative frequency over the other documents. For example, the word the may appear very
frequently in a given document but its salience is very low because it appears in all thedocuments. If the word metamorphosis occurs even a few times, it will have a high salience
because there are relatively few documents that contain this word.
Using Ridge Regression for final score consolidation:A complete relationship among the different variables(The variables here correspond to the differentscores resulting from various rubrics predefined) is established by means of Ridge Regression. Thismethod ensures that the mean is taken into consideration along with the extremes.Ridge regression is L2 regularized linear regression in which the final prediction is the result of a widervariety of inputs as against a single input in the case of Linear Regression. This tends to make the systemmore robust for generalization.
ORGANIZATION OF THIS THESIS
The remainder of this thesis is structured as follows. Chapter 2 gives a requirement analysis of
the proposed system covering all functional and non-functional requirements following which
the system use-cases are presented. Chapter 3 of this thesis gives an overview of the Systemdesign and its architecture. In Chapter 4, the algorithms and techniques employed are discussed.
Chapter 5
discusses the performance evaluation of the proposed system in comparison with baseline
methods. The thesis ends by summarizing the conclusions obtained along pointers to futureresearch in Chapter 6. The references for this research are presented in Chapter 7 followed by the
snapshots in Appendix A.
REQUIREMENT ANALYSIS:FUNCTIONAL REQUIREMENTS:
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
16/31
The proposed system is designed to have the following features:
It evaluates the essay based on 4 dimensions viz. Grammar, Spelling , Correlation with the topic
and Surface features.For Grammar checking, we use jlinkgrammar, a grammatical system to classify natural
languages by designating links between sequences of words. Instead of using part-of-speech tags
based on rules to parse sentences, it uses links to create a syntactic structure for a language.Spelling mistakes are identified and thereby automatically corrected by using the Peter Norvigsspell correction method. It uses probabilistic and Bayesian theories in its implementation.
NONFUNCTIONAL REQUIREMENTS:User Interface Design:
For Information Retrieval, Stemming and Stop word removal do not require any specialized
interfaces. The system operates as a stand alone application. The system provides interfaces forquestion/answers
to assess the learners competency level. Interfaces are also provided for presenting the details of
the score assisgment process to the learner in the form of JFrames.In addition, the interfaceincludes provision for specification of the target corpus.
2.2.2 Documentation
The system is properly documented. All requirements(functional/non functional),use casediagrams, their description, the various packages used and the relevant tools employed shall form
a part of the system documentation. The source code listing has also been documented to serve
as a reference to future developers and contributors.
2.2.3 Hardware Considerations
The following hardware considerations are identified:
Operating System: UbuntuProcessor: Pentium 2.0 GHz or higher
RAM: 256 MB or more
Hard Drive Space : 10 GB or more
2.2.4 Performance Requirements
The performance of the system is evaluated against baseline method of manual grading of essaywith respect to 4 parameters viz. Grammar, spelling, Correlation with the topic and Surface
features.
Error HandlingThe following errors are possible in each of the modules:
Null entry in the text area:
When the user enters nothing in the essay text area, a message box appears for the first time
prompting him to key in the essay. This is done to ensure that he does not submit an essay byclicking the Submit button without his knowledge. If the action is repeated, the null essay is
considered and assigned a score of zero.
Content Vector analysis:
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
17/31
In case there is total absence of relation between the essay question and the candidates answer, a
score of zero is assigned notwithstanding the performance in Grammar,Spelling and Surface
Feature aspects.
CONSTRAINTS and ASSUMPTIONS:
Language Constraints:Language of consideration is English.Assumption:
A manually prescored corpus of reference essays is assumed to be already present before the
candidate enters the essay of any particular question for a prompt score. The manually correctedessays are assumed to be evaluated on the basics of the 4 holistic rubrics mentioned above in an
error free manner.
SYSTEM MODELS:Use case model and Scenarios
The various use cases in the Figure x.x are elaborated in this section.
Use case : Create Essay questionID: 001
TITLE: Create essay question
DESCRIPTION: The question for which the candidate is required to answer is created by the
administratorACTORS: Admin
PRE CONDITIONS: The admin should have been logged in
POST CONDITIONS: The question flashes on the User Interface.
Use case : Input Test essay
ID: 002
TITLE: Input Test essayDESCRIPTION: The student enters his response for the required question in the Text area
ACTORS: Student
PRE CONDITIONS: The student should have been logged in and the question should have beenprompted on the screen
POST CONDITIONS: The essay,upon submitting, gets stored in the required file.
Use case : Store Essay
ID: 003
TITLE: Store Essay
DESCRIPTION: The essay that the student enters gets stored at the required file location.ACTORS: Test essay db
PRE CONDITIONS: The student should have entered the essay and clicked the Submit button.
POST CONDITIONS: The essay contents get copied to the file in the desired location.
Use case : Input Reference Essay
ID: 004
TITLE: Input Reference Essay
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
18/31
DESCRIPTION: The admin enters the name of the folder where there are pre-scored manually
graded essays.
ACTORS: AdminPRE CONDITIONS: The reference essays must be available in the folder after having been
graded manually.
POST CONDITIONS: The reference essays will be ready for comparison with the test essay.
Use Case: Text Processing
ID: 005
TITLE: Text ProcessingDESCRIPTION: Stemming and Stop-word removal of the reference and test essays are done.
ACTORS: Reference Essay db, Test Essay db
PRE CONDITIONS: The Simple Feature Extraction, Spell check and auto correction and
Grammar check should have been performedPOST CONDITIONS: The keywords of the reference and test essays are displayed.
Use Case: Generate Individual ScoreID: 006
TITLE: Generate Individual Score
DESCRIPTION: Depending on the rubrics, the essays are evaluated and suitable marks are
awarded for each.ACTORS: Reference Essay db, Test Essay db
PRE CONDITIONS: The text from test and reference essays must have been processed.
POST CONDITIONS: The individual score components are recorded.
Use Case: Display Grade
ID: 007
TITLE: Display GradeDESCRIPTION: The individual scores are combined and the overall score range is estimated.
ACTORS: Score db
PRE CONDITIONS: The individual scores must have been generated.POST CONDITIONS: The final score range is displayed on the screen.
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
19/31
CHAPTER 3
DESIGN
This chapter gives the detailed design description of modules in the system.
3.1 Data Flow Diagram:The Data Flow Diagram in Figure x.x lists the various stages involved in the implementation of
the system. The various relationships between them are also shown in Figure 3.1.
FIGURE x.x.1: DFD Level-0
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
20/31
FIGURE x.x.2: DFD Level-1
FIGURE x.x.3: DFD Level-23.2 USER INTERFACE DESIGN
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
21/31
The system uses JFrame for essay input purposes. It also uses the same for displaying the
different stages of the evaluation and for displaying the score. Text areas, Text boxes, message
boxes and tabbed pane are used for user interactivity.
3.3 OVERALL SYSTEM ARCHITECTURE
The proposed system shown in Figure x.x is composed of the following modules:
FIGURE 3.2: System Architecture
3.4 MODULE DESCRIPTIONS
3.4.1 SIMPLE FEATURE EXTRACTION:
INPUT:Test Essay
OUTPUT:Text Complexity Feature Score Component1
The system first evaluates text complexity features, such as the number of characters in
the document(Chars),number of words in the document(words),number of different words(Diffwds) fourth root of the number of words in the document, as suggested by the
Page(Rootwds), number of sentences in the document(Sents),average word
length(Wordlen=Chars/Words),average sentence length (Sentlen=Words/Sents) and number ofwords longer than five characters(BW5). Each feature has its own use. For example, the number
of words represents the length of the essay since the length requirement is say 250-300 words.
This feature can check the empty essay or essay which is ridiculously short that it cannot beprocessed and rejects it immediately. Otherwise a score can be assigned accordingly.
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
22/31
3.4.2 GRAMMAR/SPELL CHECK:
INPUT:Test EssayOUTPUT:Spell Check Score Component2
Once the essay passes the feature extraction process, the next step is to check the essayfor any spelling mistakes. The count of the number of spelling mistakes has to be recorded and
the errors must be auto-corrected.
INPUT: Auto Corrected Test EssayOUTPUT: Grammar Check Score Component3
Then the essay must be checked for grammatical mistakes using jlinkgrammar,whichworks on the basic principle of linking.It uses probabilistic parsing and deduces the number of
linkage errors from which the potential grammar errors are identified from the passage sent
through the batch file and based on it a component of score must be assigned.
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
23/31
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
24/31
3.4.4 FINAL SCORE USING REGRESSION:The individual raw scores, namely from the feature extraction process, the grammar/spell
check process and the content vector analysis process, are taken and weights are assigned for
each component. The scores are then subjected to ridge regression techniques using which the
final score is calculated.
IMPLEMENTATION
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
25/31
This chapter explains the details of implementation of all modules of the proposed
system.
4.1 IMPLEMENTATION DETAILS
The system is implemented in Java.
Java JFrames was used for the user interface design of the system.Netbeans was the IDE of choice for Java.For grammar checking , the system was coded using the features of the tool , jlinkgrammar.
Stanford Tagger is used to find the number of verbs for Surface feature analysis.
Tools used in the implementation:
Packages: Stanford Parts-of-Speech Tagger
Dictionary: Peter Norvigs essay (Spelling Correction)IDE for Java: Netbeans
Grammar check: Jlinkgrammar
TEXT PROCESSING DETAILS:
3.1 Stop Word Removal
Many of the most frequently used words in English are useless in Information Retrieval (IR) and
text mining. These words are called 'Stop words'. Stop-words, which are language-specificfunctional words,are frequent words that carry no information (i.e., pronouns, prepositions,
conjunctions). In English language, there are about 400-500 Stop words. Examples of such
words include 'the', 'of','and', 'to'. The first step during preprocessing is to remove these Stop words, which has proven as
very important.
The present work uses the stop word list customized by us.
3.2 StemmingStemming techniques are used to find out the root/stem of a word. Stemming converts words to
their stems, which incorporates a great deal of language-dependent linguistic knowledge. Behind
stemming, the hypothesis is that words with the same stem or word root mostly describe same orrelatively close concepts in text and so words can be conflated by using stems. For example, the
words, user, users, used, using all can be stemmed to the word 'USE'. In the present work, the
Stemmer algorithm is defined and used.3.3 Document Indexing
The main objective of document indexing is to increase the efficiency by extracting from the
resulting document a selected set of terms to be used for indexing the document. Document
indexing consists of choosing the appropriate set of keywords based on the whole corpus ofdocuments, and assigning weights to those keywords for each particular document, thus
transforming each document into a vector of keyword weights. The weight normally is related to
the frequency of occurrence of the term in the
document and the number of documents that use that term.3.3.1 Term Weighting
In the vector space model, the documents are represented vectors. Term weighting is an
important concept which determines the success or failure of the classification system. Since
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
26/31
different terms have different level of importance in a text, the term weight is associated with
every term as an important indicator.
The main components that affect the importance of a term in a document are the Term Frequency(TF) factor and Inverse Document Frequency (IDF) factor. Term frequency of each word in a
document (TF) is a weight which depends on the distribution of each word in documents. It
expresses the importance of the word in the document. Inverse document frequency of each wordin the document database (IDF) is a weight which depends on the distribution of each word inthe document database. It expresses the importance of each word in the document database.
TF/IDF is a technique which uses both TF and IDF to determine the weight a term. TF/IDF
scheme is very popular in text classificationfield and almost all the other weighting schemes are variants of this scheme.
Given a document collection 'D', a word 'w', and an individual documentd D, the weight w is
calculated using Equation x.
The result of TF/IDF is a vector with the various terms along with their term weight. The pseudocode for the calculation of TF/IDF is shown in Fig.2.
Determine TF, calculate its corresponding weight andstore it in
Weight matrix (WM)
Determine IDF
if IDF == zero thenRemove the word from the WordList
Remove the corresponding TF from the WM
ElseCalculate TF/IDF and store normalized
TF/IDF in the corresponding element of the
weight matrix
Fig. 2 Algorithm TF/IDF
RESULTS AND DISCUSSION(YET TO DO)TEST RESULTS AND ANALYSIS
Test case Id: AES1
Module being tested: Essay_entry UI
Test case Description:
This test case verifies if the essay is keyed in or not.
Flow of Events:
1.A question pertaining to a topic appears in the interface
2.In case the users answer is null, the user is again prompted to make his entry.
Expected Results:
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
27/31
The essay, if null, gets invalidated.
Exceptions:
If he continues doing the same for the second time, the null entry gets accepted and given a scoreof 0.
Test Result: PASS
Comments and Bugs(if any) identified: NIL
Test case Id: AES2
Module being tested: Response Recording
Test case Description:
This test case verifies if the essay entered by the user gets stored in a dedicated text file.
Pre-condition:
The user provides an answer to the essay question.
Flow of Events:
1.The user submits the essay after entering it on the UI.
2.If it is non-null the essay gets recorded in a text file
Expected Results:
The contents of the essay entered in the UI are copied on to the text file at the desired location.
Exceptions:
If the essay is null, it gets terminated after assignment of score zero.
Test Result: PASS
Comments and Bugs(if any) identified: NIL
Test case Id: AES4
Module being tested: Grammar and spell check
Test case Description:
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
28/31
This test case verifies if the essay entered by the user gets evaluated according to the no. of
spelling and grammar errors..
Pre-condition:
The user provides an answer to the essay question.
Flow of Events:
1.The essay is checked for Spelling errors and auto-corrected using Peter Norvigs method
2.The no. of such Spelling errors get recorded and influence a percentage of negative score.
3. The test essay is then checked for Grammar errors using jinkgrammar.
4. The no. of Grammar mistakes are recorded and a negative score is allotted correspondingly.
Expected Results:
The score components for Grammar and spelling are recorded.
Exceptions:
NIL
Test Result: PASS
Comments and Bugs(if any) identified: NIL
Test case Id: AES5
Module being tested: CVA
Test case Description:
This test case verifies if the most similar essay is determined using CVA.
Pre-condition:
The reference corpus contains the prescored essays.
Flow of Events:
1. The documents are indexed along with the test essay document.
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
29/31
2.The keyword extraction occurs after stemming and stopword removal.
3.The raw frequencies of the keywords in all the documents are determined.
4. The relative frequencies are found using tf -idf computation and the weighted Term Document
matrix(TDM) is found.5. The similarity level between the test essay and the reference essays is computed by cosinecomputation of vectors in which each column of Weighted TDM is treated as a vector.
6. The most similar document is identified and the corresponding score is allotted.
Expected Results:
A specific component of the score is allotted based on the relation with the reference documents.
Exceptions:
If users response is totally off the topic,it is given a score of 0.
Test Result: PASS
Comments and Bugs(if any) identified: NIL
CHAPTER 6CONCLUSION
6.1 OVERALL CONCLUSIONIn this work, we have presented a novel framework for automatic evaluation of essays.
For the grammatical checking component, we use linkages to detect errors which is more reliable
than the traditional method of defining the rules for checking grammar.Instead of using part-of-
speech tags based on rules to parse sentences, it uses links to create a syntactic structure for alanguage. For the topic detection component, we use the CVA model, and find it can effectively
detect whether an essay is off-topic especially for large number of essays. The final score so
computed by using ridge regression is influenced by a number of factors rather than being overly
affected by a single factor.
6.2 FUTUREWORK
In our work we have restricted ourselves to the English language. But it can be further extendedto cater to the other languages using suitable dictionaries. Also, for descriptive science essays,
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
30/31
provision to include and evaluate the equations and formulae can be made as an additional
enhancement.
Also, in addition to checking the correlation between the documents using exact words, similar
words can be derived with an aid of thesaurus and the checking can be performed for more
accuracy.
REFERENCES
[1] Lin Bin,Lu Jun,Yao Jian-Min and Zhu Qiao-Ming ,Automated Essay
Scoring Using KNN Algorithm ,International Conference on ComputerScience and Software Engineering , 2008.
[2] Yali Li and Yonghong Yan, Automated Essay Scoring System For
CET4 , Second International Conference on Education technology and
Computer Science ,2010.[3] Md. Monjurul Islam ,and A. S. M. Latiful Hoque , Automated Essay
Scoring Using Generalized Latent Semantic Analysis ,13th International
Conference on Computer and Information Technology ,2010.[4] Yali Li, and YonghongYan, An Effective Automated Essay Scoring
Sy
[4] Yali Li, and YonghongYan, An Effective Automated Essay ScoringSystem Using Support Vector Regression ,Fifth International Conference
on Intelligent Computation Technology and Automation ,2012.
[5]Dikli.S,An Overview of Automated Scoring of Essays,Journal of
Technology, Learning, and Assessment, 5(1),Retrieved from http://
8/10/2019 AUTOMATEDSCORINGSYSTEMFORESSAYS.docx (1)
31/31
www.jtla.org,2006.
[6]J. Burstein, K. Kukich, S. Wol, C. Lu, M. Chodorow, L. Bradenharder, and
M.Dee Harris, Automated Scoring Using A Hybrid Feature Identication
Technique, in Proc. In the Proceedings of the Annual Meeting of the
Association of Computational Linguistics,1998.
SNAPSHOTS:
In line with text| Fixed position
stem Using Support Vector Regression ,Fifth International Conference
on Intelligent Computation Technology and Automation ,2012.[5]Dikli.S,An Overview of Automated Scoring of Essays,Journal of
Technology, Learning, and Assessment, 5(1),Retrieved from http://
www.jtla.org,2006.
[6]J. Burstein, K. Kukich, S. Wol, C. Lu, M. Chodorow, L. Bradenharder, andM.Dee Harris, Automated Scoring Using A Hybrid Feature Identication
Technique, in Proc. In the Proceedings of the Annual Meeting of the
Association of Computational Linguistics,1998.
SNAPSHOTS: