32
Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005, Sunderland, UK.

Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Embed Size (px)

Citation preview

Page 1: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Assessment and Checking of SQL: Detecting and Preventing Plagiarism

Dr. Gordon Russell, Andrew Cumming.

Napier University, Edinburgh, Scotland.

TLAD 2005, Sunderland, UK.

Page 2: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Introduction• We have an ILE for teaching Database modules.• Part of this system teaches SQL using an interactive system.• It allows users to:

– Follow about 70 problem-solving exercises.– Each exercise needs SQL statements to solve them.– The student can enter and run SQL statements.– The SQL is automatically assessed and feedback given.– Students use this feedback to learn

• The site also offers incremental assessments to students, with automatic feeback.

Page 3: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,
Page 4: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Tutorial Index

Page 5: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,
Page 6: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,
Page 7: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Accuracy

• The accuracy measure used is, put simply, how similar is the output of executing the student’s SQL to the SQL I wrote as the sample solution.

• The basic algorithm takes the table produced by the student’s SQL statement, and then compares each cell of that table against the sample solution.– If the cell is in both tables then score+1– If the cell is not in the sample solution score-1– Divide the final score by the number of cells in the

biggest of the sample solution table and the student’s SQL table.

Page 8: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Example: Accuracy 4/9 or 45%

SELECT *

FROM people

ID Lastname

1 Russell

7 Kennedy

ID Firstname Lastname

1 Gordon Russell

4 Andrew Cumming

7 Jessie Kennedy

SELECT ID, Lastname

FROM people

WHERE ID IN (1,7)

Student’s Attempt Correct Answer

Page 9: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Assessments

• When students complete a “tutorial” they can take an assessment.

• There are 4 assessments to complete.• Each assessment has 2 questions.• The assessments can be taken at any time, and

thus are unsupervised.• It is important that we are confident that

assessment results are measuring that student’s SQL ability…

Page 10: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,
Page 11: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

System Evolution

• Many changes have been made to the system to– Improve student results– Manage a lack of student attendance at

tutorials– Keep good student interested– Give an impression of one-to-one attention– Removing the need to plagiarise

Page 12: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Why Plagiarise

• An online environment suffers from a number of issues:– Effort: “I am just cheating a machine”– Instant feedback: “It doesn’t say 100% yet”– Peers: “I got 100%, what did you get?”– Time: “Not enough time left to do it myself”

Page 13: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

What is plagiarism?

• Here is simply refers to two or more students who have submitted SQL which has a common electronic source.– An emailed solution– Shared answer then distributed

• Some plagiarists have verbatim copies, while others spend time disguising the copies.

Page 14: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Detecting Plagiarism

• We have written a number of algorithms to detect similarity between SQL statements.

• Keeping algorithms separate aids understanding.• Combining the results of 2 algorithms is problematic.• So detection of likely plagiarists is done in first phase, result

in an XML report.• XML is processed and analysis made in phase 2.• Phase 2 produces “high confidence” plagiarist identification.

Page 15: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Equality Testing

• EQUALITY was our first detection algorithm• It looks at the SQL as a single string, and uses

string equality as the test.• It returns a number, with 10 being “they are

identical”, down to (currently) 4 “identical ignoring spaces, brackets, and case”.

• This test usually is all that’s needed.• It really detects “copy and paste” electronic

copying.

Page 16: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Shuffle Match

• SHUFFLE is our “intelligent” matcher.• It ignores all student-generated formatting.• Reformats SQL using a standard format.• It tolerates switching lines around.• It tolerates moving operands around an operator.

WHERE A = BWHERE B = A

Page 17: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Histogram Analysis

• There are only so many ways of writing SQL.• Students argue that copies are just co-incidence.• HISTOGRAM produces evidence to help stop this

argument.• It detects things irrelevant to the SQL standard.

– Trailing spaces at the end of a line– Use of spaces around commas and operators– Different cases of operands or clauses

Page 18: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Histogram

SELECT a,b FROM c where a=(select b from hia);

WHERE a= b AND c= d

CREATE View unique123

Page 19: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

<equality quality='9' />

select distinct(d.d_name)from dressmaker d, construction c, material m, \_ dress_order o, order_line olwhere d.d_no = c.maker and c.order_ref = ol.order_ref and c.line_ref = ol.line_no and ol.order_ref = o.order_no and ol.ol_material = m.material_no and c.finish_date is not null and m.fabric = 'Silk' and o.completed = 'Y'

select distinct(d.d_name)from dressmaker d, construction c, material m, \_ dress_order o, order_line olwhere d.d_no = c.maker and c.order_ref = ol.order_ref and c.line_ref = ol.line_no and ol.order_ref = o.order_no and ol.ol_material = m.material_no and c.finish_date is not null and m.fabric = 'Silk' and o.completed = 'Y'

Page 20: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

<equality quality='8' /> <signature op='0' comma='1' trail='0' /> <histogram mode='all'> <word str='grm' left='3' right='3' /> <word str='mtr' left='2' right='2' /> <word str='cost' left='3' right='3' /> <word str='Cost' left='1' right='1' /> </histogram>

SELECT distinct g.description, (g.labour_cost + q.quantity * m.cost) "Total \_ Cost" FROM garment g, material m, quantities q WHERE q.size_q = 8 AND m.fabric = 'Cotton' AND g.style_no = q.style_q AND (g.labour_cost + q.quantity * m.cost) > (\_ SELECT AVG(grm.labour_cost + (qty.quantity * mtr.cost)) FROM material mtr, quantities qty, garment grm WHERE qty.size_q = 8 AND grm.style_no = qty.style_q)

SELECT distinct g.description, (g.labour_cost + q.quantity * m.cost) "Total \_ Cost" FROM garment g, material m, quantities q WHERE q.size_q = 8 AND m.fabric = 'Cotton' AND g.style_no = q.style_q AND (g.labour_cost + q.quantity * m.cost) > (\_ SELECT AVG(grm.labour_cost + (qty.quantity * mtr.cost)) FROM material mtr, quantities qty, garment grm WHERE qty.size_q = 8 AND grm.style_no = qty.style_q)

Page 21: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

<shuffle score='0.888888888888889' restarts="0" /> <signature op='0' comma='1' trail='1' /> <histogram mode='all'> <word str='labour_cost' left='2' right='2' /> <word str='TRAIL:1' left='2' right='2' /> <word str='LABOUR_COST' left='1' right='1' /> </histogram>

select a.description, b.fabric, b.colour,\_ b.pattern from garment a, material b,quantities c, \_order_line d where a.style_no = c.style_qand c.style_q = d.ol_styleand d.ol_material = b.material_noand (a.labour_cost+c.quantity*b.cost) > (select g.labour_cost from garment g where g.LABOUR_COST > 80);

select a.description, b.fabric, b.colour,\_ b.pattern from garment a, material b,quantities c, \_order_line d where a.style_no = c.style_qand c.style_q = d.ol_styleand d.ol_material = b.material_noand (a.labour_cost+c.quantity*b.cost) > (select g.labour_cost from garment g where g.LABOUR_COST >= 80);

Page 22: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

<shuffle score='0.863636363636364' restarts="0" /> <signature op='0' comma='0' trail='1' /><histogram mode='all'> <word str='TRAIL:1' left='1' right='1' /></histogram>

select fabric,round(sum(quantity*cost*1.15),2) from order_line join quantities on(\_ ol_style=style_q and ol_size=size_q) join \_ material on (material_no=ol_material),\_ constructionwhere line_no=line_refand order_line.order_ref=construction.order_ref and FINISH_DATE between '01 jan 2002' and '31 dec 2002' group by fabric;

select fabric,round(sum(quantity*cost*1.15),2)as constructionfrom order_linejoin quantities on(ol_style=style_q and ol_size=size_q) join material on (material_no=ol_material),\_ construction

where line_no=line_refand order_line.order_ref=construction.order_ref and FINISH_DATE between '01 jan 2002' and '31 dec 2002' group by fabric;

Page 23: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Understanding Plagiarism

• With random question banks it is hard to understand plagiarism.

• Everyone doing the same questions it is easy!• Can they copy off of their lab buddy?• What about a friend of a friend?• Do students really have friends-of-friends-of-friends.• Increasing question bank size should help, but this is

expensive, and maintaining equal question difficulty is hard.• Definitely we only want the minimum bank size which

prevents plagiarism oppertunities.

Page 24: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

System in 2002

• Students did the assessments when they liked.• They had 15 questions, and did any 5 they wanted

to.• Q1-5 12% each, Q6-10 16% each, Q11-15 20%.• Argument was that this would allow weak students

to do minimum, and good students to shine, without anyone feeling they failed something.

Page 25: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Collusion in 2002

Page 26: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Moving to 2003

• Target: Eliminate big groups.• Switch to random banks.• Force student to do tutorials before assessments.• Have 4 assessments (1 per tutorial).• Each assessment has 2 questions.• Each student will use use 40% of the bank

contents.

Page 27: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Collusion in 2003

Page 28: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Into 2004

• Target: Improve student moral• Students hated failing to get Ass 3 and 4.• Change difficulty level of questions slightly.• Now students get 1 question from previous

difficulty level in every assessment.• Result: Students do more assessments!

Page 29: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Collusion in 2004

Page 30: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Summary

• Plagiarism groups are back!• However, the nodes are quite far apart.• Big blob is an off-campus “support group”.• 2nd biggest is a local support network.• Mostly each student copied only from 1 other.• A slightly bigger bank size may fix this.• We are going to use a completely new bank in 2005/6 to

analyse the impact of answer sharing between year groups.

Page 31: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

StatisticsCohort

2002 2003 2004

Total Number of students 312 301 263

Average number of questions attempted

5.7 4.6 5.8

Average Score 60% 51% 64%

Plagiarised questions 208 120 138

Percentage of dishonest answers submitted

12% 9% 9%

Page 32: Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005,

Conclusion

• From the table plagiarism is actually holding steady.

• Student marks have increased.• Question attempts have increased too.• This data covers semester 1. In semester 2,

plagiarism has decreased as a result of students realising the effectiveness of the new plagiarism detector.

• 2005/6 will be the big test.