28
Using data to improve student research

Easybib Open Analytics NYC

Embed Size (px)

Citation preview

Page 1: Easybib Open Analytics NYC

Using data to improvestudent research

Page 2: Easybib Open Analytics NYC

EasyBib is an automatic bibliography composer.

Students use it to cite sources for their

research.

Page 3: Easybib Open Analytics NYC

We teach information literacy.

18%of all student papers include

plagiarism1

Source: (1) TurnItIn; (2) Both Sides Now: Librarians Looking at Information Literacy from High School and College.

50%likelihood of using a credible vs.

non-credible source1

4%increase in the use of paper

mills and cheating sites1

~16%of students are adequately

prepared for college.2

Page 4: Easybib Open Analytics NYC

That’s how we felt too..

Page 5: Easybib Open Analytics NYC

The problem is becoming bigger.

Page 6: Easybib Open Analytics NYC

Unprepared students make for unprepared

adults.It’s not just students who plagiarize:

•Pal Schmitt, former president of Hungary•German education minister•Jayson Blair (former New York Times writer)•Jonah Lehrer, journalist and author•Fareed Zakaria (reporter, author, host)

Page 7: Easybib Open Analytics NYC

We are in the right place to figure it out.

Over half of all students in the US (40M)

Over half a billion

citations

Page 8: Easybib Open Analytics NYC

We asked ourselves the following questions:

•What are students using in their research?

•How good are their sources?•How can we help them?

Page 9: Easybib Open Analytics NYC

We started with the basics.

_gaq.push([ 'citations._trackEvent', citationTitle, citationPublisher, citationId]);

Page 10: Easybib Open Analytics NYC

Here’s what we found.Top sources 2010

•Wikipedia•Google1.The New York Times2.CIA World Factbook3.Oracle Thinkquest4.Buzzle5.US BLS6.Dictionary.com7.CDC8.PBS9.eHow

Source: EasyBib Google Analytics Oct 2010-Nov 2010 data.

Page 11: Easybib Open Analytics NYC

What could we do?•Warn them when their source’s

credibility is in question•Analyze the quality of their full

bibliography•Make it easier to not plagiarize•Suggest better sources

Page 12: Easybib Open Analytics NYC

Define credibility.

Page 13: Easybib Open Analytics NYC

Improve citation quality

Page 14: Easybib Open Analytics NYC

Gave students access to their own analytics

Page 15: Easybib Open Analytics NYC

To combat plagiarism, we built an audit trail for

notes

Page 16: Easybib Open Analytics NYC

So after all this...Does it blend (tm) ?

1. Wikipedia2. Bio.com3. History.com4. PBS5. Mayo Clinic6. CDC7. The New York Times8. BBC9. CNN10.WebMD11.US BLS

•Wikipedia still on top, but ...

•No content farms, no Google..

•WebMD is questionable, but its credibility can be argued for.

Source: Apr-May 2013 Google Analytics data

Page 17: Easybib Open Analytics NYC

We have to admit, it’s getting better...

We have to admit, it’s getting better...

Page 18: Easybib Open Analytics NYC

Help students find better sources

Page 19: Easybib Open Analytics NYC

How does the Research engine currently work?

Cloudant (CouchDB)MySQLLucene/Solr

Slow, asynchronous, lots of moving parts.

Page 20: Easybib Open Analytics NYC

Starting to do a bit more

StatsD::increment($metrics);

$response = $rediska->publish( array('realtime'), $citation );

Page 21: Easybib Open Analytics NYC

There’s a lot more we can do, and data will help us.

Page 22: Easybib Open Analytics NYC

Cloudant Search•Full-text search integrated into Cloudant

•Lucene syntax

•Indexing is easy

function(doc){ index("title", doc.title, {"store": "yes"});}•Grouping of sources via chained map-reduce

map: function(doc){ if (doc.title){ emit({"title": doc.title}, 1); }}reduce: _sumdbcopy: citationGroup------map: function(doc){ if (doc.title && doc.key.title){ emit(doc.value, doc.key.title); }}

Page 23: Easybib Open Analytics NYC

Live data analysis. Crowdsourcing.

•Use Cloudant Search to power feedback on sources (# of times cited in real time, quality of bibliographies derived from)

•Allow users to submit their own credibility evaluations and aggregate results

Page 24: Easybib Open Analytics NYC

SourceRank!

Credibility weighting + crowdsourcing

Synchronous & realtime via Cloudant Search

Value nodes based on nearest neighbors

And other things...

Page 25: Easybib Open Analytics NYC

Driving growth

We have the largest UGC

citation set. Making this

searchable creates a “moat.”

The more people that use

EasyBib, the better the tool

becomes.

Page 26: Easybib Open Analytics NYC

What about other data analytics tools?

Too stretched to learn more complex tools (looking for easy answers)

Costs (GA is free!)

EMR, Hadoop, Redshift, Cloudant Search: This is what’s next.

Page 27: Easybib Open Analytics NYC

Questions?