Issues and Considerations regarding Sharable Data Sets for Recommender Systems in TEL

1

2http://www.flickr.com/photos/traftery/4773457853/sizes/lby Tom Raftery

Free the data


Why ?


Because we will get new

insights

Hendrik DrachslerCentre for Learning Sciences and Technology@ Open University of the Netherlands

Issues and Considerations regarding Sharable Data Sets for Recommender Systems in TEL

3

29.09.2010 RecSysTEL workshop at RecSys and ECTEL conference, Barcelona, Spain

#dataTEL

What is dataTEL

•dataTEL is a Theme Team funded by the STELLAR network of excellence.

•It address 2 STELLAR Grand Challenges 1. Connecting Learner2. Contextualisation

4

Recommender in TEL

5

The TEL recommender are a bit like this...

6

The TEL recommender are a bit like this...

6

We need to select for each application an appropriate recsys that fits its needs.

But...

7

Kaptain Koboldhttp://www.flickr.com/photos/kaptainkobold/3203311346/

“The performance results of different research efforts in recommender systems are hardly comparable.”

(Manouselis et al., 2010)

But...

7

Kaptain Koboldhttp://www.flickr.com/photos/kaptainkobold/3203311346/

“The performance results of different research efforts in recommender systems are hardly comparable.”

(Manouselis et al., 2010)

The TEL recommender experiments lack transparency. They need to be repeatable to test:

• Validity• Verification• Compare results

How others compare their recommenders

8

What kind of data set we can expect in TEL

9

Find an appropriate recommendation system for a set of goals, tasks, limitations, and constraints.More accurately – choose the most appropriate

system from a set of candidatesMajor claim: there is no silver bullet – a

system that is both the most accurate, the fastest, the cheapest, …

We need to select for each application an appropriate recsys that fits its needs.

Graphic by J. Cross, Informal learning, Pfeifer (2006)

The Long Tail

10Graphic Wilkins, D., (2009); Long tail concept Anderson, C. (2004)

The Long Tail


The Long Tail of Learning

The Long Tail


The Long Tail of LearningFormal Data

Informal Data

11

Guidelines for Data Set Aggregation

Justin Marshall, Coded Ornament by rootoftwo http://www.flickr.com/photos/rootoftwo/267285816

11


1. Create a data set that realistically reflects the variables of the learning setting.


11



2. Use a sufficiently large set of user profiles


11




3. Create data sets that are comparable to others


11




3. Create data sets that are comparable to others


Standardize research on recommender systems in TEL

Five core questions:

1.How can data sets be shared according to privacy and legal protection rights?

2.How to development a policy to use and share data sets?3.How to pre-process data sets to make them suitable for

other researchers?4.How to define common evaluation criteria for TEL

recommender systems?5.How to develop overview methods to monitor the

performance of TEL recommender systems on data sets?

12

dataTEL::Objectives

13

1. Protection Rights

13


13


O V E R S H A R I N G

13


O V E R S H A R I N GWere the founders of PleaseRobMe.com actually allowed to grab the data from the web and present it in that way?

13



Are we allowed to use data from social handles and reuse it for research purposes?

13



Are we allowed to use data from social handles and reuse it for research purposes?

and there is more company protection rights!

14

2. Policies to use Data

14


14


14


14


14


14


15

3. Pre-Process Data Sets

For formal data sets from LMS:

1. Data storing scripts2. Anonymisation scripts

For informal data sets:

1. Collect data2. Process data3. Share data

16

4. Evaluation Criteria

Kirkpatrick model by Manouselis et al. 2010

Combine approach by Drachsler et al. 2008

16




1. Accuracy2. Coverage3. Precision

16





1. Effectiveness of learning2. Efficiency of learning 3. Drop out rate4. Satisfaction

16






1. Reaction of learner2. Learning improved 3. Behaviour 4. Results

16







16







17

5. Data Set Framework to Monitor Performance

Formal Datasets Informal

Data A Data B Data C

Algorithms: Algoritmen A Algoritmen B Algoritmen C

Models: Learner Model A Learner Model B

Measured attributes: Attribute A Attribute B Attribute C

Algorithms: Algoritmen D Algoritmen E

Models: Learner Model C Learner Model E


Algorithms: Algoritmen B Algoritmen D

Models: Learner Model A Learner Model C


18

Join us for a Coffee ...http://www.teleurope.eu/pg/groups/9405/datatel/

19

Upcoming event ...Workshop: dataTEL- Data Sets for Technology Enhanced Learning

Date: March 30th to March 31st, 2011

Location: Ski resort La Clusaz in the French Alps, Massif des Aravis

Funding: Food and lodging for 3 nights for 10 selected participants

Submissions: For being funded, please send extended abstracts to http://www.easychair.org/conferences/?conf=datatel2011

Deadline for submissions: October 25th, 2010

CfP: http://www.teleurope.eu/pg/pages/view/46082/

This silde is available at:http://www.slideshare.com/Drachsler

Email: [email protected]

Skype: celstec-hendrik.drachsler

Blogging at: http://www.drachsler.de

Twittering at: http://twitter.com/HDrachsler

20

Many thanks for your interests

Documents

Issues and Considerations regarding Sharable Data Sets for Recommender Systems in TEL