Web Science & Technologies University of Koblenz ▪ Landau, Germany Micro-Macro-Implications Steffen Staab Slides by Klaas Dellschaft [email protected]

Web Science & Technologies

University of Koblenz ▪ Landau, Germany

Micro-Macro-Implications

Steffen Staab

Slides by Klaas Dellschaft [email protected]

Micro-Macro-Implications Slide 2 of 30 http://west.uni-koblenz.de

ProduzierenProduzieren

KonsumierenKonsumieren

Kognition

Emotion

Verhalten

Sozialisation

Wissen

BeobachtbareMikro-Interaktionen

im Web

BeobachtbareMikro-Interaktionen

im Web

Anwendungen

Protokolle

Daten & Informationen

Governance

WWWBeobachtbareMakro-Effekte

im Web

BeobachtbareMakro-Effekte

im Web


Collaborative Tagging Systems

Objectives of tag recommenders: Improve indexing quality retrieval results Reduce tagging effort


Evaluation of Tag Recommenders

How to measure the influence of tag recommenders? Which influence of a recommender is positive/negative?

Dimensions for evaluating tag recommenders: Does a recommender improve the indexing quality?

• Do recommenders lead to more consistent tagging?

Does a recommender reduce the tagging effort of a user?• How much time did the users spent on tagging?

• How many recommendations get accepted?

• How many of all assigned tags were recommended?


Outline

Measures of indexing quality What to understand under “indexing quality”? Inter-resource consistency inter-indexer consistency

Evaluation of the measures Are the measures correlated with each other? User study: Apply measures for two recommenders

Evaluation results

Conclusions


Measures of Indexing Quality


Influence of Indexing Quality on Retrieval Results

Tags describe aspects of a resource Resources are retrieved together, if they share common tags

High recall: All important aspects of a resource are described The same aspect is always described by the same tag

High precision: Only the important aspects of a resource are described The same tag is not used for describing different aspects

Precision and recall during retrieval depend on … … a consistent set of aspects for indexing the set of resources … a consistent vocabulary for describing the aspects


What does “indexing quality” mean?

Indexing quality: How good do tag vectors describe the resources? Which are relevant aspects of a resource? Are common aspects of resources described by common tags?

How does indexing quality influence the results during retrieval?

Ta

g V

ect

ors

Re

so

urc

es

r1r1 r2

r2

similarity

1.0

2.0

4.0

3.0

1v 2

0.0

3.0

5.0

2.0

v

sim(v1, v2)

describe


What does “indexing quality” mean?

Tag

Vec

tors

des

crib

e

r1r1 r2

r2 r3r3

patents

humor

news

science

0

0

10

4

1v

0

6

8

0

2v

5

9

0

0

3v

Res

ou

rces

sim(v1, v2) sim(v2, v3)

user perceived similarity


Measures of indexing quality

Inter-resource consistency Compare resource similarity to the tag vector distance Requires external knowledge about similarity of resourcesDirect but sophisticated measure of indexing quality

Inter-indexer consistency Do users agree on common description for a resource? Assumption: Users select tags independent of each other Indirect but easy measure of indexing quality

Which measure to use for evaluating tag recommenders?


Research Hypotheses

Hypothesis: Inter-indexer consistency does not measure the influence of tag recommenders on the indexing quality!

Popular Tags: Suggest most popular tags of a resource H1a: Popular Tags increase the inter-indexer consistency H1b: Popular Tags decrease the inter-resource consistency

User Tags: Suggest all tags previously applied by the user H2a: User Tags lead to a decreased or unchanged inter-indexer

consistency H2b: User Tags increase the inter-resource consistency

The measures do not correlate when evaluating tag recommenders


Measuring Inter-Resource Consistency

Idea: Compare resource similarity and tag vector distanceai: Average distance to resources in the same cluster

bi: Average distance to resources in the closest other cluster

0-1 +1

resource

cluster of similar resources

inconsistent consistent even moreconsistent

),max( ii

iii ba

abs


Measuring Inter-Indexer Consistency

Idea: Do users agree on common description for a resource?Tag Reuse Rate

Average number of users who apply a tag Used in the related work

patents

fun

humor

news

0

2

4

8

0

2

6

8

Tag Reuse Rate: 4.7 5.3 7

0

0

6

8


Evaluation


Experimental Setup

Objective: Are inter-resource and inter-indexer correlated if tag

recommendations are given?

Task given to users: Assign keywords to 10 web pages. After tagging, cluster web pages according to their

similarity ( inter-resource consistency).

Three different experimental conditions:1) No Suggestions2) User Tags3) Popular Tags

Further divided into an English and German user group


Suggestion of Popular Tags – Screenshot


Clustering of Similar Web Pages – Screenshot


Results


Sizes of the Tagging Data Set

#Users #Tags #TAS #TAS / #User

No Suggestions 74 706 2134 28.84

Popular Tags 78 531 2228 28.56

User Tags 79 466 1507 19.08

German User Group:

English User Group:

#Users #Tags #TAS #TAS / #User

No Suggestions 115 973 3150 27.39

Popular Tags 118 550 3003 25.45

User Tags 118 819 2919 24.74


Sizes of the Tagging Data Set

#Users #Tags #TAS #TAS / #User Imitated TAS Avg. Duration

No Suggestions 74 706 2134 28.84 -- 37s

User Tags 79 466 1507 19.08 26% 29s

Popular Tags 78 531 2228 28.56 64% 35s

German User Group:

English User Group:

#Users #Tags #TAS #TAS / #User Imitated TAS Avg. Duration

No Suggestions 115 973 3150 27.39 -- 34s

User Tags 118 819 2919 24.74 24% 29s

Popular Tags 118 550 3003 25.45 73% 29s


The Clustering Data Set

In average, each user identified 4.59 clusters Overall, 146 distinct clusters have been identified 11 most frequent clusters 70% of the data

The web pages cover ~7 topics 3 web pages are on the border between two topics


Sizes of the Clustering Data Set

Overall, 146 distinct clusters have been identified In average, each user identified 4.59 clusters


Differences in the Topical Clusters

English Popular Tags condition has to be excluded

The Onion + BBC News

The Onion + Patents Humor

No SuggestionsPopular TagsUser Tags

Cluster probabilities in English experiment


Differences in the Topical Clusters (I)


Differences in the Topical Clusters (II)

Significant differences between German and English experiment German and English variant not comparable to each other

Significant differences for the English Popular Tags condition English Popular Tags condition not comparable to other conditions

Iden

tifi

ed b

y y

% o

f u

sers

The Onion +BBC (News)

The Onion +Patents (Humor)

GermanEnglish

The Onion +BBC (News)

The Onion +Patents (Humor)

No SuggestionsUser TagsPopular Tags


Measuring the Inter-Resource Consistency

H1a: Popular Tags decrease the inter-resource consistency H2a: User Tags increase the inter-resource consistency

Expectation: E(spt,i) < E(sns,i) < E(sut,i)

E(spt,i) E(sns,i) E(sut,i)

German Users 0.1474 0.1847 0.2367

English Users N/A 0.1713 0.1915

(All differences are significant!)


Measuring the Inter-Indexer Consistency

H1b: Popular Tags increase the inter-indexer consistency H2b: User Tags lead to a decreased or unchanged

inter-indexer consistency

Expectation: E(trpt,i) > E(trns,i) ≥ E(trut,i)

E(trpt,i) E(trns,i) E(trut,i)

German Users 3.60 2.44 2.39*

English Users 4.67 2.76 2.68*

* Differences between E(trns,i) and E(trut,i) not significant


Conclusions

Measures of indexing quality Inter-resource consistency Inter-indexer consistencyMeasures do not correlate if recommendations are givenOnly inter-resource consistency can be used

Popular Tags Do not lead to consistent descriptions across resources Are rather counterproductive for indexing resources

User Tags Lead to consistent descriptions across resource Consolidate the personomy of users


Open Questions

Popular tags may improve understanding of web pages (humor!)

Would this help in some way? E.g. for inconsistent clusterings?


Experimental Interface:http://userpages.uni-koblenz.de/~klaasd/experiment/

Data Set:http://west.uni-koblenz.de/Research/DataSets/tagging-experiment/


What about something else?

Which music do you prefer? Why do you prefer it?

[Salganik and Watts 2009a, 2009b]


Music Lab App

App users listen to unknown bands People rated bands

9 parallel „worlds“ 1 world: people do not see ratings of others 8 worlds: people see ratings from the world they were

randomly assigned to Initially: no ratings at all

Hypothesis: If people know what they like regardless of others, seeing others do something should not affect their choice



Music Lab App

Findings: In 8 social influencers worlds popular songs were more

popular than in the baseline condition (no ratings of others visible); inversely for the unpopular songs!

The 8 different worlds had different top hits! Web page layout (list vs grid) affected the ratings, too!

(list emphasized the dynamics more than grids)

Implications Inequality increases by recommendations Predictability is reduced Unpredictability is inherent to the overall system!


Documents

Web Science & Technologies University of Koblenz ▪ Landau, Germany Micro-Macro-Implications Steffen Staab Slides by Klaas Dellschaft [email protected]