Lubos palcomaio2010




Citation preview

Retention of visitors coming to Palco 3.0 via the news in

SAPO: Decoy optimization

Jindřich TandlerLuboš Popelínský

Jan Kocourek

Knowledge Discovery Lab FI, Masaryk University BrnoCzech Republic

Readers of SAPO ( are presented with Palco news.

News appear occasionally that redirect the user to PalcoPrincipal (

Problem: people who come to Palco in this way usually don't stay there long enough.

Question: How can we help Palco to retain them?


Main task

Task: improve current level of bouncing of users (users coming to through news)

Hypothesis: Shared interests are more common among the users who clicked on a specific link (are viewing the same content) than among users who didn't (are not).

The link(s) clicked is probably the only information we can possibly know about users coming from

What else could be done?

Improving fidelity of already registered users by exploiting the social network

Hypothesis: socially related users share interests

e.g. we can identify users of Palco who showed interest in particular items and use their activity to make suggestions to socially related users

Current work Data has been collected into the database:

Current work Social network structure has been extracted from the

database to Pajek format

Using R software and igraph library we can compute assortative mixing coefficient

This can be used for verification of the second hypothesis (i.e. socially related users share interests)

R in combination with igraph will be used also for exploring the social network properties to get better understanding of its specific features

Current work – network properties

Exploring social network properties to understand its specific features

Small world phenomenom, degree distribution, degree correlations, community structure, ...

A new student has been assigned to help with this task (Pavel Kocourek)

Possible future work


division of users into groups with similarcharacteristics can be used for contentrecommendation

Association rule mining

for discovering direct or indirect relationships betweenweb pages in users' browsing behavior

Data needed

Actual rate of bouncing, time spent on the pages, etc. (Google Analytics?)

Clickstream data – browsing history of particular users (important for the main task)

Clickstream data assigned to the registered Palco users could be useful for mining as well

(privacy issues?)

Some other questions

To what extent is it possible to change the layout and content of Palco pages?

What does it take to change something? (time, people involved, …)

What algorithm is currently being used for content recommendation?

Is it possible to perform AB testing of the site with the old and a new layout/content?

Data overview

Data – statistics *Query No of results

users 68314

listeners 50663 (74%)

artists 17660 (26%)

friendship ties 168342 (avg 2.5)

mainstream bands 114062

fans 42228

comments 817370

tags 55018* February 2010

Data – users

User CreatedAt, LastLogin, IsActive

User_profile genre_id, Name, Slug, Category, Type, Culture,

ModerationStatus, About, CreatedAt, UpdatedAt

Artist MusicUploads, GooglePageRank,


Data – users

Friend InviterUserId, InvitedUserId, StatusId, CreatedAt,


Listener_data GenderId, BirthDate

Data - contact

Contact country_id, city_id, Place, Address, PhoneNumber,





Data – activities

Owner_activities ActionId, OwnerId, TargetId, CreatorId, CreatedAt,


Activity_stream_action ActionTag (new event added, band music added, ...),

PrivacyTagTitle, PrivacySubscriberTagTitle, ActionTitle

Data - music

Playlist listener_id, photo_id, Name, Description, Hash, Slug,

Position, CreatedAt, UpdatedAt

Playlist_music playlist_id, track_id, Position, CreatedAt

Track album_id, Name, CreatedAt, UpdatedAt, Complete,

Downloadable, Position, IdS3, FilenameS3, LinkS3, UploadedS3, Slug, Processed


Data – tags, comments

Tag Culture, Name, Hash, Slug, CleanSlug, Length,


Comments AuthorId, OwnerId, CommentableModel,

CommentableId, Aproved, Text, CreatedAt

Data – bands

mainstreamBand Summary, About, Name, CreatedAt, UpdatedAt,

LastFmLink, Hash, Slug

fans mainstreamBand_id, user_id

Albums artist_id, photo_id, Name, Description, ReleaseDate,

CreatedAt, UpdatedAt, Hash, Slug
