208
1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 http://www.wiwi.hu-berlin.de/~berendt/e valuation04/ Myra Spiliopoulou Bettina Berendt Ernestina Menasalvas

1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

Embed Size (px)

Citation preview

Page 1: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

1

Evaluation in Web Mining

Tutorial at ECML/PKDD 2004Pisa, Italy, September 20th, 2004http://www.wiwi.hu-berlin.de/~berendt/evaluation04/ Myra Spiliopoulou

Bettina Berendt

Ernestina Menasalvas

Page 2: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

2

The Presenters

Myra Spiliopoulou

Research group KMD: Knowledge Management & Discovery in Information Systems, Otto-von-Guericke-Universitaet Magdeburg

Bettina Berendt

Institute of Information Systems, Humboldt University Berlin, Berlin, Germany

Ernestina Menasalvas

Department of Computer Science, Facultad de Informática, Universidad Politécnica de Madrid, Spain

Page 3: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

3

Agenda

Part II: Web Mining as a Project

Part III: Evaluation Methods and Measures

Part IV: Case Study

Part V: Infrastructure for Web Mining Deployment

Part VI: Outlook

Part I: Foundations and Principles of Web Mining

Page 4: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

4

Part I:Foundations and Principles of Web Mining

A quick tour of Web usage mining

Motivation and Background

Page 5: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

5

•Despite its success, one problem of the current WWW is that much of this knowledge lies dormant in the data.

•Web mining tries to overcome these problems by applying data mining techniques to the content, (hyperlink) structure, and usage of Web resources.

Web Mining Areas

Web content mining

Web structure mining

Web usage mining

• Goals include

• the improvement of site design and site structure,

• the generation of dynamic recommendations,

• and improving marketing.

What is Web Mining?

Page 6: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

6

Application problems and goals (1)

Top-level goal 1: The Web exists in order to be used.

Evaluation focusses on usage.

Goals of usage depend on stakeholder and viewpoint.

Note:

There are other top-level goals, e.g. “The Web exists in order to allow new forms of access to knowledge“

These require a different evaluation focus.

In this tutorial, we focus on the above “top-level goal 1“.

Page 7: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

7

Application problems and goals (2)

Stakeholders

Site users

Site owners / sponsors (technical, marketing, management, ...)

Viewpoints: a Web site / a collection of Web sites or pages as ...

... a piece of software usability?

... a distribution channel for a business or organization profitability?; market analysis; recommendations for cross-selling; ...

... a collection of documents frequency of use / public perception?; competition analysis

... a medium for a given content and tasks (e.g., e-Learning) cf. distribution channel

... a Web of connections (e.g., a social network) what properties does the network have?

Page 8: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

8

Evaluation

Evaluation - act of ascertaining or fixing the value or worth of

(Programming) evaluation - The process of examining a system or system component to determine the extent to which specified properties are present.

( http://www.webster-dictionary.org/definition/evaluation )

Refine the definition:

the act of ascertaining the value of an object according to specified criteria, operationalised in terms of measures.

Page 9: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

9

measures / describes

Evaluation and data analysis – Data analysis for evaluation

Object of evaluation: a Web site

Criteria:

– quality as an interactive software,

– quality as a retailer‘s distribution channel,

– quality as a (mass) communication medium,

– ...

The measures include:

– usability metrics,

– business metrics

– ...

These measures´ values are derived from analysing the data.

data goal

mining procedure

Web resource

gives rise to

contributes to

uses

Page 10: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

10

Evaluation and data analysis – Evaluation of data analysis

Objects of evaluation: – one or more data analysis procedures

•specific algorithms or comprehensive software/information systems solutions

•existing / in place or suggested / planned

– patterns (results of a data analysis)

Criteria: quality and performance of a procedure; interestingness of a pattern

Measures include:– accuracy of a classification algorithm,

– impact of introducing a data mining software solution on tasks, resources, staffing

– interestingness measures

These measures´ values are derived from theoretical analyses and from the data.

measure / describedata goal

mining procedure

Web resource

gives rise to

contributes to

uses

measures / describes

patterns

outputs

Page 11: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

11

Forms of evaluation and their main foci

Purposeful, key informants in mining: interesting patterns

Random, probabilisticSampling

Exploratory, hypothesis generating pattern disc.

Confirmatory, hypothesis testing

Relationship to prior knowledge

Naturalistic inquiryExperimental designDesign

Holistic interdependent system

Independent and dependent variables

Conceptuali-sation

understand how something works

analyze strengths and weaknesses towards improvement, give feedback

assess concrete achievements

give results and evidence

Purpose

FormativeSummativeMode

Case studies, content and pattern analysis

Descriptive and inferential statistics

Analysis

Partly based on [Patt97]

Page 12: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

12

Part I:Foundations and Principles of Web Mining

A quick tour* of Web usage mining

Motivation and Background

* for a detailed introduction, see [SMB02]

Page 13: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

13

Web Usage Mining: Basics and data sources

Definition of Web usage mining:

discovery of meaningful patterns from data generated by client-server transactions on one or more Web servers

Typical Sources of Data

automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies

e-commerce and product-oriented user events (e.g., shopping cart changes, ad or product click-throughs, etc.)

user profiles and/or user ratings

meta-data, page attributes, page content, site structure This includes semantics / ontologies of site content and

services, cf. [BS00,OBHG03,BSH04].

Page 14: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

14

Preprocessing Pattern AnalysisPattern D iscovery

C ontent andStructure D ata

"Interesting"R ules, Patterns,

and S tatistics

R ules, Patterns,and S tatistics

PreprocessedC lickstream

D ata

R aw U sageD ata

The Web Usage Mining Process

Page 15: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

customers

ordersproducts

OperationalDatabase

ContentAnalysisModule

Web/ApplicationServer Logs

Data Cleaning /Sessionization

Module

Site Map

SiteDictionary

IntegratedSessionized

Data

DataIntegration

Module

E-CommerceData Mart

Data MiningEngine

OLAPTools

Session Analysis /Static Aggregation

PatternAnalysis

OLAPAnalysis

SiteContent

Data Cube

Basic Framework for E-Commerce Data Analysis

Web Usage and E-Business Analytics

Page 16: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

16ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"

© Myra Spiliopoulou, Bettina Berendt, Ernestina Menasalvas

Application problems and typicalpattern discovery techniques

Sequence mining

Sequence mining

Markov chainsMarkov chains

Association rules

Association rules

ClusteringClustering

Session ClusteringSession

Clustering

ClassificationClassification

Prediction of next eventPrediction of next event

Discovery of associated events/application objectsDiscovery of associated events/application objects

Discovery of visitor groups with common properties & interests

Discovery of visitor groups with common properties & interests

Discovery of visitor groups with common behaviourDiscovery of visitor groups with common behaviour

Characterization of visitors with respect to a set of predefined classes

Characterization of visitors with respect to a set of predefined classes

Card fraud detectionCard fraud detection

These are only some examples!For more infos , cf. [WebKDD]

Page 17: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

17

End of Part I

Questions thus far ?

Page 18: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

18

Further Readingson Part I (1)

[BS00] Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56-75.

[BSH04] Berendt, B., Stumme, G., & Hotho, A. (in press). Usage mining for and on the Semantic Web. In H. Kargupta, A. Joshi, K. Sivakumar, & Y. Yesha (Eds.), Data Mining: Next Generation Challenges and Future Directions (pp. 467-486). Menlo Park, CA: AAAI/MIT Press.

[OBHG03] Oberle, D., Berendt, B., Hotho, A., & Gonzalez, J. (2003). Conceptual user tracking. In E.M. Ruiz, J. Segovia, & P.S. Szczepaniak (Eds.), Web Intelligence, First International Atlantic Web Intelligence Conference, AWIC 2003, Madrid, Spain, May 5-6, 2003, Proceedings (pp. 155-164). Berlin: Springer, LNCS 2663.

Page 19: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

19

Further Readingson Part I (2)

[SMB02] Spiliopoulou, M., Mobasher, B., & Berendt, B. (2002). Web Usage Mining for E-Business Applications. Tutorial at the 13th European Conference on Machine Learning (ECML'02) / 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02), Helsinki, Finland, 19 August 2002. http://ecmlpkdd.cs.helsinki.fi/pdf/berendt-2.pdf

[WebKDD] WebKDD Workshop series at SIGKDD.

http://www.wiwi.hu-berlin.de/~myra/WEBKDD99

http://robotics.stanford.edu/~ronnyk/WEBKDD2000

http://robotics.stanford.edu/~ronnyk/WEBKDD2001

http://db.cs.ualberta.ca/webkdd02

http://db.cs.ualberta.ca/webkdd03

http://maya.cs.depaul.edu/webkdd04

Page 20: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

20

Agenda

Part II: Web Mining as a Project

Part III: Evaluation Methods and Measures

Part IV: Case Study

Part V: Infrastructure for Web Mining Deployment

Part VI: Outlook

Part I: Foundations and Principles of Web Mining

Page 21: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

21

A Project-Oriented View to Web Mining

Web Mining is a resource-intensive process. Its execution requires:

Objectives

Personnel

Time schedule and milestones

Budget

Reporting

Quality control

as is typical for projects.

Project management has delivered many results we can built upon.Here, we concentrate on:

• Data Mining projects• IT projects• CRM projects

Project management has delivered many results we can built upon.Here, we concentrate on:

• Data Mining projects• IT projects• CRM projects

Page 22: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

22

Part II:Web Mining as a Project

Project Management Methods

Models for Data Mining Process Management

Cost Estimation

The CRISP-DM Reference Model

Page 23: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

23

Cross-Industry Standard Process for Data Mining:CRISP-DM

Page 24: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

24

CRISP-DM

CRISP-DM has been an international project, funded by the EU,intended to

develop a knowledge discovery processthat is neutral with respect to/independent ofindustries, tools and applications.

CRISP-DM became a standard process model for data mining.

CRISP-DM is supported and promoted by

data mining software vendors

practitioners in data mining and in data warehousing

CRISP-DM has a special interest group, in which vendors, consultants and practitioners are involved.

Page 25: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

25

CRISP-DMand Web Mining

CRISP-DM has been an international project, funded by the EU,intended to

develop a knowledge discovery processthat is neutral with respect to/independent ofindustries, tools and applications.

Web Mining is for institutions that need to process Web data: companies authorities non-governmental organizations institutions professing in teaching/learning research institutions

Applications include CRM Process optimisation Education/Training Business intelligence Cybercrime prevention

Page 26: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

26

The CRISP-DM Process

The CRISP-DM process is

a non-ending circle of iterations

a non-sequential process, where backtracking at previous phases is usually necessary

Here is a sequential instantiation:

Business Understanding

Data Understanding

DataPreparation

Modeling Evaluation DeploymentBusiness Understanding

Page 27: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

27

Evaluation andBusiness Understanding

Evaluation requires a well-defined notion of success, which must be in place before

the evaluation takes place

the data mining phase starts

any work with the data starts

i.e. already during the business understanding process.

Page 28: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

28

Business Understandingin the CRISP-DM Process

Business UnderstandingBusiness Understanding

Determine Business

Objectives

Assess Situation

Determine Data

Mining Goals

Produce Project

Plan

Background

Business Objectives

Business Success Criteria

Inventory &

Resources

Reqs, Assumptio

ns &Constrain

ts

Risks & Contin-gencies

Terminology

Costs & Benefits

Data Mining Goals

Data Mining Success Criteria

Project Plan

Initial Assessment of

Tools & Techniques

Page 29: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

29

Our focus upon the Business Understandingin the CRISP-DM Process

Business UnderstandingBusiness Understanding

Determine Business

Objectives

Assess Situation

Determine Data

Mining Goals

Produce Project

Plan

Background

Business Objectives

Business Success Criteria

Inventory &

Resources

Reqs, Assumptio

ns &Constrain

ts

Risks & Contin-gencies

Terminology

Costs & Benefits

Data Mining Goals

Data Mining Success Criteria

Project Plan

Initial Assessment of

Tools & Techniques

Business Success Criteria

Data Mining Goals

Business Objective

s

Costs & Benefits

Data Mining Success Criteria

Page 30: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

30

Business Understanding,objectives and success (1)

Business objectives:

What is the customer's primary objective? E.g.– Increase the lifetime value of valuable customers

– Maximize the revenue from online course material

– Help students learn better using online course material

– Optimise the process of information extraction for the department X (human-genome research department, competitive intelligence department, security department)

– Minimise credit card fraud

These are different

objectives

This is not equivalent to

minimising the number of credit

card frauds !

Page 31: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

31

Business Understanding,objectives and success (2)

Business objectives:

What is the customer's primary objective? E.g.– Increase the lifetime value of valuable customers

– Help students learn better using online course material

– Minimise credit card fraud

Business success criteria:

What constitutes a successful outcome of the project? E.g.– Reduction of customer churn

– Students learning online get on average the same notes than those attending classes

– Reduction of the expenses caused by credit card fraud

Page 32: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

32

Business Understanding,objectives and success (3)

Business objectives:

What is the customer's primary objective?

Business success criteria:

What constitutes a successful outcome of the project?

Costs & Benefits:

Perform a cost-benefits analysis– Compute the benefits of the project, if it is successful

– Compute the costs of the project (equipment, human resources...)

– Quantify the risk that the project fails

– Juxtapose those values

Page 33: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

33

Business Understanding,objectives and success (4)

Business objectives:

What is the customer's primary objective?

Business success criteria:

What constitutes a successful outcome of the project?

Costs & Benefits:

Perform a cost-benefits analysis

Data mining goals:

Translate the customer's primary objective into a data mining goal, e.g.

– Increase purchases due to cross- and up-sales

– Build a cost-based prediction model for credit card fraud

Page 34: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

34

Business Understanding,objectives and success (5)

Business objectives:

Business success criteria:

Costs & Benefits:

Data mining goals:

Translate the customer's primary objective into a data mining goal, e.g.

– Increase purchases due to cross- and up-sales

– Build a cost-based prediction model for credit card fraud

Data mining success criteria:

Determine success in technical terms, e.g.– Translate the notion of increase in up-sales to statistics associated with

the confidence, support and lift of association rules

– Build a classification cost model, assigning costs/weights to true/false negatives and positives.

Page 35: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

35

Part II:Web Mining as a Project

Project Management Methods

Models for Data Mining Process Management

Cost Estimation

The CRISP-DM Reference Model

• Waterfall model• Spiral model• RUP• XP• CommonKADS

Page 36: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

36

Project Management

Why? In order to organize the process of develpoment and to produce a project plan

How?

Establish how the process is going to be develop:

Sequential

Incremental

What

Establish how is the process is splitted into phases and define the tasks to be developed in each step:

RUP

XP

COMMONKADS

Data Mining is also a process

LIFECYCLE MODELS

METHODOLOGY

•Way of making things

• Independent of the process being developed

•Particular tasks

• Detail of tasks to be developed

Page 37: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

37

Sequential Lifecycle: Waterfall Model

Traditional model for software development processes in the large scale

Sequential steps in which well defined tasks output the final required piece of software.

Each phase is connected with the next one by means of its outputs

Usually highly structured, with a fixed sequence of activities

Drawbacks:

Each task has to be completely finished at the end of it.

Risks are not properly dealt

Data Mining is iterative process

Maybe not appropriate

For Data Mining

Page 38: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

38

Project ManagementWaterfall

Requirements Analysis

Requirements Analysis

DesignDesign

Implementation and Unit Test

Implementation and Unit Test

System Integration and Testing

System Integration and Testing

Operation and Maintenance

Operation and Maintenance

Integrated System

Implemented Components

Components’ Architecture and Design

Software’s Specification

Too structured

Feedback

Risks

Data Mining?

Page 39: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

39

Part II:Web Mining as a Project

Project Management Methods

Models for Data Mining Process Management

Cost Estimation

The CRISP-DM Reference Model Waterfall model• Spiral model• RUP• XP• CommonKADS

Page 40: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

40

Progressive Lifecycle: Spiral Model

Iterative risk-driven process model generator:

cyclic approach for incrementally growing a system's degree of definition and implementation while decreasing its degree of risk.

is a set of anchor point milestones for ensuring stakeholder commitment to feasible and mutually satisfactory system solutions.

Improvement over the traditional waterfall model as it is able to deal with changes witouht affecting the previous outputs of the process

Incorporates quality goals and risks management

Dealing with new requirements is less costly than in the case of waterfall model

CRISP-DM appropriate

Data Mining process features

Page 41: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

41

Spiral Model

Simulations, models, benchmarks

Operations Concepts

Requirement validation

Design validation & verification

Product design

Requirements Detailed

design

Unit test

Code

Integration testAcceptanc

e testImplementation

Prototype 1

Prototype 2

Prototype 3

Final Prototype

Risk analysis

Risk analysis

Risk analysis

Risk analysis

Requirements plan

Life cycle plan

Development plan

Integration and test plan

Review

PlanningPlan next phase

Development and validation Develop, verify next level product

Objective SettingDetermine objectives, alternatives, constraints

Risk assessment and reductionEvaluate alternatives, identify and resolve risk

Life cycle apropriate for Data Mining

Page 42: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

42

Part II:Web Mining as a Project

Project Management Methods

Models for Data Mining Process Management

Cost Estimation

The CRISP-DM Reference Model Waterfall model Spiral model• RUP• XP• CommonKADS

Page 43: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

43

Project Management Methodologies

Software Methodologies:

RUP

XP

COMMONKADS

Data Mining Methodologies:

CRISP-DM

CRM-Catalyst

Knowledge Intensive

Not Knowledge

Intensive

•Not Real Methodology

•Model Process

Include Cost Estimation

Cost Estimation is needed

Page 44: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

44

RUP®:Rational Unified Process

Complete software-development process framework that comes with several out-of-the-box instances.

Architecture centered model that in a iterative and incremental way makes it possible to develop a software product of any scale or size.

Outputs of each iteration can be components, modules, of any software part that will be integrated in the next iteration in order to fulfil the final product at the end.

Appropriate for Web Mining projects in which:

– Requirements change as a consequence of already obtained patterns

– Outputs (patterns) of each step integrate the global solution

Page 45: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

45

RUP®:Rational Unified Process

The phases of a RUP-based project are:

Inception

Elaboration

Construction

Transition.

Each phase contains one or more iterations.

In each iteration, you expend effort in various amounts to each of several disciplines (or workflows) such as Requirements, Analysis and Design, Testing, and so forth.

The key driver for RUP is risk mitigation.

Can be integrated with CRISP-DM

Page 46: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

46

RUP flow through a typical iteration

Page 47: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

47

RUP in the enterprise

Page 48: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

48

Part II:Web Mining as a Project

Project Management Methods

Models for Data Mining Process Management

Cost Estimation

The CRISP-DM Reference Model Waterfall model Spiral model RUP• XP• CommonKADS

Page 49: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

49

Methods for Project Management: XP

XP is a lightweight code-centric process for small projects.

Kent Beck: and came to the software industry’s attention on the C3 payroll project at Chrysler Corporation around 1997.

Like the RUP, it is based upon iterations that embody several practices such as Small Releases, Simple Design, Testing, and Continuous Integration.

The required speed in the software generation makes a quick contact with the product possible. Consequently changes are possible with a low degree of risk in the final product.

For a small project team working in a relatively high-trust environment where the user is an integral part of the team XP can work very well.

Small Data Mining Projects

Expert Development Team

Page 50: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

50

A typical XP lifecycle

Page 51: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

51

XP vs RUP for Web Mining Projects

Different philosophies:

RUP is a framework of process components, methods, and techniques that you can apply to any specific software project; we expect the user to specialize RUP.

XP, on the other hand, is a more constrained process that needs additions to make it fit a complete development project.

These differences explain the perception of community:

the big system people see RUP as the answer to their problems

the small system community sees XP as the solution to their problems.

The philosophy and way of acting can be used in web mining project processes:

The size of the project has to establish the way of acting

Depending on the expertise of the development team

Page 52: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

52

Part II:Web Mining as a Project

Project Management Methods

Models for Data Mining Process Management

Cost Estimation

The CRISP-DM Reference Model Waterfall model Spiral model RUP XP• CommonKADS

Page 53: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

53

CommonKADSfor knowledge management projects

CommonKADS:

is a methodology for the design and implementation of knowledge management projects.

is designed as a process, consisting of tasks that must be executed and milestones, where decisions must be taken.

encompasses many aspects of project management, including the specification of objectives and milestones, involvement of key personnel, feasibility testing and budgeting

Web mining:

is intended to discover knowledge from web-related data

builds upon background knowledge, owned by key personnel

should be designed as project with goals, milestones and budget

Page 54: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

54

CommonKADS:The role of knowledge systems

The typical role of a Knowledge System should be that of an intelligent assistant.

Automation is not an appropriate objective.

The tasks under observation are usually too complex for modeling, let alone automation.

Process improvement is a more appropriate objective.

Page 55: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

55

CommonKADS:The knowledge modeling process

Step 1: Scoping and Feasibility Study

Tool: Organization model of CommonKADS (OM)

Step 2: Impact and Improvement Study

Tools: Task Model (TM) and Agent Model (AM)intended to zoom-in/refine the OM

whereby:

Each study consists of– an analysis part

– a "constructive", decision-making part

Page 56: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

56

CommonKADS:Analysis and Synthesis in Step 1

Scoping and Feasibility Study:

Step 1a: Analysis

Identify problem/opportunity areas and potential solutions

Put them into a wider organizational perspective

Step 1b: Synthesis

Decide about economic, technical and project feasibility

Select the most promising focus area and target solution

Page 57: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

57

CommonKADS:Analysis and Synthesis in Step 2

Impact and Improvement Study:

Step 2a: Analysis

Study interrelationships between the task, the agents involved, and the use of knowledge for successful performance

Identify improvements that may be achieved

Step 2b: Synthesis

Decide about organizational measures and task changes

Ensure organizational acceptance and integration of a knowledge system solution

Page 58: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

59

CommonKADS:Overall process of business analysis

OM-1 :problemssolutionscontext

OM-2 :description

of organization focus area

OM-3 :process

breakdown

OM-4 :knowledge

assets

OM-5 :Judge

Feasibility (Decision Document)

TM-1 :

Task analysis

TM-2 :

Knowledge item analysis

AM-1 :

Agent model

REFINE

REFINE

START

STOPINTEGRATE

INTEGRATE

[If feasible]

[If NOT feasible]

Integrate, comparing both the old and the new

situations

OTA-1 :Assets, Impacts

and Changes(Decision Document)

STOP

Context Analysis Ready

Page 59: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

60

CommonKADS:Worksheet OM-1

Organization Model

Problems and Opportunities Worksheet OM-1

Problems and opportunities

Make a shortlist of perceived problems and opportunities, based on interviews, brainstorm and visioning meetings, discussions with managers, etc.

Organizational context

Indicate in a concise manner key features of the wider organizational context, so as to put the listed opportunities and problems into proper perspective. Important features to consider are:

1. Mission, vision, goals of the organization

2. Important external factors the organization has to deal with

3. Strategy of the organization

4. Its value chain and the major value driversSolutions List possible solutions for the perceived problems and

opportunities, as suggested by the interviews and discussions held, and the above features of the organizational context

Page 60: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

61

Organization Model

Problems &

Opportunities

GeneralContext

(Mission,Strategy,

Environment,CSF's,...)

PotentialSolutions

OM-1 OM-2

OrganizationFocus AreaDescription:

Structure

Process

People

Culture & Power

Resources

Knowledge

OM-3 OM-4

ProcessBreakdown

KnowledgeAssets

Page 61: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

62

CommonKADS:Overall process of business analysis

OM-1 :problemssolutionscontext

OM-2 :description

of organization focus area

OM-3 :process

breakdown

OM-4 :knowledge

assets

OM-5 :Judge

Feasibility (Decision Document)

TM-1 :

Task analysis

TM-2 :

Knowledge item analysis

AM-1 :

Agent model

REFINE

REFINE

START

STOPINTEGRATE

INTEGRATE

[If feasible]

[If NOT feasible]

Integrate, comparing both the old and the new

situations

OTA-1 :Assets, Impacts

and Changes(Decision Document)

STOP

Context Analysis Ready

Page 62: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

63

CommonKADS:OM-5 for decision taking (1)

Organization model

Checklist for Feasibility Decision Document: Worksheet OM-5

Business

feasibility For a given problem/opportunity area and a suggestedsolution, the following question have to be answered:

1.What are the expected benefits for the organization form the considered solution? Both tangible economic and intangible business benefits should be identified here.

2.How large is this expected added value?3.What are the expected costs for the considered

solution?4.How does this compare to possible alternative

solutions?5.Are organizational changes required?6.To what extent are economic and business risks

and uncertainties involved regarding the considered solution direction?

Page 63: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

64

CommonKADS:OM-5 for decision taking (2)

Organization model

Checklist for Feasibility Decision Document: Worksheet OM-5

Technical feasibility

For a given problem/opportunity area and a suggested solution,the following questions have to be answered:1. How complex, in terms of knowledge stored and reasoning

processes to carried out, is the task to be performed by the considered knowledge-system solution? Are state-of-the-art methods and techniques available and adequate?

2. Are there critical aspects involved, relating to time, quality, needed resources, or otherwise? If so, how to go about them?

3. Is it clear what the success measures are and how to test for validity, quality, and satisfactory performance?

4. How complex is the required interaction with end users (user interfaces)? Are state-of-the-art methods and techniques available an adequate?

5. How complex is the interaction with other information systems and possible other resources (interoperability, systems integration)? Are state- of-the-art methods and techniques available an adequate?

6. Are there further technical risks and uncertainties?

Page 64: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

65

CommonKADS:Overall process of business analysis

OM-1 :problemssolutionscontext

OM-2 :description

of organization focus area

OM-3 :process

breakdown

OM-4 :knowledge

assets

OM-5 :Judge

Feasibility (Decision Document)

TM-1 :

Task analysis

TM-2 :

Knowledge item analysis

AM-1 :

Agent model

REFINE

REFINE

START

STOPINTEGRATE

INTEGRATE

[If feasible]

[If NOT feasible]

Integrate, comparing both the old and the new

situations

OTA-1 :Assets, Impacts

and Changes(Decision Document)

STOP

Context Analysis Ready

Page 65: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

66

CommonKADS:OTA-1 for decision taking (1)

Org. Task, Agent Models

Worksheet OTA-1: Checklist for Impact and Improvement Decision Document

Impacts and Changes in Organization

Describe which impacts and changes the considered knowledge system solution brings with respect to the organization, by comparing the differences between the organization model (worksheet OM-2) in the current situation, and how it will look in the future. This has to be done for all (variant) components in a global fashion (specific aspects for individual tasks or staff members are dealt with below).1. Structure2. Process3. Resources4. People5. Knowledge6. Culture & power

Page 66: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

67

CommonKADS:OTA-1 for decision taking (3)

Org. Task, Agent Models

Worksheet OTA-1: Checklist for Impact and Improvement Decision Document

Attitudes and Commitments

Consider how the individual actors and stakeholders involved will react to the suggested changes, and whether there will be a sufficient basis to successfully carry through these changes

Proposed actions

This is the part of the impacts and improvements decision document that is directly subject to managerial commitment and decision-making. It weights and integrates the previous analysis results into recommended concrete steps for action:1. Improvements: What are the recommended changes, with respect to the organization, as well as individual tasks, staff members, and systems?2. Accompanying measures: What supporting measures are to be taken to facilitate these changes (e.g., training, facilities)3. What further project action is recommended with respect to the undertaken knowledge system solution?4. Expected results, costs, benefits: reconsider items from the earlier feasibility decision document5. If circumstances inside or outside the organization change, under what conditions is it wise to reconsider the proposed decisions?

Page 67: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

68

Part II:Web Mining as a Project

Project Management Methods

Models for Data Mining Process Management

Cost Estimation

The CRISP-DM Reference Model

Page 68: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

69

Approaches

Vendor independent:

CRISP-DM

Based on the commercial tools:

CAT’s

SEMMA

CRM Methodology:

CRM Catalyst

Model Process

Not Real Methodology

Based on Crisp-DM

Globlal CRM process

Does not concentrate on Data Mining step

Page 69: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

70

Web Mining as a project: CATs

CATs :Clementine Application Templates : [CATs]

Specific libraries of best practices that provide inmediate value right out of the box

Following the CRISP-DM standard. Every CAT stream is assigned to a CRISP-DM phase

They provide long term value as they can always be used with a new data set for new insight in other projects.

Page 70: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

71

What is a CAT?[CATs]

Page 71: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

72

Examples of CATs[CATs]

The best practice templates, available as an add-on module to Clementine, include:

Telco CAT - improve retention and cross-selling efforts by leveraging our knowledge of the best data mining practices for telecommunications

CRM CAT - understand and predict customer migration between segments, so you understand how to move customers into more profitable segments and reduce the risk of attrition

Microarray CAT - accelerate biological discoveries, find genes for therapeutic targets, classify diseases based on genes and predict outcomes and find or refine biological classes

Fraud CAT - predict and detect instances of fraud in financial transactions, claims, tax returns …

Page 72: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

73

CATs for Web Mining[CATs]

Web CAT — help to discover clickstream sequences, access and merge Web log data, make recommendations based on visit profiles .

Includes modules for:

Cleaning and sessionizing Web logs

Enriching and Combining Web logs

Creating visit records and modeling visits

Creating visitor records and modeling visitors

Discovering product associations

Performing sequence analysis

Augmenting Web logs

Page 73: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

74

Web Mining as a project: SEMMA(1)

SEMMA (Sample, Explore, Modify, Model, Assess):[SEMMA]

Is not a data mining methodology

Rather a logical organization of the functional tool set of SAS Enterprise Miner for carrying out the core tasks of data mining.

Enterprise Miner can be used as part of any iterative data mining methodology adopted by the client.

Naturally steps such as formulating a well defined business or research problem and assembling quality representative data sources are critical to the overall success of any data mining project.

Page 74: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

75

Web Mining as a project: SEMMA(2)

SEMMA is focused on the model development aspects of data mining:[SEMMA]

Sample the data to extract a portion of a large data set big enough to contein significant information, yet small to manipulate quickly.

Explore the data by searching for anticipated trends and anomalies in order to gain understanding and ideas.

Modify the data by creating selecting and transforming the variables to focus the model selection problem.

Model the data allowing the software to search automatically for a combination of data that reliably predicts a desired outcome. Modelling techniques include neural networks, tree-clasiffiers, statistical models, etc.

Assess the data by evaluating the usefulness and reliability of the findings from the data mining process and estimate how well it performs.

Page 75: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

76

Web Mining as a project: SEMMA(3)

Iterative process:

by assessing the results gained from each stage of the SEMMA process, you can determine how to model new questions raised by the previous results, and thus proceed back to the exploration phase for additional refinement of the data.

Deployment:– once the champion model is developed, it then needs to be

deployed.

– It is the final phase in which the ROI from the mining process is realized.

Progressive Lifecycle model

Evaluation?

Page 76: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

77

Methods for Project Management:CRM Catalyst(1)

Developed jointly by CustomISe, MACS and SalesPathways. Together they have formed the Catalyst Foundation http://www.crmmethodology.com/

Motivations:

CRM projects are difficult to execute successfully because of the wide range of factors influencing their success. So it can take a long time to make CRM work properly for an organisation.

Solution: CRM Catalyst.

Methodology acts as a catalyst for CRM projects enabling them to achieve their objectives more reliably and in less time.

It gives a project life cycle with a set of defined phases broken down into steps with clearly stated inputs and outputs.

Page 77: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

78

Data Mining Project Management: CRM Catalyst(1)

The five mayor phases are:

Discovery: Establishing the business goals for CRM

Orientation. Defining necessary system and organisational (specific technical solutions) changes to meet the goals. This leads to a definition of top-level system requirements.

Navigation. The CRM system requirements are defined more precisely, the system is scoped, system and vendor assessment criteria are defined and a system is selected and contracted.

Implementation. Planning and managing the CRM project. It is during this phase that the system is built and put into use.

Post implementation. Monitoring performance and continuous improvement since CRM project never ends because CRM must constantly evolve to keep pace with the changing business and its environment.

Page 78: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

79

Methods for Project Management: CRM Catalyst(3)

Implementation requires

Data Mining development process

Implementation is Knowledge intensive

The resutls are obtained in a progressive way

Progressive Lifecycle Model

In some steps Knowledge Intensive Methdology could be appropriate

Page 79: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

80

Part II:Web Mining as a Project

Project Management Methods

Models for Data Mining Process Management

Cost Estimation

The CRISP-DM Reference Model

Page 80: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

81

Estimation Process

Determine Objectives. Who needs what data for what purpose(s)

Gather Data. Focus Should Be Given To ‘Hard’ Data

Well-Defined Requirements

Available Resources

Analyze Data using a variety of methods

Re-estimate Costs throughout the project

Effective Monitoring

Refine and Make Changes As Necessary

Compare end Costs with Estimated Costs.

How It’s Done: Models,

Methods,

Tools

Cost Estimation is independent of the domain

Tecniques depend on the process to estimate its cost

Page 81: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

82

What is needed to cost estimation?

A Combination of Models, Methods, and Tools

Gathering/Improving of Historical Data

Well-defined and Well-controlled Software Development Processes

Better Managing of Requirements

Experienced Project Managers, Estimators, and Team Members

Everything can be translated to Data Mining Cost estimation

Page 82: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

83

Software cost estimation methods[JPHGL]

Algorithmic methods. Designed to provide some mathematical equations to perform software estimation. Models: COCOMO & COCOMO II, Putnam, ESTIMACS and SPQR/20.

Estimating by analogy. comparing the proposed project to previously completed similar project where the project development information is known.

Expert judgment method. Technique: Delphi technique, a group consensus technique. Empirical Subjective

Top-down method. A cost estimation is derived from the global properties of the software project, and then the project is partitioned into various low-level components

Bottom-up method. The cost of each software components is estimated and then combines the results to arrive at an estimated cost of overall project.

Page 83: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

84

How to select the estimation method

No one method is necessarily better or worse than the other

Use several techniques or cost models, compare the results.

Document the assumptions made when preparing the project plan.

Monitor the project to detect when assumptions that turn out to be wrong jeopardize the accuracy of the estimate.

Maintaining a historical database.

Can be translated to Data Mining estimation Process

Problem: Historical Project database

Page 84: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

85

COCOMO Model

COCOMO (COnstructive COst MOdel):

Model to estimate the development cost and schedule of a software project

Introduced by Barry Boehm of USC-CSE in 1981.

Primarily based on the software development practices prior to 1980s, (i.e. based on the Waterfall model)

Effort equation is the basis of the COCOMO II model.

The nominal effort equation of a project of a given size is given by the equation: [ PM(nominal) = A * (Size)B ]

PM(nominal) is the nominal effort in person months

A is the multiplicative effect of cost drivers

B is the constant representing the affect of scale factors

Page 85: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

86

COCOMO II

COCOMO II:

Improvement of the original COCOMO model.

The Scale Drivers in COCOMO II replace the development modes of COCOMO 81

Cost drivers revised

Cost Drivers.

– Are used in the model to adjust the nominal effort in the software project.

– Multiplicative factors required to determine the effort required to complete the software project.

– Ratings range from VL, L, N, H, VH, EH.

– Model has 17 cost drivers divided into 4 categories: Product, Computer, Personnel and Project.

Page 86: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

87

SW Cost Estimation Conclusion

Although lots of research has been done in the area of SW Cost Estimation, it’s not an exact science yet (and probably never will be).

What about Data Mining Cost estimation ?

No tool

No technique

No enough historical cases

Page 87: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

88

Data Mining Estimation Methods

Kleinberg (1999) [KT99]

Macroeconomic viewpoint

– Games theory

– Combinatory optimization problems

Masand (1996) [MP96]

Business Model: customer value

Domingos (1998) [D98]

Decision process to follow with a project in “Machine Learning” aplications

10 )1(t

tt

r

CCNPV

Page 88: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

89

Data Mining Estimation Model

Establishing a parametrical estimation model for Data Mining (Marban’03)

DMCOMO(Data Mining COst MOdel)

Page 89: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

90

Data Mining Cost Estimation

Main factors in a Data Mining project

Data Sources (number, kind, nature, …)

Data minig problem to be solved (descriptive, predictive, …)

Development platform

Available tools

Expertise of the development team

Drivers Data Drivers Model Drivers Platform Drivers

Tools and techniques Drivers Project Drivers People Drivers

Page 90: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

91

End of Part II

Questions thus far ?

Page 91: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

92

Further readingson Part II (1)

[B00] Kent Beck, Extreme Programming Explained, Addison-Wesley, 2000

[BF01] Kent Beck, Martin Fowler, Planning Extreme Programming, Addison-Wesley, 2001

[Betal.00] Barry W. Boehm et al, Software Cost Estimation with COCOMO II, Prentice Hall PTR, 2000

[D98] P. Domingos. How to Get a Free Lunch: A Simple Cost Model for Machine Learning Applications. Proceedings of the AAAI-98/ICML-98 Workshop on the Methodology of Applying Machine Learning (pp. 1-7), 1998. Madison, WI: AAAI Press.

[JAH01] Ron Jeffries, Ann Anderson, Chet Hendrickson, Extreme Programming Installed, Addison-Wesley, 2001

[JPGL] Jerrall J. Prakash, Harprit S. Grewal,Leo Chen. Software Cost Estimation

[K00] Philippe Kruchten, The Rational Unified Process, An Introduction, Second Edition, Addison-Wesley,2000

Page 92: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

93

Further Readingson Part II (2)

[KT99] J. Kleinberg, E. Tardos. Approximation Algorithms for Classification Problems with Pairwise Relationships: Metric Labeling and Markov Random Fields.” Proc. 40th IEEE Symposium on Foundations of Computer Science (1999), 14-23

[MP96] Masand, B., and Piatetsky-Shapiro, A comparison of approaches for maximizing business payoff of prediction models. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 195--201. Portland, OR: AAAI Press. New York, NY: Wiley. G. 1996. [RM01] Robert C. Martin, James W. Newkirk, Extreme Programming in Practice, Addison-Wesley, 2001

[Setal.02] Schreiber et al., Knowledge Engineering and Management – The CommonKADS Methodology, MIT Press, 2002

[SM01] Giancarlo Succi, Michele Marchesi, Extreme Programming Examined, Addison-Wesley, 2001

[VK94] Vidger, M.R. and Kark, A.W. Software Cost Estimation and Control. (1994). National Research Council Canada.

Page 93: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

94

Further Readingson Part II (3)

CATs of Clementine:

Clementine Application Templates: Get reliable results from Clementine —faster — with built-in best practices

http://www.spss.com/PDFs/CLMCATINS-0802.pdf.

CRISP-DM:

http://www.crisp-dm.org

CRM Catalyst

CRM Catalyst Methodology Description

http://www.crmmethodology.com/

SEMMA of SAS

http://www.sas.com/technologies/analytics/datamining/miner/semma.html

Page 94: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

95

Further Readingson Part II (4)

Data Mining standards

[AG03] ECML/PKDD-2003 Tutorial Knowledge Discovery Standards by Sarab Anand, Marko Grobelnik, Dietrich Wettschereck

COCOMO

Software Cost Estimation, Hong, Danfeng, The University of Calgary, Canada, 1998. http://pages.cpsc.ucalgary.ca/~hongd/SENG/621/report2.html

USC-CSE, COCOMO, 2002. http://sunset.usc.edu/research/COCOMOII/

COCOMO II Model Definition Manual. Abts, Chris, Brad Clark, Sunita Chulani, Ellis Harowitz, Ray Madachy, Don Reifer, Rick Shelby, Bert Steece. http://my.raex.com/FC/B1/phess/coco/Modelman.pdf

Software Cost Estimation in 2002. Jones, Capers (2002). STSC CrossTalk. http://www.stsc.hill.af.mil/crosstalk/2002/06/jones.html

Page 95: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

96

Agenda

Part II: Web Mining as a Project

Part III: Evaluation Methods and Measures

Part IV: Case Study

Part V: Infrastructure for Web Mining Deployment

Part VI: Outlook

Part I: Foundations and Principles of Web Mining

Page 96: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

97

Evaluation and data mining algorithms

Data mining algorithms are intended to extract non-trivial, actionable patterns from the data.

Evaluation is an essential part of algorithm design:

How good is the algorithm in finding non-trivial patterns?

How good is the algorithm in finding actionable patterns?

How good is the algorithm for the application?– Is it fast enough?

– Is it scalable enough? WRT data size? WRT feature space size?

– Is it robust enough?

Is the algorithm better than other algorithms?

Goal of the evaluation is to help the knowledge discoverer in finding a good (preferably the best) algorithm.

Page 97: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

98

Evaluation and implicit assumptions

Knowledge discovery is a GIGO process:

The quality of the data has a drastic influence on the quality of the patterns.

Robustness towards poor-quality data is an indicator of a good algorithm.However, no algorithm can completely compensate poor data quality.

Important:

A mining algorithm can extract patterns from any data.

implying that:

We should not attempt knowledge discovery upon inappropriate data.

Page 98: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

99

Evaluation in a non-Webmining example (1)

Problem specification:

An online shop wants to identify the characteristic properties of those transactions that involve a stolen credit card.

subject to the following facts:

If the shop permits a transaction with a stolen credit card, the shop will loose money.

If the shop prohibits a transaction with a valid credit card, the shop may loose the customer.

The process of checking if a credit card is stolen costs money.

Value of the transaction V1

Approximation of the revenue by the customer V2

V3

Page 99: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

100

Evaluation in a non-Webmining example (2)

Assuming that this problem is modelled as a classification problem:

A good pattern is a classifier that minimises the loss of money.

A good algorithm is a classification algorithm that– produces good classifiers,

– satisfies further application-specific criteria, e.g. understandability of the results or robustness against arbitrary skew.

How to build a bad dataset:– Oversample the positives without reporting the oversampling

– Aggregate the transactions at day level

ClassifierClassifierRealityReality

fraudulentfraudulentnot fraudulentnot fraudulent

true pos: -V3true pos: -V3 false neg: -V1false neg: -V1fraudulentfraudulent

not fraudulentnot fraudulent true neg:V1true neg:V1false pos: -V2-V3false pos: -V2-V3

Page 100: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

101

Evaluation in knowledge discovery

Goal of knowledge discovery is the extraction of non-trivial, actionable patterns from the data.

Evaluation plays a central role in knowledge discovery:

Evaluation towards non-triviality:Do the patterns tell us something new? Something we did not already know?

Evaluation towards actionability:Do the patterns tell us something, for which we can and want to design an action?

presupposing quality in statistical terms:How confident can we be about each pattern?

Goal of the evaluation is to supply the knowledge discoverer with good results.

Page 101: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

102

So, what do we evaluate?

In Web Mining, we evaluate:

The patterns (results) acquired with a web mining algorithm to solve a particular problem

The web mining algorithm in its capability of delivering good patterns to solve the problem

The data sources that deliver the data, upon which the web mining algorithm is applied to solve the problem:

– The Web site and its server

– The data warehouse of the Web site owner

– The external sources used for data enrichment

The environment, including tools and processes, in which the particular problem has emerged

Page 102: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

103

Part III:Evaluation methods and measures

Application-centric measures associated tothe application objectives

User-centric measures associated withWeb usability

Page 103: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

104

The object of evaluation: usability

The effectiveness, efficiency, and satisfaction with which specified users achieve specified goals in particular environments.

• Effectiveness: The accuracy and completeness with which specified users can achieve specified goals in particular environments.

• Efficiency: The resources expended in relation to the accuracy and completeness of goals achieved.

• Satisfaction: The comfort and acceptability of the work system to its users and other people affected by its use.

(ISO 9241, after Alan Dix, Janet Finlay, Gregory Abowd, Russell Beale. Human Computer Interaction. Prentice Hall Europe 1998. Cited after http://www.tau-web.de/hci/space/i7.html )

Page 104: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

105

Usability on the Web

Usability is a special concern on the Web because

“In product design and software design, customers pay first and experience usability later.

On the Web, users experience usability first and pay later.”

[Niel00, pp. 10f.]

Page 105: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

106

The criteria: design objectives

Learnability

The ease with which new users can begin effective interaction and achieve maximal performance.

Flexibility

The multiplicity of ways the user and system exchange information.

Robustness

The level of support provided to the user in determining successful achivement and assessment of goals.

(Dix et al., 1998, cited after http://www.tau-web.de/hci/space/x12.html )

Page 106: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

107

How can usability be measured? – Data collection methods

Data for usability testing are collected with different methods [Shne98, Jane99]:

Reactive methods– Expert reviews and surveys ask for attitudes /

assessments.

– Usability testing employs experimental methods to investigate behavior and self-reports.

Non-reactive methods– Based on data collection via Web log files

• To assess user behavior

• To simulate expected / measured user behavior [CPCP01]

Continuing assessments to parallel changes!

Issues: cost, practicality, expressiveness of results

Page 107: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

108

The measures: Examples of usability metrics (from ISO 9241, after [Dix et al., 1998])

Rating scale for error handling

Time spent on correcting errors

Percentage of errors corrected successfully

Error Tolerance

Rating scale for "ease of learning"

Time to learn criterion

Percentage of functions learned

Learnability

Rating scale for satisfaction with "power features"

Relative efficiency compared with an expert user

Number of "power features" used

Appropriate for trained users

Rating scale for satisfaction

Time to complete a task

Percentage of goals achieved

Suitability for the Task

Satisfaction Measures

Efficiency Measures

Effectiveness Measures

Usability Objective

Page 108: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

109

The measures: Examples of usability metrics (from ISO 9241, after [Dix et al., 1998])

Rating scale for error handling

Time spent on correcting errors

Percentage of errors corrected successfully

Error Tolerance

Rating scale for "ease of learning"

Time to learn criterion

Percentage of functions learned

Learnability

Rating scale for satisfaction with "power features"

Relative efficiency compared with an expert user

Number of "power features" used

Appropriate for trained users

Rating scale for satisfaction

Time to complete a task

Percentage of goals achieved

Suitability for the Task

Satisfaction Measures

Efficiency Measures

Effectiveness Measures

Usability Objective

the users‘ task / intentions Assumptions can be made if there is background knowledge about site and usersusers‘ level of expertise requires (1) target-group specific logins, (2) induction from requested content, or (3) other methods, usually involving reactive data collection Definitions of what there is to learn; measures of what the users learned usually requires methods involving reactive data collectionDefinition of what an error is, or what indicates an error usually requires a detailed knowledge of users‘ tasks and intentions, i.e. reactive data collection

Suitability for the Task

Page 109: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

110

Design decisions that influence usability

Page design concerns issues like:– Screen real estate, links, graphics+animation, cross-

platform design; content design (writing for hypermedia)

Site design / Information architecture concerns issues like:– Hierarchical / network-like content organization, metaphors

– Navigation

• Where am I? Where have I been? Where can I go?

• Navigation is user-controlled!

– Search engines

For “Top Ten Mistakes in Web Usability” and their development, see [Nielsen 1996, 1999, 2002, 2003]

Further issues: International audiences [KB04], personalized sites

Page 110: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

111

Principles of successful navigation

Navigation that works should [Flem98, pp. 13f.]

Be easily learned

Remain consistent

Provide feedback

Appear in context

Offer alternatives

Require an economy of action and time

Provide clear visual messages

Use clear and understandable labels

Be appropriate to the site’s purpose

Support users’ goals and behaviors

Page 111: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

112

Site-specific usability issues: Example I

Criterion: Navigation should “require an economy of action and time.”

Pages that are frequently accessed together should be reachable with one or very few clicks.[KNY00] compared the foll. measures:

page co-occurrence in user paths / support of 2-page itemsets(x axis)

hyperlink distance (y axis; -1 = distance > 5)

Results help to identify

linkage candidates (top right)

redundant links (bottom left)

Action: modify site design

Page 112: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

113

Site-specific usability issues: Example II

Criterion: Navigation should “support users’ goals and behaviors .”

Search criteria that are popular should be easy to find+use.

[BS00,Ber02] investigated search behavior in an online catalog with support of pages / of sequences as measures:

Search using selection interfaces (clickable map, drop-down menue) was most popular.

Search by location was most popular.

The most efficient search by location (type in city name) was not used much.

Action: modify page design.

Page 113: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

114

An important part of usability: accessibility

"The usability of a product, service, environment or facility by people with the widest range of a capabilities„ (ISO TS 16071, cited after http://www.usability-forum.com/bereiche/accessibility.shtml )

Web Content Accessibility Guidelines 1.0 – W3C Recommendation 5-May-1999

EU Commission adopted the Communication 'eEurope 2002: Accessibility of Public Web Sites and their Content' (Sept. 2001): http://europa.eu.int/information_society/topics/citizens/accessibility/web/wai_2002/text_en.htm

USA: 1998 „Section 508“ (1998)

Japan: WCAG 1.0

Page 114: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

115

Web Content Accessibility Guidelines 1.0 http://www.w3.org/TR/1999/WAI-WEBCONTENT

„These guidelines explain how to make Web content accessible to people with disabilities. ...

The primary goal of these guidelines is to promote accessibility. However, following them will also

make Web content more available to all users,

whatever user agent they are using (e.g., desktop browser, voice browser, mobile phone, automobile-based personal computer, etc.)

or constraints they may be operating under (e.g., noisy surroundings, under- or over-illuminated rooms, in a hands-free environment, etc.).

Following these guidelines will also help people find information on the Web more quickly.

These guidelines do not discourage content developers from using images, video, etc., but rather explain how to make multimedia content more accessible to a wide audience.“

Page 115: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

116

Web Content Accessibility Guidelines 1.0 http://www.w3.org/TR/1999/WAI-WEBCONTENT

1.Provide equivalent alternatives to auditory and visual content.

2.Don't rely on color alone.

3.Use markup and style sheets and do so properly.

4.Clarify natural language usage

5.Create tables that transform gracefully.

6.Ensure that pages featuring new technologies transform gracefully.

7.Ensure user control of time-sensitive content changes.

8.Ensure direct accessibility of embedded user interfaces.

9.Design for device-independence.

10.Use interim solutions.

11.Use W3C technologies and guidelines.

12.Provide context and orientation information.

13.Provide clear navigation mechanisms.

14.Ensure that documents are clear and simple.

Page 116: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

117

Mining for usability assessment: Caveats for interpretation

Care should be taken when interpreting Web log data as indicative of users’ experience with the site:

+ Users act in a natural environment, and in a natural way.

– little or no control of variables that may influence behavior:

User intentions and intervening factors (work environment, …)

Context (e.g., online + offline competition, market developments)

Often, several characteristics of the site are changed simultaneously, e.g., product offerings and page design.

Causality is hard to assess!

Use mining as an exploratory method, to be complemented by other methods that allow for more control.

Page 117: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

118

Part III:Evaluation methods and measures

Application-centric measures associated tothe application objectives

User-centric measures associated withWeb usability

Page 118: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

119

The notion of"good" Web site

The objective of a Web site is NOT

the maximisation of the number of visitors accessing it

the prolongation of the visitors' stay time

the inspection of a maximum number of pages/items/products

the satisfaction of the visitors

In general, the (abstract) objective of a Web site is

the contribution to the business objectives of its owner

with respect to the target groups accessing it

in a cost-effective way.

The "success" of a Web site is a measure of the degree, in which the site satisfies its objective.

... but this often a prerequisite for site success.

Page 119: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

120

What does Success mean?

Before talking of success:

Why does the site exist?

Why should someone visit it?

Why should someone return to it?

After answering these questions:

Does the site satisfy its owner?

Does the site satisfy its users?

ALL the users?

Business goals

* Value creation

Sustainable value

User-centric measures

Application-centric measures

User types

* Value creation: [Kuhl96]

Page 120: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

121

Business goals of a site (I)

1. Sale of products/services on-lineAmazon sells books (etc) online.The site should help the users find the most suitable books for their needs, identify more related products of interest and, finally purchase them in a secure and intuitive way.

Personalisation

Cross/Up-SellingSite design

2. Marketing for products/services to be acquired off-line

Insurances, banks, application service providers etc: providers of services based on a long-term relationship with the customer do not sell on-line to unknown users.The site should demonstrate to the users the quality of the product/service and the trustworthiness of its owner and initiate an off-line contact.

Page 121: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

122

Business goals of a site (II)

3. Reduction of internal costsSome banks offer online banking. Some insurances support case registration online. This reduces the need for human-preprocessing and the likelihood of typing errors.The site should help the users locate and fill the right forms and submit them in a secure and intuitive way.

4. Information disseminationGoogle, IMDB etc offer information by means of a search engine over a voluminous archive of high quality data.The site should help the users find what they search for, ensure them upon the quality (precision and completeness) of the information provided, and also motivate them to access the products/services of the sponsors.

Page 122: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

123

From business goals toapplication-centric measures

Business venues evaluate their achievements on the basis of industry- and application-specific measures.Some of these measures have been adapted for Web applications:

Marketing, sales & after-sales support– e-Marketing measures for online sales of products/services

– e-Marketing measures for commodities that are sold offline

Operations and process optimisation

Security

whereby some e-measures are adjusted to other business goals.

Page 123: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

124

A process overviewfor sales of products/services

The interaction of the potential customer with the company goes through three phases:

InformationAcquisition

InformationAcquisition

Negotiation&

Transaction

Negotiation&

Transaction

After Sales

Support

After Sales

Support

The ratio of persons going from one phase to the next is the basis for a set of positive and negative measures:

ContactConversion

Retention

AbandonmentAttrition Churn

Page 124: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

125

The "Customer Life Cycle Funnel"of Cutler & Sterne [CS00]

Cutler & Sterne introduce the notion of "customer life cycle funnel" across the phases of acquisition, persuasion and conversion [CS00]:

Each phase encompasses different measures.

Ineffective measures of one phase lead to bottlenecks, which limit the effectiveness in subsequent phases.

In particular:

1. Ineffective measures in the acquisition phase:Untargeted promotions that attract the wrong people.

2. Ineffective measures in the persuasion phase:The targeting during acquisition is good but the persuasion is ineffective.

3. Ineffective measures in the conversion phase:Good targeting and good persuasion but poor conversion.

© http://www.targeting.com/emetrics.pdf© http://www.targeting.com/emetrics.pdf

Page 125: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

126

From the Funnel to the Hourglass [Ste03]

The complete interaction path between person and company encompasses Reach, Acquisition, Conversion and Retention[CS00].

After the conversion phase,the retention of loyal customerspays up, because loyal customerscan become promoters of thecompany.

How to measure Loyalty ?

Recency of purchases

Frequency of purchases

Recommendation propensity

Product improvement synergy

Acquisition

Persuasion

Conversion

Interaction

Participation

Promotion

Page 126: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

127

Ten Supplementary Analyses:The proposal of BlueMartini

• Foundation and Data Audit Bot analysis Univariate data analysis

• Operational Session timeout analysis Form error analysis Micro-conversions analysis

• Tactical Search analysis Real-estate usage analysis Market basket analysis

• Strategic Analysis of migratory customers Geographical analysis

Page 127: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

128

What does Success mean?

Before talking of success:

Why does the site exist?

Why should someone visit it?

Why should someone return to it?

After answering these questions:

Does the site satisfy its owner?

Does the site satisfy its users?

ALL the users?

Business goals

Value creation

Sustainable value

User-centric measures

Application-centric measures

User types

Page 128: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

129

From the visitor to the loyal customer:The model of Berthon et al [BPW96]

Early realisation of the marketing measures for Web sites [BPW96]:

Conversion efficiency := Customers / Active investigators

Retention efficiency := Loyal Customers / Customers

whereby: Active investigators are visitors that stay long in the site. Customers are visitors that buy something. Loyal customers are customers that come to buy again.

Short-time visitors

Sit

e u

sers

Active InvestigatorsCustomers

Loyal Customers

Page 129: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

130

From the visitor to the loyal customer:The micro-conversion rates of Lee et al [LPSH01]

The model of Lee et al [LPSH01] distinguishes among four steps until the purchase of a product:

Product impression

Click through

Basket placement

Product purchase

and introduces micro-conversion rates for them:

look-to-click rate: click throughs / product impressions

click-to-basket rate: basket placements / click throughs

basket-to-buy rate: product purchases / basket placements

look-to-buy rate: product purchases / product impressions

A session is a set of click operations performed during one visit.Clicks leading to product impressions and those corresponding to basket placements and purchases are uniquely identified as such.

Page 130: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

131

The role of the sitein application-centric measures

Dreze & Zufryden [DZ97] define site efficiency in terms of: Number of page requests Duration of site visits (sessions)

Sullivan [Sul97] defines site quality in terms of: Response time Number of supported navigation modi Discoverability of a page:

Discovering that a certain page exists Accessibility of a page:

Finding the page, after discovering that it exists Pages per visitor Visitors per page

Page 131: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

132

Site-oriented measuresand business goals

Site-oriented measures are

statistics on the traffic of the Web site

values based on the characteristics of the site from a designer's perspective

trying to capture the user perception of the site, without asking the user.

They do not consider the owner's intentions, i.e. the business goals of the site.

Combination ofpurely customer-oriented andsite-oriented measures

Combination ofpurely customer-oriented andsite-oriented measures

Page 132: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

133

The e-metrics modelof Cutler & Sterne [CS00]

The e-metrics model of Cutler & Sterne [CS00] is designed to

compute values for customer-oriented measures, by

allowing for an application-dependent definition of concepts

– customer

– conversion

– loyalty

– customer lifetime value

and by

associating these concepts with site-oriented measures

upon regions of the site

with some emphasis on online merchandising.

Page 133: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

134

The e-metrics modelof Cutler & Sterne [CS00] (cntd.)

The e-metrics model of [CS00] encompasses:

Site-centric measures for regions of a site, including:

– Slipperiness := Stickiness

"Desirable value ranges" for each measure, depending on the purpose/objective of the region:

– A region used during information acquisition should be sticky.

– The pages accessed during the negotiation and transaction phase should be slippery.

Total time spent in the region

Number of visitors in the regionStickiness:=

Avg num of visited pages in the region

Number of pages in the regionFocus:=

How is a region defined ?

Page 134: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

135

Conversion metricsinside the site [SP01]

The model of [SP01] analyses conversion at page/concept level: Active investigator is a user that invokes an action page Customer is an active investigator that invokes a target page

whereby: Target page := any page corresponding to the fullfillment of

the site's objectives– purchase of a product– registration to a service

Action page := any page that must be visited before invoking a target page

– product impression– catalog search

so that rates like customer conversion and click-to-buy can be computed at the level of individual target/action pages or page concepts.

Page 135: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

136

Conversion metricsfor multi-channel customer conversion

[TB03,TBG03]

For a multi-channel retailer, customer conversion is composed of conversion associated with online purchases conversion effected through the acquisition of information

about non-online purchase opportunities,e.g. locations of brick-and-mortar stores

The information, pages, services associated with online vs offline purchase are mapped into page concepts.

1. A session is modelled as a vector in the feature space of the page concepts.

2. The concept value in the vector space can be– dichotomised: 0/1– weighted: number of visits

3. Conversion rate is defined across paths from a concept A to a concept B.

More in Part IV...More in Part IV...

Page 136: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

137

What does Success mean?

Before talking of success:

Why does the site exist?

Why should someone visit it?

Why should someone return to it?

After answering these questions:

Does the site satisfy its owner?

Does the site satisfy its users?

ALL the users?

Business goals

Value creation

Sustainable value

User-centric measures

Application-centric measures

User types

Page 137: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

138

User Segmentation

Truisms:

A site owner does not welcome all users equally.

A site cannot satisfy all users accessing it.

Hence, sites

are designed for some types of users

serve different user types to different degrees

User types are the result of:

User segmentation according to criteria of the site owner

User segmentation on the basis of personal characteristics

User segmentation with respect to recorded behaviour

Page 138: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

139

For each visitor:(1) assign her to the right segment asap(2) motivate her to move to a segment of higher revenue

For each visitor:(1) assign her to the right segment asap(2) motivate her to move to a segment of higher revenue

User SegmentationIn Predefined Business Segments

A company may partition its customers on the basis of

the revenue it obtains or expects from them the (cost of) services it must offer them to obtain the

revenue

There are different segmentation schemes, based on– the characteristics of the customers– the company portfolio

and producing a set of predefined classes.For a Web application this means:

2. Association rules for cross selling & up selling

3. Recommendations & Personalisation

1. Classification

Page 139: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

140

For each visitor:(1) assign her to the right segment asap(2) make suggestions based on the contents of the segment

For each visitor:(1) assign her to the right segment asap(2) make suggestions based on the contents of the segment

User SegmentationIn Unknown Segments

Web site visitors can be grouped on the basis of their interests, characteristics and navigational behaviour without assuming predefined groups.

There is much research on user groupingbased on

– the properties and contents of the objects being visited– the declared or otherwise known characteristics of the visitor– (the order of the requests)

For a Web application this means:

2. Recommendations & Personalisation

1. Clustering

Page 140: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

141

User Segmentationon navigational behaviour

Web site visitors exhibit different types of navigational behaviour.

Model I (simplistic):Some users navigate across links. Others prefer a search engine.

Model II [FGL+00]:

based on criteria like active time spent on-line and per page, pages and domains accessed etc.

Model III [Moe] for merchandising sites:

based on criteria like purchase intention, time spent on the site,number of searches initiated, types of pages visited etc.

SimplifiersSimplifiers SurfersSurfers BargainersBargainersConnectorsConnectors RoutinersRoutinersSportstersSportsters

DirectbuyingDirectbuying

Knowledgebuilding

Knowledgebuilding

Search/Deliberation

Search/Deliberation

HedonicbrowsingHedonicbrowsing

Page 141: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

142

Placing Success of a site intothe business context

Before talking of success:

Why does the site exist?

Why should someone visit it?

Why should someone return to it?

After answering these questions:

Does the site satisfy its owner?

Does the site satisfy its users?

ALL the users?

Business goals

Value creation

Sustainable value

User-centric measures

Application-centric measures

User types

Page 142: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

144

Associating Success toCRM-goals

Customer wishes, expectations and characteristics play a central role for all activities of an institution.

Customer Relationships (and their Management) take a prominent position in the specification of the institution's strategy.

The customer perspective must be incorporated into the institution's strategy and the actions implementing it.

The customer perspective must be incorporated into any evaluation measures associated with the success of the institution's strategy.

Web-sites and applications, being important channels of interaction with the customers, should be evaluated on their success.

Strategy

Evaluation of Actions

Actions

Evaluation of Web-Applications

Page 143: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

145

Evaluation instrumentsfor strategic management:

The Balanced Scorecard [KN92]

The BSC is a well-established instrument for strategic management.It takes a holistic viewof the organisation,considering fourperspectives:

Vision &Strategy

Customer

"To achieveour vision,how shouldwe appearto ourcustomers?"

Ob

ject

ives

Measu

res

Targ

ets

Init

iati

ves Internal

businessprocesses"To satisfy ourshareholdersand customers,what businessprocesses mustwe excel at?"

Ob

ject

ives

Measu

res

Targ

ets

Init

iati

ves

Financial"To succeedfinancially,how shouldwe appearto ourshareholders?"

Ob

ject

ives

Measu

res

Targ

ets

Init

iati

ves

Learning& Growth"To achieveour vision, howwill we sustainour ability tochange andimprove?"

Ob

ject

ives

Measu

res

Targ

ets

Init

iati

ves

Page 144: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

146

Balanced Scorecard meetsWeb site evaluation ?

Incorporating application-centric measures into evaluation instruments for strategic management:

Option 1:Incorporation of the application-centric measures into higher level measures of the company's BSC, e.g.Customer conversion across all channels for customer interaction, encompassing

– Customer conversion rate of the site

– Customer conversion rate due to TV ads

– Customer conversion rate of a brick-and-mortar store

Option 2:Integration of all application-centric measures into a BSC for the evaluation of the site

Page 145: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

147

The Web Scorecard of SASfor Customer Relationship Management [HSB02]

Three perspectives

The Web Scorecard takes a holistic view of the company's Web presence, considering three perspectives:

System perspective:It focusses on the objectives emanating from the need for 24/7 availability.

Perspective of the offered products/services:It focusses on the optimal exploitation of the products and services that the company offers via the site.

Customer perspective:It focusses on customer segmentation and customer satisfaction for the target segments.

Page 146: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

148

The Web Scorecard of SASfor Customer Relationship Management [HSB02]

Example instantiation (1)

1. System perspective

2. Product/service perspective

ObjectivesObjectives MeasuresMeasures IndicatorsIndicators

Optimal resource usageOptimal resource usage

Reduce/increase number of servers

Reduce/increase number of servers

Server loadServer load

Performance towards customer

Performance towards customer

Minimise number of objects per page

Minimise number of objects per page

Avg. response time,Access errors

Avg. response time,Access errors

ObjectivesObjectives MeasuresMeasures IndicatorsIndicators

Maximise awarenessMaximise awareness

Design marketing actions & questionnaires

Design marketing actions & questionnaires

Pagehits,Number of sessions

Pagehits,Number of sessions

MaximizestickinessMaximizestickiness

Page redesign, PersonalisationPage redesign, Personalisation

Avg. session durationAvg. session duration

Page 147: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

149

The Web Scorecard of SASfor Customer Relationship Management [HSB02]

Example instantiation (2)

3. Customer perspectiveObjectivesObjectives MeasuresMeasures IndicatorsIndicators

Increase customer satisfaction

Increase customer satisfaction

PersonalisationPersonalisation Usability rate/ Navigation, ...Usability rate/ Navigation, ...

Increase conversion rateIncrease conversion rate

Marketing campaigns, clickstream analysis of entry/exit paths

Marketing campaigns, clickstream analysis of entry/exit paths

Conversion rate, response rateConversion rate, response rate

Increase revenueIncrease revenuePersonalised special offerings,cross-selling

Personalised special offerings,cross-selling

Avg. number of purchases in the last 3 months

Avg. number of purchases in the last 3 months

Increase effectivity of marketing measures

Increase effectivity of marketing measures

Evaluation of campaign management

Evaluation of campaign management

Response rate per mailingResponse rate per mailing

Page 148: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

150

The Web Scorecard of SASfor Customer Relationship Management [HSB02]

Example implementation

Presentation metaphor: Radar chart

Each axis represents one indicator.

The outer horizon shows the target values of the indicators.

The inner horizon shows the current values of the indicators.

The difference between inner and outer horizon summarises the position of the institution with respect to its target.

The difference between inner and outer horizon for each indicator allows for a prioritisation of the measures to be improved/consolidated.

Avg. sum in market basket

Avg. sum in market basket

Avg. sum of

purchases

Avg. sum of

purchases

Retention rate

Retention rate

Page 149: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

151

Revisited: Balanced Scorecard meetsWeb site evaluation ?

Option 1:Incorporation of the application-centric measures into higher level measures of the company's BSC

Option 2:Integration of all application-centric measures into a BSC for the evaluation of the site+ Evaluation of the Web

site with a familiar instrument

+ Identification of multiple facets of web success

- Detachment of site success from overall evaluation

- Additional BSC instantiation needed

+ Evaluation of the Web site with a familiar instrument

+ Contribution of site success to the business targets

- The view over the site, as interaction channel and marketing instrument, is fragmented across multiple measures and indicators.

Page 150: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

153

End of Part III

Questions thus far ?

Page 151: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

154

Further Readingson Part III (1)

[BPW96] P. Berthon, L.F. Pitt and R.T. Watson. The World Wide Web as an advertising medium. Journal of Advertising Research, 36(1), pp. 43-54, 1996.

[Ber02] Berendt, B. (2002). Using site semantics to analyze, visualize, and support navigation. Data Mining and Knowledge Discovery, 6, 37-59.

[BS00] Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56-75.

[CPCP01] Chi, E.H., Pirolli, P., Pitkow, J.E. (2000). The scent of a site: a system for analyzing and predicting information scent, usage, and usability of a Web site. In Proceedings CHI 2000 (pp. 161-168).

[CS00] M. Cutler and J. Sterne. E-metrics — Business metrics for the new economy. Technical report, NetGenesis Corp., http://www.netgen.com/emetrics (access date: July 22, 2001)

[DFAB98] Alan Dix, Janet Finlay, Gregory Abowd, Russell Beale. Human Computer Interaction. Prentice Hall Europe 1998. Cited after http://www.tau-web.de/hci/space/i7.html and http://www.tau-web.de/hci/space/x12.html.

[DZ97] X. Dreze and F. Zufryden. Testing web site design and promotional content. Journal of Advertising Research,37(2), pp. 77-91, 1997.

Page 152: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

155

Further Readingson Part III (2)

[FGL+00] J. Forsyth and T. McGuire and J. Lavoie. All visitors are not created equal. McKinsey marketing practice. McKinsey & Company. Whitepaper. 2000.

[Flem98] Fleming, J. (1998). Web Navigation. Designing the User Experience. Sebastopol, CA: O'Reilly.

[HSB02] K.-P. Huber, F. Säuberlich, C. Böhm. Kennzahlenbasiertes Web Controlling mit einer Web Scorecard. In "Handbuch Web Mining im Marketing" (eds. H. Hippner, M. Merzenich, K. Wilde). vieweg. 2002 (on German)

[KNY00] Kato, H., Nakayama, T., & Yamane, Y. (2000). Navigation analysis tool based on the correlation between contents distribution and access patterns. In Working Notes of the Workshop "Web Mining for E-Commerce - Challenges and Opportunities." at SIGKDD-2000. Boston, MA (pp. 95-104).

[KP92] R.S. Kaplan, D.P. Norton. The Balanced Scorecard: Translating Strategy to Action. Boston MA. 1992

[KP03] Kohavi, R. and Parekh, R. Ten Supplementary Analyses to Improve E-Commerce Web Sites. In Proceedings of the WebKDD 2003 Workshop - Webmining as a Premise to Effective and Intelligent Web Applications. at SIGKDD-2003. August 27th, 2003, Washington DC, USA (pp. 29-36).

Page 153: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

156

Further Readingson Part III (3)

[KB04] A. Kralisch und B. Berendt. Cultural determinants of search behaviour on websites. In V. Evers, E. del Galdo, D. Cyr & C. Bonanni (Eds.), Designing for Global Markets 6. Proceedings of the IWIPS 2004 Conference on Culture, Trust, and Design Innovation. Vancouver, Canada, 8 - 10 July, 2004. Vancouver, BC: Product & Systems Internationalisation, Inc., pp. 61-74, 2004.

[Kuhl96] R. Kuhlen. Informationsmarkt: Chancen und Risiken der Kommerzialisierung von Wissen. 2nd edition, 1996 (on German)

[LPS+00] Junghoung Lee, M. Podlaseck, E. Schonberg, R. Hoch and S. Gomory. Analysis and visualization of metrics for online merchandizing. In "Advances in Web Usage Mining and User Profiling: Proc. of the WEBKDD'99 Workshop", LNAI 1836, Springer Verlag, pp. 123-138, 2000.

[Moe] W. Moe. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. In Journal of Consumer Psychology.

[Niel00] Nielsen, J. (2000). Designing Web Usability: The Practice of Simplicity. New Riders Publishing.

Page 154: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

157

Further Readingson Part III (4)

[SF99] M. Spiliopoulou, L.C. Faulstich. WUM: A Tool for Web Utilization Analysis. In: Extended version of Proc. EDBT Workshop WebDB’98, LNCS 1590. Springer Verlag, Berlin Heidelberg New York, pp 184–203, 1999.

[Shne98] Shneiderman, B. (1998). Designing User Interface. Strategies for Effective Human-Computer Interaction. 3rd edition. Reading, MA: Addison-Wesley.

[Ste03] Sterne, J. WebKDD in the Business World. Invited talk in the WebKDD 2003 Workshop - Webmining as a Premise to Effective and Intelligent Web Applications. at SIGKDD-2003. August 27th, 2003, Washington DC, USA.

[Spi99] M. Spiliopoulou. The laborious way from data mining to Web mining. Int. Journal of Comp. Sys., Sci. & Eng., Special Issue on ”Semantics of the Web”, 14, pp. 113–126, 1999.

[SP01] M. Spiliopoulou,C.Pohle. Data mining for measuring and improving the success of Web sites. In Journal of Data Mining and Knowledge Discovery, Special Issue on E-commerce, 5, pp. 85–114. Kluwer Academic Publishers. 2001

[Sul97] T. Sullivan. Reading reader reaction: A proposal for inferential analysis of web server log files. Proc. of the Web Conference'97, 1997.

Page 155: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

158

Agenda

Part II: Web Mining as a Project

Part III: Evaluation Methods and Measures

Part IV: Case Study

Part V: Infrastructure for Web Mining Deployment

Part VI: Outlook

Part I: Foundations and Principles of Web Mining

Page 156: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

159

Objectives of the application

General objectives: “Standard e-tailer goals“ – attract users/shoppers and convert them into customers

Specific objectives: assess the success of the Web site – in relation to other distribution channels

Questions of the evaluation:

• What business metrics can be calculated from Web usage data, transaction and demographic data for determining online success?

• Are there cross-channel effects between a company‘s e-shop and its physical stores?

52 5467 69

48 4633 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1999 2000 2001 2002 (proj.)

Pure Internetcompanies

Multi-channelbusinesses

Background: Internet market shares [BCG 2002]

Case study “Multi-channel e-tailer“

[TB03,TBG03]

Case study “Multi-channel e-tailer“

[TB03,TBG03]

Page 157: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

160

Description of the site and its services

The retailer operates an e-shop and more than 5000 retail shops in over 10 European countries

It sells a wide range of consumer electronics

Online customers can pay, pick-up/deliver and return both online and offline

Web pages provide for all tasks in the customer buying process:

Transaction PurchaseOfflineinfoHome (Acquisition)

Product Click-

Through Product

Impression

Page 158: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

161

Outline of the KDD process

Data preparation:

Session IDs; usual data cleaning steps

Linking of sessions & transaction information (anonymized)

Modelling / pattern discovery:

Web metrics, cluster analysis, association rules, sequence mining + correlation analysis, questionnaire study, qualitative market analysis

Evaluation: Interesting patterns

Business understanding: see previous 2 slides

Data:

Web server sessions, transaction info.

Data understanding – main step:

modelling the semantics of the site in terms of a hierarchy of service concepts

Page 159: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

162

Data and data preparation

Data sources and sample:

92,467 sessions from the company’s Web logs from 21 days in 2002

anonymized transaction information of 13,653 customers who bought online over a period of 8 months in 2001/02.

621 transaction records (21 days) were linked to Web-usage records

Data preparation:

Sessions were determined by session IDs

Robot visits eliminated, usual data cleaning steps

Each URL request mapped to a service concept from {c1,...,cn}

Session representation: s = [w1, ...wn], with wi = weight of ci, indicating whether or not the concept was visited (1/0), or how often it was visited

Customer record: feature vector incl. session and transaction data

Page 160: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

163

Site semantics: A service concept hierarchy

Any

Information

Transaction

Services

Information Product

Fulfillment/ Service

Customer Data

Shopping Cart Payment

Company Infos

Registration

Other

Acquisition

Offline Referrer

Advertiser Other

Store Locator

Information Catalog

Home

Game Offline Service

and Support

= Multi-Channel Concept

760,535 page requests were mapped onto the concepts from this hierarchy:

Page 161: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

164

Types of patterns

Conversion rates (~ confidence of content-specified sequential association rules) for assessing business success

Cluster analysis for customer segmentation

Association rule and sequence analysis for understanding online/offline preferences and their temporal development

Correlation analysis for investigating the relationship between demographic indicators and online/offline preferences

In this case study, all patterns were discovered using straightforward algorithms. Algorithms were not compared or evaluated.

Page 162: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

165

Evaluation scheme for the site

Summative evaluation was designed to quantify the site‘s success in converting users to customers, online users to physical-branch shoppers, etc.:

Life-cycle business metrics (Reach, Acquisition, Conversion, Retention) & multi-channel metrics (Offline Payment rate, Payment Migration rate, Deliveries to stores rate, Deliveries migration rate) [CS00,LPSH01,TB03]

Degree of user satisfaction with features of the site (questionnaire meth.)

Formative evaluation was designed to increase understanding of the market and to give feedback on how multi-channel options are used:

What subgroups (identified by behavioural focus) exist in the e-tailers market?

market segments with specific patterns of online/offline preferences?

How “consistent“ & stable over time are customers‘ onl./offl. preferences?

How do demographic factors affect online/offline preferences?

Page 163: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

166

Evaluation scheme for patterns

The value of a business metric is interesting if ...

... it {deviates from | agrees with} industry averages, or with the same shop‘s earlier values on these metrics

A cluster is interesting if ...

... its internal structure shows a clear behavioural focus (heavy use of specific pages)

An association rule is interesting if ...

... its support is particularly large or small (size of customer groups)

... its confidence is particularly large or small ((in)stability of preferences)

A correlation is interesting if ...

... it is statistically significant

Page 164: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

167

Evaluation scheme for the environment

The evaluation of the specific e-tailer focused on the described aspects of the online-offline channel mix.

The environment within the company was not evaluated.

The environment of the company – the Internet multi-channel market was evaluated with a check-list method:

Online service mix at the world‘s 50 largest e-retailers in 2002 [GM03]. 43 of the 50 are multi-channel.

Page 165: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

168

Tend to arrive with prior knowledge

Tend to be "true multi-channel users"

Tend to be "true online users"

Largest group visits all concepts except offline

information

Results: Market segments

Cluster centers of the weighted purchase sessions with direct delivery preference

Cluster centers of the weighted purchase sessions with pick-up in store preference

Cluster 1 2 3 4 5Home 2 1 2 2 2Infocat 7 2 4 23 16Offinfo 3 0 1 2 0Infprod

6 3 12 21 5

Service

10 3 2 4 4

Transact

6 2 3 4 4

Number of cases

29 188

45 15 37

Cluster 1 2 3 4 5Home 1 4 18 1 4Infocat 22 30 6 1 6Offinfo 1 7 1 5 19Infprod 1 27 5 22 8Service 5 4 0 0 12Transact

3 7 3 3 4

Number of Cases

25 15 147 40 55

Page 166: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

169

Results: “Internal consistency“ of preferences – payment and delivery preferences

Online payment Direct delivery (s=0.27, c=0.97) < 1/3 traditional onl.users!

Online payment In-store pickup (s=0.02, c=0.03)

Cash on delivery Direct delivery (s=0.02, c=0.03)

In-store payment In-store pickup (s=0.69, c=0.94)

Site is primarily used to collect information.

s: support, c: confidence of the sequence

s: support, c: confidence of the sequence

Page 167: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

170

Results: “Internal consistency“ of preferences – return preferences

Return In-store (s=0.06, c=0.87)

Return Mail-in (s=0.04, c=0.13)

Customers may wish personal assistance.

(a result supported by the service mix analysis of different multi-channel retailers and by questionnaire results)

s: support, c: confidence of the association rule

s: support, c: confidence of the association rule

Page 168: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

171

Results: Development of preferences over time

Direct delivery In-store pickup in 1 following transaction (s=0.001,c=0.15)

Direct delivery Direct delivery in all following transactions (s=0.003,c=0.85)

In-store pickup Direct delivery in 1 foll. transaction (s=0.001, c=0.10) (*)

In-store pickup In-store pickup in all foll. transactions (s=0.004, c=0.90)

Results for payment migration are similar.

90% of repeat customers did not change transaction preferences at all.

Rule (*) as an indicator of the development of trust?!

s: support, c: confidence of the sequence

s: support, c: confidence of the sequence

Page 169: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

172

Results: Shop and Customer Distribution

Shops Customers (Red=pick up;Blue=direct

delivery)

Page 170: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

173

70.065.0

60.055.0

50.045.0

40.035.0

30.025.0

20.015.0

10.05.0

0.0

3000

2000

1000

0

Customers

Std.Dev.: 9.32,

Mean: 10.0, N=13653

km

Results: Impact of demographics and of the offline distribution channel ?!

A significant Pearson correlation exists between

the number of customers per zip code area, normalised by the number of residents/zip code, and the distance to the next store (r = -0.3, p < 0.001).

number of residents/zip code and distance to store (r =-0.01, p<0.001)

Page 171: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

174

End of Part IV

Questions thus far ?

Page 172: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

175

Further Readingson Part IV

[CS00] Cutler, M., Sterne, J. (2000). E-Metrics - Business Metrics for the New Economy. Technical Report, Netgenesis Corp., http://www.netgencom/emetrics.

[GM03] Gallo, R., McAlister, J. (2003). The Top 50 Retailers. Technical Report, August 2003. http://www.retailforward.com

[LPSH01] Lee, J., Podlaseck, M., Schonberg, E., & Hoch, R. (2001). Visualization and Analysis of Clickstream Data of Online Stores for Understanding Web Merchandising. Data Mining and Knowledge Discovery, 5(1/2), 59-84.

[TB03] Teltzrow, M., & Berendt, B. (2003). Web-Usage-Based Success Metrics for Multi-Channel Businesses. In Proceedings of the WebKDD 2003 Workshop - Webmining as a Premise to Effective and Intelligent Web Applications. at SIGKDD-2003. August 27th, 2003, Washington DC, USA (pp. 17-27).

[TBG03] Teltzrow, M., Berendt, B., & Günther, O. (2003). Consumer behaviour at multi-channel retailers. In Proceedings of the 4th IBM eBusiness Conference, School of Management, University of Surrey, 9th December 2003.

Page 173: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

176

Agenda

Part II: Web Mining as a Project

Part III: Evaluation Methods and Measures

Part IV: Case Study

Part V: Infrastructure for Web Mining Deployment

Part VI: Outlook

Part I: Foundations and Principles of Web Mining

Page 174: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

177

Part V:Infrastructure for Web Mining Deployment

An Agent-based Architecture

Basic Functionalities

Page 175: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

178

Goals to achieve

Efficiency: •User

•Stakeholder

– No loss of efficiency due to the web mining solution

Adaptability:– The solution has to be adaptable to:

•New data captured by the operational systems

•Changes in the local or global environment

Integrated:– Business logic

– Rest of subsystems

– Rest of communication channels

Flexible design, usable applications

Page 176: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

179

Functionalities

Data adquisition and storage:– Transactional

– Navigational

Business rules/logic adquisition and storage

Knowledge Discovery in data

Act according to knowledge and business goals

Monitoring

Results measurement

Improvement and refining actions

Page 177: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

180

Part V:Infrastructure for Web Mining Deployment

Basic Functionalities

An agent-based Architecture

Page 178: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

181

Agents capabilities

Features: [JW98]– Autonomy

– Abilities to perceive, reason and actin surrounding environmet

– Capability to cooperate with other agents to solve complex problems

Facilitate the incorporation of reasoning capabilities within the business aplication logic

Permit the inclusion of learning and self-improvement capabilities

Can participate in high-level dialogues using protocols and built-in organizational knowledge

Can help address serious tecnlogical challenges: security, privacy, searching interoperability

Page 179: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

182

Web Mining infraestructure

Page 180: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

183

Architecture: Decision Layer

Make decisions depending on the semantic layer information:

User Agents:– Represent each navigation on the site.

– The interaction User-Interface Agent and Interface Agent-User agent will make it possible together with the data being already stored to calculate the user model.

Planning Agents or Agents of strategy– Determine the strategy to be followed in order to obtain a

better relationship with the user at the same time that goals achievement is improved.

– They will collaborate with the Interface agents and CRM Services Provider Layer agents to elaborate the best action plan, depending on the problem to be solved and on the environment conditions.

Page 181: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

184

Architecture: Semantic Layer

Contains agents related to the:

logic of the algorithms and

method used.

There will be different agents, each of which will specialize in the application of the different models needed for decision making process.

Models will be stored in a repository from which they will be updated, deleted or improved by Refining Agents when needed.

Page 182: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

185

Architecture: CRM Services Provider Layer

It offers an interface, which will be used by any agent asking for a service.

Several agents specialized in particular services.

Particular Action Plan selected for:

a particular Session

at a particular moment

will involve several agents that will act, collaborate and interact among them in order to reach the proposed goals.

This e-business agents should be “intelligent”

Intelligence: amount of learned behaviour and possible reasoning capacity that the agent can possess [Papazoglou01]

Page 183: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

186

Agents collaboration scheme

USER:• Subjective Inf.• Objective Inf.

PLANS

USER MODELLINGAGENT

PLANNING AGENT

•Interaction Elements•Available Services•Communication Channel

INTERFACE AGENT

Service 1

DOMAINAGENT 1

Service N

DOMAINAGENT N...

Operational Plan

Action Plan

UserModel

Page 184: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

187

A business agent example[P01]

Page 185: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

188

Business Agent Typology[P01]

Page 186: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

189

Conclusions

Agent-oriented technology can help enable the development of model-based solutions to e-business applications

Enhace enterprise modelling

Offer techniques to incorporate the knowledge extracted in a data mining project

Offer techniques to gather information to be used in the data mining project

Agents have to be organized depending on their functionality

Solutions based on agent are easy to scale

Page 187: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

190

End of Part V

Questions thus far ?

Page 188: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

191

Further Readingson Part V

[JW98] Jennings and Wooldridge. Application of Intellingent Agents. Agent Technology Foundations, Applications and Markets. Springer Verlag 98.

[P00] Yang, J. and Papazoglou, M.P. Interoperation support for e-business. Commun. ACM 43, 6 (June 2000).

[P01] Papazoglou M., Agent-Oriented Technology in Support of e-business. Commun. ACM 44, 4 (april 2001)[PML01] S. Parent, B. Mobasher, and S. Lytinen. An adaptive agent for web exploration based on concept hierarchies. In Proceedings of the International Conference on Human Computer Interaction. New Orleans, LA, August 2001.

[PSS02] T. R. Payne, R. Singh, and K. Sycara. Browsing schedules - an agent-based approach to navigating the semantic web. ISWC 2002, LNCS 2342, pages 469–473, 2002.

Page 189: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

192

Agenda

Part II: Web Mining as a Project

Part III: Evaluation Methods and Measures

Part IV: Case Study

Part V: Infrastructure for Web Mining Deployment

Part VI: Outlook

Part I: Foundations and Principles of Web Mining

Page 190: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

193

Part VI:Outlook

Web mining for institutions with innovative organisational forms

How personal should and can web personalization be?

Page 191: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

194

Web Mining Applicationsfor new areas

Many Web Mining focus on:

Marketing measures to maximise customer lifetime value

User modelling, usually in the context of– e-learning and training

– site optimisation

– personalisation in any B2C and A2C applications

There are much more domains, where Web Mining can contribute:

1. Non-traditional business domains

2. Innovative applications in traditional domains

Page 192: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

195

E-Business models

Timmers identified eleven business models [Tim99], putting emphasis on the B2B-domain: e-shop e-mall e-auction third-party marketplace e-procurement virtual community collaboration platform value-chain integrator value-chain service provider information service provider trust service provider

Conventional business models, mostly B2C, with focus on sales

Which business models are encountered by Amazon?Which business models do its partners have?

Which business models are encountered by Amazon?Which business models do its partners have?

Page 193: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

196

E-Business models

Timmers identified eleven business models [Tim99], putting emphasis on the B2B-domain: e-shop e-mall e-auction third-party marketplace e-procurement virtual community collaboration platform value-chain integrator value-chain service provider information service provider trust service provider

Which business models are encountered by Amazon?Which business models do its its partners have?

Which business models are encountered by Amazon?Which business models do its its partners have? portals

application service providers

More innovative business models, mostly B2B with some focus on sales

Page 194: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

197

Some Web Mining application areasfor non-traditional business models (1)

1. Mining for Web communities and other forms of social networks:

e-auctions

third-party marketplaces and portals

virtual communities

information & trust service providers

raising questions like:

a. What is the impact of a community for its members, for the whole network and, ultimately, for the success of the business model?

b. Are desirable and undesirable impacts? How can they be distinguished and influenced? How can impact be quantified?

Page 195: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

198

Some Web Mining application areasfor non-traditional business models (2)

2. Coordination and optimisation of inter-institutional processes:

third-party marketplaces

value-chain integrators

value-chain service providers and application service providers

raising questions like:

a. What is the impact of end-user requests and throughput upon the processes of the business partners? How can this impact be quantified and used for process optimisation?

b. The participants of a third-party marketplace and the partners of a value-chain service integrator stand often in competition or cooptation. Can undesirable participant strategies be identified and sanctioned?

Page 196: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

199

Some Web Mining application areasfor arbitrary business models

4. Threads from malicious communities:

a. Disclosure of private data / confidential information (cf. [Clif03])

b. Influence upon the outcome of a transaction through:– integrity violation– confidentiality breach– repudiation– other

where a community member may or may not be involved in the transaction.

How can threads be traced, without compromising the anonymity of non-malicious participants?

How can their impact be quantified to assess the danger for the success of the business model?

How can malicious communities be dissolved?

Page 197: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

200

Part VI:Outlook

Web mining for institutions with innovative organisational forms

How personal should and can web personalization be?

Page 198: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

201

Internet users are worried about their privacy ...

(results from a meta-study of 30 questionnaire-based studies [TK03])

(results from a meta-study of 30 questionnaire-based studies [TK03])

Page 199: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

202

... but they are willing to exchange privacy for personalization benefits

Users would provide, in return for personalized content, information on their name (88%), education (88%), age (86%), hobbies (83%), salary (59%), or credit card number (13%).

27% of Internet users think tracking allows the site to provide information tailored to specific users.

73% of online users find it useful if site remembers basic information such as name and address.

People are willing to give information to receive a personalized online experience: 51% or 40%, depending on the study.

[TK03]

Page 200: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

203

User-centric evaluation: An experimental investigation of the effect of explaining the

personalization-privacy tradeoff

[KT04] compared the effects of traditional privacy statements with that of a contextualized explanation on users’ willingness to answer questions about themselves and their (product) preferences.

In the contextualized-explanation condition, participants

answered 8.3% more questions (gave at least one answer) (p<0.001),

gave 19.6% more answers (p<0.001),

purchased 33% more often (p<0.07) ,

stated that their data had helped the Web store to select better books (p<0.035) – even though the recommendations were static and identical for both groups.

[KT04] compared the effects of traditional privacy statements with that of a contextualized explanation on users’ willingness to answer questions about themselves and their (product) preferences.

In the contextualized-explanation condition, participants

answered 8.3% more questions (gave at least one answer) (p<0.001),

gave 19.6% more answers (p<0.001),

purchased 33% more often (p<0.07) ,

stated that their data had helped the Web store to select better books (p<0.035) – even though the recommendations were static and identical for both groups.

(screenshot from [TK04])

(screenshot from [TK04])

Page 201: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

204

“The more data, the better the quality of personalization“ ?

In current personalization systems, users only get an “all-or-nothing” deal:

The user can only choose whether to pay nothing (e.g., accept no cookies from the site) or all (accept all cookies), in return for an unspecified increase in recommendation quality (“better”).

P3P and its variants (such as the contextualized add-on described on the previous slide) are limited to a qualitative specification of the exchange relation: In return for data of type X, services of type Y can be offered.

But: want to know by how much the addition of a particular piece of data makes recommendation quality better.

A part of the solution:

Evaluation of algorithms under different conditions (different availability of data on an individual user) can help to quantify the personalization benefits of data disclosure [overview: BT04]

Page 202: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

205

Algorithm-centric evaluation: The impact of the availability of different levels of identity disclosure

The effect of cookies (“high“ level of identity disclosure) vs. no cookies (“intermediate“) on the quality of data for personalization [SMBN03]

The effect of tracking over multiple sites (“user centric“ – “maximal“ level of disclosure) vs. tracking only on one site (“site centric“ - “high“ level) on page prediction as an indicator of personalization quality [PZK01]

Page 203: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

206

Vision 1: Evaluation, communication design, and trade – detailed explanations of the privacy-

personalization tradeoff

[BT04]

Page 204: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

207

Vision 2: Technology – client-side profiling, pseudonymity, identity management, …

Client-side profiles [SH01]: – Users let privacy agents record all interactions with all Web sites.

– At the user’s discretion, parts of that profile can be made available to marketers or peer networks -> managed via privacy metadata.

privacy agent should also provide identity management [JM00]:– Use new pseudonyms when entering sites, and/or re-use old ones

The user privacy agent should also– monitor third-party services to bring problems to user’s attention

(privacy meta-data),

Issues to be resolved:– Need advanced interfaces to help users adopt a complex technology

– Requires a well-functioning system of market surveillance, which is fed back to the user agents => a large enough user + contributor base

– Contributions of privacy-preserving data mining? (overview:[Clif03])

Cf. [SDGR03, BGS04, KS03]

Page 205: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

208

Thank you for your attention !

Questions ?

Page 206: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

209

Further Readingson Part VI (1)

[BGS04] Berendt, B., Günther, O., & Spiekermann, S. (in press). Privacy in E-Commerce: Stated preferences vs. actual behavior. To appear in Communications of the ACM.

[BT04] Berendt, B. & Teltzrow, M. (in press). Addressing Users Privacy Concerns for Improving Personalization Quality: Towards an Integration of User Studies and Algorithm Evaluation. To appear in B. Mobasher & S.S. Anand (Eds.), Intelligent Techniques for Web Personalization. Berlin etc.: Springer. LNAI.

[Clif03] Clifton, C. (2003). Privacy Preserving Data Mining. Tutorial atThe Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24, 2003, Washington, D.C. http://www.cs.purdue.edu/homes/clifton/DistDM/Clifton_PPDM.pdf

[JM00] Jendricke, U. and Gerd tom Markotten, D. (2000). Usability meets security - The Identity Manager as your personal security assistant for the Internet. In Proceedings of the 16th Annual Computer Security Applications Conference (New Orleans, LA, Dec.).

Page 207: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

210

Further Readingson Part VI (2)

[KS03] Kobsa, A. & J. Schreck (2003): Privacy through Pseudonymity in User-Adaptive Systems. ACM Transactions on Internet Technology 3 (2), 149-183 .

[KT04] Kobsa, A. & Teltzrow, M. (2004): Contextualized Communication of Privacy Practices and Personalization Benefits: Impacts on Users’ Data Sharing Behavior. To appear in Proceedings of the 2004 Workshop on Privacy Enabling Technologies, Toronto, Canada, Springer. Draft: http://www.ics.uci.edu/~kobsa/papers/2004-PET-preconference-kobsa.pdf

[SH01] Shearin, S. & Liebermann, H. (2001). Intelligent profiling by example. In Proceedings of the ACM Conference on Intelligent User Interfaces (Santa Fe, NM, January).

[SDGR03] Spiekermann, S., Dickinson, I., Günther, O., & Reynolds, D. (2003). User agents in E-commerce environments: Industry vs. Consumer perspectives on data exchange. In Proc. CAiSE 2003 (pp. 696-710). Springer LNCS.

Page 208: 1 Evaluation in Web Mining Tutorial at ECML/PKDD 2004 Pisa, Italy, September 20th, 2004 berendt/evaluation04/ Myra Spiliopoulou

ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina

Menasalvas

211

Further Readingson Part VI (3)

[SMBN03] Spiliopoulou, M., Mobasher, B., Berendt, B., & Nakagawa, M. (2003). A framework for the evaluation of session reconstruction heuristics in Web-usage analyis. INFORMS Journal on Computing, 15, 171-190.

[TK03] Teltzrow, M.and A. Kobsa (2003): Impacts of User Privacy Preferences on Personalized Systems - a Comparative Study. In Proceedings of the CHI-2003 Workshop "Designing Personalized User Experiences for eCommerce: Theory, Methods, and Research", Fort Lauderdale, FL.

[TK04] Teltzrow, M. & Kobsa, A. (2004). Communication of Privacy and Personalization in E-Business. In Proceedings of the Workshop “WHOLES: A Multiple View of Individual Privacy in a Networked World”, Stockholm, Sweden.

[Tim99] Paul Timmers. Electronic Commerce. Wiley, 1999.