Upload
edward-stevens
View
233
Download
3
Tags:
Embed Size (px)
Citation preview
1
Evaluation in Web Mining
Tutorial at ECML/PKDD 2004Pisa, Italy, September 20th, 2004http://www.wiwi.hu-berlin.de/~berendt/evaluation04/ Myra Spiliopoulou
Bettina Berendt
Ernestina Menasalvas
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
2
The Presenters
Myra Spiliopoulou
Research group KMD: Knowledge Management & Discovery in Information Systems, Otto-von-Guericke-Universitaet Magdeburg
Bettina Berendt
Institute of Information Systems, Humboldt University Berlin, Berlin, Germany
Ernestina Menasalvas
Department of Computer Science, Facultad de Informática, Universidad Politécnica de Madrid, Spain
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
3
Agenda
Part II: Web Mining as a Project
Part III: Evaluation Methods and Measures
Part IV: Case Study
Part V: Infrastructure for Web Mining Deployment
Part VI: Outlook
Part I: Foundations and Principles of Web Mining
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
4
Part I:Foundations and Principles of Web Mining
A quick tour of Web usage mining
Motivation and Background
5
•Despite its success, one problem of the current WWW is that much of this knowledge lies dormant in the data.
•Web mining tries to overcome these problems by applying data mining techniques to the content, (hyperlink) structure, and usage of Web resources.
Web Mining Areas
Web content mining
Web structure mining
Web usage mining
• Goals include
• the improvement of site design and site structure,
• the generation of dynamic recommendations,
• and improving marketing.
What is Web Mining?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
6
Application problems and goals (1)
Top-level goal 1: The Web exists in order to be used.
Evaluation focusses on usage.
Goals of usage depend on stakeholder and viewpoint.
Note:
There are other top-level goals, e.g. “The Web exists in order to allow new forms of access to knowledge“
These require a different evaluation focus.
In this tutorial, we focus on the above “top-level goal 1“.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
7
Application problems and goals (2)
Stakeholders
Site users
Site owners / sponsors (technical, marketing, management, ...)
Viewpoints: a Web site / a collection of Web sites or pages as ...
... a piece of software usability?
... a distribution channel for a business or organization profitability?; market analysis; recommendations for cross-selling; ...
... a collection of documents frequency of use / public perception?; competition analysis
... a medium for a given content and tasks (e.g., e-Learning) cf. distribution channel
... a Web of connections (e.g., a social network) what properties does the network have?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
8
Evaluation
Evaluation - act of ascertaining or fixing the value or worth of
(Programming) evaluation - The process of examining a system or system component to determine the extent to which specified properties are present.
( http://www.webster-dictionary.org/definition/evaluation )
Refine the definition:
the act of ascertaining the value of an object according to specified criteria, operationalised in terms of measures.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
9
measures / describes
Evaluation and data analysis – Data analysis for evaluation
Object of evaluation: a Web site
Criteria:
– quality as an interactive software,
– quality as a retailer‘s distribution channel,
– quality as a (mass) communication medium,
– ...
The measures include:
– usability metrics,
– business metrics
– ...
These measures´ values are derived from analysing the data.
data goal
mining procedure
Web resource
gives rise to
contributes to
uses
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
10
Evaluation and data analysis – Evaluation of data analysis
Objects of evaluation: – one or more data analysis procedures
•specific algorithms or comprehensive software/information systems solutions
•existing / in place or suggested / planned
– patterns (results of a data analysis)
Criteria: quality and performance of a procedure; interestingness of a pattern
Measures include:– accuracy of a classification algorithm,
– impact of introducing a data mining software solution on tasks, resources, staffing
– interestingness measures
These measures´ values are derived from theoretical analyses and from the data.
measure / describedata goal
mining procedure
Web resource
gives rise to
contributes to
uses
measures / describes
patterns
outputs
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
11
Forms of evaluation and their main foci
Purposeful, key informants in mining: interesting patterns
Random, probabilisticSampling
Exploratory, hypothesis generating pattern disc.
Confirmatory, hypothesis testing
Relationship to prior knowledge
Naturalistic inquiryExperimental designDesign
Holistic interdependent system
Independent and dependent variables
Conceptuali-sation
understand how something works
analyze strengths and weaknesses towards improvement, give feedback
assess concrete achievements
give results and evidence
Purpose
FormativeSummativeMode
Case studies, content and pattern analysis
Descriptive and inferential statistics
Analysis
Partly based on [Patt97]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
12
Part I:Foundations and Principles of Web Mining
A quick tour* of Web usage mining
Motivation and Background
* for a detailed introduction, see [SMB02]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
13
Web Usage Mining: Basics and data sources
Definition of Web usage mining:
discovery of meaningful patterns from data generated by client-server transactions on one or more Web servers
Typical Sources of Data
automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies
e-commerce and product-oriented user events (e.g., shopping cart changes, ad or product click-throughs, etc.)
user profiles and/or user ratings
meta-data, page attributes, page content, site structure This includes semantics / ontologies of site content and
services, cf. [BS00,OBHG03,BSH04].
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
14
Preprocessing Pattern AnalysisPattern D iscovery
C ontent andStructure D ata
"Interesting"R ules, Patterns,
and S tatistics
R ules, Patterns,and S tatistics
PreprocessedC lickstream
D ata
R aw U sageD ata
The Web Usage Mining Process
customers
ordersproducts
OperationalDatabase
ContentAnalysisModule
Web/ApplicationServer Logs
Data Cleaning /Sessionization
Module
Site Map
SiteDictionary
IntegratedSessionized
Data
DataIntegration
Module
E-CommerceData Mart
Data MiningEngine
OLAPTools
Session Analysis /Static Aggregation
PatternAnalysis
OLAPAnalysis
SiteContent
Data Cube
Basic Framework for E-Commerce Data Analysis
Web Usage and E-Business Analytics
16ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"
© Myra Spiliopoulou, Bettina Berendt, Ernestina Menasalvas
Application problems and typicalpattern discovery techniques
Sequence mining
Sequence mining
Markov chainsMarkov chains
Association rules
Association rules
ClusteringClustering
Session ClusteringSession
Clustering
ClassificationClassification
Prediction of next eventPrediction of next event
Discovery of associated events/application objectsDiscovery of associated events/application objects
Discovery of visitor groups with common properties & interests
Discovery of visitor groups with common properties & interests
Discovery of visitor groups with common behaviourDiscovery of visitor groups with common behaviour
Characterization of visitors with respect to a set of predefined classes
Characterization of visitors with respect to a set of predefined classes
Card fraud detectionCard fraud detection
These are only some examples!For more infos , cf. [WebKDD]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
17
End of Part I
Questions thus far ?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
18
Further Readingson Part I (1)
[BS00] Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56-75.
[BSH04] Berendt, B., Stumme, G., & Hotho, A. (in press). Usage mining for and on the Semantic Web. In H. Kargupta, A. Joshi, K. Sivakumar, & Y. Yesha (Eds.), Data Mining: Next Generation Challenges and Future Directions (pp. 467-486). Menlo Park, CA: AAAI/MIT Press.
[OBHG03] Oberle, D., Berendt, B., Hotho, A., & Gonzalez, J. (2003). Conceptual user tracking. In E.M. Ruiz, J. Segovia, & P.S. Szczepaniak (Eds.), Web Intelligence, First International Atlantic Web Intelligence Conference, AWIC 2003, Madrid, Spain, May 5-6, 2003, Proceedings (pp. 155-164). Berlin: Springer, LNCS 2663.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
19
Further Readingson Part I (2)
[SMB02] Spiliopoulou, M., Mobasher, B., & Berendt, B. (2002). Web Usage Mining for E-Business Applications. Tutorial at the 13th European Conference on Machine Learning (ECML'02) / 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02), Helsinki, Finland, 19 August 2002. http://ecmlpkdd.cs.helsinki.fi/pdf/berendt-2.pdf
[WebKDD] WebKDD Workshop series at SIGKDD.
http://www.wiwi.hu-berlin.de/~myra/WEBKDD99
http://robotics.stanford.edu/~ronnyk/WEBKDD2000
http://robotics.stanford.edu/~ronnyk/WEBKDD2001
http://db.cs.ualberta.ca/webkdd02
http://db.cs.ualberta.ca/webkdd03
http://maya.cs.depaul.edu/webkdd04
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
20
Agenda
Part II: Web Mining as a Project
Part III: Evaluation Methods and Measures
Part IV: Case Study
Part V: Infrastructure for Web Mining Deployment
Part VI: Outlook
Part I: Foundations and Principles of Web Mining
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
21
A Project-Oriented View to Web Mining
Web Mining is a resource-intensive process. Its execution requires:
Objectives
Personnel
Time schedule and milestones
Budget
Reporting
Quality control
as is typical for projects.
Project management has delivered many results we can built upon.Here, we concentrate on:
• Data Mining projects• IT projects• CRM projects
Project management has delivered many results we can built upon.Here, we concentrate on:
• Data Mining projects• IT projects• CRM projects
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
22
Part II:Web Mining as a Project
Project Management Methods
Models for Data Mining Process Management
Cost Estimation
The CRISP-DM Reference Model
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
23
Cross-Industry Standard Process for Data Mining:CRISP-DM
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
24
CRISP-DM
CRISP-DM has been an international project, funded by the EU,intended to
develop a knowledge discovery processthat is neutral with respect to/independent ofindustries, tools and applications.
CRISP-DM became a standard process model for data mining.
CRISP-DM is supported and promoted by
data mining software vendors
practitioners in data mining and in data warehousing
CRISP-DM has a special interest group, in which vendors, consultants and practitioners are involved.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
25
CRISP-DMand Web Mining
CRISP-DM has been an international project, funded by the EU,intended to
develop a knowledge discovery processthat is neutral with respect to/independent ofindustries, tools and applications.
Web Mining is for institutions that need to process Web data: companies authorities non-governmental organizations institutions professing in teaching/learning research institutions
Applications include CRM Process optimisation Education/Training Business intelligence Cybercrime prevention
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
26
The CRISP-DM Process
The CRISP-DM process is
a non-ending circle of iterations
a non-sequential process, where backtracking at previous phases is usually necessary
Here is a sequential instantiation:
Business Understanding
Data Understanding
DataPreparation
Modeling Evaluation DeploymentBusiness Understanding
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
27
Evaluation andBusiness Understanding
Evaluation requires a well-defined notion of success, which must be in place before
the evaluation takes place
the data mining phase starts
any work with the data starts
i.e. already during the business understanding process.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
28
Business Understandingin the CRISP-DM Process
Business UnderstandingBusiness Understanding
Determine Business
Objectives
Assess Situation
Determine Data
Mining Goals
Produce Project
Plan
Background
Business Objectives
Business Success Criteria
Inventory &
Resources
Reqs, Assumptio
ns &Constrain
ts
Risks & Contin-gencies
Terminology
Costs & Benefits
Data Mining Goals
Data Mining Success Criteria
Project Plan
Initial Assessment of
Tools & Techniques
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
29
Our focus upon the Business Understandingin the CRISP-DM Process
Business UnderstandingBusiness Understanding
Determine Business
Objectives
Assess Situation
Determine Data
Mining Goals
Produce Project
Plan
Background
Business Objectives
Business Success Criteria
Inventory &
Resources
Reqs, Assumptio
ns &Constrain
ts
Risks & Contin-gencies
Terminology
Costs & Benefits
Data Mining Goals
Data Mining Success Criteria
Project Plan
Initial Assessment of
Tools & Techniques
Business Success Criteria
Data Mining Goals
Business Objective
s
Costs & Benefits
Data Mining Success Criteria
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
30
Business Understanding,objectives and success (1)
Business objectives:
What is the customer's primary objective? E.g.– Increase the lifetime value of valuable customers
– Maximize the revenue from online course material
– Help students learn better using online course material
– Optimise the process of information extraction for the department X (human-genome research department, competitive intelligence department, security department)
– Minimise credit card fraud
These are different
objectives
This is not equivalent to
minimising the number of credit
card frauds !
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
31
Business Understanding,objectives and success (2)
Business objectives:
What is the customer's primary objective? E.g.– Increase the lifetime value of valuable customers
– Help students learn better using online course material
– Minimise credit card fraud
Business success criteria:
What constitutes a successful outcome of the project? E.g.– Reduction of customer churn
– Students learning online get on average the same notes than those attending classes
– Reduction of the expenses caused by credit card fraud
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
32
Business Understanding,objectives and success (3)
Business objectives:
What is the customer's primary objective?
Business success criteria:
What constitutes a successful outcome of the project?
Costs & Benefits:
Perform a cost-benefits analysis– Compute the benefits of the project, if it is successful
– Compute the costs of the project (equipment, human resources...)
– Quantify the risk that the project fails
– Juxtapose those values
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
33
Business Understanding,objectives and success (4)
Business objectives:
What is the customer's primary objective?
Business success criteria:
What constitutes a successful outcome of the project?
Costs & Benefits:
Perform a cost-benefits analysis
Data mining goals:
Translate the customer's primary objective into a data mining goal, e.g.
– Increase purchases due to cross- and up-sales
– Build a cost-based prediction model for credit card fraud
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
34
Business Understanding,objectives and success (5)
Business objectives:
Business success criteria:
Costs & Benefits:
Data mining goals:
Translate the customer's primary objective into a data mining goal, e.g.
– Increase purchases due to cross- and up-sales
– Build a cost-based prediction model for credit card fraud
Data mining success criteria:
Determine success in technical terms, e.g.– Translate the notion of increase in up-sales to statistics associated with
the confidence, support and lift of association rules
– Build a classification cost model, assigning costs/weights to true/false negatives and positives.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
35
Part II:Web Mining as a Project
Project Management Methods
Models for Data Mining Process Management
Cost Estimation
The CRISP-DM Reference Model
• Waterfall model• Spiral model• RUP• XP• CommonKADS
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
36
Project Management
Why? In order to organize the process of develpoment and to produce a project plan
How?
Establish how the process is going to be develop:
Sequential
Incremental
What
Establish how is the process is splitted into phases and define the tasks to be developed in each step:
RUP
XP
COMMONKADS
Data Mining is also a process
LIFECYCLE MODELS
METHODOLOGY
•Way of making things
• Independent of the process being developed
•Particular tasks
• Detail of tasks to be developed
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
37
Sequential Lifecycle: Waterfall Model
Traditional model for software development processes in the large scale
Sequential steps in which well defined tasks output the final required piece of software.
Each phase is connected with the next one by means of its outputs
Usually highly structured, with a fixed sequence of activities
Drawbacks:
Each task has to be completely finished at the end of it.
Risks are not properly dealt
Data Mining is iterative process
Maybe not appropriate
For Data Mining
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
38
Project ManagementWaterfall
Requirements Analysis
Requirements Analysis
DesignDesign
Implementation and Unit Test
Implementation and Unit Test
System Integration and Testing
System Integration and Testing
Operation and Maintenance
Operation and Maintenance
Integrated System
Implemented Components
Components’ Architecture and Design
Software’s Specification
Too structured
Feedback
Risks
Data Mining?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
39
Part II:Web Mining as a Project
Project Management Methods
Models for Data Mining Process Management
Cost Estimation
The CRISP-DM Reference Model Waterfall model• Spiral model• RUP• XP• CommonKADS
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
40
Progressive Lifecycle: Spiral Model
Iterative risk-driven process model generator:
cyclic approach for incrementally growing a system's degree of definition and implementation while decreasing its degree of risk.
is a set of anchor point milestones for ensuring stakeholder commitment to feasible and mutually satisfactory system solutions.
Improvement over the traditional waterfall model as it is able to deal with changes witouht affecting the previous outputs of the process
Incorporates quality goals and risks management
Dealing with new requirements is less costly than in the case of waterfall model
CRISP-DM appropriate
Data Mining process features
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
41
Spiral Model
Simulations, models, benchmarks
Operations Concepts
Requirement validation
Design validation & verification
Product design
Requirements Detailed
design
Unit test
Code
Integration testAcceptanc
e testImplementation
Prototype 1
Prototype 2
Prototype 3
Final Prototype
Risk analysis
Risk analysis
Risk analysis
Risk analysis
Requirements plan
Life cycle plan
Development plan
Integration and test plan
Review
PlanningPlan next phase
Development and validation Develop, verify next level product
Objective SettingDetermine objectives, alternatives, constraints
Risk assessment and reductionEvaluate alternatives, identify and resolve risk
Life cycle apropriate for Data Mining
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
42
Part II:Web Mining as a Project
Project Management Methods
Models for Data Mining Process Management
Cost Estimation
The CRISP-DM Reference Model Waterfall model Spiral model• RUP• XP• CommonKADS
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
43
Project Management Methodologies
Software Methodologies:
RUP
XP
COMMONKADS
…
Data Mining Methodologies:
CRISP-DM
CRM-Catalyst
Knowledge Intensive
Not Knowledge
Intensive
•Not Real Methodology
•Model Process
Include Cost Estimation
Cost Estimation is needed
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
44
RUP®:Rational Unified Process
Complete software-development process framework that comes with several out-of-the-box instances.
Architecture centered model that in a iterative and incremental way makes it possible to develop a software product of any scale or size.
Outputs of each iteration can be components, modules, of any software part that will be integrated in the next iteration in order to fulfil the final product at the end.
Appropriate for Web Mining projects in which:
– Requirements change as a consequence of already obtained patterns
– Outputs (patterns) of each step integrate the global solution
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
45
RUP®:Rational Unified Process
The phases of a RUP-based project are:
Inception
Elaboration
Construction
Transition.
Each phase contains one or more iterations.
In each iteration, you expend effort in various amounts to each of several disciplines (or workflows) such as Requirements, Analysis and Design, Testing, and so forth.
The key driver for RUP is risk mitigation.
Can be integrated with CRISP-DM
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
46
RUP flow through a typical iteration
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
47
RUP in the enterprise
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
48
Part II:Web Mining as a Project
Project Management Methods
Models for Data Mining Process Management
Cost Estimation
The CRISP-DM Reference Model Waterfall model Spiral model RUP• XP• CommonKADS
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
49
Methods for Project Management: XP
XP is a lightweight code-centric process for small projects.
Kent Beck: and came to the software industry’s attention on the C3 payroll project at Chrysler Corporation around 1997.
Like the RUP, it is based upon iterations that embody several practices such as Small Releases, Simple Design, Testing, and Continuous Integration.
The required speed in the software generation makes a quick contact with the product possible. Consequently changes are possible with a low degree of risk in the final product.
For a small project team working in a relatively high-trust environment where the user is an integral part of the team XP can work very well.
Small Data Mining Projects
Expert Development Team
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
50
A typical XP lifecycle
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
51
XP vs RUP for Web Mining Projects
Different philosophies:
RUP is a framework of process components, methods, and techniques that you can apply to any specific software project; we expect the user to specialize RUP.
XP, on the other hand, is a more constrained process that needs additions to make it fit a complete development project.
These differences explain the perception of community:
the big system people see RUP as the answer to their problems
the small system community sees XP as the solution to their problems.
The philosophy and way of acting can be used in web mining project processes:
The size of the project has to establish the way of acting
Depending on the expertise of the development team
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
52
Part II:Web Mining as a Project
Project Management Methods
Models for Data Mining Process Management
Cost Estimation
The CRISP-DM Reference Model Waterfall model Spiral model RUP XP• CommonKADS
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
53
CommonKADSfor knowledge management projects
CommonKADS:
is a methodology for the design and implementation of knowledge management projects.
is designed as a process, consisting of tasks that must be executed and milestones, where decisions must be taken.
encompasses many aspects of project management, including the specification of objectives and milestones, involvement of key personnel, feasibility testing and budgeting
Web mining:
is intended to discover knowledge from web-related data
builds upon background knowledge, owned by key personnel
should be designed as project with goals, milestones and budget
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
54
CommonKADS:The role of knowledge systems
The typical role of a Knowledge System should be that of an intelligent assistant.
Automation is not an appropriate objective.
The tasks under observation are usually too complex for modeling, let alone automation.
Process improvement is a more appropriate objective.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
55
CommonKADS:The knowledge modeling process
Step 1: Scoping and Feasibility Study
Tool: Organization model of CommonKADS (OM)
Step 2: Impact and Improvement Study
Tools: Task Model (TM) and Agent Model (AM)intended to zoom-in/refine the OM
whereby:
Each study consists of– an analysis part
– a "constructive", decision-making part
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
56
CommonKADS:Analysis and Synthesis in Step 1
Scoping and Feasibility Study:
Step 1a: Analysis
Identify problem/opportunity areas and potential solutions
Put them into a wider organizational perspective
Step 1b: Synthesis
Decide about economic, technical and project feasibility
Select the most promising focus area and target solution
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
57
CommonKADS:Analysis and Synthesis in Step 2
Impact and Improvement Study:
Step 2a: Analysis
Study interrelationships between the task, the agents involved, and the use of knowledge for successful performance
Identify improvements that may be achieved
Step 2b: Synthesis
Decide about organizational measures and task changes
Ensure organizational acceptance and integration of a knowledge system solution
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
59
CommonKADS:Overall process of business analysis
OM-1 :problemssolutionscontext
OM-2 :description
of organization focus area
OM-3 :process
breakdown
OM-4 :knowledge
assets
OM-5 :Judge
Feasibility (Decision Document)
TM-1 :
Task analysis
TM-2 :
Knowledge item analysis
AM-1 :
Agent model
REFINE
REFINE
START
STOPINTEGRATE
INTEGRATE
[If feasible]
[If NOT feasible]
Integrate, comparing both the old and the new
situations
OTA-1 :Assets, Impacts
and Changes(Decision Document)
STOP
Context Analysis Ready
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
60
CommonKADS:Worksheet OM-1
Organization Model
Problems and Opportunities Worksheet OM-1
Problems and opportunities
Make a shortlist of perceived problems and opportunities, based on interviews, brainstorm and visioning meetings, discussions with managers, etc.
Organizational context
Indicate in a concise manner key features of the wider organizational context, so as to put the listed opportunities and problems into proper perspective. Important features to consider are:
1. Mission, vision, goals of the organization
2. Important external factors the organization has to deal with
3. Strategy of the organization
4. Its value chain and the major value driversSolutions List possible solutions for the perceived problems and
opportunities, as suggested by the interviews and discussions held, and the above features of the organizational context
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
61
Organization Model
Problems &
Opportunities
GeneralContext
(Mission,Strategy,
Environment,CSF's,...)
PotentialSolutions
OM-1 OM-2
OrganizationFocus AreaDescription:
Structure
Process
People
Culture & Power
Resources
Knowledge
OM-3 OM-4
ProcessBreakdown
KnowledgeAssets
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
62
CommonKADS:Overall process of business analysis
OM-1 :problemssolutionscontext
OM-2 :description
of organization focus area
OM-3 :process
breakdown
OM-4 :knowledge
assets
OM-5 :Judge
Feasibility (Decision Document)
TM-1 :
Task analysis
TM-2 :
Knowledge item analysis
AM-1 :
Agent model
REFINE
REFINE
START
STOPINTEGRATE
INTEGRATE
[If feasible]
[If NOT feasible]
Integrate, comparing both the old and the new
situations
OTA-1 :Assets, Impacts
and Changes(Decision Document)
STOP
Context Analysis Ready
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
63
CommonKADS:OM-5 for decision taking (1)
Organization model
Checklist for Feasibility Decision Document: Worksheet OM-5
Business
feasibility For a given problem/opportunity area and a suggestedsolution, the following question have to be answered:
1.What are the expected benefits for the organization form the considered solution? Both tangible economic and intangible business benefits should be identified here.
2.How large is this expected added value?3.What are the expected costs for the considered
solution?4.How does this compare to possible alternative
solutions?5.Are organizational changes required?6.To what extent are economic and business risks
and uncertainties involved regarding the considered solution direction?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
64
CommonKADS:OM-5 for decision taking (2)
Organization model
Checklist for Feasibility Decision Document: Worksheet OM-5
Technical feasibility
For a given problem/opportunity area and a suggested solution,the following questions have to be answered:1. How complex, in terms of knowledge stored and reasoning
processes to carried out, is the task to be performed by the considered knowledge-system solution? Are state-of-the-art methods and techniques available and adequate?
2. Are there critical aspects involved, relating to time, quality, needed resources, or otherwise? If so, how to go about them?
3. Is it clear what the success measures are and how to test for validity, quality, and satisfactory performance?
4. How complex is the required interaction with end users (user interfaces)? Are state-of-the-art methods and techniques available an adequate?
5. How complex is the interaction with other information systems and possible other resources (interoperability, systems integration)? Are state- of-the-art methods and techniques available an adequate?
6. Are there further technical risks and uncertainties?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
65
CommonKADS:Overall process of business analysis
OM-1 :problemssolutionscontext
OM-2 :description
of organization focus area
OM-3 :process
breakdown
OM-4 :knowledge
assets
OM-5 :Judge
Feasibility (Decision Document)
TM-1 :
Task analysis
TM-2 :
Knowledge item analysis
AM-1 :
Agent model
REFINE
REFINE
START
STOPINTEGRATE
INTEGRATE
[If feasible]
[If NOT feasible]
Integrate, comparing both the old and the new
situations
OTA-1 :Assets, Impacts
and Changes(Decision Document)
STOP
Context Analysis Ready
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
66
CommonKADS:OTA-1 for decision taking (1)
Org. Task, Agent Models
Worksheet OTA-1: Checklist for Impact and Improvement Decision Document
Impacts and Changes in Organization
Describe which impacts and changes the considered knowledge system solution brings with respect to the organization, by comparing the differences between the organization model (worksheet OM-2) in the current situation, and how it will look in the future. This has to be done for all (variant) components in a global fashion (specific aspects for individual tasks or staff members are dealt with below).1. Structure2. Process3. Resources4. People5. Knowledge6. Culture & power
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
67
CommonKADS:OTA-1 for decision taking (3)
Org. Task, Agent Models
Worksheet OTA-1: Checklist for Impact and Improvement Decision Document
Attitudes and Commitments
Consider how the individual actors and stakeholders involved will react to the suggested changes, and whether there will be a sufficient basis to successfully carry through these changes
Proposed actions
This is the part of the impacts and improvements decision document that is directly subject to managerial commitment and decision-making. It weights and integrates the previous analysis results into recommended concrete steps for action:1. Improvements: What are the recommended changes, with respect to the organization, as well as individual tasks, staff members, and systems?2. Accompanying measures: What supporting measures are to be taken to facilitate these changes (e.g., training, facilities)3. What further project action is recommended with respect to the undertaken knowledge system solution?4. Expected results, costs, benefits: reconsider items from the earlier feasibility decision document5. If circumstances inside or outside the organization change, under what conditions is it wise to reconsider the proposed decisions?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
68
Part II:Web Mining as a Project
Project Management Methods
Models for Data Mining Process Management
Cost Estimation
The CRISP-DM Reference Model
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
69
Approaches
Vendor independent:
CRISP-DM
Based on the commercial tools:
CAT’s
SEMMA
CRM Methodology:
CRM Catalyst
Model Process
Not Real Methodology
Based on Crisp-DM
Globlal CRM process
Does not concentrate on Data Mining step
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
70
Web Mining as a project: CATs
CATs :Clementine Application Templates : [CATs]
Specific libraries of best practices that provide inmediate value right out of the box
Following the CRISP-DM standard. Every CAT stream is assigned to a CRISP-DM phase
They provide long term value as they can always be used with a new data set for new insight in other projects.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
71
What is a CAT?[CATs]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
72
Examples of CATs[CATs]
The best practice templates, available as an add-on module to Clementine, include:
Telco CAT - improve retention and cross-selling efforts by leveraging our knowledge of the best data mining practices for telecommunications
CRM CAT - understand and predict customer migration between segments, so you understand how to move customers into more profitable segments and reduce the risk of attrition
Microarray CAT - accelerate biological discoveries, find genes for therapeutic targets, classify diseases based on genes and predict outcomes and find or refine biological classes
Fraud CAT - predict and detect instances of fraud in financial transactions, claims, tax returns …
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
73
CATs for Web Mining[CATs]
Web CAT — help to discover clickstream sequences, access and merge Web log data, make recommendations based on visit profiles .
Includes modules for:
Cleaning and sessionizing Web logs
Enriching and Combining Web logs
Creating visit records and modeling visits
Creating visitor records and modeling visitors
Discovering product associations
Performing sequence analysis
Augmenting Web logs
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
74
Web Mining as a project: SEMMA(1)
SEMMA (Sample, Explore, Modify, Model, Assess):[SEMMA]
Is not a data mining methodology
Rather a logical organization of the functional tool set of SAS Enterprise Miner for carrying out the core tasks of data mining.
Enterprise Miner can be used as part of any iterative data mining methodology adopted by the client.
Naturally steps such as formulating a well defined business or research problem and assembling quality representative data sources are critical to the overall success of any data mining project.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
75
Web Mining as a project: SEMMA(2)
SEMMA is focused on the model development aspects of data mining:[SEMMA]
Sample the data to extract a portion of a large data set big enough to contein significant information, yet small to manipulate quickly.
Explore the data by searching for anticipated trends and anomalies in order to gain understanding and ideas.
Modify the data by creating selecting and transforming the variables to focus the model selection problem.
Model the data allowing the software to search automatically for a combination of data that reliably predicts a desired outcome. Modelling techniques include neural networks, tree-clasiffiers, statistical models, etc.
Assess the data by evaluating the usefulness and reliability of the findings from the data mining process and estimate how well it performs.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
76
Web Mining as a project: SEMMA(3)
Iterative process:
by assessing the results gained from each stage of the SEMMA process, you can determine how to model new questions raised by the previous results, and thus proceed back to the exploration phase for additional refinement of the data.
Deployment:– once the champion model is developed, it then needs to be
deployed.
– It is the final phase in which the ROI from the mining process is realized.
Progressive Lifecycle model
Evaluation?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
77
Methods for Project Management:CRM Catalyst(1)
Developed jointly by CustomISe, MACS and SalesPathways. Together they have formed the Catalyst Foundation http://www.crmmethodology.com/
Motivations:
CRM projects are difficult to execute successfully because of the wide range of factors influencing their success. So it can take a long time to make CRM work properly for an organisation.
Solution: CRM Catalyst.
Methodology acts as a catalyst for CRM projects enabling them to achieve their objectives more reliably and in less time.
It gives a project life cycle with a set of defined phases broken down into steps with clearly stated inputs and outputs.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
78
Data Mining Project Management: CRM Catalyst(1)
The five mayor phases are:
Discovery: Establishing the business goals for CRM
Orientation. Defining necessary system and organisational (specific technical solutions) changes to meet the goals. This leads to a definition of top-level system requirements.
Navigation. The CRM system requirements are defined more precisely, the system is scoped, system and vendor assessment criteria are defined and a system is selected and contracted.
Implementation. Planning and managing the CRM project. It is during this phase that the system is built and put into use.
Post implementation. Monitoring performance and continuous improvement since CRM project never ends because CRM must constantly evolve to keep pace with the changing business and its environment.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
79
Methods for Project Management: CRM Catalyst(3)
Implementation requires
Data Mining development process
Implementation is Knowledge intensive
The resutls are obtained in a progressive way
Progressive Lifecycle Model
In some steps Knowledge Intensive Methdology could be appropriate
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
80
Part II:Web Mining as a Project
Project Management Methods
Models for Data Mining Process Management
Cost Estimation
The CRISP-DM Reference Model
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
81
Estimation Process
Determine Objectives. Who needs what data for what purpose(s)
Gather Data. Focus Should Be Given To ‘Hard’ Data
Well-Defined Requirements
Available Resources
Analyze Data using a variety of methods
Re-estimate Costs throughout the project
Effective Monitoring
Refine and Make Changes As Necessary
Compare end Costs with Estimated Costs.
How It’s Done: Models,
Methods,
Tools
Cost Estimation is independent of the domain
Tecniques depend on the process to estimate its cost
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
82
What is needed to cost estimation?
A Combination of Models, Methods, and Tools
Gathering/Improving of Historical Data
Well-defined and Well-controlled Software Development Processes
Better Managing of Requirements
Experienced Project Managers, Estimators, and Team Members
Everything can be translated to Data Mining Cost estimation
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
83
Software cost estimation methods[JPHGL]
Algorithmic methods. Designed to provide some mathematical equations to perform software estimation. Models: COCOMO & COCOMO II, Putnam, ESTIMACS and SPQR/20.
Estimating by analogy. comparing the proposed project to previously completed similar project where the project development information is known.
Expert judgment method. Technique: Delphi technique, a group consensus technique. Empirical Subjective
Top-down method. A cost estimation is derived from the global properties of the software project, and then the project is partitioned into various low-level components
Bottom-up method. The cost of each software components is estimated and then combines the results to arrive at an estimated cost of overall project.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
84
How to select the estimation method
No one method is necessarily better or worse than the other
Use several techniques or cost models, compare the results.
Document the assumptions made when preparing the project plan.
Monitor the project to detect when assumptions that turn out to be wrong jeopardize the accuracy of the estimate.
Maintaining a historical database.
Can be translated to Data Mining estimation Process
Problem: Historical Project database
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
85
COCOMO Model
COCOMO (COnstructive COst MOdel):
Model to estimate the development cost and schedule of a software project
Introduced by Barry Boehm of USC-CSE in 1981.
Primarily based on the software development practices prior to 1980s, (i.e. based on the Waterfall model)
Effort equation is the basis of the COCOMO II model.
The nominal effort equation of a project of a given size is given by the equation: [ PM(nominal) = A * (Size)B ]
PM(nominal) is the nominal effort in person months
A is the multiplicative effect of cost drivers
B is the constant representing the affect of scale factors
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
86
COCOMO II
COCOMO II:
Improvement of the original COCOMO model.
The Scale Drivers in COCOMO II replace the development modes of COCOMO 81
Cost drivers revised
Cost Drivers.
– Are used in the model to adjust the nominal effort in the software project.
– Multiplicative factors required to determine the effort required to complete the software project.
– Ratings range from VL, L, N, H, VH, EH.
– Model has 17 cost drivers divided into 4 categories: Product, Computer, Personnel and Project.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
87
SW Cost Estimation Conclusion
Although lots of research has been done in the area of SW Cost Estimation, it’s not an exact science yet (and probably never will be).
What about Data Mining Cost estimation ?
No tool
No technique
No enough historical cases
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
88
Data Mining Estimation Methods
Kleinberg (1999) [KT99]
Macroeconomic viewpoint
– Games theory
– Combinatory optimization problems
Masand (1996) [MP96]
Business Model: customer value
Domingos (1998) [D98]
Decision process to follow with a project in “Machine Learning” aplications
10 )1(t
tt
r
CCNPV
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
89
Data Mining Estimation Model
Establishing a parametrical estimation model for Data Mining (Marban’03)
DMCOMO(Data Mining COst MOdel)
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
90
Data Mining Cost Estimation
Main factors in a Data Mining project
Data Sources (number, kind, nature, …)
Data minig problem to be solved (descriptive, predictive, …)
Development platform
Available tools
Expertise of the development team
Drivers Data Drivers Model Drivers Platform Drivers
Tools and techniques Drivers Project Drivers People Drivers
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
91
End of Part II
Questions thus far ?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
92
Further readingson Part II (1)
[B00] Kent Beck, Extreme Programming Explained, Addison-Wesley, 2000
[BF01] Kent Beck, Martin Fowler, Planning Extreme Programming, Addison-Wesley, 2001
[Betal.00] Barry W. Boehm et al, Software Cost Estimation with COCOMO II, Prentice Hall PTR, 2000
[D98] P. Domingos. How to Get a Free Lunch: A Simple Cost Model for Machine Learning Applications. Proceedings of the AAAI-98/ICML-98 Workshop on the Methodology of Applying Machine Learning (pp. 1-7), 1998. Madison, WI: AAAI Press.
[JAH01] Ron Jeffries, Ann Anderson, Chet Hendrickson, Extreme Programming Installed, Addison-Wesley, 2001
[JPGL] Jerrall J. Prakash, Harprit S. Grewal,Leo Chen. Software Cost Estimation
[K00] Philippe Kruchten, The Rational Unified Process, An Introduction, Second Edition, Addison-Wesley,2000
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
93
Further Readingson Part II (2)
[KT99] J. Kleinberg, E. Tardos. Approximation Algorithms for Classification Problems with Pairwise Relationships: Metric Labeling and Markov Random Fields.” Proc. 40th IEEE Symposium on Foundations of Computer Science (1999), 14-23
[MP96] Masand, B., and Piatetsky-Shapiro, A comparison of approaches for maximizing business payoff of prediction models. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 195--201. Portland, OR: AAAI Press. New York, NY: Wiley. G. 1996. [RM01] Robert C. Martin, James W. Newkirk, Extreme Programming in Practice, Addison-Wesley, 2001
[Setal.02] Schreiber et al., Knowledge Engineering and Management – The CommonKADS Methodology, MIT Press, 2002
[SM01] Giancarlo Succi, Michele Marchesi, Extreme Programming Examined, Addison-Wesley, 2001
[VK94] Vidger, M.R. and Kark, A.W. Software Cost Estimation and Control. (1994). National Research Council Canada.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
94
Further Readingson Part II (3)
CATs of Clementine:
Clementine Application Templates: Get reliable results from Clementine —faster — with built-in best practices
http://www.spss.com/PDFs/CLMCATINS-0802.pdf.
CRISP-DM:
http://www.crisp-dm.org
CRM Catalyst
CRM Catalyst Methodology Description
http://www.crmmethodology.com/
SEMMA of SAS
http://www.sas.com/technologies/analytics/datamining/miner/semma.html
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
95
Further Readingson Part II (4)
Data Mining standards
[AG03] ECML/PKDD-2003 Tutorial Knowledge Discovery Standards by Sarab Anand, Marko Grobelnik, Dietrich Wettschereck
COCOMO
Software Cost Estimation, Hong, Danfeng, The University of Calgary, Canada, 1998. http://pages.cpsc.ucalgary.ca/~hongd/SENG/621/report2.html
USC-CSE, COCOMO, 2002. http://sunset.usc.edu/research/COCOMOII/
COCOMO II Model Definition Manual. Abts, Chris, Brad Clark, Sunita Chulani, Ellis Harowitz, Ray Madachy, Don Reifer, Rick Shelby, Bert Steece. http://my.raex.com/FC/B1/phess/coco/Modelman.pdf
Software Cost Estimation in 2002. Jones, Capers (2002). STSC CrossTalk. http://www.stsc.hill.af.mil/crosstalk/2002/06/jones.html
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
96
Agenda
Part II: Web Mining as a Project
Part III: Evaluation Methods and Measures
Part IV: Case Study
Part V: Infrastructure for Web Mining Deployment
Part VI: Outlook
Part I: Foundations and Principles of Web Mining
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
97
Evaluation and data mining algorithms
Data mining algorithms are intended to extract non-trivial, actionable patterns from the data.
Evaluation is an essential part of algorithm design:
How good is the algorithm in finding non-trivial patterns?
How good is the algorithm in finding actionable patterns?
How good is the algorithm for the application?– Is it fast enough?
– Is it scalable enough? WRT data size? WRT feature space size?
– Is it robust enough?
Is the algorithm better than other algorithms?
Goal of the evaluation is to help the knowledge discoverer in finding a good (preferably the best) algorithm.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
98
Evaluation and implicit assumptions
Knowledge discovery is a GIGO process:
The quality of the data has a drastic influence on the quality of the patterns.
Robustness towards poor-quality data is an indicator of a good algorithm.However, no algorithm can completely compensate poor data quality.
Important:
A mining algorithm can extract patterns from any data.
implying that:
We should not attempt knowledge discovery upon inappropriate data.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
99
Evaluation in a non-Webmining example (1)
Problem specification:
An online shop wants to identify the characteristic properties of those transactions that involve a stolen credit card.
subject to the following facts:
If the shop permits a transaction with a stolen credit card, the shop will loose money.
If the shop prohibits a transaction with a valid credit card, the shop may loose the customer.
The process of checking if a credit card is stolen costs money.
Value of the transaction V1
Approximation of the revenue by the customer V2
V3
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
100
Evaluation in a non-Webmining example (2)
Assuming that this problem is modelled as a classification problem:
A good pattern is a classifier that minimises the loss of money.
A good algorithm is a classification algorithm that– produces good classifiers,
– satisfies further application-specific criteria, e.g. understandability of the results or robustness against arbitrary skew.
How to build a bad dataset:– Oversample the positives without reporting the oversampling
– Aggregate the transactions at day level
ClassifierClassifierRealityReality
fraudulentfraudulentnot fraudulentnot fraudulent
true pos: -V3true pos: -V3 false neg: -V1false neg: -V1fraudulentfraudulent
not fraudulentnot fraudulent true neg:V1true neg:V1false pos: -V2-V3false pos: -V2-V3
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
101
Evaluation in knowledge discovery
Goal of knowledge discovery is the extraction of non-trivial, actionable patterns from the data.
Evaluation plays a central role in knowledge discovery:
Evaluation towards non-triviality:Do the patterns tell us something new? Something we did not already know?
Evaluation towards actionability:Do the patterns tell us something, for which we can and want to design an action?
presupposing quality in statistical terms:How confident can we be about each pattern?
Goal of the evaluation is to supply the knowledge discoverer with good results.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
102
So, what do we evaluate?
In Web Mining, we evaluate:
The patterns (results) acquired with a web mining algorithm to solve a particular problem
The web mining algorithm in its capability of delivering good patterns to solve the problem
The data sources that deliver the data, upon which the web mining algorithm is applied to solve the problem:
– The Web site and its server
– The data warehouse of the Web site owner
– The external sources used for data enrichment
The environment, including tools and processes, in which the particular problem has emerged
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
103
Part III:Evaluation methods and measures
Application-centric measures associated tothe application objectives
User-centric measures associated withWeb usability
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
104
The object of evaluation: usability
The effectiveness, efficiency, and satisfaction with which specified users achieve specified goals in particular environments.
• Effectiveness: The accuracy and completeness with which specified users can achieve specified goals in particular environments.
• Efficiency: The resources expended in relation to the accuracy and completeness of goals achieved.
• Satisfaction: The comfort and acceptability of the work system to its users and other people affected by its use.
(ISO 9241, after Alan Dix, Janet Finlay, Gregory Abowd, Russell Beale. Human Computer Interaction. Prentice Hall Europe 1998. Cited after http://www.tau-web.de/hci/space/i7.html )
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
105
Usability on the Web
Usability is a special concern on the Web because
“In product design and software design, customers pay first and experience usability later.
On the Web, users experience usability first and pay later.”
[Niel00, pp. 10f.]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
106
The criteria: design objectives
Learnability
The ease with which new users can begin effective interaction and achieve maximal performance.
Flexibility
The multiplicity of ways the user and system exchange information.
Robustness
The level of support provided to the user in determining successful achivement and assessment of goals.
(Dix et al., 1998, cited after http://www.tau-web.de/hci/space/x12.html )
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
107
How can usability be measured? – Data collection methods
Data for usability testing are collected with different methods [Shne98, Jane99]:
Reactive methods– Expert reviews and surveys ask for attitudes /
assessments.
– Usability testing employs experimental methods to investigate behavior and self-reports.
Non-reactive methods– Based on data collection via Web log files
• To assess user behavior
• To simulate expected / measured user behavior [CPCP01]
Continuing assessments to parallel changes!
Issues: cost, practicality, expressiveness of results
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
108
The measures: Examples of usability metrics (from ISO 9241, after [Dix et al., 1998])
Rating scale for error handling
Time spent on correcting errors
Percentage of errors corrected successfully
Error Tolerance
Rating scale for "ease of learning"
Time to learn criterion
Percentage of functions learned
Learnability
Rating scale for satisfaction with "power features"
Relative efficiency compared with an expert user
Number of "power features" used
Appropriate for trained users
Rating scale for satisfaction
Time to complete a task
Percentage of goals achieved
Suitability for the Task
Satisfaction Measures
Efficiency Measures
Effectiveness Measures
Usability Objective
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
109
The measures: Examples of usability metrics (from ISO 9241, after [Dix et al., 1998])
Rating scale for error handling
Time spent on correcting errors
Percentage of errors corrected successfully
Error Tolerance
Rating scale for "ease of learning"
Time to learn criterion
Percentage of functions learned
Learnability
Rating scale for satisfaction with "power features"
Relative efficiency compared with an expert user
Number of "power features" used
Appropriate for trained users
Rating scale for satisfaction
Time to complete a task
Percentage of goals achieved
Suitability for the Task
Satisfaction Measures
Efficiency Measures
Effectiveness Measures
Usability Objective
the users‘ task / intentions Assumptions can be made if there is background knowledge about site and usersusers‘ level of expertise requires (1) target-group specific logins, (2) induction from requested content, or (3) other methods, usually involving reactive data collection Definitions of what there is to learn; measures of what the users learned usually requires methods involving reactive data collectionDefinition of what an error is, or what indicates an error usually requires a detailed knowledge of users‘ tasks and intentions, i.e. reactive data collection
Suitability for the Task
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
110
Design decisions that influence usability
Page design concerns issues like:– Screen real estate, links, graphics+animation, cross-
platform design; content design (writing for hypermedia)
Site design / Information architecture concerns issues like:– Hierarchical / network-like content organization, metaphors
– Navigation
• Where am I? Where have I been? Where can I go?
• Navigation is user-controlled!
– Search engines
For “Top Ten Mistakes in Web Usability” and their development, see [Nielsen 1996, 1999, 2002, 2003]
Further issues: International audiences [KB04], personalized sites
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
111
Principles of successful navigation
Navigation that works should [Flem98, pp. 13f.]
Be easily learned
Remain consistent
Provide feedback
Appear in context
Offer alternatives
Require an economy of action and time
Provide clear visual messages
Use clear and understandable labels
Be appropriate to the site’s purpose
Support users’ goals and behaviors
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
112
Site-specific usability issues: Example I
Criterion: Navigation should “require an economy of action and time.”
Pages that are frequently accessed together should be reachable with one or very few clicks.[KNY00] compared the foll. measures:
page co-occurrence in user paths / support of 2-page itemsets(x axis)
hyperlink distance (y axis; -1 = distance > 5)
Results help to identify
linkage candidates (top right)
redundant links (bottom left)
Action: modify site design
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
113
Site-specific usability issues: Example II
Criterion: Navigation should “support users’ goals and behaviors .”
Search criteria that are popular should be easy to find+use.
[BS00,Ber02] investigated search behavior in an online catalog with support of pages / of sequences as measures:
Search using selection interfaces (clickable map, drop-down menue) was most popular.
Search by location was most popular.
The most efficient search by location (type in city name) was not used much.
Action: modify page design.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
114
An important part of usability: accessibility
"The usability of a product, service, environment or facility by people with the widest range of a capabilities„ (ISO TS 16071, cited after http://www.usability-forum.com/bereiche/accessibility.shtml )
Web Content Accessibility Guidelines 1.0 – W3C Recommendation 5-May-1999
EU Commission adopted the Communication 'eEurope 2002: Accessibility of Public Web Sites and their Content' (Sept. 2001): http://europa.eu.int/information_society/topics/citizens/accessibility/web/wai_2002/text_en.htm
USA: 1998 „Section 508“ (1998)
Japan: WCAG 1.0
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
115
Web Content Accessibility Guidelines 1.0 http://www.w3.org/TR/1999/WAI-WEBCONTENT
„These guidelines explain how to make Web content accessible to people with disabilities. ...
The primary goal of these guidelines is to promote accessibility. However, following them will also
make Web content more available to all users,
whatever user agent they are using (e.g., desktop browser, voice browser, mobile phone, automobile-based personal computer, etc.)
or constraints they may be operating under (e.g., noisy surroundings, under- or over-illuminated rooms, in a hands-free environment, etc.).
Following these guidelines will also help people find information on the Web more quickly.
These guidelines do not discourage content developers from using images, video, etc., but rather explain how to make multimedia content more accessible to a wide audience.“
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
116
Web Content Accessibility Guidelines 1.0 http://www.w3.org/TR/1999/WAI-WEBCONTENT
1.Provide equivalent alternatives to auditory and visual content.
2.Don't rely on color alone.
3.Use markup and style sheets and do so properly.
4.Clarify natural language usage
5.Create tables that transform gracefully.
6.Ensure that pages featuring new technologies transform gracefully.
7.Ensure user control of time-sensitive content changes.
8.Ensure direct accessibility of embedded user interfaces.
9.Design for device-independence.
10.Use interim solutions.
11.Use W3C technologies and guidelines.
12.Provide context and orientation information.
13.Provide clear navigation mechanisms.
14.Ensure that documents are clear and simple.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
117
Mining for usability assessment: Caveats for interpretation
Care should be taken when interpreting Web log data as indicative of users’ experience with the site:
+ Users act in a natural environment, and in a natural way.
– little or no control of variables that may influence behavior:
User intentions and intervening factors (work environment, …)
Context (e.g., online + offline competition, market developments)
Often, several characteristics of the site are changed simultaneously, e.g., product offerings and page design.
Causality is hard to assess!
Use mining as an exploratory method, to be complemented by other methods that allow for more control.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
118
Part III:Evaluation methods and measures
Application-centric measures associated tothe application objectives
User-centric measures associated withWeb usability
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
119
The notion of"good" Web site
The objective of a Web site is NOT
the maximisation of the number of visitors accessing it
the prolongation of the visitors' stay time
the inspection of a maximum number of pages/items/products
the satisfaction of the visitors
In general, the (abstract) objective of a Web site is
the contribution to the business objectives of its owner
with respect to the target groups accessing it
in a cost-effective way.
The "success" of a Web site is a measure of the degree, in which the site satisfies its objective.
... but this often a prerequisite for site success.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
120
What does Success mean?
Before talking of success:
Why does the site exist?
Why should someone visit it?
Why should someone return to it?
After answering these questions:
Does the site satisfy its owner?
Does the site satisfy its users?
ALL the users?
Business goals
* Value creation
Sustainable value
User-centric measures
Application-centric measures
User types
* Value creation: [Kuhl96]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
121
Business goals of a site (I)
1. Sale of products/services on-lineAmazon sells books (etc) online.The site should help the users find the most suitable books for their needs, identify more related products of interest and, finally purchase them in a secure and intuitive way.
Personalisation
Cross/Up-SellingSite design
2. Marketing for products/services to be acquired off-line
Insurances, banks, application service providers etc: providers of services based on a long-term relationship with the customer do not sell on-line to unknown users.The site should demonstrate to the users the quality of the product/service and the trustworthiness of its owner and initiate an off-line contact.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
122
Business goals of a site (II)
3. Reduction of internal costsSome banks offer online banking. Some insurances support case registration online. This reduces the need for human-preprocessing and the likelihood of typing errors.The site should help the users locate and fill the right forms and submit them in a secure and intuitive way.
4. Information disseminationGoogle, IMDB etc offer information by means of a search engine over a voluminous archive of high quality data.The site should help the users find what they search for, ensure them upon the quality (precision and completeness) of the information provided, and also motivate them to access the products/services of the sponsors.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
123
From business goals toapplication-centric measures
Business venues evaluate their achievements on the basis of industry- and application-specific measures.Some of these measures have been adapted for Web applications:
Marketing, sales & after-sales support– e-Marketing measures for online sales of products/services
– e-Marketing measures for commodities that are sold offline
Operations and process optimisation
Security
whereby some e-measures are adjusted to other business goals.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
124
A process overviewfor sales of products/services
The interaction of the potential customer with the company goes through three phases:
InformationAcquisition
InformationAcquisition
Negotiation&
Transaction
Negotiation&
Transaction
After Sales
Support
After Sales
Support
The ratio of persons going from one phase to the next is the basis for a set of positive and negative measures:
ContactConversion
Retention
AbandonmentAttrition Churn
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
125
The "Customer Life Cycle Funnel"of Cutler & Sterne [CS00]
Cutler & Sterne introduce the notion of "customer life cycle funnel" across the phases of acquisition, persuasion and conversion [CS00]:
Each phase encompasses different measures.
Ineffective measures of one phase lead to bottlenecks, which limit the effectiveness in subsequent phases.
In particular:
1. Ineffective measures in the acquisition phase:Untargeted promotions that attract the wrong people.
2. Ineffective measures in the persuasion phase:The targeting during acquisition is good but the persuasion is ineffective.
3. Ineffective measures in the conversion phase:Good targeting and good persuasion but poor conversion.
© http://www.targeting.com/emetrics.pdf© http://www.targeting.com/emetrics.pdf
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
126
From the Funnel to the Hourglass [Ste03]
The complete interaction path between person and company encompasses Reach, Acquisition, Conversion and Retention[CS00].
After the conversion phase,the retention of loyal customerspays up, because loyal customerscan become promoters of thecompany.
How to measure Loyalty ?
Recency of purchases
Frequency of purchases
Recommendation propensity
Product improvement synergy
Acquisition
Persuasion
Conversion
Interaction
Participation
Promotion
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
127
Ten Supplementary Analyses:The proposal of BlueMartini
• Foundation and Data Audit Bot analysis Univariate data analysis
• Operational Session timeout analysis Form error analysis Micro-conversions analysis
• Tactical Search analysis Real-estate usage analysis Market basket analysis
• Strategic Analysis of migratory customers Geographical analysis
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
128
What does Success mean?
Before talking of success:
Why does the site exist?
Why should someone visit it?
Why should someone return to it?
After answering these questions:
Does the site satisfy its owner?
Does the site satisfy its users?
ALL the users?
Business goals
Value creation
Sustainable value
User-centric measures
Application-centric measures
User types
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
129
From the visitor to the loyal customer:The model of Berthon et al [BPW96]
Early realisation of the marketing measures for Web sites [BPW96]:
Conversion efficiency := Customers / Active investigators
Retention efficiency := Loyal Customers / Customers
whereby: Active investigators are visitors that stay long in the site. Customers are visitors that buy something. Loyal customers are customers that come to buy again.
Short-time visitors
Sit
e u
sers
Active InvestigatorsCustomers
Loyal Customers
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
130
From the visitor to the loyal customer:The micro-conversion rates of Lee et al [LPSH01]
The model of Lee et al [LPSH01] distinguishes among four steps until the purchase of a product:
Product impression
Click through
Basket placement
Product purchase
and introduces micro-conversion rates for them:
look-to-click rate: click throughs / product impressions
click-to-basket rate: basket placements / click throughs
basket-to-buy rate: product purchases / basket placements
look-to-buy rate: product purchases / product impressions
A session is a set of click operations performed during one visit.Clicks leading to product impressions and those corresponding to basket placements and purchases are uniquely identified as such.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
131
The role of the sitein application-centric measures
Dreze & Zufryden [DZ97] define site efficiency in terms of: Number of page requests Duration of site visits (sessions)
Sullivan [Sul97] defines site quality in terms of: Response time Number of supported navigation modi Discoverability of a page:
Discovering that a certain page exists Accessibility of a page:
Finding the page, after discovering that it exists Pages per visitor Visitors per page
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
132
Site-oriented measuresand business goals
Site-oriented measures are
statistics on the traffic of the Web site
values based on the characteristics of the site from a designer's perspective
trying to capture the user perception of the site, without asking the user.
They do not consider the owner's intentions, i.e. the business goals of the site.
Combination ofpurely customer-oriented andsite-oriented measures
Combination ofpurely customer-oriented andsite-oriented measures
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
133
The e-metrics modelof Cutler & Sterne [CS00]
The e-metrics model of Cutler & Sterne [CS00] is designed to
compute values for customer-oriented measures, by
allowing for an application-dependent definition of concepts
– customer
– conversion
– loyalty
– customer lifetime value
and by
associating these concepts with site-oriented measures
upon regions of the site
with some emphasis on online merchandising.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
134
The e-metrics modelof Cutler & Sterne [CS00] (cntd.)
The e-metrics model of [CS00] encompasses:
Site-centric measures for regions of a site, including:
–
– Slipperiness := Stickiness
–
"Desirable value ranges" for each measure, depending on the purpose/objective of the region:
– A region used during information acquisition should be sticky.
– The pages accessed during the negotiation and transaction phase should be slippery.
Total time spent in the region
Number of visitors in the regionStickiness:=
Avg num of visited pages in the region
Number of pages in the regionFocus:=
How is a region defined ?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
135
Conversion metricsinside the site [SP01]
The model of [SP01] analyses conversion at page/concept level: Active investigator is a user that invokes an action page Customer is an active investigator that invokes a target page
whereby: Target page := any page corresponding to the fullfillment of
the site's objectives– purchase of a product– registration to a service
Action page := any page that must be visited before invoking a target page
– product impression– catalog search
so that rates like customer conversion and click-to-buy can be computed at the level of individual target/action pages or page concepts.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
136
Conversion metricsfor multi-channel customer conversion
[TB03,TBG03]
For a multi-channel retailer, customer conversion is composed of conversion associated with online purchases conversion effected through the acquisition of information
about non-online purchase opportunities,e.g. locations of brick-and-mortar stores
The information, pages, services associated with online vs offline purchase are mapped into page concepts.
1. A session is modelled as a vector in the feature space of the page concepts.
2. The concept value in the vector space can be– dichotomised: 0/1– weighted: number of visits
3. Conversion rate is defined across paths from a concept A to a concept B.
More in Part IV...More in Part IV...
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
137
What does Success mean?
Before talking of success:
Why does the site exist?
Why should someone visit it?
Why should someone return to it?
After answering these questions:
Does the site satisfy its owner?
Does the site satisfy its users?
ALL the users?
Business goals
Value creation
Sustainable value
User-centric measures
Application-centric measures
User types
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
138
User Segmentation
Truisms:
A site owner does not welcome all users equally.
A site cannot satisfy all users accessing it.
Hence, sites
are designed for some types of users
serve different user types to different degrees
User types are the result of:
User segmentation according to criteria of the site owner
User segmentation on the basis of personal characteristics
User segmentation with respect to recorded behaviour
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
139
For each visitor:(1) assign her to the right segment asap(2) motivate her to move to a segment of higher revenue
For each visitor:(1) assign her to the right segment asap(2) motivate her to move to a segment of higher revenue
User SegmentationIn Predefined Business Segments
A company may partition its customers on the basis of
the revenue it obtains or expects from them the (cost of) services it must offer them to obtain the
revenue
There are different segmentation schemes, based on– the characteristics of the customers– the company portfolio
and producing a set of predefined classes.For a Web application this means:
2. Association rules for cross selling & up selling
3. Recommendations & Personalisation
1. Classification
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
140
For each visitor:(1) assign her to the right segment asap(2) make suggestions based on the contents of the segment
For each visitor:(1) assign her to the right segment asap(2) make suggestions based on the contents of the segment
User SegmentationIn Unknown Segments
Web site visitors can be grouped on the basis of their interests, characteristics and navigational behaviour without assuming predefined groups.
There is much research on user groupingbased on
– the properties and contents of the objects being visited– the declared or otherwise known characteristics of the visitor– (the order of the requests)
For a Web application this means:
2. Recommendations & Personalisation
1. Clustering
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
141
User Segmentationon navigational behaviour
Web site visitors exhibit different types of navigational behaviour.
Model I (simplistic):Some users navigate across links. Others prefer a search engine.
Model II [FGL+00]:
based on criteria like active time spent on-line and per page, pages and domains accessed etc.
Model III [Moe] for merchandising sites:
based on criteria like purchase intention, time spent on the site,number of searches initiated, types of pages visited etc.
SimplifiersSimplifiers SurfersSurfers BargainersBargainersConnectorsConnectors RoutinersRoutinersSportstersSportsters
DirectbuyingDirectbuying
Knowledgebuilding
Knowledgebuilding
Search/Deliberation
Search/Deliberation
HedonicbrowsingHedonicbrowsing
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
142
Placing Success of a site intothe business context
Before talking of success:
Why does the site exist?
Why should someone visit it?
Why should someone return to it?
After answering these questions:
Does the site satisfy its owner?
Does the site satisfy its users?
ALL the users?
Business goals
Value creation
Sustainable value
User-centric measures
Application-centric measures
User types
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
144
Associating Success toCRM-goals
Customer wishes, expectations and characteristics play a central role for all activities of an institution.
Customer Relationships (and their Management) take a prominent position in the specification of the institution's strategy.
The customer perspective must be incorporated into the institution's strategy and the actions implementing it.
The customer perspective must be incorporated into any evaluation measures associated with the success of the institution's strategy.
Web-sites and applications, being important channels of interaction with the customers, should be evaluated on their success.
Strategy
Evaluation of Actions
Actions
Evaluation of Web-Applications
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
145
Evaluation instrumentsfor strategic management:
The Balanced Scorecard [KN92]
The BSC is a well-established instrument for strategic management.It takes a holistic viewof the organisation,considering fourperspectives:
Vision &Strategy
Customer
"To achieveour vision,how shouldwe appearto ourcustomers?"
Ob
ject
ives
Measu
res
Targ
ets
Init
iati
ves Internal
businessprocesses"To satisfy ourshareholdersand customers,what businessprocesses mustwe excel at?"
Ob
ject
ives
Measu
res
Targ
ets
Init
iati
ves
Financial"To succeedfinancially,how shouldwe appearto ourshareholders?"
Ob
ject
ives
Measu
res
Targ
ets
Init
iati
ves
Learning& Growth"To achieveour vision, howwill we sustainour ability tochange andimprove?"
Ob
ject
ives
Measu
res
Targ
ets
Init
iati
ves
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
146
Balanced Scorecard meetsWeb site evaluation ?
Incorporating application-centric measures into evaluation instruments for strategic management:
Option 1:Incorporation of the application-centric measures into higher level measures of the company's BSC, e.g.Customer conversion across all channels for customer interaction, encompassing
– Customer conversion rate of the site
– Customer conversion rate due to TV ads
– Customer conversion rate of a brick-and-mortar store
Option 2:Integration of all application-centric measures into a BSC for the evaluation of the site
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
147
The Web Scorecard of SASfor Customer Relationship Management [HSB02]
Three perspectives
The Web Scorecard takes a holistic view of the company's Web presence, considering three perspectives:
System perspective:It focusses on the objectives emanating from the need for 24/7 availability.
Perspective of the offered products/services:It focusses on the optimal exploitation of the products and services that the company offers via the site.
Customer perspective:It focusses on customer segmentation and customer satisfaction for the target segments.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
148
The Web Scorecard of SASfor Customer Relationship Management [HSB02]
Example instantiation (1)
1. System perspective
2. Product/service perspective
ObjectivesObjectives MeasuresMeasures IndicatorsIndicators
Optimal resource usageOptimal resource usage
Reduce/increase number of servers
Reduce/increase number of servers
Server loadServer load
Performance towards customer
Performance towards customer
Minimise number of objects per page
Minimise number of objects per page
Avg. response time,Access errors
Avg. response time,Access errors
ObjectivesObjectives MeasuresMeasures IndicatorsIndicators
Maximise awarenessMaximise awareness
Design marketing actions & questionnaires
Design marketing actions & questionnaires
Pagehits,Number of sessions
Pagehits,Number of sessions
MaximizestickinessMaximizestickiness
Page redesign, PersonalisationPage redesign, Personalisation
Avg. session durationAvg. session duration
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
149
The Web Scorecard of SASfor Customer Relationship Management [HSB02]
Example instantiation (2)
3. Customer perspectiveObjectivesObjectives MeasuresMeasures IndicatorsIndicators
Increase customer satisfaction
Increase customer satisfaction
PersonalisationPersonalisation Usability rate/ Navigation, ...Usability rate/ Navigation, ...
Increase conversion rateIncrease conversion rate
Marketing campaigns, clickstream analysis of entry/exit paths
Marketing campaigns, clickstream analysis of entry/exit paths
Conversion rate, response rateConversion rate, response rate
Increase revenueIncrease revenuePersonalised special offerings,cross-selling
Personalised special offerings,cross-selling
Avg. number of purchases in the last 3 months
Avg. number of purchases in the last 3 months
Increase effectivity of marketing measures
Increase effectivity of marketing measures
Evaluation of campaign management
Evaluation of campaign management
Response rate per mailingResponse rate per mailing
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
150
The Web Scorecard of SASfor Customer Relationship Management [HSB02]
Example implementation
Presentation metaphor: Radar chart
Each axis represents one indicator.
The outer horizon shows the target values of the indicators.
The inner horizon shows the current values of the indicators.
The difference between inner and outer horizon summarises the position of the institution with respect to its target.
The difference between inner and outer horizon for each indicator allows for a prioritisation of the measures to be improved/consolidated.
Avg. sum in market basket
Avg. sum in market basket
Avg. sum of
purchases
Avg. sum of
purchases
Retention rate
Retention rate
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
151
Revisited: Balanced Scorecard meetsWeb site evaluation ?
Option 1:Incorporation of the application-centric measures into higher level measures of the company's BSC
Option 2:Integration of all application-centric measures into a BSC for the evaluation of the site+ Evaluation of the Web
site with a familiar instrument
+ Identification of multiple facets of web success
- Detachment of site success from overall evaluation
- Additional BSC instantiation needed
+ Evaluation of the Web site with a familiar instrument
+ Contribution of site success to the business targets
- The view over the site, as interaction channel and marketing instrument, is fragmented across multiple measures and indicators.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
153
End of Part III
Questions thus far ?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
154
Further Readingson Part III (1)
[BPW96] P. Berthon, L.F. Pitt and R.T. Watson. The World Wide Web as an advertising medium. Journal of Advertising Research, 36(1), pp. 43-54, 1996.
[Ber02] Berendt, B. (2002). Using site semantics to analyze, visualize, and support navigation. Data Mining and Knowledge Discovery, 6, 37-59.
[BS00] Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56-75.
[CPCP01] Chi, E.H., Pirolli, P., Pitkow, J.E. (2000). The scent of a site: a system for analyzing and predicting information scent, usage, and usability of a Web site. In Proceedings CHI 2000 (pp. 161-168).
[CS00] M. Cutler and J. Sterne. E-metrics — Business metrics for the new economy. Technical report, NetGenesis Corp., http://www.netgen.com/emetrics (access date: July 22, 2001)
[DFAB98] Alan Dix, Janet Finlay, Gregory Abowd, Russell Beale. Human Computer Interaction. Prentice Hall Europe 1998. Cited after http://www.tau-web.de/hci/space/i7.html and http://www.tau-web.de/hci/space/x12.html.
[DZ97] X. Dreze and F. Zufryden. Testing web site design and promotional content. Journal of Advertising Research,37(2), pp. 77-91, 1997.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
155
Further Readingson Part III (2)
[FGL+00] J. Forsyth and T. McGuire and J. Lavoie. All visitors are not created equal. McKinsey marketing practice. McKinsey & Company. Whitepaper. 2000.
[Flem98] Fleming, J. (1998). Web Navigation. Designing the User Experience. Sebastopol, CA: O'Reilly.
[HSB02] K.-P. Huber, F. Säuberlich, C. Böhm. Kennzahlenbasiertes Web Controlling mit einer Web Scorecard. In "Handbuch Web Mining im Marketing" (eds. H. Hippner, M. Merzenich, K. Wilde). vieweg. 2002 (on German)
[KNY00] Kato, H., Nakayama, T., & Yamane, Y. (2000). Navigation analysis tool based on the correlation between contents distribution and access patterns. In Working Notes of the Workshop "Web Mining for E-Commerce - Challenges and Opportunities." at SIGKDD-2000. Boston, MA (pp. 95-104).
[KP92] R.S. Kaplan, D.P. Norton. The Balanced Scorecard: Translating Strategy to Action. Boston MA. 1992
[KP03] Kohavi, R. and Parekh, R. Ten Supplementary Analyses to Improve E-Commerce Web Sites. In Proceedings of the WebKDD 2003 Workshop - Webmining as a Premise to Effective and Intelligent Web Applications. at SIGKDD-2003. August 27th, 2003, Washington DC, USA (pp. 29-36).
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
156
Further Readingson Part III (3)
[KB04] A. Kralisch und B. Berendt. Cultural determinants of search behaviour on websites. In V. Evers, E. del Galdo, D. Cyr & C. Bonanni (Eds.), Designing for Global Markets 6. Proceedings of the IWIPS 2004 Conference on Culture, Trust, and Design Innovation. Vancouver, Canada, 8 - 10 July, 2004. Vancouver, BC: Product & Systems Internationalisation, Inc., pp. 61-74, 2004.
[Kuhl96] R. Kuhlen. Informationsmarkt: Chancen und Risiken der Kommerzialisierung von Wissen. 2nd edition, 1996 (on German)
[LPS+00] Junghoung Lee, M. Podlaseck, E. Schonberg, R. Hoch and S. Gomory. Analysis and visualization of metrics for online merchandizing. In "Advances in Web Usage Mining and User Profiling: Proc. of the WEBKDD'99 Workshop", LNAI 1836, Springer Verlag, pp. 123-138, 2000.
[Moe] W. Moe. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. In Journal of Consumer Psychology.
[Niel00] Nielsen, J. (2000). Designing Web Usability: The Practice of Simplicity. New Riders Publishing.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
157
Further Readingson Part III (4)
[SF99] M. Spiliopoulou, L.C. Faulstich. WUM: A Tool for Web Utilization Analysis. In: Extended version of Proc. EDBT Workshop WebDB’98, LNCS 1590. Springer Verlag, Berlin Heidelberg New York, pp 184–203, 1999.
[Shne98] Shneiderman, B. (1998). Designing User Interface. Strategies for Effective Human-Computer Interaction. 3rd edition. Reading, MA: Addison-Wesley.
[Ste03] Sterne, J. WebKDD in the Business World. Invited talk in the WebKDD 2003 Workshop - Webmining as a Premise to Effective and Intelligent Web Applications. at SIGKDD-2003. August 27th, 2003, Washington DC, USA.
[Spi99] M. Spiliopoulou. The laborious way from data mining to Web mining. Int. Journal of Comp. Sys., Sci. & Eng., Special Issue on ”Semantics of the Web”, 14, pp. 113–126, 1999.
[SP01] M. Spiliopoulou,C.Pohle. Data mining for measuring and improving the success of Web sites. In Journal of Data Mining and Knowledge Discovery, Special Issue on E-commerce, 5, pp. 85–114. Kluwer Academic Publishers. 2001
[Sul97] T. Sullivan. Reading reader reaction: A proposal for inferential analysis of web server log files. Proc. of the Web Conference'97, 1997.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
158
Agenda
Part II: Web Mining as a Project
Part III: Evaluation Methods and Measures
Part IV: Case Study
Part V: Infrastructure for Web Mining Deployment
Part VI: Outlook
Part I: Foundations and Principles of Web Mining
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
159
Objectives of the application
General objectives: “Standard e-tailer goals“ – attract users/shoppers and convert them into customers
Specific objectives: assess the success of the Web site – in relation to other distribution channels
Questions of the evaluation:
• What business metrics can be calculated from Web usage data, transaction and demographic data for determining online success?
• Are there cross-channel effects between a company‘s e-shop and its physical stores?
52 5467 69
48 4633 31
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1999 2000 2001 2002 (proj.)
Pure Internetcompanies
Multi-channelbusinesses
Background: Internet market shares [BCG 2002]
Case study “Multi-channel e-tailer“
[TB03,TBG03]
Case study “Multi-channel e-tailer“
[TB03,TBG03]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
160
Description of the site and its services
The retailer operates an e-shop and more than 5000 retail shops in over 10 European countries
It sells a wide range of consumer electronics
Online customers can pay, pick-up/deliver and return both online and offline
Web pages provide for all tasks in the customer buying process:
Transaction PurchaseOfflineinfoHome (Acquisition)
Product Click-
Through Product
Impression
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
161
Outline of the KDD process
Data preparation:
Session IDs; usual data cleaning steps
Linking of sessions & transaction information (anonymized)
Modelling / pattern discovery:
Web metrics, cluster analysis, association rules, sequence mining + correlation analysis, questionnaire study, qualitative market analysis
Evaluation: Interesting patterns
Business understanding: see previous 2 slides
Data:
Web server sessions, transaction info.
Data understanding – main step:
modelling the semantics of the site in terms of a hierarchy of service concepts
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
162
Data and data preparation
Data sources and sample:
92,467 sessions from the company’s Web logs from 21 days in 2002
anonymized transaction information of 13,653 customers who bought online over a period of 8 months in 2001/02.
621 transaction records (21 days) were linked to Web-usage records
Data preparation:
Sessions were determined by session IDs
Robot visits eliminated, usual data cleaning steps
Each URL request mapped to a service concept from {c1,...,cn}
Session representation: s = [w1, ...wn], with wi = weight of ci, indicating whether or not the concept was visited (1/0), or how often it was visited
Customer record: feature vector incl. session and transaction data
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
163
Site semantics: A service concept hierarchy
Any
Information
Transaction
Services
Information Product
Fulfillment/ Service
Customer Data
Shopping Cart Payment
Company Infos
Registration
Other
Acquisition
Offline Referrer
Advertiser Other
Store Locator
Information Catalog
Home
Game Offline Service
and Support
= Multi-Channel Concept
760,535 page requests were mapped onto the concepts from this hierarchy:
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
164
Types of patterns
Conversion rates (~ confidence of content-specified sequential association rules) for assessing business success
Cluster analysis for customer segmentation
Association rule and sequence analysis for understanding online/offline preferences and their temporal development
Correlation analysis for investigating the relationship between demographic indicators and online/offline preferences
In this case study, all patterns were discovered using straightforward algorithms. Algorithms were not compared or evaluated.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
165
Evaluation scheme for the site
Summative evaluation was designed to quantify the site‘s success in converting users to customers, online users to physical-branch shoppers, etc.:
Life-cycle business metrics (Reach, Acquisition, Conversion, Retention) & multi-channel metrics (Offline Payment rate, Payment Migration rate, Deliveries to stores rate, Deliveries migration rate) [CS00,LPSH01,TB03]
Degree of user satisfaction with features of the site (questionnaire meth.)
Formative evaluation was designed to increase understanding of the market and to give feedback on how multi-channel options are used:
What subgroups (identified by behavioural focus) exist in the e-tailers market?
market segments with specific patterns of online/offline preferences?
How “consistent“ & stable over time are customers‘ onl./offl. preferences?
How do demographic factors affect online/offline preferences?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
166
Evaluation scheme for patterns
The value of a business metric is interesting if ...
... it {deviates from | agrees with} industry averages, or with the same shop‘s earlier values on these metrics
A cluster is interesting if ...
... its internal structure shows a clear behavioural focus (heavy use of specific pages)
An association rule is interesting if ...
... its support is particularly large or small (size of customer groups)
... its confidence is particularly large or small ((in)stability of preferences)
A correlation is interesting if ...
... it is statistically significant
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
167
Evaluation scheme for the environment
The evaluation of the specific e-tailer focused on the described aspects of the online-offline channel mix.
The environment within the company was not evaluated.
The environment of the company – the Internet multi-channel market was evaluated with a check-list method:
Online service mix at the world‘s 50 largest e-retailers in 2002 [GM03]. 43 of the 50 are multi-channel.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
168
Tend to arrive with prior knowledge
Tend to be "true multi-channel users"
Tend to be "true online users"
Largest group visits all concepts except offline
information
Results: Market segments
Cluster centers of the weighted purchase sessions with direct delivery preference
Cluster centers of the weighted purchase sessions with pick-up in store preference
Cluster 1 2 3 4 5Home 2 1 2 2 2Infocat 7 2 4 23 16Offinfo 3 0 1 2 0Infprod
6 3 12 21 5
Service
10 3 2 4 4
Transact
6 2 3 4 4
Number of cases
29 188
45 15 37
Cluster 1 2 3 4 5Home 1 4 18 1 4Infocat 22 30 6 1 6Offinfo 1 7 1 5 19Infprod 1 27 5 22 8Service 5 4 0 0 12Transact
3 7 3 3 4
Number of Cases
25 15 147 40 55
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
169
Results: “Internal consistency“ of preferences – payment and delivery preferences
Online payment Direct delivery (s=0.27, c=0.97) < 1/3 traditional onl.users!
Online payment In-store pickup (s=0.02, c=0.03)
Cash on delivery Direct delivery (s=0.02, c=0.03)
In-store payment In-store pickup (s=0.69, c=0.94)
Site is primarily used to collect information.
s: support, c: confidence of the sequence
s: support, c: confidence of the sequence
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
170
Results: “Internal consistency“ of preferences – return preferences
Return In-store (s=0.06, c=0.87)
Return Mail-in (s=0.04, c=0.13)
Customers may wish personal assistance.
(a result supported by the service mix analysis of different multi-channel retailers and by questionnaire results)
s: support, c: confidence of the association rule
s: support, c: confidence of the association rule
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
171
Results: Development of preferences over time
Direct delivery In-store pickup in 1 following transaction (s=0.001,c=0.15)
Direct delivery Direct delivery in all following transactions (s=0.003,c=0.85)
In-store pickup Direct delivery in 1 foll. transaction (s=0.001, c=0.10) (*)
In-store pickup In-store pickup in all foll. transactions (s=0.004, c=0.90)
Results for payment migration are similar.
90% of repeat customers did not change transaction preferences at all.
Rule (*) as an indicator of the development of trust?!
s: support, c: confidence of the sequence
s: support, c: confidence of the sequence
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
172
Results: Shop and Customer Distribution
Shops Customers (Red=pick up;Blue=direct
delivery)
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
173
70.065.0
60.055.0
50.045.0
40.035.0
30.025.0
20.015.0
10.05.0
0.0
3000
2000
1000
0
Customers
Std.Dev.: 9.32,
Mean: 10.0, N=13653
km
Results: Impact of demographics and of the offline distribution channel ?!
A significant Pearson correlation exists between
the number of customers per zip code area, normalised by the number of residents/zip code, and the distance to the next store (r = -0.3, p < 0.001).
number of residents/zip code and distance to store (r =-0.01, p<0.001)
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
174
End of Part IV
Questions thus far ?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
175
Further Readingson Part IV
[CS00] Cutler, M., Sterne, J. (2000). E-Metrics - Business Metrics for the New Economy. Technical Report, Netgenesis Corp., http://www.netgencom/emetrics.
[GM03] Gallo, R., McAlister, J. (2003). The Top 50 Retailers. Technical Report, August 2003. http://www.retailforward.com
[LPSH01] Lee, J., Podlaseck, M., Schonberg, E., & Hoch, R. (2001). Visualization and Analysis of Clickstream Data of Online Stores for Understanding Web Merchandising. Data Mining and Knowledge Discovery, 5(1/2), 59-84.
[TB03] Teltzrow, M., & Berendt, B. (2003). Web-Usage-Based Success Metrics for Multi-Channel Businesses. In Proceedings of the WebKDD 2003 Workshop - Webmining as a Premise to Effective and Intelligent Web Applications. at SIGKDD-2003. August 27th, 2003, Washington DC, USA (pp. 17-27).
[TBG03] Teltzrow, M., Berendt, B., & Günther, O. (2003). Consumer behaviour at multi-channel retailers. In Proceedings of the 4th IBM eBusiness Conference, School of Management, University of Surrey, 9th December 2003.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
176
Agenda
Part II: Web Mining as a Project
Part III: Evaluation Methods and Measures
Part IV: Case Study
Part V: Infrastructure for Web Mining Deployment
Part VI: Outlook
Part I: Foundations and Principles of Web Mining
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
177
Part V:Infrastructure for Web Mining Deployment
An Agent-based Architecture
Basic Functionalities
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
178
Goals to achieve
Efficiency: •User
•Stakeholder
– No loss of efficiency due to the web mining solution
Adaptability:– The solution has to be adaptable to:
•New data captured by the operational systems
•Changes in the local or global environment
Integrated:– Business logic
– Rest of subsystems
– Rest of communication channels
Flexible design, usable applications
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
179
Functionalities
Data adquisition and storage:– Transactional
– Navigational
Business rules/logic adquisition and storage
Knowledge Discovery in data
Act according to knowledge and business goals
Monitoring
Results measurement
Improvement and refining actions
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
180
Part V:Infrastructure for Web Mining Deployment
Basic Functionalities
An agent-based Architecture
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
181
Agents capabilities
Features: [JW98]– Autonomy
– Abilities to perceive, reason and actin surrounding environmet
– Capability to cooperate with other agents to solve complex problems
Facilitate the incorporation of reasoning capabilities within the business aplication logic
Permit the inclusion of learning and self-improvement capabilities
Can participate in high-level dialogues using protocols and built-in organizational knowledge
Can help address serious tecnlogical challenges: security, privacy, searching interoperability
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
182
Web Mining infraestructure
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
183
Architecture: Decision Layer
Make decisions depending on the semantic layer information:
User Agents:– Represent each navigation on the site.
– The interaction User-Interface Agent and Interface Agent-User agent will make it possible together with the data being already stored to calculate the user model.
Planning Agents or Agents of strategy– Determine the strategy to be followed in order to obtain a
better relationship with the user at the same time that goals achievement is improved.
– They will collaborate with the Interface agents and CRM Services Provider Layer agents to elaborate the best action plan, depending on the problem to be solved and on the environment conditions.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
184
Architecture: Semantic Layer
Contains agents related to the:
logic of the algorithms and
method used.
There will be different agents, each of which will specialize in the application of the different models needed for decision making process.
Models will be stored in a repository from which they will be updated, deleted or improved by Refining Agents when needed.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
185
Architecture: CRM Services Provider Layer
It offers an interface, which will be used by any agent asking for a service.
Several agents specialized in particular services.
Particular Action Plan selected for:
a particular Session
at a particular moment
will involve several agents that will act, collaborate and interact among them in order to reach the proposed goals.
This e-business agents should be “intelligent”
Intelligence: amount of learned behaviour and possible reasoning capacity that the agent can possess [Papazoglou01]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
186
Agents collaboration scheme
USER:• Subjective Inf.• Objective Inf.
PLANS
USER MODELLINGAGENT
PLANNING AGENT
•Interaction Elements•Available Services•Communication Channel
INTERFACE AGENT
Service 1
DOMAINAGENT 1
Service N
DOMAINAGENT N...
Operational Plan
Action Plan
UserModel
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
187
A business agent example[P01]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
188
Business Agent Typology[P01]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
189
Conclusions
Agent-oriented technology can help enable the development of model-based solutions to e-business applications
Enhace enterprise modelling
Offer techniques to incorporate the knowledge extracted in a data mining project
Offer techniques to gather information to be used in the data mining project
Agents have to be organized depending on their functionality
Solutions based on agent are easy to scale
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
190
End of Part V
Questions thus far ?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
191
Further Readingson Part V
[JW98] Jennings and Wooldridge. Application of Intellingent Agents. Agent Technology Foundations, Applications and Markets. Springer Verlag 98.
[P00] Yang, J. and Papazoglou, M.P. Interoperation support for e-business. Commun. ACM 43, 6 (June 2000).
[P01] Papazoglou M., Agent-Oriented Technology in Support of e-business. Commun. ACM 44, 4 (april 2001)[PML01] S. Parent, B. Mobasher, and S. Lytinen. An adaptive agent for web exploration based on concept hierarchies. In Proceedings of the International Conference on Human Computer Interaction. New Orleans, LA, August 2001.
[PSS02] T. R. Payne, R. Singh, and K. Sycara. Browsing schedules - an agent-based approach to navigating the semantic web. ISWC 2002, LNCS 2342, pages 469–473, 2002.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
192
Agenda
Part II: Web Mining as a Project
Part III: Evaluation Methods and Measures
Part IV: Case Study
Part V: Infrastructure for Web Mining Deployment
Part VI: Outlook
Part I: Foundations and Principles of Web Mining
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
193
Part VI:Outlook
Web mining for institutions with innovative organisational forms
How personal should and can web personalization be?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
194
Web Mining Applicationsfor new areas
Many Web Mining focus on:
Marketing measures to maximise customer lifetime value
User modelling, usually in the context of– e-learning and training
– site optimisation
– personalisation in any B2C and A2C applications
There are much more domains, where Web Mining can contribute:
1. Non-traditional business domains
2. Innovative applications in traditional domains
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
195
E-Business models
Timmers identified eleven business models [Tim99], putting emphasis on the B2B-domain: e-shop e-mall e-auction third-party marketplace e-procurement virtual community collaboration platform value-chain integrator value-chain service provider information service provider trust service provider
Conventional business models, mostly B2C, with focus on sales
Which business models are encountered by Amazon?Which business models do its partners have?
Which business models are encountered by Amazon?Which business models do its partners have?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
196
E-Business models
Timmers identified eleven business models [Tim99], putting emphasis on the B2B-domain: e-shop e-mall e-auction third-party marketplace e-procurement virtual community collaboration platform value-chain integrator value-chain service provider information service provider trust service provider
Which business models are encountered by Amazon?Which business models do its its partners have?
Which business models are encountered by Amazon?Which business models do its its partners have? portals
application service providers
More innovative business models, mostly B2B with some focus on sales
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
197
Some Web Mining application areasfor non-traditional business models (1)
1. Mining for Web communities and other forms of social networks:
e-auctions
third-party marketplaces and portals
virtual communities
information & trust service providers
raising questions like:
a. What is the impact of a community for its members, for the whole network and, ultimately, for the success of the business model?
b. Are desirable and undesirable impacts? How can they be distinguished and influenced? How can impact be quantified?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
198
Some Web Mining application areasfor non-traditional business models (2)
2. Coordination and optimisation of inter-institutional processes:
third-party marketplaces
value-chain integrators
value-chain service providers and application service providers
raising questions like:
a. What is the impact of end-user requests and throughput upon the processes of the business partners? How can this impact be quantified and used for process optimisation?
b. The participants of a third-party marketplace and the partners of a value-chain service integrator stand often in competition or cooptation. Can undesirable participant strategies be identified and sanctioned?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
199
Some Web Mining application areasfor arbitrary business models
4. Threads from malicious communities:
a. Disclosure of private data / confidential information (cf. [Clif03])
b. Influence upon the outcome of a transaction through:– integrity violation– confidentiality breach– repudiation– other
where a community member may or may not be involved in the transaction.
How can threads be traced, without compromising the anonymity of non-malicious participants?
How can their impact be quantified to assess the danger for the success of the business model?
How can malicious communities be dissolved?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
200
Part VI:Outlook
Web mining for institutions with innovative organisational forms
How personal should and can web personalization be?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
201
Internet users are worried about their privacy ...
(results from a meta-study of 30 questionnaire-based studies [TK03])
(results from a meta-study of 30 questionnaire-based studies [TK03])
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
202
... but they are willing to exchange privacy for personalization benefits
Users would provide, in return for personalized content, information on their name (88%), education (88%), age (86%), hobbies (83%), salary (59%), or credit card number (13%).
27% of Internet users think tracking allows the site to provide information tailored to specific users.
73% of online users find it useful if site remembers basic information such as name and address.
People are willing to give information to receive a personalized online experience: 51% or 40%, depending on the study.
[TK03]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
203
User-centric evaluation: An experimental investigation of the effect of explaining the
personalization-privacy tradeoff
[KT04] compared the effects of traditional privacy statements with that of a contextualized explanation on users’ willingness to answer questions about themselves and their (product) preferences.
In the contextualized-explanation condition, participants
answered 8.3% more questions (gave at least one answer) (p<0.001),
gave 19.6% more answers (p<0.001),
purchased 33% more often (p<0.07) ,
stated that their data had helped the Web store to select better books (p<0.035) – even though the recommendations were static and identical for both groups.
[KT04] compared the effects of traditional privacy statements with that of a contextualized explanation on users’ willingness to answer questions about themselves and their (product) preferences.
In the contextualized-explanation condition, participants
answered 8.3% more questions (gave at least one answer) (p<0.001),
gave 19.6% more answers (p<0.001),
purchased 33% more often (p<0.07) ,
stated that their data had helped the Web store to select better books (p<0.035) – even though the recommendations were static and identical for both groups.
(screenshot from [TK04])
(screenshot from [TK04])
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
204
“The more data, the better the quality of personalization“ ?
In current personalization systems, users only get an “all-or-nothing” deal:
The user can only choose whether to pay nothing (e.g., accept no cookies from the site) or all (accept all cookies), in return for an unspecified increase in recommendation quality (“better”).
P3P and its variants (such as the contextualized add-on described on the previous slide) are limited to a qualitative specification of the exchange relation: In return for data of type X, services of type Y can be offered.
But: want to know by how much the addition of a particular piece of data makes recommendation quality better.
A part of the solution:
Evaluation of algorithms under different conditions (different availability of data on an individual user) can help to quantify the personalization benefits of data disclosure [overview: BT04]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
205
Algorithm-centric evaluation: The impact of the availability of different levels of identity disclosure
The effect of cookies (“high“ level of identity disclosure) vs. no cookies (“intermediate“) on the quality of data for personalization [SMBN03]
The effect of tracking over multiple sites (“user centric“ – “maximal“ level of disclosure) vs. tracking only on one site (“site centric“ - “high“ level) on page prediction as an indicator of personalization quality [PZK01]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
206
Vision 1: Evaluation, communication design, and trade – detailed explanations of the privacy-
personalization tradeoff
[BT04]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
207
Vision 2: Technology – client-side profiling, pseudonymity, identity management, …
Client-side profiles [SH01]: – Users let privacy agents record all interactions with all Web sites.
– At the user’s discretion, parts of that profile can be made available to marketers or peer networks -> managed via privacy metadata.
privacy agent should also provide identity management [JM00]:– Use new pseudonyms when entering sites, and/or re-use old ones
The user privacy agent should also– monitor third-party services to bring problems to user’s attention
(privacy meta-data),
Issues to be resolved:– Need advanced interfaces to help users adopt a complex technology
– Requires a well-functioning system of market surveillance, which is fed back to the user agents => a large enough user + contributor base
– Contributions of privacy-preserving data mining? (overview:[Clif03])
Cf. [SDGR03, BGS04, KS03]
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
208
Thank you for your attention !
Questions ?
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
209
Further Readingson Part VI (1)
[BGS04] Berendt, B., Günther, O., & Spiekermann, S. (in press). Privacy in E-Commerce: Stated preferences vs. actual behavior. To appear in Communications of the ACM.
[BT04] Berendt, B. & Teltzrow, M. (in press). Addressing Users Privacy Concerns for Improving Personalization Quality: Towards an Integration of User Studies and Algorithm Evaluation. To appear in B. Mobasher & S.S. Anand (Eds.), Intelligent Techniques for Web Personalization. Berlin etc.: Springer. LNAI.
[Clif03] Clifton, C. (2003). Privacy Preserving Data Mining. Tutorial atThe Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24, 2003, Washington, D.C. http://www.cs.purdue.edu/homes/clifton/DistDM/Clifton_PPDM.pdf
[JM00] Jendricke, U. and Gerd tom Markotten, D. (2000). Usability meets security - The Identity Manager as your personal security assistant for the Internet. In Proceedings of the 16th Annual Computer Security Applications Conference (New Orleans, LA, Dec.).
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
210
Further Readingson Part VI (2)
[KS03] Kobsa, A. & J. Schreck (2003): Privacy through Pseudonymity in User-Adaptive Systems. ACM Transactions on Internet Technology 3 (2), 149-183 .
[KT04] Kobsa, A. & Teltzrow, M. (2004): Contextualized Communication of Privacy Practices and Personalization Benefits: Impacts on Users’ Data Sharing Behavior. To appear in Proceedings of the 2004 Workshop on Privacy Enabling Technologies, Toronto, Canada, Springer. Draft: http://www.ics.uci.edu/~kobsa/papers/2004-PET-preconference-kobsa.pdf
[SH01] Shearin, S. & Liebermann, H. (2001). Intelligent profiling by example. In Proceedings of the ACM Conference on Intelligent User Interfaces (Santa Fe, NM, January).
[SDGR03] Spiekermann, S., Dickinson, I., Günther, O., & Reynolds, D. (2003). User agents in E-commerce environments: Industry vs. Consumer perspectives on data exchange. In Proc. CAiSE 2003 (pp. 696-710). Springer LNCS.
ECML/PKDD 2004 Tutorial "Evaluation in Web Mining"© Myra Spiliopoulou, Bettina Berendt, Ernestina
Menasalvas
211
Further Readingson Part VI (3)
[SMBN03] Spiliopoulou, M., Mobasher, B., Berendt, B., & Nakagawa, M. (2003). A framework for the evaluation of session reconstruction heuristics in Web-usage analyis. INFORMS Journal on Computing, 15, 171-190.
[TK03] Teltzrow, M.and A. Kobsa (2003): Impacts of User Privacy Preferences on Personalized Systems - a Comparative Study. In Proceedings of the CHI-2003 Workshop "Designing Personalized User Experiences for eCommerce: Theory, Methods, and Research", Fort Lauderdale, FL.
[TK04] Teltzrow, M. & Kobsa, A. (2004). Communication of Privacy and Personalization in E-Business. In Proceedings of the Workshop “WHOLES: A Multiple View of Individual Privacy in a Networked World”, Stockholm, Sweden.
[Tim99] Paul Timmers. Electronic Commerce. Wiley, 1999.