46
Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University

Web analytics presentation

Embed Size (px)

DESCRIPTION

Web analytics presentation given to Penn State ITS office on 19 Oct 2011

Citation preview

Page 1: Web analytics presentation

Web Analytics

Jim JansenAssociate Professor, The Pennsylvania State University

Page 2: Web analytics presentation

Who is Jim Jansen?• Associate professor at College of Information Sciences and

Technology, The Pennsylvania State University, USA• Senior Fellow at the Pew Research Center (Pew Internet

and American Life Project) - http://www.pewinternet.org • Active research and teaching efforts -

http://ist.psu.edu/faculty_pages/jjansen/ • Several funded and non-funded research project• Teach several courses, including keyword advertising• 2011 book, Understanding Sponsored Search (Cambridge) …

theory of keyword advertising• Editor of journal, Internet Research (Emerald)• Book, Understanding User-Web Interactions via Web

Analytics (Morgan & Claypool) - basics of web analytics

Page 3: Web analytics presentation

• Let talk web analytics! • We’ll discuss:

– context– theory– application

• Begin by setting the stage … what are we facing?

Page 4: Web analytics presentation

Moving too ‘everything’ recorded and indexed

A lot global but much will remain local

Search (along with data summarization, trend detection, information and knowledge extraction and discovery) is foundational technology

Raises issues, including: Infrastructure

requirements. How and who pays?

Changes the nature of privacy and anonymity

As publishers or providers, how do we make sense of how people are using this data? --- Web analytics

Explosion of Information -the Zettabytes are coming

There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast

Page 5: Web analytics presentation

How much is a Zettabyte?

Page 6: Web analytics presentation

1. The volume of data is exploding (information growth)

2. The complexity of data is growing (information architecture)

3. The users have less time (attention economy)

4. The user expects improved features (technological sophistication)

Explosion of Information -the Zettabytes are coming

There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast

Page 7: Web analytics presentation

Web analytics can help us …• Deal with the volume of data

(information growth)• Understand the growing

complexity of data (information architecture)

• Address users’ less time (attention economy)

• Lead to improved features (technological sophistication) expected by users

How does web analytics do this?

Page 8: Web analytics presentation

• Thousand years ago: science was mainly naturalistic

describing natural phenomena• Last few hundred years:

theoretical branch using models, generalizations

• Last few decades: a computational branch

simulating complex phenomena• Today:

data exploration (eScience)unifying theory, experiment, and simulation – Data captured by sensors, instruments,

or generated by simulator– Processed by humans and software– Information / knowledge stored in computer– Analyzes database / collection content using data

management and statistics– Network and Web Science

Data Information Knowledge

This is the realm of Web analytics!

Page 9: Web analytics presentation

What is web analytics?• The Web Analytics Association (WAA) defines

Web analytics as:– the measurement, collection, analysis, and

reporting of Internet data for the purposes of understanding and optimizing Web usage

(http://www.webanalyticsassociation.org/)

• Shares common theoretical and methodology characteristics with all forms of log analysis (e.g., Intranet logs, systems logs, OPAC logs, search logs, etc.)

Page 10: Web analytics presentation

Let’s break that definition down … • Collection - accumulate and store over a period of time • Internet data - internet facts and statistics collected

together for reference or analysis • Measurement – ascertain the size, amount, or degree of

something by using an instrument or device• Analysis - examine methodically the structure of

information for purposes of explanation and interpretation. • Reporting - giving a spoken or written account of

something that one has investigated. • Understanding - perceive the significance, explanation, or

cause of something • Optimizing - make the best or most effective use of a

resource • Web usage – employ or deploy something as a means of

accomplishing a purpose or achieving a result

Data

Information

Knowledge

Page 11: Web analytics presentation

• How is the data collected?

Page 12: Web analytics presentation

W3C Extended Log Format

W3C Extended Log Format -Variety of fields for examining visitors to Web sites.

Other common format is NCSA Separate Log that is composed of three logs Common log – actions on the server, Referral log – where they came from, and Agent log – stuff about the client computer

Rather than service-side logging, other methods such as page tagging, image cookies, Flash cookies, etc. but the data is still stored in a log.

Page 13: Web analytics presentation

• Okay, that’s collection? • What about analysis and reporting?

Page 14: Web analytics presentation

Variety of tools help make sense of this log data

Page 15: Web analytics presentation

• With that context, let’s look at the foundations aspects …

Page 16: Web analytics presentation

Theoretical Foundations• Web analytics is based on the behaviorism

paradigm• Behaviorism – an approach focused on the

outward behavioral aspects of thought and emphases the observed behaviors

• Behaviorism – Pavlov, Watson, and Skinner

Burrhus Frederic Skinner John B. Watson Ivan Petrovich Pavlov

Page 17: Web analytics presentation

Behaviorism Characteristics• Inductive, data-driven and characterized by empirical observation

of measurable behavior • Grounded on somebody doing something in a situation (all the

environmental and situational features are embedded behaviors)• Critics of behaviorism as a psychological theory have issues with

rejection of mental processes.• I agree - people are more than “mediators between behavior and

the environment” (Skinner, 1993, p 428) (c.f.c., social learning theory) … however, don’t throw out the baby with the bath water

Page 18: Web analytics presentation

What is a Behavior?

… an observable activity of a person, animal, team, organization, or system.

One can classify behaviors into three general categories. Behaviors are

• something that one can detect and record• actions or specific goal-driven events with

some purpose other than the specific action that is observable

• reactive responses to environmental stimuli

Page 19: Web analytics presentation

What is a Behavior?• Behavior is the essential construct of

the behaviorism and of web analytics• Logs record behaviors of users and

systems (records behavior but can’t tell affective, cognitive, or situational aspects .. yet, but we’re working on it! )

•A behavior is the key variable (i.e., an entity representing a set of events where each event may have a different value)

Page 20: Web analytics presentation

• can view the data collected in log files as trace data • people conducting the activities of their daily lives

many times create things, create marks, induce wear, or reduce some existing material

• within the confines of research, these things, marks, and wear become data

• classically, trace data are the physical remains of people’s interaction

Data Collection: Trace Data

Wear on a carpet

Trash heap

Surfing web

Page 21: Web analytics presentation

Trace Data• In the past, trace data was often time consuming to

gather and process, making such data costly. • logging software makes collecting trace data on the

Internet easy and cheap• Log data is controlled accretion data, where the

researcher or some other entity alters the environment in order to create the accretion data

• With the user of client apps (such as desktop search bars), the collection of data is nearly unlimited from a technology perspective

What is cool about trace data for researchers?

Page 22: Web analytics presentation

Data CollectionLog data/trace data has significant advantages as a data

collection approach for the study and investigation of behaviors, including:

• Scale: not a limiting factor as in lab user studies• Power: large sample size for inference testing; in

fact, so large must account for the size effect• Scope: naturalistic; researchers can investigate

range of interactions in a multi-variable context• Location: can collect in distributed environments• Duration: collect log data over an extended period

Page 23: Web analytics presentation

Methodological Foundations

Customer Behavior (video)

Use of logs to collect trace data is an unobtrusive methods (a.k.a., non-reactive or low-constraint). Unobtrusive methods …

• allows data collection without directly interfering into the context and,

• does not require a direct response from participants Chemistry (surface marking)

Page 24: Web analytics presentation

Methodological FoundationsThree justifications for unobtrusive methods: • Uncertainty principle: researchers interjected into an

environment become part of the system• Observer effect: difference that is made to an activity

or a person’s behaviors by being observed• Observer bias: observers overemphasize behavior

they expect to find and fail to notice behavior they do not expect

Trace data helps in overcoming the Uncertainty principle, Observer effect, and Observer bias in the data collection. Note: Observer bias for data collection but not data analysis

Example: ethnography studies (where the researcher “bird dogs” a study participantExample: no one searches for porn in a lab study of Web searching

Example: is why medical trials are double blind rather than single blind

Page 25: Web analytics presentation

Methodological FoundationsInherent characteristics in the method of log data

collection; Web analytics has issues to address as a result:• Abstraction – how does one relate low-level data to

higher-level concepts?• Selection – how does one separate the necessary from

unnecessary data? • Reduction – how does one reduce the complexity and

size of the data set?• Context – how does one interpret the significance of

events? • Evolution – how can one collect data without impacting

application deployment or use?

Page 26: Web analytics presentation

• Okay, nice but how to we apply it …

Page 27: Web analytics presentation

Web analytics process

• Every consulting firm has a web analytics process … (which is fine)

• However, the effective ones all boil down to four essential steps

Page 28: Web analytics presentation

Essential steps to any effective web analytics process

Collection of

data

Processing of data into information

Developing key

performance indicators

Formulating online

strategy

Drives Drives Drives

DrivesDrivesDrives

Typically counts.

Basically, data collection

Examples:• time stamp• referral URL• query term

Typically ratios.

Data becomes metrics.

Counts and ratios infused with business

strategy.

Online goals, objectives, or standards for organization.

Examples:• time on page• bounce rate• unique visitors

Examples:• conversion rate• average order value• task completion rate

Examples:• save money• make money• marketshare

Page 29: Web analytics presentation

Three types (plus 1) of Web analytics metrics Implementation

• Count — the most basic unit of measure; a single number.• Ratio — typically, a count divided by a count, although a

ratio can use either a count or a ratio in the numerator or denominator.

• KPI (Key Performance Indicator) — can be either a count or a ratio, it is frequently a ratio. A KPI is infused with business strategy, and therefore the set of appropriate KPIs typically differs between site and process types.

• Dimension - data that can be used to define various types of segments and represents a fundamental dimension of visitor behavior or site dynamics. Typically, not associated with a number.

Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf

Page 30: Web analytics presentation

Can be applied to three levels of granularity

• Aggregate — Total site traffic for a defined period of time. (typically used for market comparisons)

• Segmented — A subset of the site traffic for a defined period of time, filtered in some way to gain greater analytical insight. (by developing personas and profiles in Google Analytics).

• Individual — Activity of a single Web visitor for a defined period of time. (excellent for persona developing and outlier analysis)

Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf

Page 31: Web analytics presentation

Classifications of Metrics• Building Block – foundational

metrics • Visit Characterization – metrics

aimed at understanding visits, either single or aggregate

• Content Characterization – metrics aimed at understanding content or its use

• Conversion – metrics aimed at linking visits and content

Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf

Page 32: Web analytics presentation

Building Block• Page: A page is an analyst definable unit of content.• Page Views: The number of times a page was viewed.• Visits/Sessions: A visit is an interaction by an individual, with a

website consisting of one or more requests for a page.• Unique Visitors: The number of inferred individual people, within

a designated reporting timeframe, with activity consisting of one or more visits to a site.– New Visitor: The number of Unique Visitors with activity

including a first-ever Visit to a site during a reporting period– Repeat Visitor: The number of Unique Visitors with activity

consisting of two or more Visits to a site during a reporting period.

– Return Visitor: The number of Unique Visitors with activity consisting of a Visit to a site during a reporting period and where the Unique Visitor also Visited the site prior to the reporting period

Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf

Page 33: Web analytics presentation

Visit Characteristics• Entry Page: The first page of a visit.• Landing Page : A page intended to identify the beginning of

the user experience.• Exit Page: The last page on a site accessed during a visit,

signifying the end of a visit/session.• Visit Duration: The length of time in a session.• Referrer: The referrer is the page URL that originally generated

the request for the current page view or object.• Click-through: Number of times a link was clicked by a visitor.• Click-through Rate: The number of click-throughs for a specific

link divided by the number of times that link was viewed.• Page Views per Visit: The number of page views in a reporting

period divided by number of visits in the same reporting period.

Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf

Page 34: Web analytics presentation

Content Characterization• Page Exit Ratio: Number of exits from a page divided by total

number of page views of that page• Single Page Visits: Visits that consist of one page regardless of

the number of times the page was viewed.• Single Page View Visits (Bounces): Visits that consist of one

page-view.• Bounce Rate: Single page view visits divided by entry pages.

Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf

Page 35: Web analytics presentation

Conversion Metrics

• Event: Any logged or recorded action that has a specific date and time assigned to it by either the browser or server

• Conversion: A visitor completing a target action

Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf

Page 36: Web analytics presentation

Translating these metrics

• Translating these metrics into meaningful and accurate knowledge is not always easy.

• Real world example – the hotel problem (excellent illustration of the importance of proper period selection)

Page 37: Web analytics presentation

The hotel• Use Daily Uniques

Sam

Ted

Jane

Sam

Scott

Jane

Sam

Ara

Sam

Chi

Sam

Tom

Sam

Yen

Sam

Tim

Jane Jane Jane Jane Jane

Ro

om

s1

2

3

Days 1 2 3 4 5 6 7

3 3 3 3 3 3 3

• Total Daily Uniques = 21

• Use Weekly Uniques

1

1

Count

Count

7

• Total Weekly Uniques = 9

Page 38: Web analytics presentation

Bottom line: the time qualifier matters!

• So, can’t just add daily uniques to get weekly uniques

• Have to scrub the data• This just one example of many issues that one

can face when digging into the data in order to get meaningful web analytics data!

Page 39: Web analytics presentation

50 minutes = Can’t Cover Everything

• … some starting points for further reading

Page 40: Web analytics presentation

Research Work (mine)

• Book: Jansen, B. J., Spink, A., and Taksa, I. (2009) Handbook of Research on Web Log Analysis, Hershey, PA: Idea Group Publishing– First chapter on theory of log analysis is free!

• Lecture: Jansen, B. J. (2009) Understanding User – Web Interactions via Web Analytics. Morgan-Claypool Lecture Series. Gary. Marchionini (Ed). Morgan-Claypool: San Rafael, CA.– manuscript about Web Analytics, soup to nuts– companion website (free):

http://faculty.ist.psu.edu/jjansen/webanalytics/understanding_web_analytics.html

Page 41: Web analytics presentation

Research Work (mine)

• Article: Jansen, B. J. 2006. Search log analysis: What is it; what's been done; how to do it. Library and Information Science Research, 28(3), 407-432.

• http://faculty.ist.psu.edu/jjansen/academic/pubs/jansen_search_log_analysis.pdf

Page 42: Web analytics presentation

Great ‘how to books’ for web analytics

• Web Analytics: An Hour a Day by Avinash Kaushik (Jun 5, 2007)

• Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity by Avinash Kaushik (Oct 2009)

• Advanced Web Metrics with Google Analytics, 2nd Edition by Brian Clifton (Mar 15, 2010)

• Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Site Affects Your Business by Eric Peterson (Mar 2004)

Page 43: Web analytics presentation

Thanks!(welcome questions / discussion!)

Web Analytics

Jim JansenAssociate Professor, The Pennsylvania State University

Page 44: Web analytics presentation

• Before we end …

Page 45: Web analytics presentation

Follow-on Discussion

• Happy to chat with anyone (get with me either today or contact me via email)

• Email [email protected]• LinkedIn http://www.linkedin.com/in/jjansen• Twitter jimjansen

Page 46: Web analytics presentation

Again, thanks!

Web Analytics

Jim JansenAssociate Professor, The Pennsylvania State University