16
May 22, 2019

May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

May 22, 2019

Page 2: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

TABLE OF CONTENTS

INTRODUCTION

A BRIEF OVERVIEW OF EACH PLATFORM

DETAILED COMPARISON

Overview

Pageviews

Bounce Rate

Visits / Sessions

User / Visitor / Party

Definition Comparison Summary

Cloud Vs. Internal Database

Geo Tracking

WHAT IS THE BEST FREE PLATFORM?

NEXT STEPS

1

2

4

4

5

6

7

8

10

11

12

13

14

©2019 Fulcrum Analytics, Inc. All Rights Reserved.No material may be reproduced in any form without the written permission of Fulcrum Analytics.

Page 3: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

1

There are many valuable reasons for tracking

website data, which include understanding the

appeal and clarity of website content, funneling

traffic to specific calls to action, tracking cam-

paign response, measuring engagement among

existing customers, identifying geographic

reach, and so much more. From the simplest

website to more complex sites with application

forms and transactional elements, website

tracking and the associated tagging and report-

ing should be a tool used by every organization.

For companies who are not yet fully tracking

website behavior with reporting metrics, or

who are at the earliest stages of adoption, there

can be many questions. Chief among these,

is often which software should be deployed?

There are both free tools and tools available for

a fee. There are tools with built in basic report-

ing, tools with built in advanced reporting, and

some with no reporting whatsoever. While

some store collected data in a cloud, which can

be problematic if the data includes sensitive

information, other tools allow for collected data

to reside on servers within an organization’s

own environment. Additionally, there is a range

of variations in features, assistance levels, and

ease of implementation between the available

options.

Fulcrum not only measures traffic on its own

website, but also implements website tagging

and tracking for clients as one of its services. As

a proponent of the use of open source software

when possible, both to optimize investment

dollars as well as to continually stay on the cut-

ting edge of available technology, Fulcrum has

been exploring the differences between three

free solutions for website tagging and tracking

in order to make informed recommendations to

its clients. In this whitepaper we will share simi-

larities, differences, advantages, and disadvan-

tages of the following free platforms: Google

Analytics, Matomo, and Divolte Collector.

INTRODUCTION

“From the simplest website to complex sites with application forms and transactional elements, website tracking and the associat-ed tagging should be a tool used by every organization”

Page 4: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

2

A BRIEF OVERVIEW OF EACH PLATFORM

Google Analytics is the most widely adopted

web analytics platform for a couple of reasons,

the first being that it is free, followed by its ease

of use. Google Analytics requires no tech-savvy

or deep knowledge in order to implement basic

reporting. The basic, free version has preconfig-

ured reports (dashboards) that are intuitive and

informative, and implementation is easy.

Matomo, formerly Piwik, is an open source

website analytics platform that can be used

effectively within the limits of its free capabili-

ties. Matomo tag container is minimally invasive

when editing the website HTML, as containers

can be implemented with Javascript loaded

within the header of the page. Matomo is well

developed for meeting a wide variety of tracking

needs, but may have compatibility issues when

tracking activity from outdated browsers and

operating systems. As a result of the suspected

compatibility issues, Matomo records the low-

est Pageview, Session and User volume of the

three free solutions tested. It is worth noting

that Matomo, like Google Analytics, has a paid

cloud version, which is feature-rich and includes

support.

Divolte Collector was designed to maximize

flexibility through a user-defined schema and to

allow streaming directly into HDFS or Kafka. It

includes a JavaScript resource which is loaded

by the website to collect data about the session.

Divolte Collector tracks Pageviews, generates

IDs at the Session level, Pageview level, and the

User level. Divolte Collector tracks the highest

Pageview, Session and User volume of the three

solutions tested. Divolte Collector currently

does not have its own tagging or reporting

interfaces, but dashboarding interfaces can

be built on top of the data collected data using

customized tools.

Each of these three tools, as implemented, rely

on JavaScript being enabled in the client brows-

er and would fail to track activity if visitors’

browsers had it disabled. However, one should

be able to implement a “pixel” tracker or some

other method to track non-JavaScript clients

with varying accuracy, so this does not differen-

tiate Google Analytics from Matomo or Divolte

Collector in terms of capabilities. That said, if a

visitor’s browser or plugins have settings that

render the visitor untrackable, collected data

may be incomplete.

“... these three platforms do not track website activity identically and will often track differently for different browsers.”

Page 5: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

Matomo and Divolte Collector require database

integration, whereas Google Analytics is cloud-

based and therefore no database configuration

is required. On the plus side for Matomo and

Divolte Collector, the collection and storage of

data within internally configured databases al-

lows for a higher degree of certainty about the

security measures put into place to protect that

data. In some cases, this is the only option for

organizations if the collected data is regulated

or subject to specific security requirements. On

the other hand, there are many potential issues

to troubleshoot with the internal database

setup and configuration that are not a concern

when using Google Analytics, which we discuss

more fully below.

For our testing, all three platforms currently

track activity on both the Fulcrum staging and

production websites. For the purposes of this

whitepaper, traffic collection comparison only

considers activity on our production website,

fulcrumanalytics.com.

Cloud Based

Cookies

UI Built-in

FREE

Internal MySQL

Session Metadata

UI Built-in

FREE (Paid Plugins)

Internal HDFS or Kafka

Cookies

No UI Provided

FREE

JS

$

JS JS

$*$

3

Page 6: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

4

DETAILED COMPARISON

Overview

This paper draws comparisons from the deploy-

ment of three free tagging and tracking solu-

tions, and includes data collected from 2/11/19

to 3/5/19 in the US/Eastern time zone. Google

Analytics data was exported from the Google

Analytics dashboard, while Matomo data was

collected in and retrieved from the Fulcrum

Cloud MySQL database and Divolte Collector

data was collected and retrieved from HDFS.

The metrics for this comparison are Bounce

Rate, Pageviews, Matomo Visits/Google Ana-

lytics Sessions/Divolte Sessions and Matomo

Visitors/Google Analytics Users/Divolte Par-

ty. Pageviews, Visits and Visitors (and near-

est-equivalent metrics from non-Matomo plat-

forms) are viewed on a daily basis, while Bounce

Rate is viewed in aggregate.

It is worth noting that these platforms do not

track website activity identically and will often

track differently for different browsers. Further-

more, Matomo, Google Analytics, and Divolte

Collector define analogous metrics slightly

differently, and data collection methods may

differ as well. Definitions and measurement

nuances for each of the comparison metrics are

described below.

Google Analytics

Bounce Rate

Pageviews

Sessions

Users

Bounce Rate

Pageviews

Visits

Visitors

Bounce Rate

Pageviews

Sessions

Party

Metric Terminology

Matomo Divolte Collector

Page 7: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

5

Pageviews

A Pageview is recorded when the web server

receives a request to load a page, and during

the three weeks of platform comparison, the

three platforms recorded differing Pageview

totals. Theories as to why this happens includes

failure to parse user agents or receive requests

from archaic browsers or operating systems

and differences in the parsing strategies of the

three platforms.

Many of the differences between Matomo and

the other platforms are observed in traffic from

select browsers, operating systems, and ver-

sions thereof, such as Chrome 72 on Windows

10, where Matomo has tracked less activity

than either of the other platforms. Safari 12 on

Macintosh 10.13 is an interesting case where

Google Analytics has tracked far less activity

than Matomo or Divolte Collector. Fulcrum is de-

veloping an experiment to measure and record

the sources of Pageview tracking failures and

misclassifications of activity.

Pageviews

Google Analytics Matomo Divolte Collector

1648

1464

171389%

104%

Definition An instance of a page being loaded

An instance of a page being loaded

An instance of a page being loaded

Page 8: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

6

Bounce Rate (%)

Definition

Google Analytics

The percentage of single-page Sessions

in which there was no interaction with the page

out of all Sessions. A bounced Session has a duration of 0 seconds

The percentage of Visits that only had a single Pageview and no other

on-page interaction

Sessions with one Pageview divided by

all Session

When calculated as Visits with one Pageview

divided by all Visits

Matomo Divolte Collector

Derived Comparison

61%

35%

64.3%

58.91%

Bounce Rate

Bounce Rate is defined similarly between Ma-

tomo and Google Analytics, and is not defined

by Divolte Collector. Matomo’s Bounce Rate

calculates the percentage of Visits that only had

a single Pageview, with no additional tracked

on-page interaction, such as a click on an inter-

active element. Google Analytics’ Bounce Rate

is defined as Sessions with only one Pageview

and no interactions divided by all Sessions.

While these definitions align, Fulcrum’s own

Matomo implementation has been tagged to

track requests that its Google Analytics imple-

mentation does not, so in our own case, Google

Analytics does not receive requests for inter-

actions on pages and records more bounces

than our Matomo implementation. For a like

comparison, Fulcrum calculated a Bounce Rate

using the Matomo collected data while exclud-

ing on-page interactions that Google Analytics,

which currently only tracks Pageviews as imple-

mented, doesn’t receive. Fulcrum also defined

the Divolte Collector Bounce Rate as a Session

based query.

On a per-Visit or per-Session basis, without

considering on-page interactions, the Bounce

Rates calculated from all three platforms are

comparable. However, Matomo’s and Google

Analytics’ dashboards will continue to display

Bounce Rates that account for all tracked activ-

ity, and these differences, in Fulcrum’s case are

the result of differences in implementation, not

differences in the platforms’ measurements or

methodologies.

Page 9: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

A Matomo Visit and Google Analytics Session

essentially represent the same concept: the

collection of user activity on the website during

one use-instance. These concepts are similarly

defined by each platform, however Google Ana-

lytics resets sessions at Midnight (in the Google

Analytics admin’s selected timezone, defined in

the website View Settings) as well as following

30 minutes of inactivity. For global and high

traffic organizations with sessions likely being

recording near Midnight in the designated time

zone, we would then expect activity in Google

Analytics to be falsely defined as belonging to

two use instances. For low traffic or non-global

websites without near-Midnight activity, these

differences in data collection will most likely be

negligible.

One theory for why Matomo tracks fewer

Pageviews, Visits, and Visitors than Google

Analytics or Divolte Collector relates to some

users still using relatively archaic browsers, with

user agents that Matomo is not recognizing.

For example, we observe 60 Pageviews coming

from Chrome 27 in Divolte Collector and Google

Analytics, but none in Matomo, and we observe

86 fewer Pageviews from Chrome 72 in Mato-

mo than in Google Analytics. Google Analytics,

however, has failed to track activity originating

from Safari 12 on Macintosh 10.13. A hypothe-

sised cause for these discrepancies is the differ-

ing strategies by which user agent strings are

parsed in each platform. An experiment to send

test requests with rotating user agent strings

is being developed by Fulcrum to measure the

impact of these parsing differences.

7

Visits / Sessions

Google Analytics Matomo Divolte Collector

854

693

85781%

100.4%

Visits / Sessions

Definition Session: A User’s collection of activity terminating with 30

minutes of inactivity or at Midnight

Visit: A Visitor’s collection of activity terminating

with 30 minutes of inactivity

Session: A Party’s collection of activity terminating with 30 minutes of inactivity

Page 10: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

8

User / Visitor / Party

Identifying a Visitor/User/Party as a returning

user can be tricky, but is often managed with

cookies within Google Analytics and Divolte, or

by recognizing a unique combination of session

metadata such as OS and Version, Browser and

Version, Plugins, Language, and IP address in

Matomo. These strategies each present their

own pros and cons.

Cookies function by storing a small amount

of information in the user’s browser storage,

uniquely identifying the user, allowing the

platform to retrieve the information when a

user returns to the website so the user isn’t

misidentified as new. Cookies also allow the

website server to track different users from

one home or office who may be sharing an IP

address with computers all identically config-

ured, and can recognize them as being unique

rather than bundling an entire home or office

into one ‘user’. However, cookies by nature fail

to recognize when a user has returned from a

new device or new browser and similarly fail to

track returning users when they have disabled

or cleared cookie storage within their browsers.

Recognizing returning users by a collection of

session metadata the way Matomo does is

Google Analytics Matomo Divolte Collector

772

615

787

80%

102%

User / Visitor / Parties

Definition User: recognized by a first party cookie

Visitor: recognized by a com-bination of session data

Party: recognized by a first party cookie

Page 11: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

9

advantageous in the case of a user disabling

cookies but otherwise consenting to tracking.

However, this method is similarly weak in track-

ing return users from new devices or browsers.

Furthermore, if multiple devices with the same

exact configuration are visiting from a single lo-

cation, Matomo may mistakenly consider these

to be one Visitor.

A unique user log-in feature with an identifying

username or email would be the best option for

tracking returning users across devices regard-

less of the platform being used to measure web

traffic. Another user-tracking option would be

to promote content with user-unique links that

enable the connection of a Visitor/User/Party to

an identified person. This way, an organization

can understand an individual user’s website

activity and recognize them as returning should

they use the same link to visit the website from

a new device.

Similar to the Visit/Session analysis above, Ma-

tomo captures about 80% of the unique Visi-

tors compared to Google Analytics and Divolte

Collector’s measurements. Again, we speculate

based on a comparison of log files that Mato-

mo’s incompatibility with older browsers and

versions, or differences in user agent parsing,

prevent it from capturing the full spectrum of

Visitors.

“Another user-tracking option would be to promote content with user-unique links that enable the connection of a Visitor/User/Party to an identified person.”

Page 12: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

Definition Comparison Summary

Google Analytics

An instance of a page being loaded

Sessions with one pageview divided by all sessions

Termed Session: A User’s col-lection of activity terminating with 30 minutes of inactivity or at Midnight

User: recognized by a first party cookie

An instance of a page being loaded

The percentage of visits with only one page view

Termed Visit: A Visitor’s col-lection of activity terminating with 30 minutes of inactivity

Visitor: recognized by a combination of session data including operating system, browser, browser plugins, IP address and browser language

An instance of a page being loaded

The percentage of single-page sessions in which there was no interaction with the page

Termed Session: A Party’s col-lection of activity terminating with 30 minutes of inactivity

Party: recognized by a first party cookie

Pageviews

Bounce

Visit / Session

Visitor / User / Party

Matomo Divolte Collector

10

In summary, while the three platforms measure

Pageviews the same, the other main benchmark

metrics are measured differently across plat-

forms, leading to inconsistent figures which are

important to recognize when choosing a plat-

form and interpreting the data collected.

Page 13: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

A perceived advantage of Matomo or Divolte

Collector over Google Analytics and other

cloud-analytics platforms is the onsite nature of

the tools, as organizations will not be required

to send their data to a third-party cloud. Orga-

nizations can take control over their own data

retention, owning back-up protocol and being

non-reliant on a third party to provide security

controls. This is an important difference for

organizations that may be required to comply

with security and privacy regulations (e.g. HI-

PAA, GDPR). Furthermore, having direct access

to and ownership of, the collected source data

enables customized analytics.

The choice between Divolte Collector and Mato-

mo as an on-site tool and database infrastruc-

ture should depend on the scale of data that the

organization’s website is expected to generate,

as well as the analytic frequency and data ex-

plored. Divolte Collector’s HDFS data storage is

excellent for storing large amounts of data that

may cause performance issues when querying

a MySQL database table, but data pulls and an-

alytical queries from HDFS storage may take a

longer-than-desired time to complete. For most

organizations, either solution is sufficient for

storage capacity. Matomo is more conducive

for frequently updating analytical results, while

Divolte-Hadoop provides the security of distrib-

uted data storage and greater capacity when

MySQL does not perform sufficiently.

On the other hand, on-premise solutions do

incur server maintenance and energy costs,

and such solutions inherently require con-

figuration labor and unit testing, barriers to

implementation that cloud services may have

already addressed and streamlined. Organi-

zations may need to scale a steep learning

curve in implementing or troubleshooting the

on-premise tools. With Google Analytics, due to

its pre-packaging and cloud-based storage, its

implementation, maintenance, and analysis is

relatively fast, easy, free, and does not require

strong technical skills.

Cloud vs. Internal Database

11

“Organizations may need to scale a steep learning curve in implementing or troubleshooting the on-premise tools.”

Page 14: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

Geographic activity tracking is implemented

by looking up the last known location of an

IP address in a regularly-refreshed database.

Companies such as MaxMind and IP2Location

provide these databases for purchase at various

price-points, levels of granularity, and accuracy.

IP-Geolocation database providers are clear

that location mappings provided are approxi-

mate, but these mappings are often suitable for

business needs.

Google Analytics implements geographic track-

ing by deriving an approximate location from

the user’s IP address. Google Analytics’s own

documentation is not transparent about what

IP-Geolocation database it leverages, but it does

note that IP-location mappings can be inaccu-

rate, and it provides Google Analytics users with

the option to upload a custom or third-party

IP-Geolocation database file.

Regarding Matomo’s geographic tracking accu-

racy, Fulcrum leverages the free IP2Location™

LITE IP-COUNTRY-REGION-CITY Database, which

claims greater than 60% accuracy, as an alter-

native to the Matomo default GeoLite2-Cityda-

tabase, which is also free, but claims city-level

accuracy of 50%. Paid IP-Geo-location databas-

es would enhance geographic tracking accuracy

regardless of the database-service used, to

varying degrees.

When geographic tracking is enabled in Divolte,

the platform leverages a subscribed-to Max-

mind geolocation database, such as the free

GEOLite2-City Database. One must only ensure

that the Divolte Collector configuration refer-

ences the correct save-location of the database

on the Divolte Collector machine.

Geo Tracking

12

“IP-Geolocation database providers are clear that location mappings provided are approximate, but these mappings are often suitable for business needs.”

Page 15: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

The best free website analytics platform for any

given company depends on its security needs,

available resources, and desire for detailed

reporting. While Google Analytics stores the

collected data in the cloud, Matomo and Di-

volte Collector allow organizations to maintain

control over the data with its residence within

internal databases. Deciding between Matomo

and Divolte Collector can come down to deter-

mining whether a MySQL or HDFS configuration

is more appropriate.

Inevitably, Google Analytics is therefore much

easier to implement and maintain than the

in-house configurations required for Matomo

and Divolte Collector. However, Google Analyt-

ics, while the easiest to implement, has limited

dashboard reporting options. Matomo’s built-in

dashboard reporting has more flexibility than

Google Analytics but is still not completely flex-

ible. Divolte Collector does not have any built in

reporting but, when used in conjunction with a

reporting tool such as Dash or Superset, a high

degree of dashboard customization is possible.

13

“...depends on its security needs, available resources, and desire for detailed reporting.”

What is the Best Free Platform?

Data Ownership

Maintainance

Good but LimitedDashboard

Implementation not Technical

Data Ownership

Maintainance

Good and Flexible Dashboard

Implementation Moderatelly Technical

Data Ownership

Maintainance

Potentially Flexible Dashboard

Implementation Technically Intensive

-

+

+ -

+/-

+/- +/-

+/-

+/-

+

+

+

Page 16: May 22, 2019 - Fulcrum Analytics · Divolte Collector tracks Pageviews, generates IDs at the Session level, Pageview level, and the User level. Divolte Collector tracks the highest

While the open-source features of Matomo

have proven to create a useful web-analytics

platform for many website-reporting needs,

some desired analyses are only enabled using

paid plugins from the Matomo marketplace.

Looking forward, Fulcrum will continue to

explore the Matomo marketplace of paid and

free plugins, which are designed to unlock the

full potential of the Matomo platform, to assess

which plugins would be useful and cost-effec-

tive for a variety of business cases. Examples of

these plugins-for-purchase include Mouse Heat

Mapping and Scroll tracking, as well as detailed

Media analytics.

Additionally, in the coming months, Fulcrum

will design and execute a controlled experiment

to hone in on the differences between Mato-

mo, Google Analytics, and Divolte Collector

when tracking website activity across different

browsers and versions, operating systems, and

versions thereof. Website analytics is never

100% accurate, but experimentation will be

key to understanding these platforms’ relative

strengths and weaknesses, and how to utilize

each of them to maximize the intelligence they

can provide to guide investment in website fea-

tures, content, and campaigns.

Stay tuned for future extensions of our analysis!

In the meantime, contact us with any questions

you may have about how to improve upon your

organization’s website tagging and tracking.

14

fulcrumanalytics.com

[email protected]

(212) 651-7000

facebook.com/fulcrumanalytics

linkedin.com/company/fulcrum-analytics

Next Steps