Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
May 22, 2019
TABLE OF CONTENTS
INTRODUCTION
A BRIEF OVERVIEW OF EACH PLATFORM
DETAILED COMPARISON
Overview
Pageviews
Bounce Rate
Visits / Sessions
User / Visitor / Party
Definition Comparison Summary
Cloud Vs. Internal Database
Geo Tracking
WHAT IS THE BEST FREE PLATFORM?
NEXT STEPS
1
2
4
4
5
6
7
8
10
11
12
13
14
©2019 Fulcrum Analytics, Inc. All Rights Reserved.No material may be reproduced in any form without the written permission of Fulcrum Analytics.
1
There are many valuable reasons for tracking
website data, which include understanding the
appeal and clarity of website content, funneling
traffic to specific calls to action, tracking cam-
paign response, measuring engagement among
existing customers, identifying geographic
reach, and so much more. From the simplest
website to more complex sites with application
forms and transactional elements, website
tracking and the associated tagging and report-
ing should be a tool used by every organization.
For companies who are not yet fully tracking
website behavior with reporting metrics, or
who are at the earliest stages of adoption, there
can be many questions. Chief among these,
is often which software should be deployed?
There are both free tools and tools available for
a fee. There are tools with built in basic report-
ing, tools with built in advanced reporting, and
some with no reporting whatsoever. While
some store collected data in a cloud, which can
be problematic if the data includes sensitive
information, other tools allow for collected data
to reside on servers within an organization’s
own environment. Additionally, there is a range
of variations in features, assistance levels, and
ease of implementation between the available
options.
Fulcrum not only measures traffic on its own
website, but also implements website tagging
and tracking for clients as one of its services. As
a proponent of the use of open source software
when possible, both to optimize investment
dollars as well as to continually stay on the cut-
ting edge of available technology, Fulcrum has
been exploring the differences between three
free solutions for website tagging and tracking
in order to make informed recommendations to
its clients. In this whitepaper we will share simi-
larities, differences, advantages, and disadvan-
tages of the following free platforms: Google
Analytics, Matomo, and Divolte Collector.
INTRODUCTION
“From the simplest website to complex sites with application forms and transactional elements, website tracking and the associat-ed tagging should be a tool used by every organization”
2
A BRIEF OVERVIEW OF EACH PLATFORM
Google Analytics is the most widely adopted
web analytics platform for a couple of reasons,
the first being that it is free, followed by its ease
of use. Google Analytics requires no tech-savvy
or deep knowledge in order to implement basic
reporting. The basic, free version has preconfig-
ured reports (dashboards) that are intuitive and
informative, and implementation is easy.
Matomo, formerly Piwik, is an open source
website analytics platform that can be used
effectively within the limits of its free capabili-
ties. Matomo tag container is minimally invasive
when editing the website HTML, as containers
can be implemented with Javascript loaded
within the header of the page. Matomo is well
developed for meeting a wide variety of tracking
needs, but may have compatibility issues when
tracking activity from outdated browsers and
operating systems. As a result of the suspected
compatibility issues, Matomo records the low-
est Pageview, Session and User volume of the
three free solutions tested. It is worth noting
that Matomo, like Google Analytics, has a paid
cloud version, which is feature-rich and includes
support.
Divolte Collector was designed to maximize
flexibility through a user-defined schema and to
allow streaming directly into HDFS or Kafka. It
includes a JavaScript resource which is loaded
by the website to collect data about the session.
Divolte Collector tracks Pageviews, generates
IDs at the Session level, Pageview level, and the
User level. Divolte Collector tracks the highest
Pageview, Session and User volume of the three
solutions tested. Divolte Collector currently
does not have its own tagging or reporting
interfaces, but dashboarding interfaces can
be built on top of the data collected data using
customized tools.
Each of these three tools, as implemented, rely
on JavaScript being enabled in the client brows-
er and would fail to track activity if visitors’
browsers had it disabled. However, one should
be able to implement a “pixel” tracker or some
other method to track non-JavaScript clients
with varying accuracy, so this does not differen-
tiate Google Analytics from Matomo or Divolte
Collector in terms of capabilities. That said, if a
visitor’s browser or plugins have settings that
render the visitor untrackable, collected data
may be incomplete.
“... these three platforms do not track website activity identically and will often track differently for different browsers.”
Matomo and Divolte Collector require database
integration, whereas Google Analytics is cloud-
based and therefore no database configuration
is required. On the plus side for Matomo and
Divolte Collector, the collection and storage of
data within internally configured databases al-
lows for a higher degree of certainty about the
security measures put into place to protect that
data. In some cases, this is the only option for
organizations if the collected data is regulated
or subject to specific security requirements. On
the other hand, there are many potential issues
to troubleshoot with the internal database
setup and configuration that are not a concern
when using Google Analytics, which we discuss
more fully below.
For our testing, all three platforms currently
track activity on both the Fulcrum staging and
production websites. For the purposes of this
whitepaper, traffic collection comparison only
considers activity on our production website,
fulcrumanalytics.com.
Cloud Based
Cookies
UI Built-in
FREE
Internal MySQL
Session Metadata
UI Built-in
FREE (Paid Plugins)
Internal HDFS or Kafka
Cookies
No UI Provided
FREE
JS
$
JS JS
$*$
3
4
DETAILED COMPARISON
Overview
This paper draws comparisons from the deploy-
ment of three free tagging and tracking solu-
tions, and includes data collected from 2/11/19
to 3/5/19 in the US/Eastern time zone. Google
Analytics data was exported from the Google
Analytics dashboard, while Matomo data was
collected in and retrieved from the Fulcrum
Cloud MySQL database and Divolte Collector
data was collected and retrieved from HDFS.
The metrics for this comparison are Bounce
Rate, Pageviews, Matomo Visits/Google Ana-
lytics Sessions/Divolte Sessions and Matomo
Visitors/Google Analytics Users/Divolte Par-
ty. Pageviews, Visits and Visitors (and near-
est-equivalent metrics from non-Matomo plat-
forms) are viewed on a daily basis, while Bounce
Rate is viewed in aggregate.
It is worth noting that these platforms do not
track website activity identically and will often
track differently for different browsers. Further-
more, Matomo, Google Analytics, and Divolte
Collector define analogous metrics slightly
differently, and data collection methods may
differ as well. Definitions and measurement
nuances for each of the comparison metrics are
described below.
Google Analytics
Bounce Rate
Pageviews
Sessions
Users
Bounce Rate
Pageviews
Visits
Visitors
Bounce Rate
Pageviews
Sessions
Party
Metric Terminology
Matomo Divolte Collector
5
Pageviews
A Pageview is recorded when the web server
receives a request to load a page, and during
the three weeks of platform comparison, the
three platforms recorded differing Pageview
totals. Theories as to why this happens includes
failure to parse user agents or receive requests
from archaic browsers or operating systems
and differences in the parsing strategies of the
three platforms.
Many of the differences between Matomo and
the other platforms are observed in traffic from
select browsers, operating systems, and ver-
sions thereof, such as Chrome 72 on Windows
10, where Matomo has tracked less activity
than either of the other platforms. Safari 12 on
Macintosh 10.13 is an interesting case where
Google Analytics has tracked far less activity
than Matomo or Divolte Collector. Fulcrum is de-
veloping an experiment to measure and record
the sources of Pageview tracking failures and
misclassifications of activity.
Pageviews
Google Analytics Matomo Divolte Collector
1648
1464
171389%
104%
Definition An instance of a page being loaded
An instance of a page being loaded
An instance of a page being loaded
6
Bounce Rate (%)
Definition
Google Analytics
The percentage of single-page Sessions
in which there was no interaction with the page
out of all Sessions. A bounced Session has a duration of 0 seconds
The percentage of Visits that only had a single Pageview and no other
on-page interaction
Sessions with one Pageview divided by
all Session
When calculated as Visits with one Pageview
divided by all Visits
Matomo Divolte Collector
Derived Comparison
61%
35%
64.3%
58.91%
Bounce Rate
Bounce Rate is defined similarly between Ma-
tomo and Google Analytics, and is not defined
by Divolte Collector. Matomo’s Bounce Rate
calculates the percentage of Visits that only had
a single Pageview, with no additional tracked
on-page interaction, such as a click on an inter-
active element. Google Analytics’ Bounce Rate
is defined as Sessions with only one Pageview
and no interactions divided by all Sessions.
While these definitions align, Fulcrum’s own
Matomo implementation has been tagged to
track requests that its Google Analytics imple-
mentation does not, so in our own case, Google
Analytics does not receive requests for inter-
actions on pages and records more bounces
than our Matomo implementation. For a like
comparison, Fulcrum calculated a Bounce Rate
using the Matomo collected data while exclud-
ing on-page interactions that Google Analytics,
which currently only tracks Pageviews as imple-
mented, doesn’t receive. Fulcrum also defined
the Divolte Collector Bounce Rate as a Session
based query.
On a per-Visit or per-Session basis, without
considering on-page interactions, the Bounce
Rates calculated from all three platforms are
comparable. However, Matomo’s and Google
Analytics’ dashboards will continue to display
Bounce Rates that account for all tracked activ-
ity, and these differences, in Fulcrum’s case are
the result of differences in implementation, not
differences in the platforms’ measurements or
methodologies.
A Matomo Visit and Google Analytics Session
essentially represent the same concept: the
collection of user activity on the website during
one use-instance. These concepts are similarly
defined by each platform, however Google Ana-
lytics resets sessions at Midnight (in the Google
Analytics admin’s selected timezone, defined in
the website View Settings) as well as following
30 minutes of inactivity. For global and high
traffic organizations with sessions likely being
recording near Midnight in the designated time
zone, we would then expect activity in Google
Analytics to be falsely defined as belonging to
two use instances. For low traffic or non-global
websites without near-Midnight activity, these
differences in data collection will most likely be
negligible.
One theory for why Matomo tracks fewer
Pageviews, Visits, and Visitors than Google
Analytics or Divolte Collector relates to some
users still using relatively archaic browsers, with
user agents that Matomo is not recognizing.
For example, we observe 60 Pageviews coming
from Chrome 27 in Divolte Collector and Google
Analytics, but none in Matomo, and we observe
86 fewer Pageviews from Chrome 72 in Mato-
mo than in Google Analytics. Google Analytics,
however, has failed to track activity originating
from Safari 12 on Macintosh 10.13. A hypothe-
sised cause for these discrepancies is the differ-
ing strategies by which user agent strings are
parsed in each platform. An experiment to send
test requests with rotating user agent strings
is being developed by Fulcrum to measure the
impact of these parsing differences.
7
Visits / Sessions
Google Analytics Matomo Divolte Collector
854
693
85781%
100.4%
Visits / Sessions
Definition Session: A User’s collection of activity terminating with 30
minutes of inactivity or at Midnight
Visit: A Visitor’s collection of activity terminating
with 30 minutes of inactivity
Session: A Party’s collection of activity terminating with 30 minutes of inactivity
8
User / Visitor / Party
Identifying a Visitor/User/Party as a returning
user can be tricky, but is often managed with
cookies within Google Analytics and Divolte, or
by recognizing a unique combination of session
metadata such as OS and Version, Browser and
Version, Plugins, Language, and IP address in
Matomo. These strategies each present their
own pros and cons.
Cookies function by storing a small amount
of information in the user’s browser storage,
uniquely identifying the user, allowing the
platform to retrieve the information when a
user returns to the website so the user isn’t
misidentified as new. Cookies also allow the
website server to track different users from
one home or office who may be sharing an IP
address with computers all identically config-
ured, and can recognize them as being unique
rather than bundling an entire home or office
into one ‘user’. However, cookies by nature fail
to recognize when a user has returned from a
new device or new browser and similarly fail to
track returning users when they have disabled
or cleared cookie storage within their browsers.
Recognizing returning users by a collection of
session metadata the way Matomo does is
Google Analytics Matomo Divolte Collector
772
615
787
80%
102%
User / Visitor / Parties
Definition User: recognized by a first party cookie
Visitor: recognized by a com-bination of session data
Party: recognized by a first party cookie
9
advantageous in the case of a user disabling
cookies but otherwise consenting to tracking.
However, this method is similarly weak in track-
ing return users from new devices or browsers.
Furthermore, if multiple devices with the same
exact configuration are visiting from a single lo-
cation, Matomo may mistakenly consider these
to be one Visitor.
A unique user log-in feature with an identifying
username or email would be the best option for
tracking returning users across devices regard-
less of the platform being used to measure web
traffic. Another user-tracking option would be
to promote content with user-unique links that
enable the connection of a Visitor/User/Party to
an identified person. This way, an organization
can understand an individual user’s website
activity and recognize them as returning should
they use the same link to visit the website from
a new device.
Similar to the Visit/Session analysis above, Ma-
tomo captures about 80% of the unique Visi-
tors compared to Google Analytics and Divolte
Collector’s measurements. Again, we speculate
based on a comparison of log files that Mato-
mo’s incompatibility with older browsers and
versions, or differences in user agent parsing,
prevent it from capturing the full spectrum of
Visitors.
“Another user-tracking option would be to promote content with user-unique links that enable the connection of a Visitor/User/Party to an identified person.”
Definition Comparison Summary
Google Analytics
An instance of a page being loaded
Sessions with one pageview divided by all sessions
Termed Session: A User’s col-lection of activity terminating with 30 minutes of inactivity or at Midnight
User: recognized by a first party cookie
An instance of a page being loaded
The percentage of visits with only one page view
Termed Visit: A Visitor’s col-lection of activity terminating with 30 minutes of inactivity
Visitor: recognized by a combination of session data including operating system, browser, browser plugins, IP address and browser language
An instance of a page being loaded
The percentage of single-page sessions in which there was no interaction with the page
Termed Session: A Party’s col-lection of activity terminating with 30 minutes of inactivity
Party: recognized by a first party cookie
Pageviews
Bounce
Visit / Session
Visitor / User / Party
Matomo Divolte Collector
10
In summary, while the three platforms measure
Pageviews the same, the other main benchmark
metrics are measured differently across plat-
forms, leading to inconsistent figures which are
important to recognize when choosing a plat-
form and interpreting the data collected.
A perceived advantage of Matomo or Divolte
Collector over Google Analytics and other
cloud-analytics platforms is the onsite nature of
the tools, as organizations will not be required
to send their data to a third-party cloud. Orga-
nizations can take control over their own data
retention, owning back-up protocol and being
non-reliant on a third party to provide security
controls. This is an important difference for
organizations that may be required to comply
with security and privacy regulations (e.g. HI-
PAA, GDPR). Furthermore, having direct access
to and ownership of, the collected source data
enables customized analytics.
The choice between Divolte Collector and Mato-
mo as an on-site tool and database infrastruc-
ture should depend on the scale of data that the
organization’s website is expected to generate,
as well as the analytic frequency and data ex-
plored. Divolte Collector’s HDFS data storage is
excellent for storing large amounts of data that
may cause performance issues when querying
a MySQL database table, but data pulls and an-
alytical queries from HDFS storage may take a
longer-than-desired time to complete. For most
organizations, either solution is sufficient for
storage capacity. Matomo is more conducive
for frequently updating analytical results, while
Divolte-Hadoop provides the security of distrib-
uted data storage and greater capacity when
MySQL does not perform sufficiently.
On the other hand, on-premise solutions do
incur server maintenance and energy costs,
and such solutions inherently require con-
figuration labor and unit testing, barriers to
implementation that cloud services may have
already addressed and streamlined. Organi-
zations may need to scale a steep learning
curve in implementing or troubleshooting the
on-premise tools. With Google Analytics, due to
its pre-packaging and cloud-based storage, its
implementation, maintenance, and analysis is
relatively fast, easy, free, and does not require
strong technical skills.
Cloud vs. Internal Database
11
“Organizations may need to scale a steep learning curve in implementing or troubleshooting the on-premise tools.”
Geographic activity tracking is implemented
by looking up the last known location of an
IP address in a regularly-refreshed database.
Companies such as MaxMind and IP2Location
provide these databases for purchase at various
price-points, levels of granularity, and accuracy.
IP-Geolocation database providers are clear
that location mappings provided are approxi-
mate, but these mappings are often suitable for
business needs.
Google Analytics implements geographic track-
ing by deriving an approximate location from
the user’s IP address. Google Analytics’s own
documentation is not transparent about what
IP-Geolocation database it leverages, but it does
note that IP-location mappings can be inaccu-
rate, and it provides Google Analytics users with
the option to upload a custom or third-party
IP-Geolocation database file.
Regarding Matomo’s geographic tracking accu-
racy, Fulcrum leverages the free IP2Location™
LITE IP-COUNTRY-REGION-CITY Database, which
claims greater than 60% accuracy, as an alter-
native to the Matomo default GeoLite2-Cityda-
tabase, which is also free, but claims city-level
accuracy of 50%. Paid IP-Geo-location databas-
es would enhance geographic tracking accuracy
regardless of the database-service used, to
varying degrees.
When geographic tracking is enabled in Divolte,
the platform leverages a subscribed-to Max-
mind geolocation database, such as the free
GEOLite2-City Database. One must only ensure
that the Divolte Collector configuration refer-
ences the correct save-location of the database
on the Divolte Collector machine.
Geo Tracking
12
“IP-Geolocation database providers are clear that location mappings provided are approximate, but these mappings are often suitable for business needs.”
The best free website analytics platform for any
given company depends on its security needs,
available resources, and desire for detailed
reporting. While Google Analytics stores the
collected data in the cloud, Matomo and Di-
volte Collector allow organizations to maintain
control over the data with its residence within
internal databases. Deciding between Matomo
and Divolte Collector can come down to deter-
mining whether a MySQL or HDFS configuration
is more appropriate.
Inevitably, Google Analytics is therefore much
easier to implement and maintain than the
in-house configurations required for Matomo
and Divolte Collector. However, Google Analyt-
ics, while the easiest to implement, has limited
dashboard reporting options. Matomo’s built-in
dashboard reporting has more flexibility than
Google Analytics but is still not completely flex-
ible. Divolte Collector does not have any built in
reporting but, when used in conjunction with a
reporting tool such as Dash or Superset, a high
degree of dashboard customization is possible.
13
“...depends on its security needs, available resources, and desire for detailed reporting.”
What is the Best Free Platform?
Data Ownership
Maintainance
Good but LimitedDashboard
Implementation not Technical
Data Ownership
Maintainance
Good and Flexible Dashboard
Implementation Moderatelly Technical
Data Ownership
Maintainance
Potentially Flexible Dashboard
Implementation Technically Intensive
-
+
+ -
+/-
+/- +/-
+/-
+/-
+
+
+
While the open-source features of Matomo
have proven to create a useful web-analytics
platform for many website-reporting needs,
some desired analyses are only enabled using
paid plugins from the Matomo marketplace.
Looking forward, Fulcrum will continue to
explore the Matomo marketplace of paid and
free plugins, which are designed to unlock the
full potential of the Matomo platform, to assess
which plugins would be useful and cost-effec-
tive for a variety of business cases. Examples of
these plugins-for-purchase include Mouse Heat
Mapping and Scroll tracking, as well as detailed
Media analytics.
Additionally, in the coming months, Fulcrum
will design and execute a controlled experiment
to hone in on the differences between Mato-
mo, Google Analytics, and Divolte Collector
when tracking website activity across different
browsers and versions, operating systems, and
versions thereof. Website analytics is never
100% accurate, but experimentation will be
key to understanding these platforms’ relative
strengths and weaknesses, and how to utilize
each of them to maximize the intelligence they
can provide to guide investment in website fea-
tures, content, and campaigns.
Stay tuned for future extensions of our analysis!
In the meantime, contact us with any questions
you may have about how to improve upon your
organization’s website tagging and tracking.
14
fulcrumanalytics.com
(212) 651-7000
facebook.com/fulcrumanalytics
linkedin.com/company/fulcrum-analytics
Next Steps