7/23/2019 Web Data Mart
1/12
A STUDY ON DESIGN AND ANALYSIS
OF WEB MART MINING AND ITS
RELEVANCE TODAY
Ravikumar G K*
*Dr. MGR University, Chennai, Tamilnadu, INDIA,
Manjunath T. N+
+Bharathiar University, Coimbatore, Tamil Nadu, INDIA,
Ravindra S. Hegadi#
#Karnatak University,Dharwad,Karnataka,INDIA,
Archana R A++
++SJB Institute of Technology,Bangalore,Karnataka,INDIA,
Abstract:
Data warehousing is one of the latest trends in computing environment and information technology
applications. A data warehouse is a system that extracts, cleans and delivers source data into dimensional
data store and then supports and implements querying and analysis for the purpose of decision making.
From a data warehouse, data flows to various departments for their customized decision support systems.
These individual departmental components are called data marts. A data mart is a set of dimensional
tables supporting a business process. Data marts contain all atomic detail needed to support drilling down
to the lowest level. Every company or organization in the world has a website. Beneath each web site are
web logs that record every object either posted to or served from the web server. Web logs are important
because they reveal the user traffic on the web site. The activity of parsing web logs and storing the
results in a data mart to analyze customer activity is known as click stream data warehousing. The web
mart - database schema is designed to make the underlying data structure more comprehensible to users
and to simplify the query process. The recommended approach for data warehouse data modeling is to
follow a Dimensional Modeling approach - Star Schema. We explore the design and analysis of web mart
and its relevance today at minute level.
Keywords: Data warehousing, ETL, Web log, Data mart, Web mart.
1. Introduction - Star Schema of the Web Mart
The web mart - database schema is designed to make the underlying data structure more comprehensible to
users and to simplify the query process. The recommended approach for data warehouse data modeling is to
follow a Dimensional Modeling approach-called Star Schema. The star schema has a central fact table with
dimension tables at the points of the star. The single fact tables composite primary key requires a foreign key
field corresponding to the primary key field of each dimension table. The dimension tables are hierarchical and
thus highly denormalised [4] .A fact table is a primary table in the web mart that contain the business facts, and
dimension tables are companion tables to the fact table that represent the business critical dimensions and
contain the attributes for the business critical dimensions. The central fact table provides users the ability to doanalysis on business facts, and dimensional tables provide users the ability to do analysis on these business facts
in various business critical dimensions[10].
The figure-1 presents the overall view of the click stream fact and the associated dimensions.
Ravikumar G K et al. / International Journal of Engineering Science and Technology (IJEST)
ISSN : 0975-5462 Vol. 3 No. 4 Apr 2011 3141
7/23/2019 Web Data Mart
2/12
Fig-1: Star Schema of the Web Mart Design -Click Stream Fact and its Associated Dimensions
2. Description -Web Mart Objects
This section explains about the detailed analytical capabilities of the model by giving the listing of the basic fact
that the user will be able to analyze and the corresponding dimensions which gives user the capability of drill up
and drill down and slicing and dicing on the base fact. Before the design of specific click stream data marts,
there is a need to collect together as many dimensions as one can think of that may have relevance in a click
stream environment. The unique dimensions of the click stream data warehouses are page, visitor, session and
referral. The page dimension describes the page context for a web page event. It contains attributes like page
key, page source, page function. The visitor dimension gives the details regarding visitor. The main attributesare userId, CookieId, Operating System and Browser. The session dimension provides one or more levels of
diagnosis for the visitors session as a whole. For example, the local context of the session might be requesting
product information, but the overall session context might be ordering a product. The referral dimension
describes how the customer arrived at the current page [9] [11].
2.1 Facts and Dimensions in the Web Mart
The following table-1 presents the objects i.e. fact and dimensions available in the web mart for the analysis
purpose.
Table name Fact/Dimension Levels
Click Stream Fact Fact -
Universal Date Dimension Year, Quarter, Month, Week, Day
Universal TOD Dimension Period of the day, Hour, Minute, Second
Date Dimension Year, Quarter, Month, Week, Day
TOD Dimension Period of the day, Hour, Minute, Second
Visitor Dimension IP Address (or) Visitor Id (or) Cookie Id
Page Dimension Object Type, File Type, Page Type, URL (a)
Domain, Site, Directory, URL
Session Dimension Session Type
Click Stream Fact
ObjectDimension
Universal
DateTOD
Dimension
OS
Dimension
Browser
Dimension
Universal
TOD
Visitor
Dimension
Page
Dimension
Session
Dimension
ReferrerDimension
DateDimension
Status
Dimension
GeographyDimension
Ravikumar G K et al. / International Journal of Engineering Science and Technology (IJEST)
ISSN : 0975-5462 Vol. 3 No. 4 Apr 2011 3142
7/23/2019 Web Data Mart
3/12
Referrer Dimension Referrer Type, URL (a)
Domain, Site, URL
Status Dimension Type of the Status, Status description
Visit Dimension -
Content Page Dimension -
Table-1: Objects of Web Mart
2.2 Click Stream Fact Table
2.3 The Table-2 presents various business measures on which the user will be able to do analysisusing the click stream fact table and the associated dimension tables.
Field Name Description
UniversalDatekey Foreign key for the Universal Date dimension
UTODkey Foreign key for the Universal TOD dimension
Datekey Foreign key for the Date dimension
TODkey Foreign key for the TOD dimension
Visitorkey Foreign key for the Visitor dimension
Pagekey Foreign key for the Page dimension
Sessionkey Foreign key for the Session dimension
Referrerkey Foreign key for the Referrer dimension
Statuskey Foreign key for the Status dimension
Visitkey Foreign key for the Visit dimension
ContentPagekey Foreign key for the Content Page dimension
TimeViewed The time spent in seconds, by the Visitor on a particular object like
page, file
BytesTransferred The bytes transferred to the client machine.Table-2: Click Stream Fact Table Description
2.3 Dimensions
The dimension table gives users the ability to analyze the business measures in different dimensions by allowing
the users to drill up and drill down and slice and dice with the attributes of the dimensions. Drilling down is
adding detailed rows to an existing request and is nothing more than requesting to give more detail. Drilling upis subtracting row headers and is nothing more than looking at the data at more aggregated/consolidated form.
Slicing is constraining the data that is displayed on an attribute found in a dimension and dicing is constraining
the data that is displayed by attributes in multiple dimensions [4].
2.3.1 Universal Date Dimension
The figure-2 presents the universal date dimension with all its attributes. This universal date dimension
facilitates analysis along the calendar period with respect to the Greenwich mean time. The hierarchical
attributes of the dimension are represented using the arrow head connections and general attributes are
represented using straight-line connections. Date is the lowest grain level that the user will be able to drill down
to, and year is the highest level that the user will be able to drill up to. Drill down path can be identified by
following the arrow headings.
Ravikumar G K et al. / International Journal of Engineering Science and Technology (IJEST)
ISSN : 0975-5462 Vol. 3 No. 4 Apr 2011 3143
7/23/2019 Web Data Mart
4/12
7/23/2019 Web Data Mart
5/12
The table-4 describes the structure of the Universal TOD dimension table.
Field Name Description Values/Example
UTODkey Primary key for the dimension (Surrogate key) 1,2
UniversalSecond Second of a Minute 1-60
UniversalMinute Minutes of a hour 1-60
UniversalHour Hours of a day 10-11, 11-12 UniversalTODPeriodofDay Collection of hours in a day Morning, Evening
Table-4: Column Description of Universal TOD Dimension
2.3.3 Date Dimension
The figure-4 presents the date dimension with all its attributes. The hierarchical attributes of the dimension are
represented using the arrow head connections and general attributes are represented using straight-line
connections. Date is the lowest grain level that the user will be able to drill down to, and year is the highest level
that the user will be able to drill up to. Drill down path can be identified by following the arrow headings [10]
[1].
Figure-4: Date Dimension with the hierarchy
The table-5 describes the structure of the date dimension table.
Field Name Description Values/Example
Datekey Primary key for the dimension (Surrogate key) 1,2
Date Date 25/01/2000
DayOfWeek Day of the week Sunday
DayOfWeekNumber Day of the week number 1-7, 1 being Sunday
WeekNumber Week number in the month 1-5
Week Week number in the year 1-52MonthDay Day number in the month 1-31
MonthNumber Month of the year in number 1-12
Month Month of the year January
Quarter Quarter of the year 1-4
Year Year of the date 2000
Table-5: Column Description of Date dimension
DATE
DAY NUMBER IN MONTH
MONTH NUMBER
YEAR
WEEK NUMBER IN YEAR
YEAR
QUARTER
MONTH
WEEK OF THE MONTH
DAY OF THE WEEK
DAY NUMBER IN WEEK
Ravikumar G K et al. / International Journal of Engineering Science and Technology (IJEST)
ISSN : 0975-5462 Vol. 3 No. 4 Apr 2011 3145
7/23/2019 Web Data Mart
6/12
2.3.4 TOD Dimension
The figure-5 presents the TOD dimension with all its attributes. The hierarchical attributes of the dimension are
represented using the arrow head connections and general attributes are represented using straight-line
connections. Seconds is the lowest grain level that the user will be able to drill down to, and period of the day is
the highest level that the user will be able to drill up to. Drill down path can be identified by following the arrow
headings [10] [1].
Figure-5: TOD dimension with its attributes
The table-6 describes the structure of the TOD dimension table.
Field Name Description Values/Example
TODkey Primary key for the dimension (Surrogate key) 1,2 Second Second of a Minute 1-60
Minute Minutes of a hour 1-60
Hour Hours of a day 10-11, 11-12
TimeOfDay Time of the day 12:00:55;19:15:25
PeriodofDay Collection of hours in a day Morning, Evening
Table-6: Column description of TOD dimension
2.3.5 Visitor Dimension
The figure-6 presents the Visitor dimension with all its attributes. The hierarchical attributes of the dimension
are represented using the arrow head connections and general attributes are represented using straight-line
connections. User id/ Cookie id/ Domain name is the lowest grain level that the user will be able to drill down
to, and country is the highest level that the user will be able to drill up to. The lowest granularity will be decidedat the client site. Drill down path can be identified by following the arrow headings [10] [1].
Figure-6: Visitor Dimension with its attributes
* - Demographics are collection of many fields. It is also possible to form a hierarchy in the demographic
information. The table-7 describes the structure of the visitor dimension table.
PERIOD OF THE DAY
HOUR
USER ID
OPERATING SYSTEM
BROWSER
COOKIE ID
DEMOGRAPHICS *
COUNTRY
DOMAIN NAME
Ravikumar G K et al. / International Journal of Engineering Science and Technology (IJEST)
ISSN : 0975-5462 Vol. 3 No. 4 Apr 2011 3146
7/23/2019 Web Data Mart
7/12
7/23/2019 Web Data Mart
8/12
The following table-8 describes the various fields of the page dimension.
Field Name Description Values/Example
Pagekey Primary key for the dimension (Surrogate key) 1,2
URL Full path of the page in the server C:\..\index.html
PageName Name of the web page Welcome page, Product
info page
PageType Classification of pages in the web site News pages, Jobs &Career pages
FileType The file type of the object Gif, au, ra, html
ObjectType The type of the object Multimedia files,
Application, Content
pages
FileName Name of the file being accessed by the user Index.html,
ProductInfo.html
Directory The directory in the server, of the accessed file C:\Inetpub\doc
Site The site where the particular page is available
Domain The domain of the site where the page resides
Table-8: Column description of Page Dimension
2.3.7. Session Dimension
The figure-8 presents the session dimension with all its attributes. The general attributes are represented using
straight-line connections; the connection with a circle at one end denotes that the specified item is a collection of
fields. Session type assigns a meaning to a visit [1]
Figure-8: Session Dimension with its attributes
* - Session parameters are collection of fields, which describes the conditions for characterizing a session
type.
The following table-9 describes the various fields of the session dimension.
Field Name Description Values/Example
Sessionkey Primary key for the dimension (Surrogate key) 1,2
SessionType The type of the user session. Session is defined
bases on the business rules
Quick hit and gone,
Product Ordering
SessionDescription The description of a particular session
SessionParameters The parameters that characterize the particularsession. It can be split into multiple fields, based
on the business rules provided by the customer.
Ex: If Time Spent is inthe range of 1-10 min
and the pages visited in
general info or product
info, then it is a
Looking for Info
session.
Table-9: Column Description of Session dimension
SESSION TYPE
SESSION DESCRIPTIONSESSION PARAMETERS *
Ravikumar G K et al. / International Journal of Engineering Science and Technology (IJEST)
ISSN : 0975-5462 Vol. 3 No. 4 Apr 2011 3148
7/23/2019 Web Data Mart
9/12
2.3.7 Referrer Dimension
The figure-9 presents the session dimension with all its attributes. The hierarchical attributes of the dimension
are represented using the arrow head connections and general attributes are represented using straight-line
connections. URL is the lowest grain level that the user will be able to drill down to, and the domain and referrer
type are the highest levels that the user will be able to drill up to. Drill down path can be identified by following
the arrow headings [1].
Figure-8: Session Dimension with its attributes
The following table-10 describes the fields of the referrer dimension.
Field Name Description Values/Example
Referrerkey Primary key for the dimension (Surrogate key) 1,2
ReferringURL The URL of the referring page C:\..\index.html
ReferringSite The Site of the referring page
ReferringDomain The domain of the referring page
Keyword The keyword given by the user as search criteria
to reach the page.
Web mining,
warehousing
ReferrerType The type of the referrer Ad banner, Search
engine
Table-10: Column description of referrer dimension
2.3.8 Status Dimension
The figure-9 presents the status dimension with all its attributes. The hierarchical attributes of the dimension are
represented using the arrow head connections and general attributes are represented using straight-line
connections. Status id is the lowest grain level that the user will be able to drill down to, and the status type is
the highest level that the user will be able to drill up to. Status description provides a description for the status
id. Drill down path can be identified by following the arrow headings [10] [1].
Figure-9: Status dimension with its attributes
URL
KEY WORD
SITE
DOMAIN
REFERRER TYPE
STATUS DESCRIPTIONSTATUS ID
STATUS TYPE
Ravikumar G K et al. / International Journal of Engineering Science and Technology (IJEST)
ISSN : 0975-5462 Vol. 3 No. 4 Apr 2011 3149
7/23/2019 Web Data Mart
10/12
The following table-11 describes the fields of the status dimension.
Field Name Description Values/Example
Statuskey Primary key for the dimension (Surrogate key) 1,2
StatusId The Status Code 101, 201
StatusDescription Description of the Status Successful, File not
found error
StatusType Type of the Status File errors
Table-11: Column Description of Status dimension
2.3.9 Visit Dimension
This dimension has no hierarchy. This dimension is used for identifying the start and end of a visit, it is show in
table-12
Field Name Description Values/Example
Visitkey Primary key for the dimension (Surrogate key) 1,2
Description The value or description. For the start of the visit
it is 'Start' for an end of the visit page 'End'
Start, End
Table-12: Column description of visit dimension
2.3.10 Content Page Dimension
This dimension has no hierarchy. This dimension is used for identifying a page as a content page or not. it is
shown in table-13
Field Name Description Values/Example
ContentPagekey Primary key for the dimension (Surrogate key) 1,2
Description 'Yes' to indicate a content page. 'No' to indicate
other files
'Yes', 'No'
Table-13: Column description of content page dimension
3. The Data Modeling-Star Schema
The recommended approach for data warehouse data modeling is to follow a dimensional modeling approach
i.e. star schema. The star schema has a central fact table with dimension tables at the points of the star. The
single fact table composite primary key requires a foreign key field corresponding to the primary key field of
each dimension table. The dimensional tables are hierarchical and thus highly denormalised. A fact table is a
primary table in the web mart that contain the business facts, and dimension tables are companion tables to the
fact table that represent the business critical dimensions and contain the attributes for the business critical
dimensions[4]. The central fact table provides users the ability to do analysis on business facts, and dimensional
tables provide users the ability to do analysis on these business facts in various business critical dimensions, this
is shown in figure-10.
Ravikumar G K et al. / International Journal of Engineering Science and Technology (IJEST)
ISSN : 0975-5462 Vol. 3 No. 4 Apr 2011 3150
7/23/2019 Web Data Mart
11/12
Figure-10: Star Schema model design for Web mart
4. Results
Now we can proceed to the interesting part of our data warehouse: relieving information.
4.1 The average number of minutes from login to order
4.2 The average number of days from first being invited to the site by email to the first order.
5.Conclusions
Understanding the behavior of users on your website is as valuable as following a customer around a store and
recording his or her every move. Imagine how much more organized your store can be and how many
opportunities you can have to dell merchandise if you know every move customers make while navigating your
store. The ETL process in Clickstream data warehousing is significantly different from any other source you are
likely to encounter.
References[1] Ralph Kimball, The Data Warehouse ETL Toolkit, Wiley India Pvt Ltd., 2006
[2] Dr. K.V.K.K Prasad, Data warehouse Development Tools, Dreamtech Press, 2006
[3] White paper by Vivek R Gupta, Senior Consultant, System Services Corporation,An Introduction to data warehousing.[4] Manjunath T.N, Ravindra S Hegadi, Ravikumar G K."Analysis of Data Quality Aspects in DataWarehouse Systems", International
Journal of Computer Science and Information Technologies, Vol. 2 (1), 2010, 477-485
[5] Manjunath T.N, Ravindra S Hegadi, Ravikumar G K." A Survey on Multimedia Data Mining and Its Relevance Today",
[6] International journal of Computer Science and Network Security. Vol. 10 No. 11 pp. 165-170.
[7] Sanjeevkumar R. Jadhav, and Praveen Kumar Kumbargoudar, Multimedia Data Mining in Digital Libraries: Standards and
[8] Features in Proc. READIT-2007, p. 54.[9] Shu-Ching Chen, Mei-Ling Shyu, Chengcui Zhang, and Jeff Strickrott, "Multimedia Data Mining for Traffic Video Sequences,"
Proceedings of the Second International Workshop on Multimedia data Mining MDM/KDD'2001), in conjunction with the Seventh
Ravikumar G K et al. / International Journal of Engineering Science and Technology (IJEST)
ISSN : 0975-5462 Vol. 3 No. 4 Apr 2011 3151
7/23/2019 Web Data Mart
12/12
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 78-85, August 26, 2001, San Francisco, CA,
USA.
[10] Valery A. Petrushin and Latifur Khan, Multimedia Data Mining and Knowledge Discovery, 2007 - London: Springer-Verlag, pp.
[11] 3- 17[12] S. Kotsiantis, D. Kanellopoulos, P. Pintelas, Multimedia Mining, SEAS Transactions on Systems, Issue 10, Volume 3, December
2004, pp. 3263-3268
[13] Sanjiv Purba Data Management Handbook Published by CRC Press, 1999[14] Bhavani M. Thuraisingham, Data Management Systems: Evolution and Interoperation, Published by CRC Press, 1997
[15] Jiawei Han, Micheline Kamber Data Mining: Concepts and Techniques Published by Morgan Kaufmann, 2001
[16]
Sanjeevkumar R. Jadhav, and Praveen Kumar Kumbargoudar, Multimedia Data Mining in Digital Libraries: Standards and Features,ACVIT- 07, Dr. Babasaheb Ambedkar MarathWada University, Aurangabad,MS-India
[17] Mori Y, Takahashi H, Oka R. Image-to-word transformation based on dividing and vector quantizing images with words. In:
MISRM99 First International Workshop on Multimedia Intelligent Stotage and Retrievel management, 1999.[18] Ordenoz C, Omiecinski E. Discovering association rules based on image content. In: ADL 99: Proceedings of the IEEE Forum on
Research and Technology Advances in Digital libraries. Washington, DC: IEEE Computer Society; 1999, p.38.
[19] Chakrabarti, S. (2000): Data mining for hypertext: A tutorial survey. SIGKDD explorations, 1(2), pp. 111.
[20] Ravikumar G K, Manjunath T. N, Ravindra S. Hegadi, Umesh I.M, Cross Industry Survey on Data mining Applications, InternationalJournal of Computer Science and Information Technologies, Vol. 2 (2) , 2011, 624-628.
Authors Profile
Ravikumar GK. received his Bachelors degree from Siddaganga Institute of Technology, Tumkur (Bangalore
University) during the year 1996 and M. Tech in Systems Analysis and Computer Application from Karnataka
Regional Engineering College Surthakal (NITK) during the year 2000. He is currently working towards his PhD
degree in the Area of Data mining. He has published several papers in International and national levelconferences. He is having around 14 years of Professional experienced which includes Software Industry and
teaching experience. His area of interests are Data Warehouse & Business Intelligence, multimedia and
Databases.
Manjunath T N. received his Bachelors Degree in computer Science and Engineering from Bangalore
University, Bangalore, Karnataka, India during the year 2001 and M. Tech in computer Science and Engineering
from VTU, Belgaum, Karnataka, India during the year 2004. Currently pursing Ph.D degree in Bharathiar
University, Coimbatore. He is having total 10 years of Industry and teaching experience. His areas of interests
are Data Warehouse & Business Intelligence, multimedia and Databases. He has published and presented papers
in journals, international and national level conferences.
Dr.Ravindra S Hegadi received his Master of Computer Applications (MCA) & M.Phil and Doctorate of
Philosophy (Ph.D). in year 2007 in computer science from Gurbarga University, Karnataka; He is having 15years of Experience. He has visited overseas to various universities as SME.His area of interests are Image
Mining, Image Processing and Databases and business intelligence. He has published and presented papers in
journals, international and national level conferences.
Archana.R.A received her Bachelors Degree in computer Science and Engineering from VTU ,Belgaum,
Karnataka, India during the year 2007 and Master of Technology in year 2010 in computer science from
VTU,Belgaum,Karnataka,India,she is working in SJB Institute of Technology,Bangalore,Karnataka,India.she is
having 3 years of Experience.Her area of interests are Image Mining, Image Processing and Databases and
business intelligence. She has published and presented papers in journals, international and national level
conferences.
Ravikumar G K et al. / International Journal of Engineering Science and Technology (IJEST)
ISSN 0975 5462 V l 3 N 4 A 2011 3152