Upload
lyquynh
View
217
Download
1
Embed Size (px)
Citation preview
OLAP and Visualisation
University of Manchester
School of Computer Science
Third Year Project Report 2013-2014
Author: Milen Kindekov
Supervisor: Professor John Keane
Abstract
Online Analytical Processing, a broader category of Business Intelligence, deals with the
analysis of multidimensional data in an interactive manner from different perspectives. It allows
analysts to explore vast amounts, giving them the ability to ask questions and receive immediate
answers. This report discusses the major Business Intelligence and OLAP tools available
commercially and builds upon techniques and ideas appearing in recent academic research on the
topic. An application is developed, utilising an agile approach and using a variety of tools and
methodologies, implementing an exploratory driven browser for the hierarchical nature of OLAP
cubes. The data within them is displayed using 2D visualizations – interactive
TreeMap/HeatMap with a categorical time-series analyser and alternatively using an
experimental interactive 3D display. The design and implementation details of this application,
in the report, are followed by an overview of the testing methodologies used in the process of
development. A final evaluation of the implemented visualizations is made, along with a final
discussion on possible work that would add value to the developed application in future.
Acknowledgements
I would like to express my gratitude to my supervisor, Professor John Keane, for his support and
clear guidance throughout this project – providing me with a continuous stream of background
literature to review and for pushing me past my initial goals.
I would also like to thank my father for his unwavering support and ideas that helped me
overcome some of the challenges I faced during the project, as well as for the time he took to
help proofread this report.
Lastly, I want to add a thank you to all my family and friends for the encouragement, patience
and understanding they have shown me.
Table of Contents
1. Introduction
1.1 Business Intelligence and OLAP
1.2 Graphical Analysis in BI
1.3 Project Aim
1.4 Report Structure
2. Background
2.1 Business Intelligence Tools
2.1.1 BI Commercial Solutions
2.1.2 OLAP Specific Toolkits
2.1.3 Academic Solutions
2.2 OLAP vs. Relational Database Sourcing for Visualization
2.3 OLAP Servers
2.3.1 Overview
2.3.2 Server Comparison
2.4 Big Data
3. Approach
3.1 Overview
3.2 Project Environment and Technologies
3.2.1 Platform
3.2.2 Technologies
3.2.3 Sample Data
3.3 Agile Processes
3.3.1 Agile Practices Selection
3.3.2 Agile Practices Application
4. Design
4.1 Overview
4.2 Design Principles
4.3 Design Patterns
4.4 Project Set-Up Step Activities
4.5 Abstract High-Level System Design
4.6 Side Module Design
4.7 Main Module Design
5. Implementation
5.1 Overview
5.2 Base System Framework
5.3 UI Templatability
5.4 Component Interaction
5.5 Database and Cube Selector
5.6 OLAP Metadata Retrieval
5.7 Cube Browser
5.8 Main Screen
5.9 OLAP Data Retrieval, Query Mechanism.
5.10 TreeMap Data Visualizer
5.11 Categorical Time-Series Analyser
5.12 3D Data Visualizer
6. Testing and Evaluation
6.1 Overview
6.2 Testing
6.2.1 Unit Tests
6.2.2 Test-driven Development (TDD)
6.3 Evaluation
6.3.1 OLAP Operations Comparison
6.3.2 Performance Profiling
7. Conclusion
7.1 Project Achievements
7.2 Future Development
7.2.1 Multiple OLAP Data Sources
7.2.2 Additional 2D Graphics
7.2.3 Algorithm Optimization
7.2.4 Library Inefficiencies
7.2.5 3D Visualisation Functionality Expansion
Appendix
Appendix A: Term Glossary
A.1 OLAP Terms
A.2 Prism Framework Terms
Appendix B: OLAP Operations
B.1 Pivot
B.2 Drill Down / Drill Up
B.3 Slice
B.4 Dice
B.5 Drill-Through
Appendix C: OLAPVisi Step-Through Example
Bibliography
Table of Figures
Figure 1 - Borders, Patterns, Trends and Deviations in Data Flow ................................................ 9
Figure 2 – Tibco Spotfire Example #1 (10) .................................................................................. 11
Figure 3 - Tibco Spotfire Example #2 (10) ................................................................................... 12
Figure 4 - Tableau Software Example #1 (11) .............................................................................. 13
Figure 5 - Tableau Software Example #2 (11) .............................................................................. 13
Figure 6 - Tableau Software Bubble Graph (11) .......................................................................... 13
Figure 7 - Tableau Software TreeMap Graph (11) ....................................................................... 13
Figure 8 - Telerik RadControls (12) ............................................................................................. 14
Figure 9 - GapMinder categorical time-series main screen (21) .................................................. 15
Figure 10 - Star Schema (24) ........................................................................................................ 17
Figure 11 - Snowflake Schema (24) ............................................................................................. 18
Figure 12 - Project Environment................................................................................................... 21
Figure 13 - Helix 3D Toolkit Examples (51) ................................................................................ 24
Figure 14 - AgileBoard YouTrack (42) ........................................................................................ 26
Figure 15 – Project folder ............................................................................................................. 27
Figure 16 – Learning Spike Week 7 ............................................................................................. 27
Figure 17 – Learning Spike Week 8 ............................................................................................. 27
Figure 18 - TeamCity Flow (41) ................................................................................................... 27
Figure 19 - Composite Application Design Patterns (57) ............................................................. 29
Figure 20 - MVVM Pattern Architecture (57) .............................................................................. 30
Figure 21 - Prism Composite Application Creation Activities (57) ............................................. 32
Figure 22 - Abstract high-level module view ............................................................................... 33
Figure 23 - Side module design diagram ...................................................................................... 34
Figure 24 - Main module design diagram ..................................................................................... 35
Figure 25 - Connecting to the Prism Framework (57) .................................................................. 36
Figure 26 - Merging Generic Resources into Views .................................................................... 37
Figure 27 - Original and Templated Window Style ...................................................................... 37
Figure 28 - Templated Button Results Example ........................................................................... 38
Figure 29 - Default and Templated TreeView Control ................................................................. 38
Figure 30 - Publish/Subscribe Event Aggregator Service (57) ..................................................... 39
Figure 31 - Database Selector Pop-Up.......................................................................................... 39
Figure 32 - Cube Selector Pop-Up ................................................................................................ 39
Figure 33 - Cube Browser View Example #1 ............................................................................... 41
Figure 34 - Cube Browser View Example #2 ............................................................................... 41
Figure 35 - MainGraphView.xaml Data Template Selector. Selects an appropriate UI template
based on the view model type passed to the view. ....................................................................... 42
Figure 36 - Cube Choice Selector with user choices over main screen. ....................................... 43
Figure 37 - WPF Treemaps & SquarifiedTreeMaps Control Example - original library
capabilities (47) ............................................................................................................................. 44
Figure 38 - OlapVisi TreeMap visualisation - provides control over the element depth,
breadcrumb bar at top saving context, functionality to drill down/drill up and slice into the data.
....................................................................................................................................................... 45
Figure 39 - Categorical Time-Series data browser and filtering tool. .......................................... 46
Figure 40 - 3D OLAP visualizer application example.................................................................. 47
Figure 41 - HSV Cylindrical-Colour Model - Hue, Saturation, Value (66) ................................. 48
Figure 42 - TDD Created Test Extract .......................................................................................... 50
Figure 43 - Main functions performance time profile .................................................................. 52
Figure 44 - Main 3D element memory usage profile .................................................................... 53
Figure 45 - Pivot OLAP Operation ............................................................................................... 58
Figure 46 - Drill Down/Drill Up OLAP Operation ...................................................................... 58
Figure 47 - Slice OLAP Operation ............................................................................................... 59
Figure 48 - Dice OLAP Operation ................................................................................................ 59
Figure 49 - Initial application screen. The initial steps to take are to select a database and then
select a cube from that database to explore. ................................................................................. 60
Figure 50 - Prompt for cube selection........................................................................................... 60
Figure 51 - The user can explore the cube browser, selecting specific hierarchies for
visualisation. ................................................................................................................................. 61
Figure 52 - Selected choices populate a slide-out menu, indicating the types of OLAP data. In the
case of hierarchies: the number of levels contained within. ......................................................... 61
Figure 53 – By clicking the middle add button, a menu visualising the types of graphs available
is presented.................................................................................................................................... 62
Figure 54 - An initial view of the TreeMap graph visualisation. The user populates the data
specifics and uses the Plot button to render the visualisation. ...................................................... 62
Figure 55 - All additional controls are retractable and can be hidden away. Elements can be
hovered over and selected. Drilling down displays data at a lower level of the hierarchy. .......... 63
Figure 56 - Slicing into the data, the selected element is maximised. The context is saved and is
displayed to the user by means of a breadcrumb at the top of the graph. ..................................... 63
Figure 57 - After a slice, moving up in levels is done via a triangle button in the top right corner.
....................................................................................................................................................... 64
Figure 58 - The categorical time-series analyser tool located at the bottom of the graph. A
specific date level is selected for exploration. The user can slide through the members of the
level, automatically filtering the data and re-rendering the graph above it. ................................. 64
Figure 59 - Initial 3D view of the data. The user has selected the hierarchies and measures they
want to visualise and has clicked on the Map button.................................................................... 65
Figure 60 - Selecting a 3D element displays a legend with additional info about the member. The
member is localised on screen by greying out all other elements................................................. 65
Figure 61 - Selecting a corner level sphere, an additional prompt with information about the axis
is displayed. It provides the OLAP functionality needed to explore the data further. .................. 66
Figure 62 – Slicing on a specific member. ................................................................................... 66
Figure 63 - Drilling down into a specific member (Accessories) from previous step. ................. 67
Figure 64 - Displaying a large amount of elements for the lower levels of the data is not a
problem for the tool. This data can be looked at more thoroughly using the slice functionality
provided. ....................................................................................................................................... 67
Table of Tables
Table 1 - OLAP operation visualisation capabilities .................................................................... 51
Table 2 - Performance profile specification. X, Y, Z dimension member amounts. .................... 52
1. Introduction
1.1 Business Intelligence and OLAP
Business Intelligence (BI) software encompasses a set of support technologies with the main
purpose of providing knowledge for the enterprise. It allows and enables the people responsible
for this knowledge – executives, managers and analysts to make faster and better decisions (1).
With the decrease of cost related to data acquirement and storage, businesses have focused on
increasing their competitive edge by procuring and storing as much as possible for analysis. This
data does not only grow in volume, but also in complexity, thus creating the need for different
multidimensional models, which more closely reflect the decision makers’ analytical view of it
(2).
Online analytical processing (OLAP) tools that support multi-dimensional analysis and decision
making within the context of Business Intelligence are now extremely common. These are a set
of analysis techniques allied to visualization approaches that allow information to be viewed
from many angles such as summarization, consolidation and aggregation (3). OLAP is flexible
and powerful with an ability to navigate easily between different views of the data. It is
interactive in the sense that analysts can constantly ask new questions and get immediate answers
back from the data. Singh and Bhasin (2011) state that OLAP cannot be achieved unless the data
analysis application used returns results of queries immediately (4), which means that
performance is a key factor for OLAP.
Based on this information, vital components to any BI OLAP System are its underlying server
architecture – providing latency-free access to the data, along with the analytics software front-
end that is intuitive for the user and provides the functionality needed for the decision making
process.
1.2 Graphical Analysis in BI
Many methods for the exploration and analysis of this data have been researched over the past
two decades with a particular focus on Business Analytics through graphical and visual
interfaces and tools. Humans possess great cognitive skills, both visual and spatial – edge and
discontinuity detection; pattern recognition; the use of visual cues for information retrieval (5).
The analysis of data visually therefore has great advantages over many other existing analytic
techniques. A number of visual interfaces have been witnessed for OLAP lately, ranging from
dashboards, charts, maps and scatterplots to interactive features - brushing and filtering, slicing,
zooming.
Figure 1 - Borders, Patterns, Trends and Deviations in Data Flow
Presenting the data visually allows for experts to quickly reveal patterns and recognize trends or
deviations in the normal flow of the data (Figure 1). They can visually specify a part they are
interested in or have a hypothesis about and interact with it directly (6).
1.3 Project Aim
This project takes into account the aforementioned with the goal to create a modular solution
package for the .NET framework to be used for the analysis of OLAP data cubes with both
extensibility and scalability in mind. The primary focus was on various types of visualization
techniques that make analysis and pattern-recognition of complex data sets fast and easy. A
selection of ideas from various business intelligence software and toolkits were consolidated into
one, with the possibility to build up from this and create an application that can be reused in the
future. Additional research into recent academic work was done in order to refine some of these
ideas, experiment with them and produce a viable OLAP visualization application.
The main techniques used are ones related to the hierarchical visualisation of data in data cubes
(6) (7) (8), with additional categorical time-series analysis for additional dimensionality , as well
as visualising the same data using a 3D OLAP interface (9). The main project aims were
specified as follows:
Create an end-to-end Business Intelligence solution.
Provide various OLAP visualization capabilities.
Provide a polished and user-friendly interface.
Create an extensible solution for future improvements.
1.4 Report Structure
A brief overview of the rest of the report structure follows:
Chapter 2: Background
This chapter provides a description covering the current commercial and academic trends in the
Business Intelligence domain, which includes a look into the leading BI solutions currently on
the market. A comparison is made between typical OLAP servers and Relational Databases and
the benefits of choosing the right server solution when developing such a visualization
application. Furthermore, a discussion on the current Big Data trend is included about how it
affects and will affect the field in the future.
Chapter 3: Approach
This chapter covers the approach taken in the development of the project. This includes the
choices of platforms and technologies, as well as agile methodologies and practices used
throughout the course of this project.
Chapter 4: Design
This chapter gives details about the design methodologies, patterns used in the project and
specifics of the main framework capabilities that affect the design of the project. It finishes with
a look into the high-level abstract view of the workings of the system
Chapter 5: Implementation
This chapter details the implementation specifics of the main system components – how they
function and connect with the rest of the system.
Chapter 6: Testing and Evaluation
This chapter discusses the main testing practices and methodologies used for the project.
Furthermore, it includes an evaluation of the different visualisations provided in the application
in terms of their performance specifics and the functionality they provide in OLAP terms.
Chapter 7: Conclusion
This chapter describes the conclusions drawn at the end of the project in terms of achievements
both on a project and on a personal level. It also includes a description of possible future
extensions and developments that would increase the value of the developed application.
2. Background
2.1 Business Intelligence Tools
2.1.1 BI Commercial Solutions
Many commercial and open source general-purpose Numerical Data Visualization solutions
exist. These are products designed to be used across different industries for various visualization
and analytical tasks. They tend to include state-of-the-art visualization techniques that are tested
and guaranteed for the job of visual analysis of large amounts of data. Studying these solutions
provides answers and possible approaches when developing any type of visualization software.
Two of the most popular commercial solutions are Tibco Spotfire (10) and Tableau (11).
Tibco Spotfire - provides interactive analysis of multidimensional data, with a wide-range of
possible visualizations available. Relating to personal experiences, this product has been
practically used at Merrill Lynch, showcasing great capabilities with positive feedback coming
directly from management on the analysis done with it.
An example of data visualization for airline incident data observed in Spotfire (Figure 2) shows
different types of viable ideas related to the UI element distribution across the screen. There is a
flexible filtering system and measure selection criteria. The filter module can be undocked and
dragged across the screen, providing increased UI flexibility. Interactive data visualization,
zooming and multiple-selection connected to a representation of the raw data along with the
ability to export the chart images themselves.
Figure 2 – Tibco Spotfire Example #1 (10)
Another example of data visualization in Spotfire (Figure 3) includes a scatter plot graph. There
is easy to use filtering and interactive data selection. Legend and visualization change
dynamically based on filtering set. Status bar displays load and error information when changing
between different graphs. The demonstration page on the Spotfire website includes many other
different examples of possible data visualizations.
Figure 3 - Tibco Spotfire Example #2 (10)
Tableau Business Intelligence and Analytics Software – provides a powerful business solution
for big data analysis, data discovery, easy creation of business dashboards, data visualization,
mobile BI etc. A highlight of Tableau is its ease of use, which means that more time can be spent
on analysing the data and answering questions rather than on learning how to use the tool itself.
Tableau can connect to a wide range of data sources providing both server and file connections.
It supports live connections to the DB or the ability to bring data into memory for faster or
offline work. An interesting feature is the ability to easily combine multiple data sources for a
single analysis and visualization task, automatically blending the data on common fields and
filtering across data sources in real time.
Tableau uses a tabular structure with different worksheets that have varying functions. There is a
worksheet view (Figure 4) that connects to different data sources. Dimensions and measures are
automatically recognized and additional measures can be created by the user. Visualizations are
created by dragging and dropping the different dimensions and measures onto the specific fields.
Filtering, colouring schemes and levels of detail can be specified easily.
Each created chart from the data source can then be added to a dashboard view (Figure 5) and
positioned according to the users’ needs. If common fields exist between these visualizations,
filtering and highlighting data on the dashboard will automatically filter and highlight the data
between the different charts.
Figure 4 - Tableau Software Example #1 (11)
Figure 5 - Tableau Software Example #2 (11)
Tableau continuously improves, releasing support for a selection of visual graph options for the
representation of the data (Figure 6). A useful functionality very recently added into the software
is the visualization selection menu (Figure 7). This menu guides the user by showing them the
available charting options for the type of data he has selected, as well as providing them with an
easy visual way to change charts quickly and hassle free.
Figure 6 - Tableau Software Bubble Graph (11)
Figure 7 - Tableau Software TreeMap Graph (11)
Telerik RadControls - another commercial product, which provides a customizable framework
for WPF (Windows Presentation Foundation) applications, with great visualization capabilities
(12). This is not a full-blown BI software package, but rather a flexible plug-in for any
application that requires charting functionality. This example (Figure 8) shows the possibilities
when creating visualizations for the .NET Framework.
Figure 8 - Telerik RadControls (12)
The greatest set-back with this type of commercial software is the high overhead costs involved
with utilizing them in practice. Licenses are extremely expensive and additional resources are
needed for maintenance and training including the need for a software expert on-site. This means
that these solutions are viable only for very large firms where the benefits of use of
comprehensive and professionally supported technologies outweigh the high-costs (5).
2.1.2 OLAP Specific Toolkits
Framework libraries exist that provide OLAP functionality to be plugged into existing projects.
These solutions have a similar generic look and feel to one another. Some examples being: Ranet
OLAP component library (13), Sharpshooter OLAP (14), RadarCube WinForms OLAP (15),
ComponentOne WinForms OLAP (16).
A common occurrence in the OLAP toolkits for visualization listed here is that they are all based
on Excel’s PivotTable component. They typically provide the functionality seen in Excel where
the users have to navigate through data situated in tables, along with some simple charting
controls and exporting capabilities.
A major aim for this project is to try and move away from the technique used in the existing
OLAP functionality components on the market by employing different types of methods and
visualizations that ease the user experience when analysing OLAP data.
2.1.3 Academic Solutions
Various approaches have been proposed and exist for the appropriate arrangement and
visualisation of data elements of OLAP cubes, as well as their relationships. These vary in the
applied techniques and some are used as the base for the application developed for this project.
Viewing data cubes with a multi-scale visualization system (17).
High-dimensional data cubes in a hyperbolic space (18).
Built-in graphic elements within varying data levels of data cubes (19).
Hierarchical visualisation of data cubes using Enhanced Decomposition Trees (6) (7) (8).
Data cubes displayed using Hierarchical Difference Scatterplots (20).
Immersive 3D interface for data cubes (9).
According to Mansmann et al. (6) (7) (8), the basis of explorative analysis is related to insights
acquired in the course of interaction. The ability to visualize data in a hierarchical nature allows
for the preservation of the entire interaction, thus allowing the user to more naturally
comprehend the data being explored. This approach is used in this project to explore the structure
of OLAP cubes, both in a textual browser (Section 5.7) and as a TreeMap visualisation (Section
5.10).
Figure 9 - GapMinder categorical time-series main screen (21)
The visualisation of hierarchical data using TreeMaps allows for only one dimension and up to
two measures to be displayed at a time. In order to improve upon this, the following two
applications were researched additionally:
GapMinder – (Figure 9) a state-of-the-art application providing interactive categorical time-
series analysis (21). This solution provides customisable scatter charts, which are used to fluidly
explore a set of data over a specific time-span. A slider moves across a set of dates,
automatically refreshing the data on the display. The ideas seen in this application are used to
increase the explored dimensionality of the OLAP cube (Section 5.11), allowing for the
exploration of multiple date levels rather than only one.
VR4OLAP – (9) solution used to visualise three dimensions of an OLAP data cube and up to
two measures at the same time in an immersive 3D environment. It allows for the use of a 3D
stereoscopic screen with a 3D mouse and provides all the basic functionality needed for OLAP
analysis (Appendix B). Based on evaluations of the system, Lafon et al. conclude that 3D
representation was highly-regarded amongst users, but that the 2D performance results were
equal or better than those explored in 3D. The 3D visualisation component in this project
(Section 5.12) is based on the design of VR4OLAP with an added incentive for improving
performance of the interactive visualisation.
2.2 OLAP vs. Relational Database Sourcing for Visualization
An important question that arises when developing a BI Data Visualization application such as
the one in this project is from where and how the will the data be sourced for analysis. Should
the application focus on connecting to a mid-tiered, dedicated OLAP server, or provide direct
access to the relational data in the warehouse itself.
In the context of the Tableau commercial software, Taylor (2013) offers a viable comparison
between using data sourced from OLAP and RDB when visualizing data in a BI Data
Visualization Dashboard (22). The paper lists the typical scenario a user may face, how it is
portrayed with an OLAP or an RDB system, as well as the typical risk such a problem might
create.
When talking about visual analysis of data, OLAP sources truly outperform RDB ones. Taking
all this into account, the most viable approach for this project would be to focus on providing
data already organized into OLAP cubes. The option to tend to the needs of more experienced
users – by connecting to RDB sources, is left for future development since providing such
functionality is a small project in itself. Having made this decision, a logical step forward was to
look into the available OLAP Server solutions on the market and compare them to understand
which could provide a good core basis architecture onto which the project could build upon.
2.3 OLAP Servers
2.3.1 Overview
OLAP Servers are a mid-tier architecture and normally rest between the data warehouse and the
front-end analysis system. Before being able to make a comparison between the different
solutions on the market, it is important to firstly understand how they differ from one another
based on the most important thing – the way the data is stored on the server itself.
Based on the storage modes, three main types of OLAP Server engines exist – multidimensional
storage (MOLAP), relational DBMS (ROLAP) or a hybrid combination of the two (HOLAP).
With recent technology trends, in-memory BI engines have started being used as well, due to
their faster response times and better interaction capabilities. These engines are a possibility
thanks to decreases in disk access time and the constantly lowering costs of memory leading to
cheap and affordable servers utilizing large amounts of it (23).
ROLAP - Relations and SQL queries are used to map the multidimensional data model and its
operations. This creates the need for query optimization algorithms for the efficient loading of
data (23). These systems use the star schema (Figure 10) or its corresponding snowflake schema
(Figure 11). The star schema includes a fact table at the centre and each of its dimensions is a
point of the star, while the snowflake schema includes all the dimensional information in a single
fact table again at the centre and divides each dimension into a hierarchical structure of related
tables normalizing them.
Figure 10 - Star Schema (24)
Figure 11 - Snowflake Schema (24)
MOLAP - Alternative to ROLAP that uses the multidimensional data model. Typically pre-
computes large data cubes for speedier query processing. This allows for excellent indexing
properties and fast query response times, but provides poor storage utilization on sparse data sets
(23). MOLAP cubes are optimal for slicing and dicing operations. Pre-computation of the cube
actually allows for fast performance on complex calculations.
HOLAP – Combines ROLAP and MOLAP, splitting the storage of data between the two. These
servers perform density analysis in order to identify the regions of the multidimensional space
which are sparse and dense, hence the ability to decide how to store the data (23).
2.3.2 Server Comparison
A wide range of OLAP Servers exist that can satisfy the requirements needed by any enterprise.
Choosing a particular one to use is done via looking into capabilities and limitations that these
servers provide, such as various licensing options, different data storage modes, supporting
different APIs and querying languages, showcasing distinctive features and security capabilities.
With regard to this project, choosing an OLAP server for the initial stages of application support
was based on the following factors in order of most to least important.
License availability – how easy will it be to attain and what will it cost.
APIs and query languages – the project is written with the .NET framework, looking for
the ability to connect and work with the OLAP data hassle-free.
Data storage modes – how will the data be stored and does the solution provide any
offline support, which would increase the flexibility of the project by allowing data to be
analysed without having to make a connection to a server.
The server distinctive features, limitations and security capabilities were not a factor in the
choice, since these did not fall into the scope of the project. The OLAP server chosen was just
for the provision of data for analysis and visualization. The application would not take part in the
creation of the data cubes in question – cubes are created by data experts following a certain set
of principles that carefully design all aspects of the multi-dimensional data model.
The following information is based on the charts seen in the “Comparison of OLAP Servers”
Wikipedia entry (25). The OLAP Servers that were taken into account were: Essbase, icCube,
Microsoft Analysis Services, MicroStrategy Intelligence Server, Mondrian OLAP Server, Oracle
Database OLAP Option, Palo, SAS OLAP Server, SAP NetWeaver BW and Cognos TM1.
A general pattern seen between these OLAP Servers is that almost all of them have proprietary
licensing, with the exception of Mondrian OLAP Server and Palo. icCube comes with a
community license, which include an MDX (MultiDimensional eXpressions) engine, a web IDE
and an XMLA (XML for Analysis) Interface for free but also has the limitation where it cannot
be included as part of a product or solution (26). This project will only be making a connection
to the server for querying purposes, which allows icCube to remain a viable option. Microsoft
Analysis Services, although having a proprietary license, remains a viable solution for this
project as well. It being a Microsoft product, a student license is possible to obtain and use
thanks to the vendors’ connection with the University.
From the second point of the requirements, all four remaining solutions provided have similar
interfaces - XMLA, standard API for exchanging data between an OLAP server and a Windows
platform client - OLE DB for OLAP, as well as the uniform query language for OLAP databases
- MDX. This leads to the idea that all four of these servers would be able to be supported in
future.
Looking at the third point - data storage modes, icCube is a MOLAP server with support for
Offline Cubes. Microsoft Analysis Services supports MOLAP/ROLAP/HOLAP as well as Local
cubes and PowerPivot for Excel. The remaining two - Mondrian OLAP Server - ROLAP and
Palo - MOLAP, have no offline support provided.
Microsoft Analysis Services was chosen due to the fact that the project revolves around the
Microsoft .NET Framework and based on its advantages from the comparison provided above.
2.4 Big Data
OLAP cubes are used to represent, model, process, query and mine various forms of data by
creating a structure for it. Most of this is done by finding associations and relationships in
structured or, with some limitations, in semi-structured data (27). This differs from the
unstructured format of the recently emerging Big Data trend.
A look into Big Data shows that most organizations and scientific communities today generate
an unprecedented amount of continuously streamed unstructured data – “Big Data” (28): a total
of around 2.5 Exabytes of data daily, doubling every 40 months (29). This volume and variety
have outstripped the possibility of manual analysis and exceeded conventional database
capacities, while computing performance has increased (30). According to a study done by Intel,
33% of all companies surveyed work with very large amounts of data (500 TB or more); 84% of
managers are analysing unstructured data, with the rest expecting to start analysing it over the
next year. There is an expectation that 63% of all analytics will be done in real time by 2015
(31). The ability to connect this data, creating structure for easier and more efficient analysis, is a
research topic that is much discussed.
Creating OLAP data cubes over Big Data is a challenge for state-of-the-art solutions due to
intrinsic factors – enormous size of such data sets and their high complexity when
multidimensional models are created (28). The unstructured nature of the data creates a
possibility for an explosive amount of dimensions and computational issues when working with
it. As mentioned previously (Section 1.1), OLAP is based on fast user interaction with the data,
which means that the previously mentioned factors create a bottleneck for its natural operation.
Cuzzocrea et al. (28) propose a variety of research topics that will affect the generation of OLAP
cubes over Big Data in the future and overcome the mentioned challenges:
technology advancements in Cloud Computing (32)
innovative hardware solutions – GPU-based data processing for computation of cubes
(33)
query language and optimization (end-user performance)
data quality aspects
visualization and interactive exploration issues – real-time visualization, mobile
device visualization (34) (35)
analytical processes and methodology design
development tools support
With the visualisation and exploration issue point in mind, this project attempts to look at the
effect of an increasing amount of data simultaneously displayed on screen using non-specialized
tools examined in the following chapters. Results of this can be seen in a later chapter of this
report (Section 6.3).
3. Approach
3.1 Overview
This chapter provides an overview and details of the approach taken in the development of the
application. A detailed view of the environment, platforms, technologies, agile practices chosen
for use is provided.
3.2 Project Environment and Technologies
The information in this section is split between the core architecture environment used that
includes base support tools for the development process and the specific technologies used to
build the application itself.
3.2.1 Platform
The base platform for the project is a VMware Workstation supporting two distinct virtual
machines, one taking the role of a data server and the other of a development and testing
environment.
The choice of virtualization of the platform and environments came due to the many advantages
virtual machines technology provides. These include control of system resources, easy back-up
and physical storage – the ability to clone and archive machines at the click of a button, as well
as run them from any location that can run a free-to-use VM player. Great flexibility exists
thanks to the capability of carrying your environment with you anywhere.
VMware (36) is the perfect virtual machine solution for this. It provides the ability to run
applications on multiple operating systems, consolidate multiple computers running web servers,
database servers all on the same computer, build reference architectures for evaluation before
deploying into production, remote connect to virtual machines from any device, create snapshots
that preserve the state of a virtual machine to quickly revert back at any time. The figure below
(Figure 12) shows the relationship between the development, server and deployment machines of
the project.
Figure 12 - Project Environment
VM Server Environment - Running Windows Server 2012 R2 Operating System which
includes Microsoft SQL Server 2012 with installed Microsoft Analysis Services (SSAS 2012).
This is the core data server, which is the source for the data of the application. It simulates a
possible real implementation in an enterprise environment. The AdventureWorks sample DB and
sample OLAP cubes are used for reference data.
VM Development Environment – Running on a typical Windows 7 Operating System, along
with a version of Microsoft SQL Server 2012 and a Visual Studio 2012 IDE.
Source Control: Git (37) – free and open source distributed version control system.
Repository: Atlassian Bitbucket (38) – used for hosting and managing code online, includes
built-in issue trackers, wikis, code comments, and pull requests.
Source Control Visual Studio GUI: Git Source Control Extension (39) – integrates Git
functions into the Visual Studio Solution Explorer.
Source Control Windows GUI: TortoiseGit (40) – provides a windows icon overlay, along with
a context menu for easy source control commands over files and directories on the file system,
integrated Commit Dialog, Merge Support GUI and a historical view of changes and commits.
Continuous Integration System: TeamCity (41) – powerful tool for automatic builds and
commit monitoring. Integrates directly with BitBucket, pulls the latest committed version of the
application and runs a pre-set series of build steps, testing and ensuring that any changes
introduced work as expected.
Agile Board: YouTrack (42)– tool for the creation of fully-customisable electronic task boards
that allow for the monitoring of multiple project iterations and allow for their easy planning and
execution.
3.2.2 Technologies
The core application technologies used for coding of the application. These were chosen with
flexibility and future extensibility in mind and to provide a usable application at the end.
Windows Presentation Foundation (WPF) (43) for rich user interfaces based on the .NET
framework – uses XAML for its front-end and C# for the code-behind. With proper use, one
code base could be created for easy conversion into a web application using Microsoft Silverlight
in the future.
Important factors for the choice of WPF were its ease of use, robustness, expert support and most
importantly its complete extensibility and customization. The extensibility allowed for the
creation of elements from scratch that best fit the needs of the project. The customization added
the necessary visual polish that created the professional look and feel of the completed
application. Templates could be easily created, used and re-used to style both visually and
functionally each and every single element of the application.
Microsoft PRISM Framework for WPF – used to create multi-screened applications, typically
by WPF developers looking for better design and development methodology options. These
typically encompass rich user interaction and data visualization functionality (44). PRISM can
build loosely-coupled components that can evolve independently but can easily be integrated into
the overall application. It incorporates various patterns for development, the main one being the
Model View ViewModel (MVVM) architecture pattern.
When choosing a visualization package with charting capabilities, most usable ones were
commercial and would normally have a high up-front cost. From the open source alternatives, a
set were chosen for possible implementation. One based on charting capabilities – OxyPlot (45)
and another based on fluid and eye-catching graphics - Modern UI (Metro) Charts for Windows
8, WPF, Silverlight (46), these however had the problem that they did not include a graph with
the functionality needed for the main 2D graph visualization – a TreeMap/HeatMap combination
with interactive capabilities. The closest library available was WPF Treemaps &
SquarifiedTreeMaps Control (47).
WPF Treemaps & SquarifiedTreeMaps Control – provided an implementation of the square
treemapping algorithm developed by Schneiderman and Johnson (48) (49) and extended by Bruls
et al. (50). The control allowed for multiple level recursive depth and easy templatability and
extensibility. The interactive functions needed by this project were not part of the control, so
additional work needed to be put into modifications of the control, discussed in Section 5.10 of
this report.
Helix 3D Toolkit (51) – a collection of custom controls and helper classes for WPF and the 3D
capabilities of WPF in particular. This library provided a wide set of easy to use controls and a
variety of examples, which served as the basis of the 3D visualizer discussed in Section 5.12 of
this report. A set of these examples can be seen below (Figure 13) showcasing this wide
selection and capabilities of the library in question.
Figure 13 - Helix 3D Toolkit Examples (51)
3.2.3 Sample Data
Multi-dimensional complex data was needed to serve as a base model for all development efforts
– this data along with its OLAP models needed to be ready for use, as their creation was out of
scope for this project. The sample data chosen was the Adventure Works 2012 – a continuously
updated dataset provided by Microsoft to showcase their different SQL Server products (52).
Two components were needed for the base functionality of this project. A mid-tiered data
warehouse database – Adventure Works 2012 DW and a SQL Server Analysis Services
OLAP server, to sit on top of the data warehouse providing the appropriate complex models
(53).
3.3 Agile Processes
The method for development and project management chosen was one using agile software
development principles. The main objective was to be able to produce working code on a short
time-scale and at the same time be open to changing needs and requirements, which over the
course of the project happened on numerous occasions.
3.3.1 Agile Practices Selection
The selected agile practices applied in the project were complementing to each other. The most
important process used was Iterative Development with time boxed Iterations. Each iteration
included its own requirements analysis, design, implementation, and testing activities that
concluded with a stabilized working system by the scheduled date. Small steps, rapid feedback,
and adaptation are central ideas in iterative development so iterations varied in length. The end
of the iteration concluded with a decision for the length of the following one based on the
feedback provided from the supervisor and the tasks and requirements that were able to be
fulfilled or the ones that were skipped (54).
Learning Spikes when tackling new technologies is a vital agile practice. These normally are
stories or tasks for answering a question or to gather information about a specific design question
or a particular technology. Shore and Warden (2008) suggest conducting frequent experiments
rather than speculating about the answer when faced with questions. Spikes according to them
are small experiments used to research the answer to a problem. The main idea is to create a
small program or test demonstrating the functionality needed in the simplest way possible. This
should be run from the command-line or test framework, implementing the use of hardcoded
values, ignoring user input. The resulting solution is not supposed to be reused, when the
experiment finishes it is thrown away (55).
For testing purposes, Continuous Integration (CI) via test and build automation was used. This
method requires that every time a commit to the repository with a change is made, the entire
application is re-built and a comprehensive set of automated tests is run against it. The main goal
is that software is in a working state all the time (56). Normally software is viewed as broken
until proven otherwise, usually with a testing or an integration stage, but with CI it is proven to
work. It provides faster bug detection earlier in the development process. Although used by
teams of developers, this agile method has positive aspects for solo developers as well creating a
safety net around the stable version of the application.
3.3.2 Agile Practices Application
In order to apply Iterative Development, an easy way to monitor iteration progress was needed to
allow for reflection on the tasks set in the beginning and those that still needed to be completed
by the end of the iteration. The initial idea was to use a document based approach – simply write
up requirements documents and customer feedback for each iteration. The problem with this was
that there would be too much paper flying around.
In order to deal with this, an application to track iteration states was used – YouTrack (42), that
allowed for the creation of a fully customisable, easily navigable and accessible agile board that
could be viewed and modified from any browser (with some additional set-up). This truly helped
organize and monitor work in the most efficient way thanks to its wide array of features and
capabilities. An example screen capture (Figure 14) taken from the middle of the first iteration
shows the structure of the agile board customized to be used for the project. Four columns were
created to group tasks into the stages they currently occupied – tasks were On Hold, Open, In
Progress and Completed. A colour-coding that best suited the states that they were currently in
was set-up. Tasks were grouped into separate categories depending on their project context.
Additional information could be added to each task – such as a descriptive text about its purpose,
criticality flag and different file uploads and attachments etc.
Figure 14 - AgileBoard YouTrack (42)
The second agile practice applied was Learning Spikes. This practice was simple to apply –
folders used to group all learning spikes together for each week of the semester were created
when development took place so an idea of when a learning spike occurred existed. (Figure 15,
Figure 16, Figure 17) show an example of the structure of the Learning Spikes. Most
technologies and techniques were introduced by following tutorials found online that created
small projects, which were included in the appropriate Learning Spikes weekly folders.
Figure 15 – Project folder
Figure 16 – Learning Spike
Week 7
Figure 17 – Learning Spike
Week 8
In order to apply Continuous Integration, TeamCity – CI Environment was used, which was
installed onto the VM Server Virtual Machine. A visual graphic as seen in (Figure 18) provides a
thorough explanation of the tool process flow. The Testers and Developers commit their changes
to source control. The TeamCity server polls the source control system (connected to Atlassian
BitBucket for this project) and whenever it sees a change, a build agent that lives on the server is
triggered. A build agent includes a pre-specified set of steps (could be extremely complex)
ranging from testing, building, deployment scripts etc. TeamCity supports various testing
frameworks spread across multiple languages. Finally if any kind of error occurs – a notification
is sent and based on the given specifications, the whole process stops until the error is fixed.
Figure 18 - TeamCity Flow (41)
4. Design
4.1 Overview
This chapter describes and focuses on the design of the system. The framework chosen for
development– Microsoft Prism (Section 3.2.2) - provides a core for the design principles and
patterns used in this project, which builds upon them. This is discussed in the following sections
along with an overview of the high-level diagrams of the main project entities.
4.2 Design Principles
Design principles help software development by providing guidelines as a recommendation to be
utilized when building any project. The key principles that impact the design and architecture of
this project are outlined below.
Modularity – The base principle when building any composite application. This ensures
cohesiveness of objects and components that are loosely-coupled, which allows them to evolve
independently from one another but be easily integratable later into a common overall
application – composite application. The Microsoft Prism framework used for this project
adheres strongly to this principle and uses as a foundation, onto which developers can easily
build upon (57).
GRASP – General Responsibility Assignment Software Principles are used for assigning
responsibilities to classes, forming a language to help developers communicate, as well as
answering common software problems (54). The principles are: Controller, Creator, Indirection,
Information Expert, High Cohesion, Low Coupling, Polymorphism, Protected Variations, and
Pure Fabrication. Using these as a guideline in the implementation of the project allowed for the
creation of a better design and increased its value. These aforementioned principles serve as the
“building blocks” for the design patterns described in the next section.
4.3 Design Patterns
The Microsoft Prism framework provides and incorporates a set of design patterns, which this
project makes use of to support its composite architecture (Figure 19). A variety of additional
patterns, part of the software development domain (58), are used throughout the project. Patterns
are vital in object-oriented software development as they provide tried and tested solutions to
common software design problems; a language for experienced developers to communicate with
efficiency; learning aids for inexperienced developers; a framework for discussing design trade-
offs (54).
Figure 19 - Composite Application Design Patterns (57)
A brief overview of the patterns in question follows, along with how they apply to this project.
Prism framework component definitions are found in Appendix A.
Adapter – A Gang of Four (GOF) pattern (54) (59) that decouples code within the application
from external formats. This is used extensively in the data layer of this project (Section 5.2)
when pulling Metadata from the OLAP server. In terms of the system architecture itself, it is
used to adapt the interface of a class to match the interface expected from another class.
Application regions (Appendix A) are adapted to constructs understood by the WPF framework.
Application Controller Pattern – A pattern that separates the creation and display of views
responsibility into a centralized point as described by Martin Fowler (60). Used for view
injection and view switching on the main application screen to display either 2D or 3D
visualisations.
Command Pattern – A GOF pattern (59) in which objects are used to represent actions. This is
used frequently in the implementation for button click logic binding between the UI and
presentation logic of the application.
Composite – A GOF partitioning pattern (59) where a group of objects are treated as a single
instance. This pattern is a predominant part of the Cube Browser View Model hierarchy
structuring (Section 5.7), which allows for easy hierarchy refactoring.
Composite View – A Composite (see previous) of Views. This allows for the combination of
individual views to create a more complex entity (Section 5.8).
Dependency Injection – A pattern which implements inversion of control, where multiple
dependencies are injected or passed into a dependent object (client) and are made part of that
client’s state (61). When objects are constructed, the dependency injection container would
resolve any external dependencies, allowing for concrete implementations to be easily changed
as the system evolves (57). The project uses the Unity container for its dependency injection
purposes.
Event Aggregator – based around the observer pattern, it is a simple element of indirection that
aggregates events from multiple objects through itself. Objects can subscribe to, publish or locate
events. The project frequently uses this pattern for communication between loosely-coupled
components (Section 5.4).
Factory – a GOF pattern (54) (59) which is a pure fabrication object that handles creation and
initialization of other objects, without having the instantiation logic exposed to the client.
Façade – a GOF pattern (59) that simplifies complex interfaces to either ease how they are used
or with the purpose of isolating access to them. The Prism library is isolated from changes in the
container and logging services thanks to the use of the Façade pattern.
Model-View-ViewModel (MVVM) – a pattern that helps to partition and separate the business
and presentation logic from an application’s UI into three specific types of classes – ‘View’,
‘View Model’ and ‘Model’ (Figure 20).
Figure 20 - MVVM Pattern Architecture (57)
The Model describes the encapsulated application logic, data retrieval and validity. It also
represents the client facing domain model for the application. It serves as the base, which the
view model encapsulates for use by the view.
The View Model concerns itself with the presentation logic of the application. It separates itself
from the View, having no direct knowledge of its implementation. It exposes commands and
handles event notifications through binding. The view binds itself to properties in the view
model and on changes, notification events are raised to be handled by the view model.
The View defines the structure and visual appearance of the interface shown to the user, handling
visual behaviour and rendering triggered from state changes in the view model or after
interactions with the UI.
Observer – the Prism framework uses a variation of the Observer pattern that serves to separate
any interaction requests with the user from the actual interactions chosen to be displayed. This
allows views to decide on the way they provide feedback to the user.
Registry – the pattern provides an object, which can be approached to locate common objects
and services (62). This pattern is used in the project to associate views with their appropriate
regions.
Separated Interface and Plug-In – the separated interface allows for reduction in coupling,
separating the interface definition from the implementation. With the plug-in pattern, concrete
class implementations are determined at run time, avoiding the need for recompilation when
changes to the concrete class used are made.
Service Locator – allows classes to locate specific services without having any knowledge of
who these services were implemented by. This pattern’s specific use in the project is for locating
and retrieving instances of the Event Aggregator, for the purpose of publishing and subscribing
to events (Section 5.3).
Singleton – a GOF (59) pattern which restricts a class to only one instantiation. This pattern is
used in this project for Cache creation and in conjunction with the Factory pattern, previously
discussed.
4.4 Project Set-Up Step Activities
The set-up activities for an application using the Prism framework are illustrated below (Figure
21). These consist of creating an initial shell project with region outlines, a bootstrapper
component used for initialization of various Prism components and services, an Infrastructure
project module to define all shared events and commands, and starting to build functional
modules to be plugged-into (Section 4.3) the previously defined region outlines of the main shell
project.
Figure 21 - Prism Composite Application Creation Activities (57)
4.5 Abstract High-Level System Design
An overview of the high-level system design of the application can be seen in the diagram
below. It shows the dependencies between the different modules – how they connect to one
another and their structure in the Visual Studio Solution (Figure 22).
Figure 22 - Abstract high-level module view
4.6 Side Module Design
The Side Module is the first important module to be designed for the application. It includes a
composite view (Section 4.3) containing the database selector and the cube browser. The
structure of the cube browser and its presentation logic hierarchy design can be seen in the
diagram that follows (Figure 23).
Figure 23 - Side module design diagram
4.7 Main Module Design
The Main Module is the component, which holds the visualization logic and data population
models of the graphs, the views and the view-models of the main screen. A conceptual design
diagram of this module can be seen below (Figure 24). All data models (ending in Info) used are
part of the DataLayer module.
Figure 24 - Main module design diagram
5. Implementation
5.1 Overview
This chapter delves into the implementation details of the project, looking at each of the specific
components and how they build up to create the final end-to-end application.
5.2 Base System Framework
The initial implementation steps were to set-up and connect to the Prism Library by following
the steps discussed in Section 4.5. The first step that gets executed at the start of the application
can be seen below (Figure 25).
Figure 25 - Connecting to the Prism Framework (57)
An additional configuration file with logging options and server connection strings was added –
app.config. Utility modules were set-up to support the application – DataLayer, Infrastructure,
Resources.
DataLayer – Provides all the connection and data specific logic for OLAP Metadata and Data
retrieval adapters, MDX query building and execution.
Infrastructure – Provides the common services, interfaces, events objects and utilities used by
the main application modules.
Resources – A shared resource module that contains images and common UI controls and
functions.
5.3 UI Templatability
WPF provides an easy way for developers to customize their UI, thanks to the highly templatable
XAML language. In order to create the professional and polished appearance of the application,
extensive work on custom component templates and styles was done – changing the look and
feel of most default controls provided by WPF (63). Non-existing UI component functionality is
easily added by extending and overwriting the original framework presentation logic found in the
code-behind. Styles found in templates, similarly overwrite existing default component styles.
Based on the scope of these templates, their location would normally be included in a generic
template file shared across the application.
generic.xaml – the main template file located in the Resources module. All generic templates
shared between the different views in the different modules are located in this resource
dictionary. The dictionary file is merged at the beginning of all views using these templates by a
specification of the location as seen below (Figure 26).
Figure 26 - Merging Generic Resources into Views
Examples of the most prominent of these template changes can be seen below. Some of these
changes include a reworked Window Style (Figure 27), Button Style (Figure 28) and changes to
the hierarchical TreeView component (Figure 29) used in the Cube Browser (Section 5.7). All
other UI controls are changed in a similar manner using the same principles already discussed.
Window Style – The figure below (Figure 27) shows the ability for the templatability and
styling of Operating System specific controls – program windows.
Figure 27 - Original and Templated Window Style
Button Style – The figure below (Figure 28) shows the original button appearance and the one
after a template has been applied to alter its normal and hovering states. The templates are
applied via the Style property in the code, whereas the style template itself is located in the
previously discussed generic.xaml template dictionary.
Figure 28 - Templated Button Results Example
TreeView Style – The figure below (Figure 29), is an example of the customization capabilities
of WPF – the ability to change the structure of a control in any way necessary, obtaining wanted
results.
Figure 29 - Default and Templated TreeView Control
5.4 Component Interaction
In order to communicate and share data between independent components, the Prism framework
provides an Event Aggregator service, based on the pattern with the same name (Section 4.3)
visualized below (Figure 30). It allows the same events to be raised by different publishers and
multiple subscribers to listen to the same event. Events can also carry data across from the
publisher to the listener. All unique application events can be found in the Infrastructure module.
Figure 30 - Publish/Subscribe Event Aggregator Service (57)
5.5 Database and Cube Selector
The database (Figure 31) and cube (Figure 32) selectors, both part of the Side Module, take the
forms of pop-up screens that allow the user to select the appropriate database and cube for use
and data browsing. Both have their own separate Views and View Models:
SelectCubeView.xaml and SelectCubeViewModel.cs; SelectDatabaseView.xaml and
SelectDatabaseViewModel.cs;
Figure 31 - Database Selector Pop-Up
Figure 32 - Cube Selector Pop-Up
Database Selector – uses the initial server connection string created at boot-up, taken from the
configuration file discussed in Section 5.2. A query is automatically run and the available
databases on the server are listed. On selection of a database, the connection string is updated,
which allows for subsequent cube selection. Changing databases fires off events (Section 5.4),
which update views across all modules.
Cube Selector – uses the updated database string to list all OLAP cubes found on the selected
database server. Changing cubes caches the currently selected cube for a performance increase
on secondary loads.
5.6 OLAP Metadata Retrieval
OLAP Metadata provides information about the organization and types of data in the OLAP
Cube (Appendix A). This project adapts the metadata model hierarchy used in the Microsoft
Analysis Services solution. Instead of creating an adapter and model to be used in the application
from scratch, the one implemented in the Ranet OLAP open-source library (13) is modified for
use and located in the Metadata section of the DataLayer module.
The project calls upon the provider by feeding it a connection string and extracting the needed
metadata model from it, which is populated by querying the OLAP database. The hierarchical
nature of this model allows for a single object to be handled – at the top of the tree – the
CubeDefInfo object. This is processed and used when creating the presentation logic for the
Cube Browser discussed in the next section.
5.7 Cube Browser
The cube browser was designed and implemented to be used for easy explorative analysis of the
data in the cube, based on techniques mentioned by Mansmann et al. (6) (7) (8). This component
is structured around the MVVM pattern (Section 4.3).
The OLAP Metadata model is encapsulated in a hierarchically structured set of view models,
containing the presentation logic to be consumed by the view and displayed on the screen. Every
item seen in the browser is represented with its own view model (Figure 33, Figure 34) and own
display style using the XAML HierarchicalDataTemplates (64) of the TreeView UI control. The
hierarchical structure is defined once for the UI display and once for the data organization of the
view model hierarchy.
Each metadata view model extends a base TreeViewItemViewModel class serving as an adapter
between the raw data object and an item of the TreeView. It provides a set of properties,
specifying the information displayed on screen, along with a lazy-loaded child collection that
builds the next level of the hierarchy. The children of the next level are loaded only when an
item in the TreeView is expanded.
Items can be right-clicked, with prompts allowing the user to specify where they should be used
for visualization purposes – 2D or 3D main screen discussed next. These are added to a cache
object in the Infrastructure module and an event (Section 5.4) is published warning of this
addition.
Figure 33 - Cube Browser View Example #1
Figure 34 - Cube Browser View Example #2
5.8 Main Screen
The main screen provides a view for the 2D and 3D visualisations of the application. The 2D
aspect and functionality is discussed in this section and the 3D - in Section 5.12.
The main application screen provides a placeholder for the visualisations that can be displayed to
the user. The placeholder is in the form of a tab pane, where each tab can hold a different user
selected visualization. Visualisations have their own data templates, which serve to provide a
style to display on the UI, and a view model with their presentation logic. Every new tab is
initialised with a GenericGraphView data template and GenericGraphViewModel, which can
then be replaced, based on user interaction, with another template and view model. The latter is
created from a view model factory. Multiple instances of the same template and view model can
exist. Their context and bindings are held with the tab pane template. The choice for UI display
of the template and view model is done via a template selector (Figure 35).
Figure 35 - MainGraphView.xaml Data Template Selector. Selects an appropriate UI template based on the
view model type passed to the view.
The 2D main screen also hosts the link between the Cube Browser and the 2D OLAP
visualisations. The user-selected metadata gets pulled from the cache (Section 5.7), ready to be
used in any of the visualisations. The cube choice selector (Figure 36) sits on top of all views and
is globally shared between them.
Figure 36 - Cube Choice Selector with user choices over main screen.
5.9 OLAP Data Retrieval, Query Mechanism.
This subsection explains the basis of the OLAP data retrieval and MDX querying mechanism.
These are needed for all subsequent visualization steps. Each visualisation data population
algorithm retrieves the data in the form of a datatable object, created in the
DefaultQueryExecutor class, to be displayed in a similar way, but then consumes it in a way
based on its own needs. The main objects used for the MDX query mechanism are the
SelectMDXObject and the FilterMDXObject.
FilterMDXObject – consumes OLAP hierarchy members to filter by and based on their count
and filter type, generates a filter query string to be used by a SelectMDXObject.
Note: Filter Types are the following: Equals, Not Equals, Between, Not Between, In, Not In;
SelectMDXObject – provides a wide range of functions for MDX Select query string
generation. Concrete examples can be seen in the SelectMDXGeneratorTests class of the
DataLayerTests module.
5.10 TreeMap Data Visualizer
The TreeMap data visualizer is based on the WPF Treemaps & SquarifiedTreeMaps Control
(Section 3.2.2) library. Initial work was done to modify the library itself as it did not provide the
necessary functionality needed by this project. An example retrieved from the library project
itself (47), shows the original functionality provided (Figure 37) - a recursive view into a
hierarchical structure of data of a selected HDD drive. The SquarifiedTreeMap and the control
over the depth of the hierarchy are shown.
Figure 37 - WPF Treemaps & SquarifiedTreeMaps Control Example - original library capabilities (47)
What this project needed was the ability to control the depth of specific elements in the
TreeMap, control separate element colours, provide an indicator of the level depth and an ability
to slice (Appendix B) into specific data levels, maximizing them on screen, as well as providing
the function of drilling up/down into higher and lower levels of the hierarchy. This needed to be
done without losing sense of the context, as is normal in exploratory analysis (6) (7) (8). The
result of these changes can be seen below (Figure 38). Drilling up and down is done via a
circular button in the top right corner of every element or via a right-click context menu. Slicing
(Maximising) is done via a square button in a similar way to the drilling function.
Figure 38 - OlapVisi TreeMap visualisation - provides control over the element depth, breadcrumb bar at top
saving context, functionality to drill down/drill up and slice into the data.
The visualization automatically recalculates the sizes and positions of its elements on window
resizing and/or expanding and retracting of additional view elements such as settings; categorical
time-series analyser (Section 5.11). Customisation options related to the hover, text and element
colours were added to improve the user experience. Once the user selects the hierarchies and
measures they want displayed, a Plot button click ensues, which calls the main data population
algorithm for the visualization, PlotGraphAction, located in the TreeMapGraphViewModel class.
PlotGraphAction – creates a SelectMDXObject that uses the data currently plotted in the
TreeMap (Measures, Dimensions, Cube). It then retrieves the MDX Select query and executes it,
retrieving a datatable object. The correct column indexes for the colour and size measures are
retrieved from the first row of the datatable. A loop through the rows and columns of the table
ensues, creating TreeMapGraphDataViewModel objects used as the main elements seen in the
TreeMap and linking them up into a hierarchical structure, which allows for easy traversal. The
minimum and maximum values of the colour measure are then calculated to be converted into an
appropriate colour when displayed. Finally, the context breadcrumb at the top of the graph is
updated.
5.11 Categorical Time-Series Analyser
The TreeMap data visualizer discussed in the previous subsection visualizes one dimension and
two measures. In an attempt to increase the dimensions shown to the user a categorical time-
series tool was added to the TreeMap graph (Figure 39), building upon the ideas seen in the
GapMinder research application (Section 2.1.3).
Figure 39 - Categorical Time-Series data browser and filtering tool.
The functionality of the time-series analyser is split into two main operations, which can be used
only after an initial Plot of the data has been completed. The first constitutes selecting an
appropriate date level from the previously selected date hierarchy (hierarchies contain multiple
levels). The member data of the selected level is loaded and populated into a collection displayed
under the slide bar. This collection represents a list of possible filters to be applied to the data in
the TreeMap, hence increasing the dimensionality control available to the user by one dimension.
The slider thumb changes a SelectedDate property of the view model, which prompts the start of
the second operation – the FilterGraphAction, part of the TreeMapGraphViewModel class. This
action is similar to the PlotGraphAction (Section 5.10), but it generates a SelectMDXObject
(Section 5.9) applying the SelectedDate as a filter to the data.
FilterGraphAction - creates a FilterMDXObject (Section 5.9) using the CurrentDate selected
and feeds it into a SelectMDXObject that uses the data currently plotted in the TreeMap
(Measures, Dimensions, Cube). It then retrieves the MDX Select query and executes it,
retrieving a datatable object. A helper dictionary collection is created to keep track of the context
of the data being processed, so that no duplicates are created. The previous elements’
hierarchical structure is unlinked, without losing the references to the elements themselves. A
loop through the rows and columns of the datatable object ensues, retrieving the previously
created (via the PlotGraphAction Section 5.10) data objects based on a name comparison and
creating a brand new hierarchical structure linking, based on the filtered data. The final two
columns are retrieved to be used as the values for the colour and size of the TreeMap visual
objects.
5.12 3D Data Visualizer
The experimental 3D OLAP visualizer (Figure 40) uses ideas from the VR4OLAP application
(Section 2.1.3), in an attempt to increase the number of dimensions the user can interact with
simultaneously, as well as to provide additional OLAP operations (Appendix B) on top of the
ones seen in the 2D TreeMap graph. Apart from drill-up/drill-down and slice, the 3D OLAP tool
provides dicing and pivoting. In terms of dimensions – the number is brought up to three of any
type and a measure number choice is introduced - one or two measures can be displayed at the
same time.
Figure 40 - 3D OLAP visualizer application example
The Helix Toolkit (Section 3.2.2) provides an easy to use scene set-up, along with various visual
objects for use. Minimal changes were made to the base library, as the addition of selectable
element functionality was needed – 3DControls folder of the MainVisualisation module.
The interface was implemented incrementally – hierarchy loading and measure set-up were
implemented first; secondly the X,Y,Z grids serving as a placeholder for elements were added.
Labels for the members of each level were positioned and orientated along the outsides of the
grid. The data elements collection was created after, along with the full data population
algorithm – MapAction. The models of the 3D objects are instances of the UIBoxElement3D
class, and their view models are instances of the DataVisual class. Colour coding and size based
on the values of the data elements was added. Additional legends and data information followed,
along with all OLAP functionality seen in Appendix C.
Colour coding – implemented with help of the ColorHelper WPF utility class. After a study on
the manipulation of colours in .NET (65), the method for choosing colours is built using the HSV
cylindrical RGB colour model (Figure 41). The saturation and value are both set to the numerical
value of one; the only parameter that varies is the hue. In this case, the minimum and maximum
values of the data were calculated and mapped to start and end points of the hue cylinder – from
yellow to red. The two colour values are then used to create a gradient for use in the legend.
Figure 41 - HSV Cylindrical-Colour Model - Hue, Saturation, Value (66)
Element positioning – implemented using previously calculated member label positions,
situated at the grid edges. Each element is a 3D coordinate of the intersection of three members.
MapAction – loads all member data of the levels selected using a MemberLoaderCache utility
class. Maps the placeholder grid locations, populates the member labels, looping through each
level collection, calculating the appropriate position of each member, rendering them on screen
and adding clickable filter elements, used for the OLAP operations. Populates the level labels at
the three corners of the grid. A SelectMDXObject is created, using the currently selected OLAP
information to run an MDX query, retrieving a datatable object. This object is parsed, calculating
the minimum and maximum values of the measures selected and the DataVisual element objects
are created. Each DataVisual takes in a Point3D object, created using the X, Y, Z coordinates of
the member label intersections. This serves as a centre point for the 3D element. The final
parameters passed in are the measure values of the element. A notification is sent to the UI to
render the elements on screen; the previously discussed colour ranges are created and appropriate
flags are set to end the method.
6. Testing and Evaluation
6.1 Overview
The purpose of this chapter is to look into and discuss the testing methodologies used in the
project, as well as to evaluate the OLAP visualisations in terms of query, algorithm and render
performance times. Additional comparisons with functionalities of existing BI solutions are
made along with an evaluation on the types of OLAP operations, which can be performed by the
provided visualisations.
6.2 Testing
In order to make sure that all components of the application work as intended, as well as that the
whole system functions properly as a whole, various methods of testing were put to use. This
section will highlight these methods and discuss their advantages. As previously discussed, the
MVVM design architecture of the project allows for easy separation for the purpose of testing of
the View Models from their corresponding Views. This testing is done via Unit Tests, which
were run as a step in the Continuous Integration process (Section 3.3). Some functionality was
implemented using a Red-Green-Green Cycle approach, which is an important part of the Test
Driven Development methodology.
6.2.1 Unit Tests
Unit tests were created for all the View Models that contain the presentation logic of the
application, as well as for the Models that contain the business and data logic of the application.
Unit testing the OLAP Metadata retrieval models, which were modified and used from the Ranet
OLAP Open Source library, was not necessary, as it was assumed that this logic was tested by
the original library developers. The NUnit test framework was used for the purpose of creating
these unit tests due to its ease of use and integration capabilities with the TeamCity Continuous
Integration system (Section 3.3).
The two main NUnit functions used in testing were the SetUp and Assertion methods. SetUp is
used to provide a set of functions that are performed before each test method is called and are
common for all test methods. Assertions are the central unit testing mechanism for providing
answers to whether a piece of production code works properly. Multiple types of assertion
methods exist – for checking equality, identity, condition, comparison etc. (67)
6.2.2 Test-driven Development (TDD)
This methodology was chosen for use in the creation of the OLAP data retrieval and querying
mechanism (Section 5.9). The main idea behind this methodology, described initially by Kent
Beck (68), is to simplify code writing by growing the code and its associated test suite in small
increments that are meaningful on their own. This approach is empirically proven of its
dominance over the other Test-Last method creating lower inter-object coupling and allowing for
a better modular design, code reuse and ease of testing (69).
Test code is written in conjunction with production code following a simple three-step cycle:
test-code-refactor or red-green-green (the colours indicate the test results that are seen at the end
of each step):
1. Write failing test (Red) – simplest test that can be thought of that will fail. No production
code can be written in TDD unless there is a failing test that the developer can work
towards.
2. Write production code to make test pass (Green) - without breaking previously working
tests, an implementation of the simplest manner that makes the test pass is written.
3. Refactor code (Green) – design refactoring of both test code and production code that
does not break any existing test cases.
The production code and test suite gradually grow as development continues, while the test suite
acts as documentation that does not go out of date. It also acts as a regression safety net when
bugs are found, as the developer needs to work towards fixing it, without breaking any other
previously written tests. Another benefit of test-driven development is better design, loosely
coupled and easily maintainable, ultimately creating high-quality polished components as part of
the application (70). A set of tests created via the test-driven development cycle applied to the
creation of the OLAP MDX Query Builder (Section 5.9) can be seen (Figure 42 - TDD Created
Test Extract)
Figure 42 - TDD Created Test Extract
6.3 Evaluation
Evaluating the visualisations of the project is done by comparing their functionality in terms of
OLAP operations (Appendix B) supported, as well as dimensions and measures supported. The
second method of evaluation is based on the performance of the visualisations in terms of
algorithms, queries and rendering done.
6.3.1 OLAP Operations Comparison
The comparison seen below (
Table 1) shows that the visualisations provide the necessary base functionality needed, for the
manipulation of OLAP data. Additional operations exist, as described by Mansmann et al. (8),
but this project does not take them into account because it was important to establish only a base
that would provide a capability for future improvements, based on additional needs of the user.
The 3D Cube Visualizer provides functionality similar to the one described in the VR4OLAP
application (Section 2.1.3.), as well as the same amount of dimensions and measures.
Visualisation Dimensions Measures Drill-
Drown
Drill
-Up
Drill
Through Slice Dice Pivot
2D TreeMap 1 2
2D TreeMap /w
Categorical Time-
Series Tool 1 + 1 * 2
3D Cube Visualizer 3 1 or 2
* The additional dimension can only be of a time dimension type
Table 1 - OLAP operation visualisation capabilities
6.3.2 Performance Profiling
Profiling the performance of the visualisations is done using the ANTS Performance Profiler
from Red Gate Software (71), which provides the necessary set of tools for pinpointing
performance issues – execution time and hit count, directly in the source code. A performance
profile of the 3D Cube Visualizer is created, as it was demonstrated that the tool had the ability
to display large amounts of data at the same time. It was interesting to observe how long some
methods took to execute, as well as its memory usage footprint.
3D Cube Visualizer – the main performance method of this component was narrowed down to
the MapAction discussed previously (Section 5.12), which makes calls to methods that retrieve,
populate and render the data on the display, hence taking the most amount of CPU processing
time. Comparisons are made based on an increasing amount of members shown per level for
each axis (Table 2), thus displaying the visualizer capabilities for larger amounts of data. The
actual amount of members that were populated from the data was much smaller than the
maximum amount possible. For the purpose of this profiling exercise, dimensions with multiple
levels are picked.
Table 2 - Performance profile specification. X, Y, Z dimension member amounts.
CPU Timing Profile
A timing profile of the main methods being executed is provided below (Figure 43).
Figure 43 - Main functions performance time profile
Note: MapAction contains PopulateMemberLabels, PopulateData; PopulateData contains
QueryExecution, Create3DDataObjects; Create3DDataObjects contains 3DObjectUINotification call.
An analysis of the MapAction function, along with the functions within it that provide the main
algorithm functionality is seen. The decrease between Profile 1 and Profile 2 execution times is
due to the fact that new objects are cached after they are retrieved from the database. The
operation, which raises the UI notification, takes the most amount of time to execute (within the
Create3DDataObjects method). With every subsequent increase in data, the time percentage that
this notification takes increases. In Profile 6, this is 60% and in Profile 7, it is 74%. Detailed
looks into the other methods show that similar UI notification events for the other operations,
e.g. member labels, take up a similar percentage of the total time.
Memory Usage Profile
A memory usage profile of the three main visual objects displayed on screen is seen below
(Figure 44).
Figure 44 - Main 3D element memory usage profile
The two main 3D elements displayed on screen that increase in memory consumption are the
UIBoxElement3D – used to display the data members in the grid (Appendix C), and
UISphereElement3D – used to display all filter spheres among the grid edges. Based on the
results, a mere 7.3 MB are used to display 33811 data objects. Which when scaled, allows for
impressive results.
7. Conclusion
7.1 Project Achievements
This project has built upon ideas and techniques found in leading commercial Business
Intelligence products and on some of the latest achievements in academic research. A viable,
polished, end-to-end application was created that could serve the needs of Business Analysts and
their need to visualize complex multi-dimensional data in an easy and efficient manner. The
project has been a success, as the initial project aims were met with additional work done (3D
visualization – Section 5.12) to both experiment with new research ideas on visualization
techniques (9), along with an extra capability of handling large amounts of data (Section 6.3)
after some redesign of the system.
In terms of personal achievements, a great amount of knowledge was gained with potential for
practical future use. Coming from a year in industry, with work done related to the Business
Intelligence field, this project provided the perfect opportunity for delving deeper into the typical
challenges that are faced in in the field and to work on solving some of these challenges.
Technically - methodologies and technologies (Sections 3, 4, 5) that were lacked in experience
were researched and used successfully for the implementation of the project – modular
development, visualization techniques and technologies, XAML UI creation and control
templatability etc.
7.2 Future Development
The end-to-end aspect and modular nature of the developed application provide a good basis
point for improving on the design and functionality in various ways that would take it to a level
comparable to some of the commercial products on the market.
7.2.1 Multiple OLAP Data Sources
The current system works on top of the Microsoft Analysis Services OLAP Server, but in order
for it to become a more robust solution, additional OLAP data sources need to be included.
Based on the research in Section 2.3, a variety of mid-tier servers exist. An additional question
that needs to be answered is whether offline sources should be supported in the form of offline
OLAP cubes, as well as if connections to a normal relational database should be included. A
discussion regarding this can be seen in Section 2.4.
In terms of technical implementations - the query standard language MDX is supported
universally across all server solutions and the query mechanism should remain the same if other
OLAP sources are included. Additional data provider adapters will need to be created to handle
the OLAP Metadata object creation, which serves as the basis for the Cube Browser (Section
5.7).
7.2.2 Additional 2D Graphics
The project provides a harness for the inclusion of additional types of graphs. This improvement
will depend on the graphing libraries available for .NET, or on reengineering the system to be
able to use cross-language toolkits that provide more powerful graphing capabilities. A possible
.NET solution, which was researched for use in the project (Section 3.2.2), is the Modern UI
(Metro) Charts for Windows 8, WPF, Silverlight (46) library, which provides fluid visualisations
and robust templatability and functionality.
7.2.3 Algorithm Optimization
The data population algorithms, for the 2D TreeMap Graph in particular, depend on pulling all
the data of an OLAP Hierarchy at once and populating a data structure to be visualized by the
graph (Section 5.10). This creates a problem with the increase in data that is shown on-screen. A
more segmented just-in-time approach is needed – querying the server and generating objects
when needed and including a more complex caching mechanism for these objects, or even
creating them smartly in multiple background threads.
7.2.4 Library Inefficiencies
Based on the observations made in Section 6.3, future work needs to be done to remedy the issue
seen with the raise of the UI notification event. This event is part of the WPF framework itself,
so additional research into possible workarounds will need to be done in order for larger amounts
of data to be displayed.
The library used to incorporate the 2D TreeMap graph visualization, WPF Treemaps &
SquarifiedTreeMaps Control (47), includes inefficiencies in its design and certain bugs found in
the implementation of the components. These were not overlooked in the implementation of the
project, but were also not as major so as to allow for time to be spent in refactoring and fixing
them. The recursive nature of the square map algorithm implemented and the rendering
performance of the library could be improved drastically in terms of performance. Errors in the
rendering appear i.e. rendering an element with a size too small will throw propagating
exceptions – StackOverflowException (72). Another issue relating to the recursiveness of the
algorithm is the visualization of a large amount of members as part of a single OLAP level –
again running out of memory for the calculations needed to be done.
7.2.5 3D Visualisation Functionality Expansion
The 3D experimental visualization (Section 5.12) provides a great starting point for the addition
of functionality related to OLAP operations not discussed in Appendix B. The efficiency of the
3D library used allows for work to be done in increasing the amount of shown dimensions and
measures from the base 3 dimensions/2 measures model implemented in the project – moving
into the unknown (no such solutions at the time of writing exist for OLAP visualization).
Appendix
Appendix A: Term Glossary
A.1 OLAP Terms
OLAP - OnLine Analytical Processing evolved as a powerful spreadsheet tool, with flexibility in
navigation of complex data. Data always involves multiple dimensions, with multiple levels (4).
Interactive OLAP - browsing is done by human analysts, with the ability to ask questions and
receive immediate answers. Data summaries are pre-calculated ahead of time to increase
browsing speed.
Cube – primary OLAP structure for data viewing; it is similar to a table in a relational database.
Although three dimensions are implied, a cube can have n number of dimensions.
Dimension – identifies and categorizes the data. Provides a perspective used for looking at the
data, E.g. Product, Time, Customer Age, Employee.
Hierarchy – dimensions contain hierarchies, while hierarchies contain levels. Hierarchies are a
way to organize data according to levels. A dimension may contain multiple hierarchies, where
the same data is organized in different ways, E.g. Calendar Year and Fiscal Year.
Level – contained in a hierarchy, structures the data into logical summarized steps, E.g. Year is
the top level, followed by Quarter as the second level.
Member – part of a level, the most detailed element of the metadata structure, E.g. Specific date
– “1 January, 2014”.
Named Set - collection or group of members organized by a certain condition.
Measures –also known as Facts, or the numbers and values seen in an OLAP spreadsheet.
Measure Group – combination of related measures in a group. These measures are also
normally related to a set of specific dimensions for which data exists.
MDX – querying language for OLAP. Similar to SQL in appearance, but has a different purpose
– to query and browse through data. No data modification is possible like in SQL.
A.2 Prism Framework Terms
Modules – independent packages that represent business-related functionality, encapsulating all
components needed for it. These can be developed, tested and deployed independent of one
another thanks to the Modularity principle (Section 4.2).
Module Catalog – specifies the modules to be loaded, when they should be loaded, where they
are located and in what order the load should take place.
Shell – top-level window which hosts content contributed by modules. Defines the overall layout
of the application, but is unaware of the exact modules that it hosts.
Commands – encapsulation of application functionality, separating it from the UI and allowing
for segregated testing.
Regions – UI placeholders for views. Allow for flexible UI updating, without need of application
logic change.
Navigation – the resulting change to the UI display of a user’s interaction with the application.
Dependency injection container – injects services and other dependencies modules require,
based on the Inversion of Control principle (Section 4.3).
Services – allow for encapsulation of non-UI functionality, to be used as cross-cutting concerns
throughout the application. Typical examples of services are logging, exception management and
data access.
Bootstrapper – performs initialization, displays the shell, creates the module catalog, and loads
the modules.
Multi-targeting – targeting multiple technologies at once, where code can be reused with ease –
WPF and Silverlight can use a similar code-base.
Appendix B: OLAP Operations
When working with and analysing OLAP data, the business intelligence tool needs to implement
a certain set of OLAP operations. These operations are normally used to evaluate the capabilities
of a specific visualization – Pivot, Drill Down / Drill Up, Slice, Dice.
B.1 Pivot
With the Pivot operation, an analyst can rotate the cube to see its various faces. There are no
changes in the displayed data, only in how it is displayed (Figure 45).
Figure 45 - Pivot OLAP Operation
B.2 Drill Down / Drill Up
With these operations, an analyst can navigate between different levels of the hierarchical data.
The figure shows a drill down operation – moving from left to right. The opposite is a drill up –
moving from a lower to a higher level of the data (Figure 46).
Figure 46 - Drill Down/Drill Up OLAP Operation
B.3 Slice
With the slice operation, an analyst is able to select a subset of the data, thus creating a new cube
with fewer dimensions (Figure 47).
Figure 47 - Slice OLAP Operation
B.4 Dice
With the Dice operation, which is similar to the Slice, an analyst can select multiple subsets of
the data, again creating a new cube with fewer dimensions (Figure 48).
Figure 48 - Dice OLAP Operation
B.5 Drill-Through
Operation used to retrieve the actual fact data behind the aggregates.
Appendix C: OLAPVisi Step-Through Example
Figure 49 - Initial application screen. The initial steps to take are to select a database and then select a cube
from that database to explore.
Figure 50 - Prompt for cube selection.
Figure 51 - The user can explore the cube browser, selecting specific hierarchies for visualisation.
Figure 52 - Selected choices populate a slide-out menu, indicating the types of OLAP data. In the case of
hierarchies: the number of levels contained within.
Figure 53 – By clicking the middle add button, a menu visualising the types of graphs available is presented.
Figure 54 - An initial view of the TreeMap graph visualisation. The user populates the data specifics and uses
the Plot button to render the visualisation.
Figure 55 - All additional controls are retractable and can be hidden away. Elements can be hovered over and
selected. Drilling down displays data at a lower level of the hierarchy.
Figure 56 - Slicing into the data, the selected element is maximised. The context is saved and is displayed to
the user by means of a breadcrumb at the top of the graph.
Figure 57 - After a slice, moving up in levels is done via a triangle button in the top right corner.
Figure 58 - The categorical time-series analyser tool located at the bottom of the graph. A specific date level is
selected for exploration. The user can slide through the members of the level, automatically filtering the data
and re-rendering the graph above it.
Figure 59 - Initial 3D view of the data. The user has selected the hierarchies and measures they want to
visualise and has clicked on the Map button.
Figure 60 - Selecting a 3D element displays a legend with additional info about the member. The member is
localised on screen by greying out all other elements.
Figure 61 - Selecting a corner level sphere, an additional prompt with information about the axis is displayed.
It provides the OLAP functionality needed to explore the data further.
Figure 62 – Slicing on a specific member.
Figure 63 - Drilling down into a specific member (Accessories) from previous step.
Figure 64 - Displaying a large amount of elements for the lower levels of the data is not a problem for the tool.
This data can be looked at more thoroughly using the slice functionality provided.
Bibliography
1. Varshney, K. R. Introduction to Business Analytics.
http://informationashvins.files.wordpress.com/2012/04/varshney_icassp2012.pdf. [Online] 2012.
2. Boukraâ, D., Boussaïd, O. and Bentayeb, F. OLAP Operators for Complex Object Data
Cubes. Advances in Databases and Information Systems, Lecture Notes in Computer Science Vol
6295. 2011, pp. 103-116.
3. Han, J., Kamber, M. and Pei, J. Data Mining Concepts and Techniques. s.l. : Morgan
Kauffman, 2012.
4. Singh, K. and Bhasin, Dr. S. Constructing the OLAP Cube from Relational Databases/Flat
Files. International Journal of Computer Trends and Technology. 2011, Vol. 2, pp. 167-177.
5. Dang, Luan Quang. A functional framework for evaluating visualization applications, with a
focus on financial analysis problems. 2012. Masters Thesis.
6. Mansmann, S., Mansmann, F., Scholl, M., Keim, D. Hierarchy-driven Visual Exploration of
Multidimensional Data Cubes. 2007. Paper presented at the meeting of the BTW.
7. Mansmann, S., Scholl, M. Exploring OLAP Aggregates with Hierarchical Visualization
Techniques. 2007 : s.n. ACM SAC 2007: Proc. of 22nd Annual ACM Symposium on Applied
Computing.
8. —. Visual OLAP: A new paradigm for exploring multidimensional aggregates. 2008 : s.n. in
Proc. of IADIS Int'l Conf. on Computer Graphics and Visualization.
9. Lafon, S., Bouali, F., Guinot, C., Venturini, G. "3D and Immersive Interfaces for Business
Intelligence: The Case of OLAP," Information Visualisation (IV), 2013 17th International
Conference. 16-18 July 2013.
10. Tibco Spotfire. [Online] http://spotfire.tibco.com/.
11. Tableau Software. [Online] http://www.tableausoftware.com/.
12. Telerik RadControls. [Online] http://www.telerik.com/products/wpf/overview.aspx.
13. Ranet OLAP. [Online] Galantis. http://www.galantis.com/ranet/.
14. SharpShooter OLAP. [Online] Perpetuum Software.
http://www.perpetuumsoft.com/Product.aspx?lang=en&pid=32&tid=features.
15. RadarCube Windows Forms. [Online] RadarSoft.
16. ComponentOne OLAP for WinForms. [Online] Component One.
https://www.componentone.com/SuperProducts/OLAPWinForms/.
17. Stolte. C, Tang D. and Hanrahan, P. Multiscale Visualization Using Data Cubes. s.l. :
IEEE Trans. Visualization and Computer Graphics, vol. 9, no. 2, 2003, pp. pp. 176-187.
18. Walter, J. et al. Interactive Visualization and Navigation in Large Data Collections Using
the Hyperbolic Space. s.l. : Proc 3rd IEEE International Conference Data Mining, IEEE CS,
2003, pp. pp. 355-365.
19. Techapichetvanich, K. and Datta, A. Interactive Visulization for OLAP. s.l. : Proc.
International Conference Computational Science and its Applications, LNCS 3842, Springer,
2005, pp. pp. 293-304.
20. Piringer, H. and Buchetics, M. Hierarchical Difference Scatterplots: Interactive Visual
ANalysis of Data Cubes. s.l. : ACM SIGKDD Explorations Newsletter, vol. 11, no. 2, 2009, pp.
pp. 49-58.
21. GapMinder. [Online] http://www.gapminder.org/.
22. Taylor, J., StatSlice Systems. Tableau Dashboards Sourced from OLAP vs. RDB: An
Analysis. 2013.
23. Chaudhuri, S, Dayal, U and Narasayya, V. An overview of business intelligence
technology. Comms. ACM. 2011, pp. 88-98.
24. Urbanek, Stefan. Cubes - OLAP Framework Documentation. [Online] 2013. [Cited: 11 10
2013.] http://databrewery.org/cubes/doc/backends/sql.html.
25. Comparison of OLAP Servers. Wikipedia. [Online] [Cited: 11 10 2013.]
http://en.wikipedia.org/wiki/Comparison_of_OLAP_Servers.
26. icCube Licensing Comparison. icCube. [Online] [Cited: 11 10 2013.]
http://www.iccube.com/purchase/edition-comparison.
27. Mansmann, S., Rehman, N., Weiler, A., Scholl, M. Discovering OLAP dimensions in
semi-structured data. s.l. : In Proceedings of the fifteenth international workshop on Data
warehousing and OLAP (DOLAP '12), 2012.
28. Cuzzocrea, A., Domenico, S., Ullman, J. Big Data: A Research Agenda. s.l. : In
Proceedings of the 17th International Database Engineering & Applications Symposium (IDEAS
'13), 2013.
29. McAfee, A., Brynjolfsson, E. Big Data: The Management Revolution. Harvard Business
Review. [Online] http://hbr.org/2012/10/big-data-the-management-revolution/ar.
30. Provost, F., Fawcett, T. Data science and its relationship to big data & data-driven decision
making. s.l. : O'Reilly Media, 2013.
31. Intel. Big Data Visualization: Turning Big Data Into Big Insights. 2013.
32. Agrawal, D., Das, S., Abbadi, A. E. Big data and cloud computing: current state and future
opportunities. 2011.
33. Sitaridi, E., Ross, A. Ameliorating memory contention of olap operators on gpu processors.
2012.
34. Cuzzocrea, A., Sacca, D., Serafino, P. A hierarchy-driven compression technique for
advanced olap visualization of multidimensional data cubes. 2006.
35. —. Semantics-aware advanced olap visualization of multidimensional data cubes. 2007.
36. VMware. [Online] http://www.vmware.com/uk/products/workstation/.
37. Git. [Online] http://git-scm.com/.
38. Atlassian BitBucket. [Online] https://www.atlassian.com/software/bitbucket/overview.
39. Git Source Control Provider. [Online]
http://visualstudiogallery.msdn.microsoft.com/63a7e40d-4d71-4fbb-a23b-d262124b8f4c.
40. TortoiseGit. [Online] https://code.google.com/p/tortoisegit/.
41. TeamCity. [Online] http://www.jetbrains.com/teamcity/.
42. YouTrack. [Online] http://www.jetbrains.com/youtrack/.
43. Windows Presentation Foundation. [Online] http://msdn.microsoft.com/en-
us/library/ms754130(v=vs.110).aspx.
44. Microsoft PRISM. [Online] http://compositewpf.codeplex.com/.
45. OxyPlot. [Online] http://oxyplot.codeplex.com/.
46. Modern UI (Metro) Charts for Windows 8, WPF, Silverlight. [Online]
http://modernuicharts.codeplex.com/.
47. WPF Treemaps & SquarifiedTreeMaps Control. [Online] http://treemaps.codeplex.com/.
48. Johnson, B. and Shneiderman, B. Treemaps: a space-filling approach to the visualization of
hierarchical information structures. October 1991, pp. 284-291. In Proc. of the 2nd International
IEEE Visualization Conference.
49. Shneiderman, B. Tree visualization with tree-maps: a 2d space-filling approach. ACM
Transactions on Graphics. September 1992.
50. Bruls, Mark, Huizing, Kees and van Wijk, Jarke J. Squarified treemaps. Data
Visualization 2000: Proc. Joint Eurographics and IEEE TCVG Symp. on Visualization. 2000.
51. Helix 3D Toolkit. [Online] http://helixtoolkit.codeplex.com/.
52. Adventure Works Data Samples. [Online] http://msftdbprodsamples.codeplex.com/.
53. Adventure Works SQL Server 2012 Samples. [Online]
http://msftdbprodsamples.codeplex.com/releases/view/55330.
54. Larman, Craig. Applying UML And Patterns Second Edition.
55. Shore, James and Warden, Shane. The Art of Agile Development. s.l. : O'Reilly, 2008.
56. Humble, Jez and Farley, David. Continuous Delivery. s.l. : Addison-Wesley, 2011.
57. Brumfield, B. et al. Developer's Guide to Microsoft Prism 4. [Online] 2011.
http://msdn.microsoft.com/en-us/library/gg406140.aspx.
58. Shalloway, A. and Trott, J. Design Patterns Explained: A New Perspective on Object-
Oriented Design. s.l. : Addison-Wesley Professional, 2004, Chap. 1.
59. Gamma, E., Helm, R., Johnson, R., Vlissides, J. Design patterns: elements of reusable
object-oriented software. s.l. : Addison Wesley, 1995.
60. Fowler, Martin. Application Controler. [Online]
http://martinfowler.com/eaaCatalog/applicationController.html.
61. —. Inversion of Control Containers and the Dependency Injection pattern. [Online]
http://www.martinfowler.com/articles/injection.html.
62. —. Registry. [Online] http://martinfowler.com/eaaCatalog/registry.html.
63. Microsoft. Windows Forms Controls and Equivalent WPF Controls. [Online]
http://msdn.microsoft.com/en-us/library/ms750559(v=vs.110).aspx.
64. —. HierarchicalDataTemplate Class. [Online] http://msdn.microsoft.com/en-
us/library/system.windows.hierarchicaldatatemplate.aspx.
65. Leparmentier, Guillaume. Manipulating colors in .NET. Code Project. [Online] 2007.
http://www.codeproject.com/Articles/19045/Manipulating-colors-in-NET-Part.
66. Adaptive Coloring for Syntax Highlighting. Qt Quarterly. [Online]
http://doc.qt.digia.com/qq/qq26-adaptivecoloring.html.
67. NUnit. [Online] http://www.nunit.org/index.php?p=home.
68. Beck, Kent. Test-Driven Development by Example. s.l. : Addison Wesley - Vaseem, 2003.
69. Madeyski, Lech. Test-Driven Development - An Empirical Evaluation of Agile Practice.
s.l. : Springer, 2010.
70. Palermo, Jeffrey. Guidelines for Test-Driven Development. [Online] Microsoft.
http://msdn.microsoft.com/en-us/library/aa730844(v=vs.80).aspx.
71. ANTS Performance Profiler. [Online] Redgate Software. http://www.red-
gate.com/products/dotnet-development/ants-performance-profiler/.
72. WPF Treemaps & SquarifiedTreeMaps Control Issues. [Online]
http://treemaps.codeplex.com/workitem/5307.