OLAP and Visualisation - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/3rd-year-projects/... · Table of Contents 1. ... Figure 8 - Telerik RadControls

OLAP and Visualisation

University of Manchester

School of Computer Science

Third Year Project Report 2013-2014

Author: Milen Kindekov

Supervisor: Professor John Keane

Abstract

Online Analytical Processing, a broader category of Business Intelligence, deals with the

analysis of multidimensional data in an interactive manner from different perspectives. It allows

analysts to explore vast amounts, giving them the ability to ask questions and receive immediate

answers. This report discusses the major Business Intelligence and OLAP tools available

commercially and builds upon techniques and ideas appearing in recent academic research on the

topic. An application is developed, utilising an agile approach and using a variety of tools and

methodologies, implementing an exploratory driven browser for the hierarchical nature of OLAP

cubes. The data within them is displayed using 2D visualizations – interactive

TreeMap/HeatMap with a categorical time-series analyser and alternatively using an

experimental interactive 3D display. The design and implementation details of this application,

in the report, are followed by an overview of the testing methodologies used in the process of

development. A final evaluation of the implemented visualizations is made, along with a final

discussion on possible work that would add value to the developed application in future.

Acknowledgements

I would like to express my gratitude to my supervisor, Professor John Keane, for his support and

clear guidance throughout this project – providing me with a continuous stream of background

literature to review and for pushing me past my initial goals.

I would also like to thank my father for his unwavering support and ideas that helped me

overcome some of the challenges I faced during the project, as well as for the time he took to

help proofread this report.

Lastly, I want to add a thank you to all my family and friends for the encouragement, patience

and understanding they have shown me.

Table of Contents

1. Introduction

1.1 Business Intelligence and OLAP

1.2 Graphical Analysis in BI

1.3 Project Aim

1.4 Report Structure

2. Background

2.1 Business Intelligence Tools

2.1.1 BI Commercial Solutions

2.1.2 OLAP Specific Toolkits

2.1.3 Academic Solutions

2.2 OLAP vs. Relational Database Sourcing for Visualization

2.3 OLAP Servers

2.3.1 Overview

2.3.2 Server Comparison

2.4 Big Data

3. Approach

3.1 Overview

3.2 Project Environment and Technologies

3.2.1 Platform

3.2.2 Technologies

3.2.3 Sample Data

3.3 Agile Processes

3.3.1 Agile Practices Selection

3.3.2 Agile Practices Application

4. Design

4.1 Overview

4.2 Design Principles

4.3 Design Patterns

4.4 Project Set-Up Step Activities

4.5 Abstract High-Level System Design

4.6 Side Module Design

4.7 Main Module Design

5. Implementation

5.1 Overview

5.2 Base System Framework

5.3 UI Templatability

5.4 Component Interaction

5.5 Database and Cube Selector

5.6 OLAP Metadata Retrieval

5.7 Cube Browser

5.8 Main Screen

5.9 OLAP Data Retrieval, Query Mechanism.

5.10 TreeMap Data Visualizer

5.11 Categorical Time-Series Analyser

5.12 3D Data Visualizer

6. Testing and Evaluation

6.1 Overview

6.2 Testing

6.2.1 Unit Tests

6.2.2 Test-driven Development (TDD)

6.3 Evaluation

6.3.1 OLAP Operations Comparison

6.3.2 Performance Profiling

7. Conclusion

7.1 Project Achievements

7.2 Future Development

7.2.1 Multiple OLAP Data Sources

7.2.2 Additional 2D Graphics

7.2.3 Algorithm Optimization

7.2.4 Library Inefficiencies

7.2.5 3D Visualisation Functionality Expansion

Appendix

Appendix A: Term Glossary

A.1 OLAP Terms

A.2 Prism Framework Terms

Appendix B: OLAP Operations

B.1 Pivot

B.2 Drill Down / Drill Up

B.3 Slice

B.4 Dice

B.5 Drill-Through

Appendix C: OLAPVisi Step-Through Example

Bibliography

Table of Figures

Figure 1 - Borders, Patterns, Trends and Deviations in Data Flow ................................................ 9

Figure 2 – Tibco Spotfire Example #1 (10) .................................................................................. 11

Figure 3 - Tibco Spotfire Example #2 (10) ................................................................................... 12

Figure 4 - Tableau Software Example #1 (11) .............................................................................. 13

Figure 5 - Tableau Software Example #2 (11) .............................................................................. 13

Figure 6 - Tableau Software Bubble Graph (11) .......................................................................... 13

Figure 7 - Tableau Software TreeMap Graph (11) ....................................................................... 13

Figure 8 - Telerik RadControls (12) ............................................................................................. 14

Figure 9 - GapMinder categorical time-series main screen (21) .................................................. 15

Figure 10 - Star Schema (24) ........................................................................................................ 17

Figure 11 - Snowflake Schema (24) ............................................................................................. 18

Figure 12 - Project Environment................................................................................................... 21

Figure 13 - Helix 3D Toolkit Examples (51) ................................................................................ 24

Figure 14 - AgileBoard YouTrack (42) ........................................................................................ 26

Figure 15 – Project folder ............................................................................................................. 27

Figure 16 – Learning Spike Week 7 ............................................................................................. 27

Figure 17 – Learning Spike Week 8 ............................................................................................. 27

Figure 18 - TeamCity Flow (41) ................................................................................................... 27

Figure 19 - Composite Application Design Patterns (57) ............................................................. 29

Figure 20 - MVVM Pattern Architecture (57) .............................................................................. 30

Figure 21 - Prism Composite Application Creation Activities (57) ............................................. 32

Figure 22 - Abstract high-level module view ............................................................................... 33

Figure 23 - Side module design diagram ...................................................................................... 34

Figure 24 - Main module design diagram ..................................................................................... 35

Figure 25 - Connecting to the Prism Framework (57) .................................................................. 36

Figure 26 - Merging Generic Resources into Views .................................................................... 37

Figure 27 - Original and Templated Window Style ...................................................................... 37

Figure 28 - Templated Button Results Example ........................................................................... 38

Figure 29 - Default and Templated TreeView Control ................................................................. 38

Figure 30 - Publish/Subscribe Event Aggregator Service (57) ..................................................... 39

Figure 31 - Database Selector Pop-Up.......................................................................................... 39

Figure 32 - Cube Selector Pop-Up ................................................................................................ 39

Figure 33 - Cube Browser View Example #1 ............................................................................... 41

Figure 34 - Cube Browser View Example #2 ............................................................................... 41

Figure 35 - MainGraphView.xaml Data Template Selector. Selects an appropriate UI template

based on the view model type passed to the view. ....................................................................... 42

Figure 36 - Cube Choice Selector with user choices over main screen. ....................................... 43

Figure 37 - WPF Treemaps & SquarifiedTreeMaps Control Example - original library

capabilities (47) ............................................................................................................................. 44

Figure 38 - OlapVisi TreeMap visualisation - provides control over the element depth,

breadcrumb bar at top saving context, functionality to drill down/drill up and slice into the data.

....................................................................................................................................................... 45

Figure 39 - Categorical Time-Series data browser and filtering tool. .......................................... 46

Figure 40 - 3D OLAP visualizer application example.................................................................. 47

Figure 41 - HSV Cylindrical-Colour Model - Hue, Saturation, Value (66) ................................. 48

Figure 42 - TDD Created Test Extract .......................................................................................... 50

Figure 43 - Main functions performance time profile .................................................................. 52

Figure 44 - Main 3D element memory usage profile .................................................................... 53

Figure 45 - Pivot OLAP Operation ............................................................................................... 58

Figure 46 - Drill Down/Drill Up OLAP Operation ...................................................................... 58

Figure 47 - Slice OLAP Operation ............................................................................................... 59

Figure 48 - Dice OLAP Operation ................................................................................................ 59

Figure 49 - Initial application screen. The initial steps to take are to select a database and then

select a cube from that database to explore. ................................................................................. 60

Figure 50 - Prompt for cube selection........................................................................................... 60

Figure 51 - The user can explore the cube browser, selecting specific hierarchies for

visualisation. ................................................................................................................................. 61

Figure 52 - Selected choices populate a slide-out menu, indicating the types of OLAP data. In the

case of hierarchies: the number of levels contained within. ......................................................... 61

Figure 53 – By clicking the middle add button, a menu visualising the types of graphs available

is presented.................................................................................................................................... 62

Figure 54 - An initial view of the TreeMap graph visualisation. The user populates the data

specifics and uses the Plot button to render the visualisation. ...................................................... 62

Figure 55 - All additional controls are retractable and can be hidden away. Elements can be

hovered over and selected. Drilling down displays data at a lower level of the hierarchy. .......... 63

Figure 56 - Slicing into the data, the selected element is maximised. The context is saved and is

displayed to the user by means of a breadcrumb at the top of the graph. ..................................... 63

Figure 57 - After a slice, moving up in levels is done via a triangle button in the top right corner.

....................................................................................................................................................... 64

Figure 58 - The categorical time-series analyser tool located at the bottom of the graph. A

specific date level is selected for exploration. The user can slide through the members of the

level, automatically filtering the data and re-rendering the graph above it. ................................. 64

Figure 59 - Initial 3D view of the data. The user has selected the hierarchies and measures they

want to visualise and has clicked on the Map button.................................................................... 65

Figure 60 - Selecting a 3D element displays a legend with additional info about the member. The

member is localised on screen by greying out all other elements................................................. 65

Figure 61 - Selecting a corner level sphere, an additional prompt with information about the axis

is displayed. It provides the OLAP functionality needed to explore the data further. .................. 66

Figure 62 – Slicing on a specific member. ................................................................................... 66

Figure 63 - Drilling down into a specific member (Accessories) from previous step. ................. 67

Figure 64 - Displaying a large amount of elements for the lower levels of the data is not a

problem for the tool. This data can be looked at more thoroughly using the slice functionality

provided. ....................................................................................................................................... 67

Table of Tables

Table 1 - OLAP operation visualisation capabilities .................................................................... 51

Table 2 - Performance profile specification. X, Y, Z dimension member amounts. .................... 52

1. Introduction

1.1 Business Intelligence and OLAP

Business Intelligence (BI) software encompasses a set of support technologies with the main

purpose of providing knowledge for the enterprise. It allows and enables the people responsible

for this knowledge – executives, managers and analysts to make faster and better decisions (1).

With the decrease of cost related to data acquirement and storage, businesses have focused on

increasing their competitive edge by procuring and storing as much as possible for analysis. This

data does not only grow in volume, but also in complexity, thus creating the need for different

multidimensional models, which more closely reflect the decision makers’ analytical view of it

(2).

Online analytical processing (OLAP) tools that support multi-dimensional analysis and decision

making within the context of Business Intelligence are now extremely common. These are a set

of analysis techniques allied to visualization approaches that allow information to be viewed

from many angles such as summarization, consolidation and aggregation (3). OLAP is flexible

and powerful with an ability to navigate easily between different views of the data. It is

interactive in the sense that analysts can constantly ask new questions and get immediate answers

back from the data. Singh and Bhasin (2011) state that OLAP cannot be achieved unless the data

analysis application used returns results of queries immediately (4), which means that

performance is a key factor for OLAP.

Based on this information, vital components to any BI OLAP System are its underlying server

architecture – providing latency-free access to the data, along with the analytics software front-

end that is intuitive for the user and provides the functionality needed for the decision making

process.

1.2 Graphical Analysis in BI

Many methods for the exploration and analysis of this data have been researched over the past

two decades with a particular focus on Business Analytics through graphical and visual

interfaces and tools. Humans possess great cognitive skills, both visual and spatial – edge and

discontinuity detection; pattern recognition; the use of visual cues for information retrieval (5).

The analysis of data visually therefore has great advantages over many other existing analytic

techniques. A number of visual interfaces have been witnessed for OLAP lately, ranging from

dashboards, charts, maps and scatterplots to interactive features - brushing and filtering, slicing,

zooming.

Figure 1 - Borders, Patterns, Trends and Deviations in Data Flow

Presenting the data visually allows for experts to quickly reveal patterns and recognize trends or

deviations in the normal flow of the data (Figure 1). They can visually specify a part they are

interested in or have a hypothesis about and interact with it directly (6).

1.3 Project Aim

This project takes into account the aforementioned with the goal to create a modular solution

package for the .NET framework to be used for the analysis of OLAP data cubes with both

extensibility and scalability in mind. The primary focus was on various types of visualization

techniques that make analysis and pattern-recognition of complex data sets fast and easy. A

selection of ideas from various business intelligence software and toolkits were consolidated into

one, with the possibility to build up from this and create an application that can be reused in the

future. Additional research into recent academic work was done in order to refine some of these

ideas, experiment with them and produce a viable OLAP visualization application.

The main techniques used are ones related to the hierarchical visualisation of data in data cubes

(6) (7) (8), with additional categorical time-series analysis for additional dimensionality , as well

as visualising the same data using a 3D OLAP interface (9). The main project aims were

specified as follows:

Create an end-to-end Business Intelligence solution.

Provide various OLAP visualization capabilities.

Provide a polished and user-friendly interface.

Create an extensible solution for future improvements.

1.4 Report Structure

A brief overview of the rest of the report structure follows:

Chapter 2: Background

This chapter provides a description covering the current commercial and academic trends in the

Business Intelligence domain, which includes a look into the leading BI solutions currently on

the market. A comparison is made between typical OLAP servers and Relational Databases and

the benefits of choosing the right server solution when developing such a visualization

application. Furthermore, a discussion on the current Big Data trend is included about how it

affects and will affect the field in the future.

Chapter 3: Approach

This chapter covers the approach taken in the development of the project. This includes the

choices of platforms and technologies, as well as agile methodologies and practices used

throughout the course of this project.

Chapter 4: Design

This chapter gives details about the design methodologies, patterns used in the project and

specifics of the main framework capabilities that affect the design of the project. It finishes with

a look into the high-level abstract view of the workings of the system

Chapter 5: Implementation

This chapter details the implementation specifics of the main system components – how they

function and connect with the rest of the system.

Chapter 6: Testing and Evaluation

This chapter discusses the main testing practices and methodologies used for the project.

Furthermore, it includes an evaluation of the different visualisations provided in the application

in terms of their performance specifics and the functionality they provide in OLAP terms.

Chapter 7: Conclusion

This chapter describes the conclusions drawn at the end of the project in terms of achievements

both on a project and on a personal level. It also includes a description of possible future

extensions and developments that would increase the value of the developed application.

2. Background

2.1 Business Intelligence Tools

2.1.1 BI Commercial Solutions

Many commercial and open source general-purpose Numerical Data Visualization solutions

exist. These are products designed to be used across different industries for various visualization

and analytical tasks. They tend to include state-of-the-art visualization techniques that are tested

and guaranteed for the job of visual analysis of large amounts of data. Studying these solutions

provides answers and possible approaches when developing any type of visualization software.

Two of the most popular commercial solutions are Tibco Spotfire (10) and Tableau (11).

Tibco Spotfire - provides interactive analysis of multidimensional data, with a wide-range of

possible visualizations available. Relating to personal experiences, this product has been

practically used at Merrill Lynch, showcasing great capabilities with positive feedback coming

directly from management on the analysis done with it.

An example of data visualization for airline incident data observed in Spotfire (Figure 2) shows

different types of viable ideas related to the UI element distribution across the screen. There is a

flexible filtering system and measure selection criteria. The filter module can be undocked and

dragged across the screen, providing increased UI flexibility. Interactive data visualization,

zooming and multiple-selection connected to a representation of the raw data along with the

ability to export the chart images themselves.

Figure 2 – Tibco Spotfire Example #1 (10)

Another example of data visualization in Spotfire (Figure 3) includes a scatter plot graph. There

is easy to use filtering and interactive data selection. Legend and visualization change

dynamically based on filtering set. Status bar displays load and error information when changing

between different graphs. The demonstration page on the Spotfire website includes many other

different examples of possible data visualizations.

Figure 3 - Tibco Spotfire Example #2 (10)

Tableau Business Intelligence and Analytics Software – provides a powerful business solution

for big data analysis, data discovery, easy creation of business dashboards, data visualization,

mobile BI etc. A highlight of Tableau is its ease of use, which means that more time can be spent

on analysing the data and answering questions rather than on learning how to use the tool itself.

Tableau can connect to a wide range of data sources providing both server and file connections.

It supports live connections to the DB or the ability to bring data into memory for faster or

offline work. An interesting feature is the ability to easily combine multiple data sources for a

single analysis and visualization task, automatically blending the data on common fields and

filtering across data sources in real time.

Tableau uses a tabular structure with different worksheets that have varying functions. There is a

worksheet view (Figure 4) that connects to different data sources. Dimensions and measures are

automatically recognized and additional measures can be created by the user. Visualizations are

created by dragging and dropping the different dimensions and measures onto the specific fields.

Filtering, colouring schemes and levels of detail can be specified easily.

Each created chart from the data source can then be added to a dashboard view (Figure 5) and

positioned according to the users’ needs. If common fields exist between these visualizations,

filtering and highlighting data on the dashboard will automatically filter and highlight the data

between the different charts.

Figure 4 - Tableau Software Example #1 (11)

Figure 5 - Tableau Software Example #2 (11)

Tableau continuously improves, releasing support for a selection of visual graph options for the

representation of the data (Figure 6). A useful functionality very recently added into the software

is the visualization selection menu (Figure 7). This menu guides the user by showing them the

available charting options for the type of data he has selected, as well as providing them with an

easy visual way to change charts quickly and hassle free.

Figure 6 - Tableau Software Bubble Graph (11)

Figure 7 - Tableau Software TreeMap Graph (11)

Telerik RadControls - another commercial product, which provides a customizable framework

for WPF (Windows Presentation Foundation) applications, with great visualization capabilities

(12). This is not a full-blown BI software package, but rather a flexible plug-in for any

application that requires charting functionality. This example (Figure 8) shows the possibilities

when creating visualizations for the .NET Framework.

Figure 8 - Telerik RadControls (12)

The greatest set-back with this type of commercial software is the high overhead costs involved

with utilizing them in practice. Licenses are extremely expensive and additional resources are

needed for maintenance and training including the need for a software expert on-site. This means

that these solutions are viable only for very large firms where the benefits of use of

comprehensive and professionally supported technologies outweigh the high-costs (5).

2.1.2 OLAP Specific Toolkits

Framework libraries exist that provide OLAP functionality to be plugged into existing projects.

These solutions have a similar generic look and feel to one another. Some examples being: Ranet

OLAP component library (13), Sharpshooter OLAP (14), RadarCube WinForms OLAP (15),

ComponentOne WinForms OLAP (16).

A common occurrence in the OLAP toolkits for visualization listed here is that they are all based

on Excel’s PivotTable component. They typically provide the functionality seen in Excel where

the users have to navigate through data situated in tables, along with some simple charting

controls and exporting capabilities.

A major aim for this project is to try and move away from the technique used in the existing

OLAP functionality components on the market by employing different types of methods and

visualizations that ease the user experience when analysing OLAP data.

2.1.3 Academic Solutions

Various approaches have been proposed and exist for the appropriate arrangement and

visualisation of data elements of OLAP cubes, as well as their relationships. These vary in the

applied techniques and some are used as the base for the application developed for this project.

Viewing data cubes with a multi-scale visualization system (17).

High-dimensional data cubes in a hyperbolic space (18).

Built-in graphic elements within varying data levels of data cubes (19).

Hierarchical visualisation of data cubes using Enhanced Decomposition Trees (6) (7) (8).

Data cubes displayed using Hierarchical Difference Scatterplots (20).

Immersive 3D interface for data cubes (9).

According to Mansmann et al. (6) (7) (8), the basis of explorative analysis is related to insights

acquired in the course of interaction. The ability to visualize data in a hierarchical nature allows

for the preservation of the entire interaction, thus allowing the user to more naturally

comprehend the data being explored. This approach is used in this project to explore the structure

of OLAP cubes, both in a textual browser (Section 5.7) and as a TreeMap visualisation (Section

5.10).

Figure 9 - GapMinder categorical time-series main screen (21)

The visualisation of hierarchical data using TreeMaps allows for only one dimension and up to

two measures to be displayed at a time. In order to improve upon this, the following two

applications were researched additionally:

GapMinder – (Figure 9) a state-of-the-art application providing interactive categorical time-

series analysis (21). This solution provides customisable scatter charts, which are used to fluidly

explore a set of data over a specific time-span. A slider moves across a set of dates,

automatically refreshing the data on the display. The ideas seen in this application are used to

increase the explored dimensionality of the OLAP cube (Section 5.11), allowing for the

exploration of multiple date levels rather than only one.

VR4OLAP – (9) solution used to visualise three dimensions of an OLAP data cube and up to

two measures at the same time in an immersive 3D environment. It allows for the use of a 3D

stereoscopic screen with a 3D mouse and provides all the basic functionality needed for OLAP

analysis (Appendix B). Based on evaluations of the system, Lafon et al. conclude that 3D

representation was highly-regarded amongst users, but that the 2D performance results were

equal or better than those explored in 3D. The 3D visualisation component in this project

(Section 5.12) is based on the design of VR4OLAP with an added incentive for improving

performance of the interactive visualisation.

2.2 OLAP vs. Relational Database Sourcing for Visualization

An important question that arises when developing a BI Data Visualization application such as

the one in this project is from where and how the will the data be sourced for analysis. Should

the application focus on connecting to a mid-tiered, dedicated OLAP server, or provide direct

access to the relational data in the warehouse itself.

In the context of the Tableau commercial software, Taylor (2013) offers a viable comparison

between using data sourced from OLAP and RDB when visualizing data in a BI Data

Visualization Dashboard (22). The paper lists the typical scenario a user may face, how it is

portrayed with an OLAP or an RDB system, as well as the typical risk such a problem might

create.

When talking about visual analysis of data, OLAP sources truly outperform RDB ones. Taking

all this into account, the most viable approach for this project would be to focus on providing

data already organized into OLAP cubes. The option to tend to the needs of more experienced

users – by connecting to RDB sources, is left for future development since providing such

functionality is a small project in itself. Having made this decision, a logical step forward was to

look into the available OLAP Server solutions on the market and compare them to understand

which could provide a good core basis architecture onto which the project could build upon.

2.3 OLAP Servers

2.3.1 Overview

OLAP Servers are a mid-tier architecture and normally rest between the data warehouse and the

front-end analysis system. Before being able to make a comparison between the different

solutions on the market, it is important to firstly understand how they differ from one another

based on the most important thing – the way the data is stored on the server itself.

Based on the storage modes, three main types of OLAP Server engines exist – multidimensional

storage (MOLAP), relational DBMS (ROLAP) or a hybrid combination of the two (HOLAP).

With recent technology trends, in-memory BI engines have started being used as well, due to

their faster response times and better interaction capabilities. These engines are a possibility

thanks to decreases in disk access time and the constantly lowering costs of memory leading to

cheap and affordable servers utilizing large amounts of it (23).

ROLAP - Relations and SQL queries are used to map the multidimensional data model and its

operations. This creates the need for query optimization algorithms for the efficient loading of

data (23). These systems use the star schema (Figure 10) or its corresponding snowflake schema

(Figure 11). The star schema includes a fact table at the centre and each of its dimensions is a

point of the star, while the snowflake schema includes all the dimensional information in a single

fact table again at the centre and divides each dimension into a hierarchical structure of related

tables normalizing them.

Figure 10 - Star Schema (24)

Figure 11 - Snowflake Schema (24)

MOLAP - Alternative to ROLAP that uses the multidimensional data model. Typically pre-

computes large data cubes for speedier query processing. This allows for excellent indexing

properties and fast query response times, but provides poor storage utilization on sparse data sets

(23). MOLAP cubes are optimal for slicing and dicing operations. Pre-computation of the cube

actually allows for fast performance on complex calculations.

HOLAP – Combines ROLAP and MOLAP, splitting the storage of data between the two. These

servers perform density analysis in order to identify the regions of the multidimensional space

which are sparse and dense, hence the ability to decide how to store the data (23).

2.3.2 Server Comparison

A wide range of OLAP Servers exist that can satisfy the requirements needed by any enterprise.

Choosing a particular one to use is done via looking into capabilities and limitations that these

servers provide, such as various licensing options, different data storage modes, supporting

different APIs and querying languages, showcasing distinctive features and security capabilities.

With regard to this project, choosing an OLAP server for the initial stages of application support

was based on the following factors in order of most to least important.

License availability – how easy will it be to attain and what will it cost.

APIs and query languages – the project is written with the .NET framework, looking for

the ability to connect and work with the OLAP data hassle-free.

Data storage modes – how will the data be stored and does the solution provide any

offline support, which would increase the flexibility of the project by allowing data to be

analysed without having to make a connection to a server.

The server distinctive features, limitations and security capabilities were not a factor in the

choice, since these did not fall into the scope of the project. The OLAP server chosen was just

for the provision of data for analysis and visualization. The application would not take part in the

creation of the data cubes in question – cubes are created by data experts following a certain set

of principles that carefully design all aspects of the multi-dimensional data model.

The following information is based on the charts seen in the “Comparison of OLAP Servers”

Wikipedia entry (25). The OLAP Servers that were taken into account were: Essbase, icCube,

Microsoft Analysis Services, MicroStrategy Intelligence Server, Mondrian OLAP Server, Oracle

Database OLAP Option, Palo, SAS OLAP Server, SAP NetWeaver BW and Cognos TM1.

A general pattern seen between these OLAP Servers is that almost all of them have proprietary

licensing, with the exception of Mondrian OLAP Server and Palo. icCube comes with a

community license, which include an MDX (MultiDimensional eXpressions) engine, a web IDE

and an XMLA (XML for Analysis) Interface for free but also has the limitation where it cannot

be included as part of a product or solution (26). This project will only be making a connection

to the server for querying purposes, which allows icCube to remain a viable option. Microsoft

Analysis Services, although having a proprietary license, remains a viable solution for this

project as well. It being a Microsoft product, a student license is possible to obtain and use

thanks to the vendors’ connection with the University.

From the second point of the requirements, all four remaining solutions provided have similar

interfaces - XMLA, standard API for exchanging data between an OLAP server and a Windows

platform client - OLE DB for OLAP, as well as the uniform query language for OLAP databases

- MDX. This leads to the idea that all four of these servers would be able to be supported in

future.

Looking at the third point - data storage modes, icCube is a MOLAP server with support for

Offline Cubes. Microsoft Analysis Services supports MOLAP/ROLAP/HOLAP as well as Local

cubes and PowerPivot for Excel. The remaining two - Mondrian OLAP Server - ROLAP and

Palo - MOLAP, have no offline support provided.

Microsoft Analysis Services was chosen due to the fact that the project revolves around the

Microsoft .NET Framework and based on its advantages from the comparison provided above.

2.4 Big Data

OLAP cubes are used to represent, model, process, query and mine various forms of data by

creating a structure for it. Most of this is done by finding associations and relationships in

structured or, with some limitations, in semi-structured data (27). This differs from the

unstructured format of the recently emerging Big Data trend.

A look into Big Data shows that most organizations and scientific communities today generate

an unprecedented amount of continuously streamed unstructured data – “Big Data” (28): a total

of around 2.5 Exabytes of data daily, doubling every 40 months (29). This volume and variety

have outstripped the possibility of manual analysis and exceeded conventional database

capacities, while computing performance has increased (30). According to a study done by Intel,

33% of all companies surveyed work with very large amounts of data (500 TB or more); 84% of

managers are analysing unstructured data, with the rest expecting to start analysing it over the

next year. There is an expectation that 63% of all analytics will be done in real time by 2015

(31). The ability to connect this data, creating structure for easier and more efficient analysis, is a

research topic that is much discussed.

Creating OLAP data cubes over Big Data is a challenge for state-of-the-art solutions due to

intrinsic factors – enormous size of such data sets and their high complexity when

multidimensional models are created (28). The unstructured nature of the data creates a

possibility for an explosive amount of dimensions and computational issues when working with

it. As mentioned previously (Section 1.1), OLAP is based on fast user interaction with the data,

which means that the previously mentioned factors create a bottleneck for its natural operation.

Cuzzocrea et al. (28) propose a variety of research topics that will affect the generation of OLAP

cubes over Big Data in the future and overcome the mentioned challenges:

technology advancements in Cloud Computing (32)

innovative hardware solutions – GPU-based data processing for computation of cubes

(33)

query language and optimization (end-user performance)

data quality aspects

visualization and interactive exploration issues – real-time visualization, mobile

device visualization (34) (35)

analytical processes and methodology design

development tools support

With the visualisation and exploration issue point in mind, this project attempts to look at the

effect of an increasing amount of data simultaneously displayed on screen using non-specialized

tools examined in the following chapters. Results of this can be seen in a later chapter of this

report (Section 6.3).

3. Approach

3.1 Overview

This chapter provides an overview and details of the approach taken in the development of the

application. A detailed view of the environment, platforms, technologies, agile practices chosen

for use is provided.

3.2 Project Environment and Technologies

The information in this section is split between the core architecture environment used that

includes base support tools for the development process and the specific technologies used to

build the application itself.

3.2.1 Platform

The base platform for the project is a VMware Workstation supporting two distinct virtual

machines, one taking the role of a data server and the other of a development and testing

environment.

The choice of virtualization of the platform and environments came due to the many advantages

virtual machines technology provides. These include control of system resources, easy back-up

and physical storage – the ability to clone and archive machines at the click of a button, as well

as run them from any location that can run a free-to-use VM player. Great flexibility exists

thanks to the capability of carrying your environment with you anywhere.

VMware (36) is the perfect virtual machine solution for this. It provides the ability to run

applications on multiple operating systems, consolidate multiple computers running web servers,

database servers all on the same computer, build reference architectures for evaluation before

deploying into production, remote connect to virtual machines from any device, create snapshots

that preserve the state of a virtual machine to quickly revert back at any time. The figure below

(Figure 12) shows the relationship between the development, server and deployment machines of

the project.

Figure 12 - Project Environment

VM Server Environment - Running Windows Server 2012 R2 Operating System which

includes Microsoft SQL Server 2012 with installed Microsoft Analysis Services (SSAS 2012).

This is the core data server, which is the source for the data of the application. It simulates a

possible real implementation in an enterprise environment. The AdventureWorks sample DB and

sample OLAP cubes are used for reference data.

VM Development Environment – Running on a typical Windows 7 Operating System, along

with a version of Microsoft SQL Server 2012 and a Visual Studio 2012 IDE.

Source Control: Git (37) – free and open source distributed version control system.

Repository: Atlassian Bitbucket (38) – used for hosting and managing code online, includes

built-in issue trackers, wikis, code comments, and pull requests.

Source Control Visual Studio GUI: Git Source Control Extension (39) – integrates Git

functions into the Visual Studio Solution Explorer.

Source Control Windows GUI: TortoiseGit (40) – provides a windows icon overlay, along with

a context menu for easy source control commands over files and directories on the file system,

integrated Commit Dialog, Merge Support GUI and a historical view of changes and commits.

Continuous Integration System: TeamCity (41) – powerful tool for automatic builds and

commit monitoring. Integrates directly with BitBucket, pulls the latest committed version of the

application and runs a pre-set series of build steps, testing and ensuring that any changes

introduced work as expected.

Agile Board: YouTrack (42)– tool for the creation of fully-customisable electronic task boards

that allow for the monitoring of multiple project iterations and allow for their easy planning and

execution.

3.2.2 Technologies

The core application technologies used for coding of the application. These were chosen with

flexibility and future extensibility in mind and to provide a usable application at the end.

Windows Presentation Foundation (WPF) (43) for rich user interfaces based on the .NET

framework – uses XAML for its front-end and C# for the code-behind. With proper use, one

code base could be created for easy conversion into a web application using Microsoft Silverlight

in the future.

Important factors for the choice of WPF were its ease of use, robustness, expert support and most

importantly its complete extensibility and customization. The extensibility allowed for the

creation of elements from scratch that best fit the needs of the project. The customization added

the necessary visual polish that created the professional look and feel of the completed

application. Templates could be easily created, used and re-used to style both visually and

functionally each and every single element of the application.

Microsoft PRISM Framework for WPF – used to create multi-screened applications, typically

by WPF developers looking for better design and development methodology options. These

typically encompass rich user interaction and data visualization functionality (44). PRISM can

build loosely-coupled components that can evolve independently but can easily be integrated into

the overall application. It incorporates various patterns for development, the main one being the

Model View ViewModel (MVVM) architecture pattern.

When choosing a visualization package with charting capabilities, most usable ones were

commercial and would normally have a high up-front cost. From the open source alternatives, a

set were chosen for possible implementation. One based on charting capabilities – OxyPlot (45)

and another based on fluid and eye-catching graphics - Modern UI (Metro) Charts for Windows

8, WPF, Silverlight (46), these however had the problem that they did not include a graph with

the functionality needed for the main 2D graph visualization – a TreeMap/HeatMap combination

with interactive capabilities. The closest library available was WPF Treemaps &

SquarifiedTreeMaps Control (47).

WPF Treemaps & SquarifiedTreeMaps Control – provided an implementation of the square

treemapping algorithm developed by Schneiderman and Johnson (48) (49) and extended by Bruls

et al. (50). The control allowed for multiple level recursive depth and easy templatability and

extensibility. The interactive functions needed by this project were not part of the control, so

additional work needed to be put into modifications of the control, discussed in Section 5.10 of

this report.

Helix 3D Toolkit (51) – a collection of custom controls and helper classes for WPF and the 3D

capabilities of WPF in particular. This library provided a wide set of easy to use controls and a

variety of examples, which served as the basis of the 3D visualizer discussed in Section 5.12 of

this report. A set of these examples can be seen below (Figure 13) showcasing this wide

selection and capabilities of the library in question.

Figure 13 - Helix 3D Toolkit Examples (51)

3.2.3 Sample Data

Multi-dimensional complex data was needed to serve as a base model for all development efforts

– this data along with its OLAP models needed to be ready for use, as their creation was out of

scope for this project. The sample data chosen was the Adventure Works 2012 – a continuously

updated dataset provided by Microsoft to showcase their different SQL Server products (52).

Two components were needed for the base functionality of this project. A mid-tiered data

warehouse database – Adventure Works 2012 DW and a SQL Server Analysis Services

OLAP server, to sit on top of the data warehouse providing the appropriate complex models

(53).

3.3 Agile Processes

The method for development and project management chosen was one using agile software

development principles. The main objective was to be able to produce working code on a short

time-scale and at the same time be open to changing needs and requirements, which over the

course of the project happened on numerous occasions.

3.3.1 Agile Practices Selection

The selected agile practices applied in the project were complementing to each other. The most

important process used was Iterative Development with time boxed Iterations. Each iteration

included its own requirements analysis, design, implementation, and testing activities that

concluded with a stabilized working system by the scheduled date. Small steps, rapid feedback,

and adaptation are central ideas in iterative development so iterations varied in length. The end

of the iteration concluded with a decision for the length of the following one based on the

feedback provided from the supervisor and the tasks and requirements that were able to be

fulfilled or the ones that were skipped (54).

Learning Spikes when tackling new technologies is a vital agile practice. These normally are

stories or tasks for answering a question or to gather information about a specific design question

or a particular technology. Shore and Warden (2008) suggest conducting frequent experiments

rather than speculating about the answer when faced with questions. Spikes according to them

are small experiments used to research the answer to a problem. The main idea is to create a

small program or test demonstrating the functionality needed in the simplest way possible. This

should be run from the command-line or test framework, implementing the use of hardcoded

values, ignoring user input. The resulting solution is not supposed to be reused, when the

experiment finishes it is thrown away (55).

For testing purposes, Continuous Integration (CI) via test and build automation was used. This

method requires that every time a commit to the repository with a change is made, the entire

application is re-built and a comprehensive set of automated tests is run against it. The main goal

is that software is in a working state all the time (56). Normally software is viewed as broken

until proven otherwise, usually with a testing or an integration stage, but with CI it is proven to

work. It provides faster bug detection earlier in the development process. Although used by

teams of developers, this agile method has positive aspects for solo developers as well creating a

safety net around the stable version of the application.

3.3.2 Agile Practices Application

In order to apply Iterative Development, an easy way to monitor iteration progress was needed to

allow for reflection on the tasks set in the beginning and those that still needed to be completed

by the end of the iteration. The initial idea was to use a document based approach – simply write

up requirements documents and customer feedback for each iteration. The problem with this was

that there would be too much paper flying around.

In order to deal with this, an application to track iteration states was used – YouTrack (42), that

allowed for the creation of a fully customisable, easily navigable and accessible agile board that

could be viewed and modified from any browser (with some additional set-up). This truly helped

organize and monitor work in the most efficient way thanks to its wide array of features and

capabilities. An example screen capture (Figure 14) taken from the middle of the first iteration

shows the structure of the agile board customized to be used for the project. Four columns were

created to group tasks into the stages they currently occupied – tasks were On Hold, Open, In

Progress and Completed. A colour-coding that best suited the states that they were currently in

was set-up. Tasks were grouped into separate categories depending on their project context.

Additional information could be added to each task – such as a descriptive text about its purpose,

criticality flag and different file uploads and attachments etc.

Figure 14 - AgileBoard YouTrack (42)

The second agile practice applied was Learning Spikes. This practice was simple to apply –

folders used to group all learning spikes together for each week of the semester were created

when development took place so an idea of when a learning spike occurred existed. (Figure 15,

Figure 16, Figure 17) show an example of the structure of the Learning Spikes. Most

technologies and techniques were introduced by following tutorials found online that created

small projects, which were included in the appropriate Learning Spikes weekly folders.

Figure 15 – Project folder

Figure 16 – Learning Spike

Week 7

Figure 17 – Learning Spike

Week 8

In order to apply Continuous Integration, TeamCity – CI Environment was used, which was

installed onto the VM Server Virtual Machine. A visual graphic as seen in (Figure 18) provides a

thorough explanation of the tool process flow. The Testers and Developers commit their changes

to source control. The TeamCity server polls the source control system (connected to Atlassian

BitBucket for this project) and whenever it sees a change, a build agent that lives on the server is

triggered. A build agent includes a pre-specified set of steps (could be extremely complex)

ranging from testing, building, deployment scripts etc. TeamCity supports various testing

frameworks spread across multiple languages. Finally if any kind of error occurs – a notification

is sent and based on the given specifications, the whole process stops until the error is fixed.

Figure 18 - TeamCity Flow (41)

4. Design

4.1 Overview

This chapter describes and focuses on the design of the system. The framework chosen for

development– Microsoft Prism (Section 3.2.2) - provides a core for the design principles and

patterns used in this project, which builds upon them. This is discussed in the following sections

along with an overview of the high-level diagrams of the main project entities.

4.2 Design Principles

Design principles help software development by providing guidelines as a recommendation to be

utilized when building any project. The key principles that impact the design and architecture of

this project are outlined below.

Modularity – The base principle when building any composite application. This ensures

cohesiveness of objects and components that are loosely-coupled, which allows them to evolve

independently from one another but be easily integratable later into a common overall

application – composite application. The Microsoft Prism framework used for this project

adheres strongly to this principle and uses as a foundation, onto which developers can easily

build upon (57).

GRASP – General Responsibility Assignment Software Principles are used for assigning

responsibilities to classes, forming a language to help developers communicate, as well as

answering common software problems (54). The principles are: Controller, Creator, Indirection,

Information Expert, High Cohesion, Low Coupling, Polymorphism, Protected Variations, and

Pure Fabrication. Using these as a guideline in the implementation of the project allowed for the

creation of a better design and increased its value. These aforementioned principles serve as the

“building blocks” for the design patterns described in the next section.

4.3 Design Patterns

The Microsoft Prism framework provides and incorporates a set of design patterns, which this

project makes use of to support its composite architecture (Figure 19). A variety of additional

patterns, part of the software development domain (58), are used throughout the project. Patterns

are vital in object-oriented software development as they provide tried and tested solutions to

common software design problems; a language for experienced developers to communicate with

efficiency; learning aids for inexperienced developers; a framework for discussing design trade-

offs (54).

Figure 19 - Composite Application Design Patterns (57)

A brief overview of the patterns in question follows, along with how they apply to this project.

Prism framework component definitions are found in Appendix A.

Adapter – A Gang of Four (GOF) pattern (54) (59) that decouples code within the application

from external formats. This is used extensively in the data layer of this project (Section 5.2)

when pulling Metadata from the OLAP server. In terms of the system architecture itself, it is

used to adapt the interface of a class to match the interface expected from another class.

Application regions (Appendix A) are adapted to constructs understood by the WPF framework.

Application Controller Pattern – A pattern that separates the creation and display of views

responsibility into a centralized point as described by Martin Fowler (60). Used for view

injection and view switching on the main application screen to display either 2D or 3D

visualisations.

Command Pattern – A GOF pattern (59) in which objects are used to represent actions. This is

used frequently in the implementation for button click logic binding between the UI and

presentation logic of the application.

Composite – A GOF partitioning pattern (59) where a group of objects are treated as a single

instance. This pattern is a predominant part of the Cube Browser View Model hierarchy

structuring (Section 5.7), which allows for easy hierarchy refactoring.

Composite View – A Composite (see previous) of Views. This allows for the combination of

individual views to create a more complex entity (Section 5.8).

Dependency Injection – A pattern which implements inversion of control, where multiple

dependencies are injected or passed into a dependent object (client) and are made part of that

client’s state (61). When objects are constructed, the dependency injection container would

resolve any external dependencies, allowing for concrete implementations to be easily changed

as the system evolves (57). The project uses the Unity container for its dependency injection

purposes.

Event Aggregator – based around the observer pattern, it is a simple element of indirection that

aggregates events from multiple objects through itself. Objects can subscribe to, publish or locate

events. The project frequently uses this pattern for communication between loosely-coupled

components (Section 5.4).

Factory – a GOF pattern (54) (59) which is a pure fabrication object that handles creation and

initialization of other objects, without having the instantiation logic exposed to the client.

Façade – a GOF pattern (59) that simplifies complex interfaces to either ease how they are used

or with the purpose of isolating access to them. The Prism library is isolated from changes in the

container and logging services thanks to the use of the Façade pattern.

Model-View-ViewModel (MVVM) – a pattern that helps to partition and separate the business

and presentation logic from an application’s UI into three specific types of classes – ‘View’,

‘View Model’ and ‘Model’ (Figure 20).

Figure 20 - MVVM Pattern Architecture (57)

The Model describes the encapsulated application logic, data retrieval and validity. It also

represents the client facing domain model for the application. It serves as the base, which the

view model encapsulates for use by the view.

The View Model concerns itself with the presentation logic of the application. It separates itself

from the View, having no direct knowledge of its implementation. It exposes commands and

handles event notifications through binding. The view binds itself to properties in the view

model and on changes, notification events are raised to be handled by the view model.

The View defines the structure and visual appearance of the interface shown to the user, handling

visual behaviour and rendering triggered from state changes in the view model or after

interactions with the UI.

Observer – the Prism framework uses a variation of the Observer pattern that serves to separate

any interaction requests with the user from the actual interactions chosen to be displayed. This

allows views to decide on the way they provide feedback to the user.

Registry – the pattern provides an object, which can be approached to locate common objects

and services (62). This pattern is used in the project to associate views with their appropriate

regions.

Separated Interface and Plug-In – the separated interface allows for reduction in coupling,

separating the interface definition from the implementation. With the plug-in pattern, concrete

class implementations are determined at run time, avoiding the need for recompilation when

changes to the concrete class used are made.

Service Locator – allows classes to locate specific services without having any knowledge of

who these services were implemented by. This pattern’s specific use in the project is for locating

and retrieving instances of the Event Aggregator, for the purpose of publishing and subscribing

to events (Section 5.3).

Singleton – a GOF (59) pattern which restricts a class to only one instantiation. This pattern is

used in this project for Cache creation and in conjunction with the Factory pattern, previously

discussed.

4.4 Project Set-Up Step Activities

The set-up activities for an application using the Prism framework are illustrated below (Figure

21). These consist of creating an initial shell project with region outlines, a bootstrapper

component used for initialization of various Prism components and services, an Infrastructure

project module to define all shared events and commands, and starting to build functional

modules to be plugged-into (Section 4.3) the previously defined region outlines of the main shell

project.

Figure 21 - Prism Composite Application Creation Activities (57)

4.5 Abstract High-Level System Design

An overview of the high-level system design of the application can be seen in the diagram

below. It shows the dependencies between the different modules – how they connect to one

another and their structure in the Visual Studio Solution (Figure 22).

Figure 22 - Abstract high-level module view

4.6 Side Module Design

The Side Module is the first important module to be designed for the application. It includes a

composite view (Section 4.3) containing the database selector and the cube browser. The

structure of the cube browser and its presentation logic hierarchy design can be seen in the

diagram that follows (Figure 23).

Figure 23 - Side module design diagram

4.7 Main Module Design

The Main Module is the component, which holds the visualization logic and data population

models of the graphs, the views and the view-models of the main screen. A conceptual design

diagram of this module can be seen below (Figure 24). All data models (ending in Info) used are

part of the DataLayer module.

Figure 24 - Main module design diagram

5. Implementation

5.1 Overview

This chapter delves into the implementation details of the project, looking at each of the specific

components and how they build up to create the final end-to-end application.

5.2 Base System Framework

The initial implementation steps were to set-up and connect to the Prism Library by following

the steps discussed in Section 4.5. The first step that gets executed at the start of the application

can be seen below (Figure 25).

Figure 25 - Connecting to the Prism Framework (57)

An additional configuration file with logging options and server connection strings was added –

app.config. Utility modules were set-up to support the application – DataLayer, Infrastructure,

Resources.

DataLayer – Provides all the connection and data specific logic for OLAP Metadata and Data

retrieval adapters, MDX query building and execution.

Infrastructure – Provides the common services, interfaces, events objects and utilities used by

the main application modules.

Resources – A shared resource module that contains images and common UI controls and

functions.

5.3 UI Templatability

WPF provides an easy way for developers to customize their UI, thanks to the highly templatable

XAML language. In order to create the professional and polished appearance of the application,

extensive work on custom component templates and styles was done – changing the look and

feel of most default controls provided by WPF (63). Non-existing UI component functionality is

easily added by extending and overwriting the original framework presentation logic found in the

code-behind. Styles found in templates, similarly overwrite existing default component styles.

Based on the scope of these templates, their location would normally be included in a generic

template file shared across the application.

generic.xaml – the main template file located in the Resources module. All generic templates

shared between the different views in the different modules are located in this resource

dictionary. The dictionary file is merged at the beginning of all views using these templates by a

specification of the location as seen below (Figure 26).

Figure 26 - Merging Generic Resources into Views

Examples of the most prominent of these template changes can be seen below. Some of these

changes include a reworked Window Style (Figure 27), Button Style (Figure 28) and changes to

the hierarchical TreeView component (Figure 29) used in the Cube Browser (Section 5.7). All

other UI controls are changed in a similar manner using the same principles already discussed.

Window Style – The figure below (Figure 27) shows the ability for the templatability and

styling of Operating System specific controls – program windows.

Figure 27 - Original and Templated Window Style

Button Style – The figure below (Figure 28) shows the original button appearance and the one

after a template has been applied to alter its normal and hovering states. The templates are

applied via the Style property in the code, whereas the style template itself is located in the

previously discussed generic.xaml template dictionary.

Figure 28 - Templated Button Results Example

TreeView Style – The figure below (Figure 29), is an example of the customization capabilities

of WPF – the ability to change the structure of a control in any way necessary, obtaining wanted

results.

Figure 29 - Default and Templated TreeView Control

5.4 Component Interaction

In order to communicate and share data between independent components, the Prism framework

provides an Event Aggregator service, based on the pattern with the same name (Section 4.3)

visualized below (Figure 30). It allows the same events to be raised by different publishers and

multiple subscribers to listen to the same event. Events can also carry data across from the

publisher to the listener. All unique application events can be found in the Infrastructure module.

Figure 30 - Publish/Subscribe Event Aggregator Service (57)

5.5 Database and Cube Selector

The database (Figure 31) and cube (Figure 32) selectors, both part of the Side Module, take the

forms of pop-up screens that allow the user to select the appropriate database and cube for use

and data browsing. Both have their own separate Views and View Models:

SelectCubeView.xaml and SelectCubeViewModel.cs; SelectDatabaseView.xaml and

SelectDatabaseViewModel.cs;

Figure 31 - Database Selector Pop-Up

Figure 32 - Cube Selector Pop-Up

Database Selector – uses the initial server connection string created at boot-up, taken from the

configuration file discussed in Section 5.2. A query is automatically run and the available

databases on the server are listed. On selection of a database, the connection string is updated,

which allows for subsequent cube selection. Changing databases fires off events (Section 5.4),

which update views across all modules.

Cube Selector – uses the updated database string to list all OLAP cubes found on the selected

database server. Changing cubes caches the currently selected cube for a performance increase

on secondary loads.

5.6 OLAP Metadata Retrieval

OLAP Metadata provides information about the organization and types of data in the OLAP

Cube (Appendix A). This project adapts the metadata model hierarchy used in the Microsoft

Analysis Services solution. Instead of creating an adapter and model to be used in the application

from scratch, the one implemented in the Ranet OLAP open-source library (13) is modified for

use and located in the Metadata section of the DataLayer module.

The project calls upon the provider by feeding it a connection string and extracting the needed

metadata model from it, which is populated by querying the OLAP database. The hierarchical

nature of this model allows for a single object to be handled – at the top of the tree – the

CubeDefInfo object. This is processed and used when creating the presentation logic for the

Cube Browser discussed in the next section.

5.7 Cube Browser

The cube browser was designed and implemented to be used for easy explorative analysis of the

data in the cube, based on techniques mentioned by Mansmann et al. (6) (7) (8). This component

is structured around the MVVM pattern (Section 4.3).

The OLAP Metadata model is encapsulated in a hierarchically structured set of view models,

containing the presentation logic to be consumed by the view and displayed on the screen. Every

item seen in the browser is represented with its own view model (Figure 33, Figure 34) and own

display style using the XAML HierarchicalDataTemplates (64) of the TreeView UI control. The

hierarchical structure is defined once for the UI display and once for the data organization of the

view model hierarchy.

Each metadata view model extends a base TreeViewItemViewModel class serving as an adapter

between the raw data object and an item of the TreeView. It provides a set of properties,

specifying the information displayed on screen, along with a lazy-loaded child collection that

builds the next level of the hierarchy. The children of the next level are loaded only when an

item in the TreeView is expanded.

Items can be right-clicked, with prompts allowing the user to specify where they should be used

for visualization purposes – 2D or 3D main screen discussed next. These are added to a cache

object in the Infrastructure module and an event (Section 5.4) is published warning of this

addition.

Figure 33 - Cube Browser View Example #1

Figure 34 - Cube Browser View Example #2

5.8 Main Screen

The main screen provides a view for the 2D and 3D visualisations of the application. The 2D

aspect and functionality is discussed in this section and the 3D - in Section 5.12.

The main application screen provides a placeholder for the visualisations that can be displayed to

the user. The placeholder is in the form of a tab pane, where each tab can hold a different user

selected visualization. Visualisations have their own data templates, which serve to provide a

style to display on the UI, and a view model with their presentation logic. Every new tab is

initialised with a GenericGraphView data template and GenericGraphViewModel, which can

then be replaced, based on user interaction, with another template and view model. The latter is

created from a view model factory. Multiple instances of the same template and view model can

exist. Their context and bindings are held with the tab pane template. The choice for UI display

of the template and view model is done via a template selector (Figure 35).

Figure 35 - MainGraphView.xaml Data Template Selector. Selects an appropriate UI template based on the

view model type passed to the view.

The 2D main screen also hosts the link between the Cube Browser and the 2D OLAP

visualisations. The user-selected metadata gets pulled from the cache (Section 5.7), ready to be

used in any of the visualisations. The cube choice selector (Figure 36) sits on top of all views and

is globally shared between them.

Figure 36 - Cube Choice Selector with user choices over main screen.

5.9 OLAP Data Retrieval, Query Mechanism.

This subsection explains the basis of the OLAP data retrieval and MDX querying mechanism.

These are needed for all subsequent visualization steps. Each visualisation data population

algorithm retrieves the data in the form of a datatable object, created in the

DefaultQueryExecutor class, to be displayed in a similar way, but then consumes it in a way

based on its own needs. The main objects used for the MDX query mechanism are the

SelectMDXObject and the FilterMDXObject.

FilterMDXObject – consumes OLAP hierarchy members to filter by and based on their count

and filter type, generates a filter query string to be used by a SelectMDXObject.

Note: Filter Types are the following: Equals, Not Equals, Between, Not Between, In, Not In;

SelectMDXObject – provides a wide range of functions for MDX Select query string

generation. Concrete examples can be seen in the SelectMDXGeneratorTests class of the

DataLayerTests module.

5.10 TreeMap Data Visualizer

The TreeMap data visualizer is based on the WPF Treemaps & SquarifiedTreeMaps Control

(Section 3.2.2) library. Initial work was done to modify the library itself as it did not provide the

necessary functionality needed by this project. An example retrieved from the library project

itself (47), shows the original functionality provided (Figure 37) - a recursive view into a

hierarchical structure of data of a selected HDD drive. The SquarifiedTreeMap and the control

over the depth of the hierarchy are shown.

Figure 37 - WPF Treemaps & SquarifiedTreeMaps Control Example - original library capabilities (47)

What this project needed was the ability to control the depth of specific elements in the

TreeMap, control separate element colours, provide an indicator of the level depth and an ability

to slice (Appendix B) into specific data levels, maximizing them on screen, as well as providing

the function of drilling up/down into higher and lower levels of the hierarchy. This needed to be

done without losing sense of the context, as is normal in exploratory analysis (6) (7) (8). The

result of these changes can be seen below (Figure 38). Drilling up and down is done via a

circular button in the top right corner of every element or via a right-click context menu. Slicing

(Maximising) is done via a square button in a similar way to the drilling function.

Figure 38 - OlapVisi TreeMap visualisation - provides control over the element depth, breadcrumb bar at top

saving context, functionality to drill down/drill up and slice into the data.

The visualization automatically recalculates the sizes and positions of its elements on window

resizing and/or expanding and retracting of additional view elements such as settings; categorical

time-series analyser (Section 5.11). Customisation options related to the hover, text and element

colours were added to improve the user experience. Once the user selects the hierarchies and

measures they want displayed, a Plot button click ensues, which calls the main data population

algorithm for the visualization, PlotGraphAction, located in the TreeMapGraphViewModel class.

PlotGraphAction – creates a SelectMDXObject that uses the data currently plotted in the

TreeMap (Measures, Dimensions, Cube). It then retrieves the MDX Select query and executes it,

retrieving a datatable object. The correct column indexes for the colour and size measures are

retrieved from the first row of the datatable. A loop through the rows and columns of the table

ensues, creating TreeMapGraphDataViewModel objects used as the main elements seen in the

TreeMap and linking them up into a hierarchical structure, which allows for easy traversal. The

minimum and maximum values of the colour measure are then calculated to be converted into an

appropriate colour when displayed. Finally, the context breadcrumb at the top of the graph is

updated.

5.11 Categorical Time-Series Analyser

The TreeMap data visualizer discussed in the previous subsection visualizes one dimension and

two measures. In an attempt to increase the dimensions shown to the user a categorical time-

series tool was added to the TreeMap graph (Figure 39), building upon the ideas seen in the

GapMinder research application (Section 2.1.3).

Figure 39 - Categorical Time-Series data browser and filtering tool.

The functionality of the time-series analyser is split into two main operations, which can be used

only after an initial Plot of the data has been completed. The first constitutes selecting an

appropriate date level from the previously selected date hierarchy (hierarchies contain multiple

levels). The member data of the selected level is loaded and populated into a collection displayed

under the slide bar. This collection represents a list of possible filters to be applied to the data in

the TreeMap, hence increasing the dimensionality control available to the user by one dimension.

The slider thumb changes a SelectedDate property of the view model, which prompts the start of

the second operation – the FilterGraphAction, part of the TreeMapGraphViewModel class. This

action is similar to the PlotGraphAction (Section 5.10), but it generates a SelectMDXObject

(Section 5.9) applying the SelectedDate as a filter to the data.

FilterGraphAction - creates a FilterMDXObject (Section 5.9) using the CurrentDate selected

and feeds it into a SelectMDXObject that uses the data currently plotted in the TreeMap

(Measures, Dimensions, Cube). It then retrieves the MDX Select query and executes it,

retrieving a datatable object. A helper dictionary collection is created to keep track of the context

of the data being processed, so that no duplicates are created. The previous elements’

hierarchical structure is unlinked, without losing the references to the elements themselves. A

loop through the rows and columns of the datatable object ensues, retrieving the previously

created (via the PlotGraphAction Section 5.10) data objects based on a name comparison and

creating a brand new hierarchical structure linking, based on the filtered data. The final two

columns are retrieved to be used as the values for the colour and size of the TreeMap visual

objects.

5.12 3D Data Visualizer

The experimental 3D OLAP visualizer (Figure 40) uses ideas from the VR4OLAP application

(Section 2.1.3), in an attempt to increase the number of dimensions the user can interact with

simultaneously, as well as to provide additional OLAP operations (Appendix B) on top of the

ones seen in the 2D TreeMap graph. Apart from drill-up/drill-down and slice, the 3D OLAP tool

provides dicing and pivoting. In terms of dimensions – the number is brought up to three of any

type and a measure number choice is introduced - one or two measures can be displayed at the

same time.

Figure 40 - 3D OLAP visualizer application example

The Helix Toolkit (Section 3.2.2) provides an easy to use scene set-up, along with various visual

objects for use. Minimal changes were made to the base library, as the addition of selectable

element functionality was needed – 3DControls folder of the MainVisualisation module.

The interface was implemented incrementally – hierarchy loading and measure set-up were

implemented first; secondly the X,Y,Z grids serving as a placeholder for elements were added.

Labels for the members of each level were positioned and orientated along the outsides of the

grid. The data elements collection was created after, along with the full data population

algorithm – MapAction. The models of the 3D objects are instances of the UIBoxElement3D

class, and their view models are instances of the DataVisual class. Colour coding and size based

on the values of the data elements was added. Additional legends and data information followed,

along with all OLAP functionality seen in Appendix C.

Colour coding – implemented with help of the ColorHelper WPF utility class. After a study on

the manipulation of colours in .NET (65), the method for choosing colours is built using the HSV

cylindrical RGB colour model (Figure 41). The saturation and value are both set to the numerical

value of one; the only parameter that varies is the hue. In this case, the minimum and maximum

values of the data were calculated and mapped to start and end points of the hue cylinder – from

yellow to red. The two colour values are then used to create a gradient for use in the legend.

Figure 41 - HSV Cylindrical-Colour Model - Hue, Saturation, Value (66)

Element positioning – implemented using previously calculated member label positions,

situated at the grid edges. Each element is a 3D coordinate of the intersection of three members.

MapAction – loads all member data of the levels selected using a MemberLoaderCache utility

class. Maps the placeholder grid locations, populates the member labels, looping through each

level collection, calculating the appropriate position of each member, rendering them on screen

and adding clickable filter elements, used for the OLAP operations. Populates the level labels at

the three corners of the grid. A SelectMDXObject is created, using the currently selected OLAP

information to run an MDX query, retrieving a datatable object. This object is parsed, calculating

the minimum and maximum values of the measures selected and the DataVisual element objects

are created. Each DataVisual takes in a Point3D object, created using the X, Y, Z coordinates of

the member label intersections. This serves as a centre point for the 3D element. The final

parameters passed in are the measure values of the element. A notification is sent to the UI to

render the elements on screen; the previously discussed colour ranges are created and appropriate

flags are set to end the method.

6. Testing and Evaluation

6.1 Overview

The purpose of this chapter is to look into and discuss the testing methodologies used in the

project, as well as to evaluate the OLAP visualisations in terms of query, algorithm and render

performance times. Additional comparisons with functionalities of existing BI solutions are

made along with an evaluation on the types of OLAP operations, which can be performed by the

provided visualisations.

6.2 Testing

In order to make sure that all components of the application work as intended, as well as that the

whole system functions properly as a whole, various methods of testing were put to use. This

section will highlight these methods and discuss their advantages. As previously discussed, the

MVVM design architecture of the project allows for easy separation for the purpose of testing of

the View Models from their corresponding Views. This testing is done via Unit Tests, which

were run as a step in the Continuous Integration process (Section 3.3). Some functionality was

implemented using a Red-Green-Green Cycle approach, which is an important part of the Test

Driven Development methodology.

6.2.1 Unit Tests

Unit tests were created for all the View Models that contain the presentation logic of the

application, as well as for the Models that contain the business and data logic of the application.

Unit testing the OLAP Metadata retrieval models, which were modified and used from the Ranet

OLAP Open Source library, was not necessary, as it was assumed that this logic was tested by

the original library developers. The NUnit test framework was used for the purpose of creating

these unit tests due to its ease of use and integration capabilities with the TeamCity Continuous

Integration system (Section 3.3).

The two main NUnit functions used in testing were the SetUp and Assertion methods. SetUp is

used to provide a set of functions that are performed before each test method is called and are

common for all test methods. Assertions are the central unit testing mechanism for providing

answers to whether a piece of production code works properly. Multiple types of assertion

methods exist – for checking equality, identity, condition, comparison etc. (67)

6.2.2 Test-driven Development (TDD)

This methodology was chosen for use in the creation of the OLAP data retrieval and querying

mechanism (Section 5.9). The main idea behind this methodology, described initially by Kent

Beck (68), is to simplify code writing by growing the code and its associated test suite in small

increments that are meaningful on their own. This approach is empirically proven of its

dominance over the other Test-Last method creating lower inter-object coupling and allowing for

a better modular design, code reuse and ease of testing (69).

Test code is written in conjunction with production code following a simple three-step cycle:

test-code-refactor or red-green-green (the colours indicate the test results that are seen at the end

of each step):

1. Write failing test (Red) – simplest test that can be thought of that will fail. No production

code can be written in TDD unless there is a failing test that the developer can work

towards.

2. Write production code to make test pass (Green) - without breaking previously working

tests, an implementation of the simplest manner that makes the test pass is written.

3. Refactor code (Green) – design refactoring of both test code and production code that

does not break any existing test cases.

The production code and test suite gradually grow as development continues, while the test suite

acts as documentation that does not go out of date. It also acts as a regression safety net when

bugs are found, as the developer needs to work towards fixing it, without breaking any other

previously written tests. Another benefit of test-driven development is better design, loosely

coupled and easily maintainable, ultimately creating high-quality polished components as part of

the application (70). A set of tests created via the test-driven development cycle applied to the

creation of the OLAP MDX Query Builder (Section 5.9) can be seen (Figure 42 - TDD Created

Test Extract)

Figure 42 - TDD Created Test Extract

6.3 Evaluation

Evaluating the visualisations of the project is done by comparing their functionality in terms of

OLAP operations (Appendix B) supported, as well as dimensions and measures supported. The

second method of evaluation is based on the performance of the visualisations in terms of

algorithms, queries and rendering done.

6.3.1 OLAP Operations Comparison

The comparison seen below (

Table 1) shows that the visualisations provide the necessary base functionality needed, for the

manipulation of OLAP data. Additional operations exist, as described by Mansmann et al. (8),

but this project does not take them into account because it was important to establish only a base

that would provide a capability for future improvements, based on additional needs of the user.

The 3D Cube Visualizer provides functionality similar to the one described in the VR4OLAP

application (Section 2.1.3.), as well as the same amount of dimensions and measures.

Visualisation Dimensions Measures Drill-

Drown

Drill

-Up

Drill

Through Slice Dice Pivot

2D TreeMap 1 2

2D TreeMap /w

Categorical Time-

Series Tool 1 + 1 * 2

3D Cube Visualizer 3 1 or 2

* The additional dimension can only be of a time dimension type

Table 1 - OLAP operation visualisation capabilities

6.3.2 Performance Profiling

Profiling the performance of the visualisations is done using the ANTS Performance Profiler

from Red Gate Software (71), which provides the necessary set of tools for pinpointing

performance issues – execution time and hit count, directly in the source code. A performance

profile of the 3D Cube Visualizer is created, as it was demonstrated that the tool had the ability

to display large amounts of data at the same time. It was interesting to observe how long some

methods took to execute, as well as its memory usage footprint.

3D Cube Visualizer – the main performance method of this component was narrowed down to

the MapAction discussed previously (Section 5.12), which makes calls to methods that retrieve,

populate and render the data on the display, hence taking the most amount of CPU processing

time. Comparisons are made based on an increasing amount of members shown per level for

each axis (Table 2), thus displaying the visualizer capabilities for larger amounts of data. The

actual amount of members that were populated from the data was much smaller than the

maximum amount possible. For the purpose of this profiling exercise, dimensions with multiple

levels are picked.

Table 2 - Performance profile specification. X, Y, Z dimension member amounts.

CPU Timing Profile

A timing profile of the main methods being executed is provided below (Figure 43).

Figure 43 - Main functions performance time profile

Note: MapAction contains PopulateMemberLabels, PopulateData; PopulateData contains

QueryExecution, Create3DDataObjects; Create3DDataObjects contains 3DObjectUINotification call.

An analysis of the MapAction function, along with the functions within it that provide the main

algorithm functionality is seen. The decrease between Profile 1 and Profile 2 execution times is

due to the fact that new objects are cached after they are retrieved from the database. The

operation, which raises the UI notification, takes the most amount of time to execute (within the

Create3DDataObjects method). With every subsequent increase in data, the time percentage that

this notification takes increases. In Profile 6, this is 60% and in Profile 7, it is 74%. Detailed

looks into the other methods show that similar UI notification events for the other operations,

e.g. member labels, take up a similar percentage of the total time.

Memory Usage Profile

A memory usage profile of the three main visual objects displayed on screen is seen below

(Figure 44).

Figure 44 - Main 3D element memory usage profile

The two main 3D elements displayed on screen that increase in memory consumption are the

UIBoxElement3D – used to display the data members in the grid (Appendix C), and

UISphereElement3D – used to display all filter spheres among the grid edges. Based on the

results, a mere 7.3 MB are used to display 33811 data objects. Which when scaled, allows for

impressive results.

7. Conclusion

7.1 Project Achievements

This project has built upon ideas and techniques found in leading commercial Business

Intelligence products and on some of the latest achievements in academic research. A viable,

polished, end-to-end application was created that could serve the needs of Business Analysts and

their need to visualize complex multi-dimensional data in an easy and efficient manner. The

project has been a success, as the initial project aims were met with additional work done (3D

visualization – Section 5.12) to both experiment with new research ideas on visualization

techniques (9), along with an extra capability of handling large amounts of data (Section 6.3)

after some redesign of the system.

In terms of personal achievements, a great amount of knowledge was gained with potential for

practical future use. Coming from a year in industry, with work done related to the Business

Intelligence field, this project provided the perfect opportunity for delving deeper into the typical

challenges that are faced in in the field and to work on solving some of these challenges.

Technically - methodologies and technologies (Sections 3, 4, 5) that were lacked in experience

were researched and used successfully for the implementation of the project – modular

development, visualization techniques and technologies, XAML UI creation and control

templatability etc.

7.2 Future Development

The end-to-end aspect and modular nature of the developed application provide a good basis

point for improving on the design and functionality in various ways that would take it to a level

comparable to some of the commercial products on the market.

7.2.1 Multiple OLAP Data Sources

The current system works on top of the Microsoft Analysis Services OLAP Server, but in order

for it to become a more robust solution, additional OLAP data sources need to be included.

Based on the research in Section 2.3, a variety of mid-tier servers exist. An additional question

that needs to be answered is whether offline sources should be supported in the form of offline

OLAP cubes, as well as if connections to a normal relational database should be included. A

discussion regarding this can be seen in Section 2.4.

In terms of technical implementations - the query standard language MDX is supported

universally across all server solutions and the query mechanism should remain the same if other

OLAP sources are included. Additional data provider adapters will need to be created to handle

the OLAP Metadata object creation, which serves as the basis for the Cube Browser (Section

5.7).

7.2.2 Additional 2D Graphics

The project provides a harness for the inclusion of additional types of graphs. This improvement

will depend on the graphing libraries available for .NET, or on reengineering the system to be

able to use cross-language toolkits that provide more powerful graphing capabilities. A possible

.NET solution, which was researched for use in the project (Section 3.2.2), is the Modern UI

(Metro) Charts for Windows 8, WPF, Silverlight (46) library, which provides fluid visualisations

and robust templatability and functionality.

7.2.3 Algorithm Optimization

The data population algorithms, for the 2D TreeMap Graph in particular, depend on pulling all

the data of an OLAP Hierarchy at once and populating a data structure to be visualized by the

graph (Section 5.10). This creates a problem with the increase in data that is shown on-screen. A

more segmented just-in-time approach is needed – querying the server and generating objects

when needed and including a more complex caching mechanism for these objects, or even

creating them smartly in multiple background threads.

7.2.4 Library Inefficiencies

Based on the observations made in Section 6.3, future work needs to be done to remedy the issue

seen with the raise of the UI notification event. This event is part of the WPF framework itself,

so additional research into possible workarounds will need to be done in order for larger amounts

of data to be displayed.

The library used to incorporate the 2D TreeMap graph visualization, WPF Treemaps &

SquarifiedTreeMaps Control (47), includes inefficiencies in its design and certain bugs found in

the implementation of the components. These were not overlooked in the implementation of the

project, but were also not as major so as to allow for time to be spent in refactoring and fixing

them. The recursive nature of the square map algorithm implemented and the rendering

performance of the library could be improved drastically in terms of performance. Errors in the

rendering appear i.e. rendering an element with a size too small will throw propagating

exceptions – StackOverflowException (72). Another issue relating to the recursiveness of the

algorithm is the visualization of a large amount of members as part of a single OLAP level –

again running out of memory for the calculations needed to be done.

7.2.5 3D Visualisation Functionality Expansion

The 3D experimental visualization (Section 5.12) provides a great starting point for the addition

of functionality related to OLAP operations not discussed in Appendix B. The efficiency of the

3D library used allows for work to be done in increasing the amount of shown dimensions and

measures from the base 3 dimensions/2 measures model implemented in the project – moving

into the unknown (no such solutions at the time of writing exist for OLAP visualization).

Appendix

Appendix A: Term Glossary

A.1 OLAP Terms

OLAP - OnLine Analytical Processing evolved as a powerful spreadsheet tool, with flexibility in

navigation of complex data. Data always involves multiple dimensions, with multiple levels (4).

Interactive OLAP - browsing is done by human analysts, with the ability to ask questions and

receive immediate answers. Data summaries are pre-calculated ahead of time to increase

browsing speed.

Cube – primary OLAP structure for data viewing; it is similar to a table in a relational database.

Although three dimensions are implied, a cube can have n number of dimensions.

Dimension – identifies and categorizes the data. Provides a perspective used for looking at the

data, E.g. Product, Time, Customer Age, Employee.

Hierarchy – dimensions contain hierarchies, while hierarchies contain levels. Hierarchies are a

way to organize data according to levels. A dimension may contain multiple hierarchies, where

the same data is organized in different ways, E.g. Calendar Year and Fiscal Year.

Level – contained in a hierarchy, structures the data into logical summarized steps, E.g. Year is

the top level, followed by Quarter as the second level.

Member – part of a level, the most detailed element of the metadata structure, E.g. Specific date

– “1 January, 2014”.

Named Set - collection or group of members organized by a certain condition.

Measures –also known as Facts, or the numbers and values seen in an OLAP spreadsheet.

Measure Group – combination of related measures in a group. These measures are also

normally related to a set of specific dimensions for which data exists.

MDX – querying language for OLAP. Similar to SQL in appearance, but has a different purpose

– to query and browse through data. No data modification is possible like in SQL.

A.2 Prism Framework Terms

Modules – independent packages that represent business-related functionality, encapsulating all

components needed for it. These can be developed, tested and deployed independent of one

another thanks to the Modularity principle (Section 4.2).

Module Catalog – specifies the modules to be loaded, when they should be loaded, where they

are located and in what order the load should take place.

Shell – top-level window which hosts content contributed by modules. Defines the overall layout

of the application, but is unaware of the exact modules that it hosts.

Commands – encapsulation of application functionality, separating it from the UI and allowing

for segregated testing.

Regions – UI placeholders for views. Allow for flexible UI updating, without need of application

logic change.

Navigation – the resulting change to the UI display of a user’s interaction with the application.

Dependency injection container – injects services and other dependencies modules require,

based on the Inversion of Control principle (Section 4.3).

Services – allow for encapsulation of non-UI functionality, to be used as cross-cutting concerns

throughout the application. Typical examples of services are logging, exception management and

data access.

Bootstrapper – performs initialization, displays the shell, creates the module catalog, and loads

the modules.

Multi-targeting – targeting multiple technologies at once, where code can be reused with ease –

WPF and Silverlight can use a similar code-base.

Appendix B: OLAP Operations

When working with and analysing OLAP data, the business intelligence tool needs to implement

a certain set of OLAP operations. These operations are normally used to evaluate the capabilities

of a specific visualization – Pivot, Drill Down / Drill Up, Slice, Dice.

B.1 Pivot

With the Pivot operation, an analyst can rotate the cube to see its various faces. There are no

changes in the displayed data, only in how it is displayed (Figure 45).

Figure 45 - Pivot OLAP Operation

B.2 Drill Down / Drill Up

With these operations, an analyst can navigate between different levels of the hierarchical data.

The figure shows a drill down operation – moving from left to right. The opposite is a drill up –

moving from a lower to a higher level of the data (Figure 46).

Figure 46 - Drill Down/Drill Up OLAP Operation

B.3 Slice

With the slice operation, an analyst is able to select a subset of the data, thus creating a new cube

with fewer dimensions (Figure 47).

Figure 47 - Slice OLAP Operation

B.4 Dice

With the Dice operation, which is similar to the Slice, an analyst can select multiple subsets of

the data, again creating a new cube with fewer dimensions (Figure 48).

Figure 48 - Dice OLAP Operation

B.5 Drill-Through

Operation used to retrieve the actual fact data behind the aggregates.

Appendix C: OLAPVisi Step-Through Example

Figure 49 - Initial application screen. The initial steps to take are to select a database and then select a cube

from that database to explore.

Figure 50 - Prompt for cube selection.

Figure 51 - The user can explore the cube browser, selecting specific hierarchies for visualisation.

Figure 52 - Selected choices populate a slide-out menu, indicating the types of OLAP data. In the case of

hierarchies: the number of levels contained within.

Figure 53 – By clicking the middle add button, a menu visualising the types of graphs available is presented.

Figure 54 - An initial view of the TreeMap graph visualisation. The user populates the data specifics and uses

the Plot button to render the visualisation.

Figure 55 - All additional controls are retractable and can be hidden away. Elements can be hovered over and

selected. Drilling down displays data at a lower level of the hierarchy.

Figure 56 - Slicing into the data, the selected element is maximised. The context is saved and is displayed to

the user by means of a breadcrumb at the top of the graph.

Figure 57 - After a slice, moving up in levels is done via a triangle button in the top right corner.

Figure 58 - The categorical time-series analyser tool located at the bottom of the graph. A specific date level is

selected for exploration. The user can slide through the members of the level, automatically filtering the data

and re-rendering the graph above it.

Figure 59 - Initial 3D view of the data. The user has selected the hierarchies and measures they want to

visualise and has clicked on the Map button.

Figure 60 - Selecting a 3D element displays a legend with additional info about the member. The member is

localised on screen by greying out all other elements.

Figure 61 - Selecting a corner level sphere, an additional prompt with information about the axis is displayed.

It provides the OLAP functionality needed to explore the data further.

Figure 62 – Slicing on a specific member.

Figure 63 - Drilling down into a specific member (Accessories) from previous step.

Figure 64 - Displaying a large amount of elements for the lower levels of the data is not a problem for the tool.

This data can be looked at more thoroughly using the slice functionality provided.

Bibliography

1. Varshney, K. R. Introduction to Business Analytics.

http://informationashvins.files.wordpress.com/2012/04/varshney_icassp2012.pdf. [Online] 2012.

2. Boukraâ, D., Boussaïd, O. and Bentayeb, F. OLAP Operators for Complex Object Data

Cubes. Advances in Databases and Information Systems, Lecture Notes in Computer Science Vol

6295. 2011, pp. 103-116.

3. Han, J., Kamber, M. and Pei, J. Data Mining Concepts and Techniques. s.l. : Morgan

Kauffman, 2012.

4. Singh, K. and Bhasin, Dr. S. Constructing the OLAP Cube from Relational Databases/Flat

Files. International Journal of Computer Trends and Technology. 2011, Vol. 2, pp. 167-177.

5. Dang, Luan Quang. A functional framework for evaluating visualization applications, with a

focus on financial analysis problems. 2012. Masters Thesis.

6. Mansmann, S., Mansmann, F., Scholl, M., Keim, D. Hierarchy-driven Visual Exploration of

Multidimensional Data Cubes. 2007. Paper presented at the meeting of the BTW.

7. Mansmann, S., Scholl, M. Exploring OLAP Aggregates with Hierarchical Visualization

Techniques. 2007 : s.n. ACM SAC 2007: Proc. of 22nd Annual ACM Symposium on Applied

Computing.

8. —. Visual OLAP: A new paradigm for exploring multidimensional aggregates. 2008 : s.n. in

Proc. of IADIS Int'l Conf. on Computer Graphics and Visualization.

9. Lafon, S., Bouali, F., Guinot, C., Venturini, G. "3D and Immersive Interfaces for Business

Intelligence: The Case of OLAP," Information Visualisation (IV), 2013 17th International

Conference. 16-18 July 2013.

10. Tibco Spotfire. [Online] http://spotfire.tibco.com/.

11. Tableau Software. [Online] http://www.tableausoftware.com/.

12. Telerik RadControls. [Online] http://www.telerik.com/products/wpf/overview.aspx.

13. Ranet OLAP. [Online] Galantis. http://www.galantis.com/ranet/.

14. SharpShooter OLAP. [Online] Perpetuum Software.

http://www.perpetuumsoft.com/Product.aspx?lang=en&pid=32&tid=features.

15. RadarCube Windows Forms. [Online] RadarSoft.

16. ComponentOne OLAP for WinForms. [Online] Component One.

https://www.componentone.com/SuperProducts/OLAPWinForms/.

17. Stolte. C, Tang D. and Hanrahan, P. Multiscale Visualization Using Data Cubes. s.l. :

IEEE Trans. Visualization and Computer Graphics, vol. 9, no. 2, 2003, pp. pp. 176-187.

18. Walter, J. et al. Interactive Visualization and Navigation in Large Data Collections Using

the Hyperbolic Space. s.l. : Proc 3rd IEEE International Conference Data Mining, IEEE CS,

2003, pp. pp. 355-365.

19. Techapichetvanich, K. and Datta, A. Interactive Visulization for OLAP. s.l. : Proc.

International Conference Computational Science and its Applications, LNCS 3842, Springer,

2005, pp. pp. 293-304.

20. Piringer, H. and Buchetics, M. Hierarchical Difference Scatterplots: Interactive Visual

ANalysis of Data Cubes. s.l. : ACM SIGKDD Explorations Newsletter, vol. 11, no. 2, 2009, pp.

pp. 49-58.

21. GapMinder. [Online] http://www.gapminder.org/.

22. Taylor, J., StatSlice Systems. Tableau Dashboards Sourced from OLAP vs. RDB: An

Analysis. 2013.

23. Chaudhuri, S, Dayal, U and Narasayya, V. An overview of business intelligence

technology. Comms. ACM. 2011, pp. 88-98.

24. Urbanek, Stefan. Cubes - OLAP Framework Documentation. [Online] 2013. [Cited: 11 10

2013.] http://databrewery.org/cubes/doc/backends/sql.html.

25. Comparison of OLAP Servers. Wikipedia. [Online] [Cited: 11 10 2013.]

http://en.wikipedia.org/wiki/Comparison_of_OLAP_Servers.

26. icCube Licensing Comparison. icCube. [Online] [Cited: 11 10 2013.]

http://www.iccube.com/purchase/edition-comparison.

27. Mansmann, S., Rehman, N., Weiler, A., Scholl, M. Discovering OLAP dimensions in

semi-structured data. s.l. : In Proceedings of the fifteenth international workshop on Data

warehousing and OLAP (DOLAP '12), 2012.

28. Cuzzocrea, A., Domenico, S., Ullman, J. Big Data: A Research Agenda. s.l. : In

Proceedings of the 17th International Database Engineering & Applications Symposium (IDEAS

'13), 2013.

29. McAfee, A., Brynjolfsson, E. Big Data: The Management Revolution. Harvard Business

Review. [Online] http://hbr.org/2012/10/big-data-the-management-revolution/ar.

30. Provost, F., Fawcett, T. Data science and its relationship to big data & data-driven decision

making. s.l. : O'Reilly Media, 2013.

31. Intel. Big Data Visualization: Turning Big Data Into Big Insights. 2013.

32. Agrawal, D., Das, S., Abbadi, A. E. Big data and cloud computing: current state and future

opportunities. 2011.

33. Sitaridi, E., Ross, A. Ameliorating memory contention of olap operators on gpu processors.

2012.

34. Cuzzocrea, A., Sacca, D., Serafino, P. A hierarchy-driven compression technique for

advanced olap visualization of multidimensional data cubes. 2006.

35. —. Semantics-aware advanced olap visualization of multidimensional data cubes. 2007.

36. VMware. [Online] http://www.vmware.com/uk/products/workstation/.

37. Git. [Online] http://git-scm.com/.

38. Atlassian BitBucket. [Online] https://www.atlassian.com/software/bitbucket/overview.

39. Git Source Control Provider. [Online]

http://visualstudiogallery.msdn.microsoft.com/63a7e40d-4d71-4fbb-a23b-d262124b8f4c.

40. TortoiseGit. [Online] https://code.google.com/p/tortoisegit/.

41. TeamCity. [Online] http://www.jetbrains.com/teamcity/.

42. YouTrack. [Online] http://www.jetbrains.com/youtrack/.

43. Windows Presentation Foundation. [Online] http://msdn.microsoft.com/en-

us/library/ms754130(v=vs.110).aspx.

44. Microsoft PRISM. [Online] http://compositewpf.codeplex.com/.

45. OxyPlot. [Online] http://oxyplot.codeplex.com/.

46. Modern UI (Metro) Charts for Windows 8, WPF, Silverlight. [Online]

http://modernuicharts.codeplex.com/.

47. WPF Treemaps & SquarifiedTreeMaps Control. [Online] http://treemaps.codeplex.com/.

48. Johnson, B. and Shneiderman, B. Treemaps: a space-filling approach to the visualization of

hierarchical information structures. October 1991, pp. 284-291. In Proc. of the 2nd International

IEEE Visualization Conference.

49. Shneiderman, B. Tree visualization with tree-maps: a 2d space-filling approach. ACM

Transactions on Graphics. September 1992.

50. Bruls, Mark, Huizing, Kees and van Wijk, Jarke J. Squarified treemaps. Data

Visualization 2000: Proc. Joint Eurographics and IEEE TCVG Symp. on Visualization. 2000.

51. Helix 3D Toolkit. [Online] http://helixtoolkit.codeplex.com/.

52. Adventure Works Data Samples. [Online] http://msftdbprodsamples.codeplex.com/.

53. Adventure Works SQL Server 2012 Samples. [Online]

http://msftdbprodsamples.codeplex.com/releases/view/55330.

54. Larman, Craig. Applying UML And Patterns Second Edition.

55. Shore, James and Warden, Shane. The Art of Agile Development. s.l. : O'Reilly, 2008.

56. Humble, Jez and Farley, David. Continuous Delivery. s.l. : Addison-Wesley, 2011.

57. Brumfield, B. et al. Developer's Guide to Microsoft Prism 4. [Online] 2011.

http://msdn.microsoft.com/en-us/library/gg406140.aspx.

58. Shalloway, A. and Trott, J. Design Patterns Explained: A New Perspective on Object-

Oriented Design. s.l. : Addison-Wesley Professional, 2004, Chap. 1.

59. Gamma, E., Helm, R., Johnson, R., Vlissides, J. Design patterns: elements of reusable

object-oriented software. s.l. : Addison Wesley, 1995.

60. Fowler, Martin. Application Controler. [Online]

http://martinfowler.com/eaaCatalog/applicationController.html.

61. —. Inversion of Control Containers and the Dependency Injection pattern. [Online]

http://www.martinfowler.com/articles/injection.html.

62. —. Registry. [Online] http://martinfowler.com/eaaCatalog/registry.html.

63. Microsoft. Windows Forms Controls and Equivalent WPF Controls. [Online]

http://msdn.microsoft.com/en-us/library/ms750559(v=vs.110).aspx.

64. —. HierarchicalDataTemplate Class. [Online] http://msdn.microsoft.com/en-

us/library/system.windows.hierarchicaldatatemplate.aspx.

65. Leparmentier, Guillaume. Manipulating colors in .NET. Code Project. [Online] 2007.

http://www.codeproject.com/Articles/19045/Manipulating-colors-in-NET-Part.

66. Adaptive Coloring for Syntax Highlighting. Qt Quarterly. [Online]

http://doc.qt.digia.com/qq/qq26-adaptivecoloring.html.

67. NUnit. [Online] http://www.nunit.org/index.php?p=home.

68. Beck, Kent. Test-Driven Development by Example. s.l. : Addison Wesley - Vaseem, 2003.

69. Madeyski, Lech. Test-Driven Development - An Empirical Evaluation of Agile Practice.

s.l. : Springer, 2010.

70. Palermo, Jeffrey. Guidelines for Test-Driven Development. [Online] Microsoft.

http://msdn.microsoft.com/en-us/library/aa730844(v=vs.80).aspx.

71. ANTS Performance Profiler. [Online] Redgate Software. http://www.red-

gate.com/products/dotnet-development/ants-performance-profiler/.

72. WPF Treemaps & SquarifiedTreeMaps Control Issues. [Online]

http://treemaps.codeplex.com/workitem/5307.

Documents

OLAP and Visualisation - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/3rd-year-projects/... · Table of Contents 1. ... Figure 8 - Telerik RadControls