Cubes - Lightweight Python OLAP Framework

Cubes DocumentationRelease 0.10

Stefan Urbanek

December 07, 2012

CONTENTS

1 Introduction 31.1 Why cubes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Cube, Dimensions, Facts and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Installation 72.1 Basic Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Quick Start or Hello World! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Customized Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Logical Model and Metadata 93.1 Logical Model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Logical to Physical Model Mapping 194.1 Implicit Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2 Dimension tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Database Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.4 Explicit Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5 Date Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.7 Customization of the Implicit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.8 Mapping Process Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.9 Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.10 Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Aggregation Browsing and Aggregations 295.1 ROLAP with SQL backend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Cell Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.3 Hierarchies, levels and drilling-down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.4 Multiple Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Creating Cubes 396.1 Relational Database (SQL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 Localization 417.1 Metadata Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427.2 Data Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.3 Localized Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

8 OLAP Server 458.1 HTTP API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458.2 Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498.3 Running and Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

i

9 slicer - Command Line Tool 579.1 serve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579.2 model validate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589.3 model json . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599.4 model extract_locale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599.5 model translate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599.6 ddl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599.7 denormalize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

10 Reference 6110.1 Logical Model Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6110.2 Aggregation Browser Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6910.3 backends — Aggregation Browsing Backends . . . . . . . . . . . . . . . . . . . . . . . . . . 7810.4 HTTP WSGI OLAP Server Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8310.5 Utility functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

11 Developing Cubes 8511.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8511.2 New or changed feature checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

12 Development Notes 8712.1 Fact Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

13 Contact and Getting Help 89

14 License 91

15 Indices and tables 93

Python Module Index 95

Index 97

ii

Cubes Documentation, Release 0.10

Cubes is a light-weight Python framework and set of tools for Online Analytical Processing (OLAP), multidimen-sional analysis and browsing of aggregated data. It is part of Data Brewery.

Contents:

CONTENTS 1

http://databrewery.org/


2 CONTENTS

CHAPTER

ONE

INTRODUCTION

1.1 Why cubes?

Focus on data analysis, in human way

Purpose is to provide a framework for giving analyst or any application end-user understandable and natural wayof presenting the multidimensional data. One of the main features is the logical model, which serves as abstractionover physical data to provide end-user layer.

It is meant to be used by application builders that want to provide analytical functionality.

Features:

• logical view of analysed data - how analysts look at data, how they think of data, not not how the data arephysically implemented in the data stores

• hierarchical dimensions (attributes that have hierarchical dependencies, such as category-subcategory orcountry-region)

• localizable metadata and data (see Localization)

• OLAP and aggregated browsing (default backend is for relational databse - ROLAP)

• multidimensional analysis

1.2 Cube, Dimensions, Facts and Measures

The framework thiks of the data as a cube with multiple dimensions:

The most detailed unit of the data is a fact. Fact can be a contract, invoice, spending, task, etc. Each fact mighthave a measure – an attribute that can be measured, such as: price, amount, revenue, duration, tax, discount, etc.

The dimension provides context for facts. Is used to:

• filter queries or reporst

• controls scope of aggregation of facts

• used for ordering or sorting

• defines master-detail relationship

Dimension can have multiple hierarchies, for example the date dimension might have year, month and day levelsin a hierarchy.

1.3 Architecture

The framework is composed of four modules and one command-line tool:

3


Figure 1.1: a data cube

• model - Description of data (metadata): dimensions, hierarchies, attributes, labels, localizations.

• browser - Aggregation browsing, slicing-and-dicing, drill-down.

• backends - Actual aggregation implementation and utility functions.

• server - WSGI HTTP server for Cubes

• slicer - Command Line Tool - command-line tool

Figure 1.2: framework modules

1.3.1 Model

Logical model describes the data from user’s or analyst’s perspective: data how they are being measured, aggre-gated and reported. Model is independent of physical implementation of data. This physical independence makesit easier to focus on data instead on ways of how to get the data in understandable form.

More information about logical model can be found in the chapter Logical Model and Metadata. See also pro-gramming reference of the model module.

4 Chapter 1. Introduction


1.3.2 Browser

Core of the Cubes analytics functionality is the aggregation browser. The browser module contains utility classesand functions for the browser to work.

More information about browser can be found in the chapter Aggregation Browsing and Aggregations. See alsoprogramming reference of the browser module.

1.3.3 Backends

Backends provide the actual data aggregation and browsing functionality. Cubes comes with built-in ROLAPbackend which uses SQL database through SQLAlchemy.

Framework has modular nature and supports multiple database backends, therefore different ways of cube com-putation and ways of browsing aggregated data.

See also programming reference of the backends module.

1.3.4 Server

Cubes comes with built-in WSGI HTTP OLAP server called slicer - Command Line Tool and provides json APIfor most of the cubes framework functionality. The server is based on the Werkzeug WSGI framework.

More information about the Slicer server requests can be found in the chapter OLAP Server. See also programmingreference of the server module.

1.3. Architecture 5

http://en.wikipedia.org/wiki/ROLAP

http://www.sqlalchemy.org/download.html


6 Chapter 1. Introduction

CHAPTER

TWO

INSTALLATION

There are two options how to install cubes: basic common installation - recommended mostly for users startingwith Cubes. Then there is customized installation with requirements explained.

2.1 Basic Installation

The cubes has optional requirements:

• SQLAlchemy for SQL database aggregation browsing backend

• Werkzeug for Slicer WSGI server

Note: If you never used Python before, you might have to get the pip installer first, if you do not have it already.

Note: The command-line tool Slicer does not require knowledge of Python. You do not need to know thelanguage if you just want to serve OLAP data.

For quick satisfaction of requirements install the packages:

pip install sqlalchemy werkzeug

Then install the Cubes:

pip install cubes

2.2 Quick Start or Hello World!

Download the sources from the Cubes Github repository. Go to the examples/hello_world folder:

git clone git://github.com/Stiivi/cubes.gitcd cubescd examples/hello_world

Prepare data and run the OLAP server:

python prepare_data.pyslicer serve slicer.ini

And try to do some queries:

curl "http://localhost:5000/aggregate"curl "http://localhost:5000/aggregate?drilldown=year"curl "http://localhost:5000/aggregate?drilldown=item"curl "http://localhost:5000/aggregate?drilldown=item&cut=item:e"

7


http://werkzeug.pocoo.org/

http://www.pip-installer.org/

https://github.com/Stiivi/cubes


2.3 Customized Installation

The project sources are stored in the Github repository.

Download from Github:

git clone git://github.com/Stiivi/cubes.git

The requirements for SQLAlchemy and Werkzeug are optional and you do not need them if you are going to useanother kind of backend.

Install:

cd cubespip install -r requirements-optional.txtpython setup.py install

8 Chapter 2. Installation

https://github.com/Stiivi/cubes



CHAPTER

THREE

LOGICAL MODEL AND METADATA

Logical model describes the data from user’s or analyst’s perspective: data how they are being measured, aggre-gated and reported. Model is independent of physical implementation of data. This physical independence makesit easier to focus on data instead on ways of how to get the data in understandable form. Objects and functionalityfor working with metadata are provided by the model package.

In short, logical model enables users to:

• refer to dimension attributes by name regardless of storage (which table)

• specify hierarchical dependencies of attributes, such as:

– product category > product > subcategory > product

– country > region > county > town.

• specify attribute labels to be displayed in end-user application

• for all localizations use the same attribute name, therefore write only one query for all report translations

Analysts or report writers do not have to know where name of an organisation or category is stored, nor he does nothave to care whether customer data is stored in single table or spread across multiple tables (customer, customertypes, ...). They just ask for customer.name or category.code.

In addition to abstraction over physical model, localization abstraction is included. When working in multi-lingualenvironment, only one version of report/query has to be written, locales can be switched as desired. If requesting“contract type name”, analyst just writes constract_type.name and Cubes framework takes care about appropriatelocalisation of the value.

Example: Analysts wants to report contract amounts by geography which has two levels: country level and regionlevel. In original physical database, the geography information is normalised and stored in two separate tables,one for countries and another for regions. Analyst does not have to know where the data are stored, he just queriesfor geography.country and/or geography.region and will get the proper data. How it is done is depicted on thefollowing image:

The logical model describes dimensions geography in which default hierarchy has two levels: country and region.Each level can have more attributes, such as code, name, population... In our example report we are interestedonly in geographical names, that is: country.name and region.name.

3.1 Logical Model description

The logical model can be either constructed programmatically or provided as JSON. The model entities and theirstructure are depicted on the following figure:

Load a model:

model = cubes.load_model(path)

The path might be:

• JSON file with a dictionary describing model

9


Figure 3.1: Mapping from logical model to physical data.

10 Chapter 3. Logical Model and Metadata


Figure 3.2: The logical model entities and relationships.

3.1. Logical Model description 11


• URL with a JSON dictionary

• a directory with logical model description files (model, cubes, dimensions) - note that this is the old way ofspecifying model and is being depreciated

Model can be represented also as a single json file containing all model objects.

The directory contains:

File Descriptionmodel.json Core model informationcube_*cube_name*.json Cube description, one file per cubedim_*dimension_name*.json Dimension description, one file per dimension

3.1.1 Model

The model dictionary contains main model description. The structure is:

{"name": "public_procurements","label": "Public Procurements of Slovakia","description": "Contracts of public procurement winners in Slovakia""cubes": [...]"dimensions": [...]

}

Key Descriptioncubes list of cube descriptionsdimensions list of dimension descriptionsname model name (optional)label human readable name - can be used in an application (optional)description longer human-readable description of the model (optional)

3.1.2 Cubes

Cube descriptions are stored as a dictionary for key cubes in the model description dictionary or in json fileswith prefix cube_ like cube_contracts, or

Key Descriptionname cube namemea-sures

list of cube measures (recommended, but might be empty for measure-less, record count onlycubes)

dimen-sions

list of cube dimension names (recommended, but might be empty for dimension-less cubes)

label human readable name - can be used in an applicationdetails list of fact details (as Attributes) - attributes that are not relevant to aggregation, but are

nice-to-have when displaying facts (might be separately stored)joins specification of physical table joins (required for star/snowflake schema)map-pings

mapping of logical attributes to physical attributes

options backend/workspace optionsinfo custom info, such as formatting. Not used by cubes framework.

Example:

{"name": "date","label": "Dátum","dimensions": [ "date", ... ]

"measures": [...],



"details": [...],

"fact": "fact_table_name","mappings": { ... },"joins": [ ... ]

}

For more information about mappings see Logical to Physical Model Mapping

3.1.3 Dimensions

Dimension descriptions are stored in model dictionary under the key dimensions.

Figure 3.3: Dimension description - attributes.

The dimension description contains keys:

3.1. Logical Model description 13


Key Descriptionname dimension name, used as identifierlabel human readable name - can be used in an applicationlevels list of level descriptionshierarchies list of dimension hierarchieshierarchy if dimension has only one hierarchy, you can specify it under this keydefault_hierarchy_name name of a hierarchy that will be used as defaultinfo custom info, such as formatting. Not used by cubes framework.

Example:

{"name": "date","label": "Dátum","levels": [ ... ]"attributes": [ ... ]"hierarchies": [ ... ]

}

Use either hierarchies or hierarchy, using both results in an error.

Hierarchy levels are described as:

Key Descriptionname level name, used as identifierlabel human readable name - can be used in an applicationat-tributes

list of other additional attributes that are related to the level. The attributes are not being used foraggregations, they provide additional useful information.

key key field of the level (customer number for customer level, region code for region level,year-month for month level). key will be used as a grouping field for aggregations. Key should beunique within level.

la-bel_attribute

name of attribute containing label to be displayed (customer name for customer level, region namefor region level, month name for month level)

info custom info, such as formatting. Not used by cubes framework.

Example of month level of date dimension:

{"month","label": "Mesiac","key": "month","label_attribute": "month_name","attributes": ["month", "month_name", "month_sname"]

},

Example of supplier level of supplier dimension:

{"name": "supplier","label": "Dodávatel’","key": "ico","label_attribute": "name","attributes": ["ico", "name", "address", "date_start", "date_end",

"legal_form", "ownership"]}

Hierarchies are described as:

Key Descriptionname hierarchy name, used as identifierla-bel

human readable name - can be used in an application

lev-els

ordered list of level names from top to bottom - from least detailed to most detailed (for example:from year to day, from country to city)



Example:

"hierarchies": [{

"name": "default","levels": ["year", "month"]

},{

"name": "ymd","levels": ["year", "month", "day"]

},{

"name": "yqmd","levels": ["year", "quarter", "month", "day"]

}]

3.1.4 Attributes

Measures and dimension level attributes can be specified either as rich metadata or just simply as strings. If onlystring is specified, then all attribute metadata will have default values, label will be equal to the attribute name.

Key Descriptionname attribute name (should be unique within a dimension)label human readable name - can be used in an application, localizableorder natural order of the attribute (optional), can be asc or desclocales list of locales in which the attribute values are available in (optional)aggregations list of aggregations to be performed if the attribute is a measureinfo custom info, such as formatting. Not used by cubes framework.

The optional order is used in aggregation browsing and reporting. If specified, then all queries will have resultssorted by this field in specified direction. Level hierarchy is used to order ordered attributes. Only one orderedattribute should be specified per dimension level, otherwise the behavior is unpredictable. This natural (or default)order can be later overridden in reports by explicitly specified another ordering direction or attribute. Explicitorder takes precedence before natural order.

For example, you might want to specify that all dates should be ordered by default:

"attributes" = [{"name" = "year", "order": "asc"}

]

Locales is a list of locale names. Say we have a CPV dimension (common procurement vocabulary - EU procure-ment subject hierarchy) and we are reporting in Slovak, English and Hungarian. The attributes will be thereforespecified as:

"attributes" = [{"name" = "group_code"},{"name" = "group_name", "order": "asc", "locales" = ["sk", "en", "hu"]}

]

group name is localized, but group code is not. Also you can see that the result will always be sorted by groupname alphabetical in ascending order. See PhysicalAttributeMappings for more information about how logicalattributes are mapped to the physical sources.

In reports you do not specify locale for each localized attribute, you specify locale for whole report or browsingsession. Report queries remain the same for all languages.

3.2 Model validation

To validate a model do:

3.2. Model validation 15


results = model.validate()

This will return a list of tuples (result, message) where result might be ‘warning’ or ‘error’. If validation containserrors, the model can not be used without resulting in failure. If there are warnings, some functionalities might ormight not fail or might not work as expected.

You can validate model from command line:

slicer model validate model.json

See also the slicer tool documentation for more information.

3.2.1 Errors

When any of the following validation errors occurs, then it is very probable that use of the model will result infailure.



Error Ob-ject

Resolution

Duplicate measure ‘measure‘ in cube‘cube‘

cube Two or more measures have the same name. Make sure thatall measure names are unique within the cube, includingdetail attributes.

Duplicate detail ‘detail‘ in cube ‘cube‘ cube Two or more detail attributes have the same name. Make surethat all detail attribute names are unique within the cube,including measures.

Duplicate detail ‘detail‘ in cube‘cube‘ - specified also as measure

cube A detail attribute has same name as one of the measures.Make sure that all detail attribute names are unique withinthe cube, including measures.

No hierarchies in dimension‘dimension‘, more than one levelsexist (count)”

di-men-sion

There is more than one level specified in the dimension, butno hierarchy is defined. Specify a hierarchy with expectedorder of the levels.

No defaut hierarchy specified, there ismore than one hierarchy in dimension‘dimension’

di-men-sion

Dimension has more than one hierarchy, but none of them isspecified as default. Set the default_hierarchy_name todesired default hierarchy.

Default hierarchy ‘hierarchy‘ does notexist in dimension ‘dimension‘

di-men-sion

There is no hierarchy in the dimension with name specifiedas default_hierarchy_name. Make sure that the defaulthierarchy name refers to existing hierarchy within thedimension.

Level ‘level‘ in dimension‘dimension‘ has no attributes

di-men-sion

There are no attributes specified for level. Set attributesduring Level obejct creation. This error should not appearwhen creating model from file.

Key ‘key‘ in level ‘level‘ in dimension‘dimension‘ is not in level’s attributelist

di-men-sion

Key should be one of the attributes specified for the level.Either add the key to the attribute list (preferrably at thebeginning) or choose another attribute as the level key.

Duplicate attribute ‘attribute‘ indimension ‘dimension‘ level ‘level‘(also defined in level ‘another_level‘)

di-men-sion

attribute is defined in two or more levels in the samedimension. Make sure that attribute names are all uniquewithin one dimension. Example of most common duplicatesare: id or name. Recommended fix is to use level prefix:country_id and country_name.

Dimension (dim1) of attribute ‘attr‘does not match with owningdimension dim2

di-men-sion

This might happen when creating model programatically.Make sure that attribute added to the dimension level hasproperely set dimension attribute to the dimension it is goingto be part of (dim2).

Dimension ‘dimension‘ is not instanceof Attribute

model When creating dimension programatically, make sure that allattributes added to the dimension level are instances ofcubes.Attribute. You should not see this error whenloading a model from a file.

Dimension ‘dimension‘ is not asubclass of Dimension class

model When creating model programatically, make sure that alldimensions you add to model are subclasses of Dimension.You should not see this error when loading a model from afile.

Measure ‘measure‘ in cube ‘cube‘ isnot instance of Attribute

cube When creating cube programatically, make sure that allmeasures you add to the cube are subclasses ofcubes.Attribute. You should not see this error whenloading a model from a file.

Detail ‘detail‘ in cube ‘cube‘ is notinstance of Attribute

cube When creating cube programatically, make sure that all detailattributes you add to the cube are subclasses ofcubes.Attribute. You should not see this error whenloading a model from a file.

The following list contains warning messages from validation process. It is not recommended to use the model,some issues might emerge.

Warning Object ResolutionNo cubes defined model Model should contain at least one cube

3.2. Model validation 17


The model construction uses some implicit defaults to satisfy needs for a working model. Validator identifieswhere the defaults are going to be applied and adds information about them to the validation results. Considerthem to be informative only. The model can be used, just make sure that defaults reflect expected reality.

Warning Ob-ject

Resolution

No hierarchies in dimension‘dimension‘, flat level ‘level‘ will beused.

di-men-sion

There are no hierarchies specified in the dimension andthere is only one level. Default hierarchy will be createdwith the only one level.

Level ‘level‘ in dimension ‘dim‘ has nokey attribute specified, first attribute willbe used: ‘attr‘

di-men-sion

Each level should have a key attribute specified. If it isnot, then the first attribute from attribute list will be usedas key.


CHAPTER

FOUR

LOGICAL TO PHYSICAL MODELMAPPING

One of the most important parts of proper OLAP on top of the relational database is the mapping of logicalattributes to their physical counterparts. In SQL database the physical attribute is stored in a column, whichbelongs to a table, which might be part of a database schema.

Note: StarBrowser (SQL) is used as an example.

Note: Despite this chapter describes examples mostly in the relational database backend, the principles are thesame, or very similar, in other backends as well.

For example, take a reference to an attribute name in a dimension product. What is the column of what table inwhich schema that contains the value of this dimension attribute?

19


For data browsing, the Cubes framework has to know where those logical (reported) attributes are physicallystored. It needs to know which tables are related to the cube and how they are joined together so we get wholeview of a fact.

The process is done in two steps:

1. joining relevant star/snowflake tables

2. mapping logical attribute to table + column

There are two ways how the mapping is being done: implicit and explicit. The simplest, straightforward and mostcustomizable is the explicit way, where the actual column reference is provided in a mapping dictionary of thecube description.

4.1 Implicit Mapping

With implicit mapping one can match a database schema with logical model and does not have to specify additionalmapping metadata. Expected structure is star schema with one table per (denormalized) dimension.

Fact table: Cubes looks for fact table with the same name as cube name. You might specify prefix for every facttable with fact_table_prefix. Example:

• Cube is named contracts, framework looks for a table named contracts.

• Cubes are named contracts, invoices and fact table prefix is fact_ then framework looks for tables namedfact_contracts and fact_invoices respectively.

4.2 Dimension tables

By default, dimension tables are expected to have same name as dimensions and dimension table columns areexpected to have same name as dimension attributes:

It is quite common practice that dimension tables have a prefix such as dim_ or dm_. Such prefix can be specifiedwith dimension_prefix option.

20 Chapter 4. Logical to Physical Model Mapping


Basic rules:

• fact table should have same name as represented cube

• dimension table should have same name as the represented dimension, for example: product (singular)

• column name should have same name as dimension attribute: name, code, description

• references without dimension name in them are expected to be in the fact table, for example: amount,discount (see note below for simple flat dimensions)

• if attribute is localized, then there should be one column per localization and should have locale suffix:description_en, description_sk, description_fr (see below for more information)

Flat dimension without details: What about dimensions that have only one attribute, like one would not havea full date but just a year? In this case it is kept in the fact table without need of separate dimension table.The attribute is treated in by the same rule as measure and is referenced by simple year. This is applied to alldimensions that have only one attribute (representing key as well). This dimension is referred to as flat andwithout details.

Note for advanced users: this behavior can be disabled by setting simplify_dimension_references toFalse in the mapper. In that case you will have to have separate table for the dimension attribute and you willhave to reference the attribute by full name. This might be useful when you know that your dimension will bemore detailed.

Note: In other than SQL backends, the implicit mapping might be implemented differently. Refer to the respectivebackend documentation to learn how the mapping is done.

4.3 Database Schemas

For databases that support schemas, such as PostgreSQL, option schema can be used to specify default databaseschema where all tables are going to be looked for.

In case you have dimensions stored in separate schema than fact table, you can specify that indimension_schema. All dimension tables are going to be searched in that schema.

4.4 Explicit Mapping

If the schema does not match expectations of cubes, it is possible to explicitly specify how logical attributes aregoing to be mapped to their physical tables and columns. Mapping dictionary is a dictionary of logical attributesas keys and physical attributes (columns, fields) as values. The logical attributes references look like:

• dimensions_name.attribute_name, for example: geography.country_name or category.code

• fact_attribute_name, for example: amount or discount

For example:

4.3. Database Schemas 21


"mappings": {"product.name": "dm_products.product_name"

}

If it is in different schema or any part of the reference contains a dot:

"mappings": {"product.name": {

"schema": "sales","table": "dm_products","column": "product_name"

}}

Both, explicit and implicit mappings have ability to specify default database schema (if you are using Oracle,PostgreSQL or any other DB which supports schemas).

The mapping process process is like this:

Note: In other than SQL backends, the value in the mapping dictionary can be interpreted differently. The(schema, table, column) tuple is used as an example from SQL browser.

4.5 Date Data Type

Date datatype column can be turned into a date dimension by extracting date parts in the mapping. To do so, foreach date attribute specify a column name and part to be extracted with value for extract key.

"mappings": {"date.year": {"column":"date", "extract":"year"},"date.month": {"column":"date", "extract":"month"},



"date.day": {"column":"date", "extract":"day"}}

According to SQLAlchemy, you can extract in most of the databases: month, day, year, second, hour,doy (day of the year), minute, quarter, dow (day of the week), week, epoch, milliseconds,microseconds, timezone_hour, timezone_minute. Please refer to your database engine documen-tation for more information.

Note: It is still recommended to have a date dimension table.

4.6 Localization

Despite localization taking place first in the mapping process, we talk about it at the end, as it might be not socommonly used feature. From physical point of view, the data localization is very trivial and requires languagedenormalization - that means that each language has to have its own column for each attribute.

Localizable attributes are those attributes that have locales specified in their definition. To map logical at-tributes which are localizable, use locale suffix for each locale. For example attribute name in dimension categoryhas two locales: Slovak (sk) and English (en). Or for example product category can be in English, Slovak orGerman. It is specified in the model like this:

attributes = [{

"name" = "category","locales" = ["en", "sk", "de"]

}]

During the mapping process, localized logical reference is created first:

In short: if attribute is localizable and locale is requested, then locale suffix is added. If no such localization existsthen default locale is used. Nothing happens to non-localizable attributes.

For such attribute, three columns should exist in the physical model. There are two ways how the columns shouldbe named. They should have attribute name with locale suffix such as category_sk and category_en(_underscore_ because it is more common in table column names), if implicit mapping is used. You can name thecolumns as you like, but you have to provide explicit mapping in the mapping dictionary. The key for the localizedlogical attribute should have .locale suffix, such as product.category.sk for Slovak version of categoryattribute of dimension product. Here the _dot_ is used because dots separate logical reference parts.

4.6. Localization 23


Note: Current implementation of Cubes framework requires a star or snowflake schema that can be joined intofully denormalized normalized form just by simple one-key based joins. Therefore all localized attributes have tobe stored in their own columns. In other words, you have to denormalize the localized data before using them inCubes.

Read more about Localization.

4.7 Customization of the Implicit

The implicit mapping process has a little bit of customization as well:

• dimension_table_prefix: you can specify what prefix will be used for all dimension tables. For example ifthe prefix is dim_ and attribute is product.name then the table is going to be dim_product.

• fact_table_prefix: used for constructing fact table name from cube name. Example: having prefix ft_ allfact attributes of cube sales are going to be looked up in table ft_sales

• fact_table_name: one can explicitly specify fact table name for each cube separately

See also: cubes.backends.sql.mapper.SnowflakeMapper

4.8 Mapping Process Summary

Following diagram describes how the mapping of logical to physical attributes is done in the star SQL browser(see cubes.backends.sql.StarBrowser):

The “red path” shows the most common scenario where defaults are used.

4.8.1 Joins

Star browser supports a star:

and snowflake database schema:

If you are using either of the two schemas (star or snowflake) in relational database, Cubes requires informationon how to join the tables. Tables are joined by matching single-column – surrogate keys. The framework needsthe join information to be able to transform following snowflake:

to appear as this (denormalized table) with all cube attributes:

4.9 Join

The single join description consists of reference to the master table and a table with details. Fact table is exampleof master table, dimension is example of a detail table (in star schema).

Note: As mentioned before, only single column – surrogate keys are supported for joins.

The join specification is very simple, you define column reference for both: master and detail. The table referenceis in the form table.‘column‘:

"joins" = [{

"master": "fact_sales.product_key","detail": "dim_product.key"



Figure 4.1: logical to physical attribute mapping

4.9. Join 25




}]

As in mappings, if you have specific needs for explicitly mentioning database schema or any other reason wheretable.column reference is not enough, you might write:

"joins" = [{

"master": "fact_sales.product_id","detail": {

"schema": "sales","table": "dim_products","column": "id"

}]

4.10 Aliases

What if you need to join same table twice or more times? For example, you have list of organizations and youwant to use it as both: supplier and service consumer.

It can be done by specifying alias in the joins:

"joins" = [{

"master": "contracts.supplier_id","detail": "organisations.id","alias": "suppliers"

},{

"master": "contracts.consumer_id","detail": "organisations.id","alias": "consumers"

}]

4.10. Aliases 27


Note that with aliases, in the mappings you refer to the table by alias specified in the joins, not by real table name.So after aliasing tables with previous join specification, the mapping should look like:

"mappings": {"supplier.name": "suppliers.org_name","consumer.name": "consumers.org_name"

}

For example, we have a fact table named fact_contracts and dimension table with categories nameddm_categories. To join them we define following join specification:

"joins" = [{

"master": "fact_contracts.category_id","detail": "dm_categories.id"

}]


CHAPTER

FIVE

AGGREGATION BROWSING ANDAGGREGATIONS

The main purpose of the Cubes framework is aggregation and browsing through the aggregates. In this chapter weare going to demonstrate how basic aggregation is done.

Note: See the backend documentation for reference.

5.1 ROLAP with SQL backend

Following examples will show:

• how to build a model programmatically

• how to create a model with flat dimensions

• how to aggregate whole cube

• how to drill-down and aggregate through a dimension

The example data used are IBRD Balance Sheet taken from The World Bank. Backend used for the examples issql.browser.

Create a tutorial directory and download this example file.

Start with imports:

>>> from sqlalchemy import create_engine>>> from cubes.tutorial.sql import create_table_from_csv

Note: Cubes comes with tutorial helper methods in cubes.tutorial. It is advised not to use them inproduction, they are provided just to simplify learner’s life.

Prepare the data using the tutorial helpers. This will create a table and populate it with contents of the CSV file:

>>> engine = create_engine(’sqlite:///data.sqlite’)... create_table_from_csv(engine,... "data.csv",... table_name="irbd_balance",... fields=[... ("category", "string"),... ("category_label", "string"),... ("subcategory", "string"),... ("subcategory_label", "string"),... ("line_item", "string"),... ("year", "integer"),

29

https://raw.github.com/Stiivi/cubes/master/tutorial/data/IBRD_Balance_Sheet__FY2010.csv

https://finances.worldbank.org/Accounting-and-Control/IBRD-Balance-Sheet-FY2010/e8yz-96c6)


... ("amount", "integer")],

... create_id=True

... )

Download the example model and save it. Load the model:

>>> import cubes>>> model = cubes.load_model("model.json")

Check whether the model is valid with model.is_valid() - should return True.

>>> model.is_valid()True

Create a workspace and get a browser instance (in this example it is SQL backend):

>>> workspace = cubes.create_workspace("sql.star", model, engine=engine)

Get a cube and browser for the cube:

>>> cube = model.cube("irbd_balance")>>> browser = workspace.browser(cube)

cell defines context of interest - part of the cube we are looking at. We start with whole cube:

>>> cell = cubes.Cell(cube)

Compute the aggregate. Measure fields of aggregation result have aggregation suffix. Also a total recordcount within the cell is included as record_count.

>>> result = browser.aggregate(cell)>>> result.summary["record_count"]62>>> result.summary["amount_sum"]1116860

Now try some drill-down by year dimension:

>>> result = browser.aggregate(cell, drilldown=["year"])>>> for record in result.drilldown:... print record{u’record_count’: 31, u’amount_sum’: 550840, u’year’: 2009}{u’record_count’: 31, u’amount_sum’: 566020, u’year’: 2010}

Drill-dow by item category:

>>> result = browser.aggregate(cell, drilldown=["item"])>>> for record in result.drilldown:... print record{u’item.category’: u’a’, u’item.category_label’: u’Assets’, u’record_count’: 32, u’amount_sum’: 558430}{u’item.category’: u’e’, u’item.category_label’: u’Equity’, u’record_count’: 8, u’amount_sum’: 77592}{u’item.category’: u’l’, u’item.category_label’: u’Liabilities’, u’record_count’: 22, u’amount_sum’: 480838}

5.1.1 Aggregations and Aggregation Result

Aggregate types depend on the backend, however there is at least one that should be supported by all backends:sum. The SQL StarBrowser supports also min, max and avg. In some databases, such as PostgreSQL, stddev andvariance can be used.

Relevant aggregations for a measure can be specified in the model description:

"measures": [{

"name": "amount",

30 Chapter 5. Aggregation Browsing and Aggregations


"aggregations": ["sum", "min", "max"]}

]

The resulting aggregated attribute name will be constructed from the measure name and aggregation suffix, for ex-ample the mentioned amount will have three aggregates in the result: amount_sum, amount_min and amount_maxin the case described above.

Result of aggregation is a structure containing: summary - summary for the aggregated cell, drilldown - drill downcells, if was desired, and total_cell_count - total cells in the drill down, regardless of pagination.

5.2 Cell Details

When we are browsing a cube, the cell provides current browsing context. For aggregations and selections tohappen, only keys and some other internal attributes are necessary. Those can not be presented to the user though.For example we have geography path (country, region) as [’sk’, ’ba’], however we want to display to theuser Slovakia for the country and Bratislava for the region. We need to fetch those values from the data store. Celldetails is basically a human readable description of the current cell.

For applications where it is possible to store state between aggregation calls, we can use values from previousaggregations or value listings. Problem is with web applications - sometimes it is not desirable or possible to storewhole browsing context with all details. This is exact the situation where fetching cell details explicitly mightcome handy.

Note: The Original browser added cut information in the summary, which was ok when only point cuts wereused. In other situations the result was undefined and mostly erroneous.

The cell details are now provided separately by method cubes.AggregationBrowser.cell_details()which has Slicer HTTP equivalent /cell or {"query":"detail", ...} in /report request (see theserver documentation for more information).

For point cuts, the detail is a list of dictionaries for each level. For example our previously mentioned path[’sk’, ’ba’] would have details described as:

[{

"geography.country_code": "sk","geography.country_name": "Slovakia","geography.something_more": "...""_key": "sk","_label": "Slovakia"

},{

"geography.region_code": "ba","geography.region_name": "Bratislava","geography.something_even_more": "...","_key": "ba","_label": "Bratislava"

}]

You might have noticed the two redundant keys: _key and _label - those contain values of a level key attributeand level label attribute respectively. It is there to simplify the use of the details in presentation layer, such astemplates. Take for example doing only one-dimensional browsing and compare presentation of “breadcrumbs”:

labels = [detail["_label"] for detail in cut_details]

Which is equivalent to:

5.2. Cell Details 31


levels = dimension.hierarchy().levels()labels = []for i, detail in enumerate(cut_details):

labels.append(detail[level[i].label_attribute.ref()])

Note that this might change a bit: either full detail will be returned or just key and label, depending on an optionargument (not yet decided).

5.3 Hierarchies, levels and drilling-down

• how to create a hierarchical dimension

• how to do drill-down through a hierarchy

• detailed level description

We are going to use very similar data as in the previous examples. Difference is in two added columns: categorycode and sub-category code. They are simple letter codes for the categories and subcategories. Download thisexample file.

5.3.1 Hierarchy

Some dimensions can have multiple levels forming a hierarchy. For example dates have year, month,day; geography has country, region, city; product might have category, subcategory and the product.

In our example we have the item dimension with three levels of hierarchy: category, subcategory and line item:

Figure 5.1: Item dimension hierarchy.

The levels are defined in the model:

"levels": [{

"name":"category","label":"Category","attributes": ["category"]

},{

"name":"subcategory","label":"Sub-category","attributes": ["subcategory"]

},{

"name":"line_item","label":"Line Item",



"attributes": ["line_item"]}

]

You can see a slight difference between this model description and the previous one: we didn’t just specify levelnames and didn’t let cubes to fill-in the defaults. Here we used explicit description of each level. name is levelidentifier, label is human-readable label of the level that can be used in end-user applications and attributes is listof attributes that belong to the level. The first attribute, if not specified otherwise, is the key attribute of the level.

Other level description attributes are key and label_attribute‘. The key specifies attribute name which contains keyfor the level. Key is an id number, code or anything that uniquely identifies the dimension level. label_attribute isname of an attribute that contains human-readable value that can be displayed in user-interface elements such astables or charts.

5.3.2 Preparation

Again, in short we need:

• data in a database

• logical model (see model file) prepared with appropriate mappings

• denormalized view for aggregated browsing (optional)

5.3.3 Drill-down

Drill-down is an action that will provide more details about data. Drilling down through a dimension hierarchywill expand next level of the dimension. It can be compared to browsing through your directory structure.

We create a function that will recursively traverse a dimension hierarchy and will print-out aggregations (count ofrecords in this example) at the actual browsed location.

Attributes

• cell - cube cell to drill-down

• dimension - dimension to be traversed through all levels

• path - current path of the dimension

Path is list of dimension points (keys) at each level. It is like file-system path.

def drill_down(cell, dimension, path=[]):

Get dimension’s default hierarchy. Cubes supports multiple hierarchies, for example for date you might haveyear-month-day or year-quarter-month-day. Most dimensions will have one hierarchy, thought.

hierarchy = dimension.hierarchy()

Base path is path to the most detailed element, to the leaf of a tree, to the fact. Can we go deeper in the hierarchy?

if hierarchy.path_is_base(path):return

Get the next level in the hierarchy. levels_for_path returns list of levels according to provided path. Whendrilldown is set to True then one more level is returned.

levels = hierarchy.levels_for_path(path,drilldown=True)current_level = levels[-1]

We need to know name of the level key attribute which contains a path component. If the model does not explicitlyspecify key attribute for the level, then first attribute will be used:

5.3. Hierarchies, levels and drilling-down 33


level_key = dimension.attribute_reference(current_level.key)

For prettier display, we get name of attribute which contains label to be displayed for the current level. If there isno label attribute, then key attribute is used.

level_label = dimension.attribute_reference(current_level.label_attribute)

We do the aggregation of the cell...

Note: Shell analogy: Think of ls $CELL command in commandline, where $CELL is a directory name. In thisfunction we can think of $CELL to be same as current working directory (pwd)

result = browser.aggregate(cell, drilldown=[dimension])

for record in result.drilldown:print "%s%s: %d" % (indent, record[level_label], record["record_count"])...

And now the drill-down magic. First, construct new path by key attribute value appended to the current path:

drill_path = path[:] + [record[level_key]]

Then get a new cell slice for current path:

drill_down_cell = cell.slice(dimension, drill_path)

And do recursive drill-down:

drill_down(drill_down_cell, dimension, drill_path)

The whole recursive drill down function looks like this:

Whole working example can be found in the tutorial sources.

Get the full cube (or any part of the cube you like):

cell = browser.full_cube()

And do the drill-down through the item dimension:

drill_down(cell, cube.dimension("item"))

The output should look like this:

a: 32da: 8

Borrowings: 2Client operations: 2Investments: 2Other: 2

dfb: 4Currencies subject to restriction: 2Unrestricted currencies: 2

i: 2Trading: 2

lo: 2Net loans outstanding: 2

nn: 2Nonnegotiable, nonintrest-bearing demand obligations on account of subscribed capital: 2

oa: 6Assets under retirement benefit plans: 2Miscellaneous: 2Premises and equipment (net): 2



Figure 5.2: Recursive drill-down explained

5.3. Hierarchies, levels and drilling-down 35


Note that because we have changed our source data, we see level codes instead of level names. We will fix thatlater. Now focus on the drill-down.

See that nice hierarchy tree?

Now if you slice the cell through year 2010 and do the exact same drill-down:

cell = cell.slice("year", [2010])drill_down(cell, cube.dimension("item"))

you will get similar tree, but only for year 2010 (obviously).

5.3.4 Level Labels and Details

Codes and ids are good for machines and programmers, they are short, might follow some scheme, easy to handlein scripts. Report users have no much use of them, as they look cryptic and have no meaning for the first sight.

Our source data contains two columns for category and for subcategory: column with code and column with labelfor user interfaces. Both columns belong to the same dimension and to the same level. The key column is used bythe analytical system to refer to the dimension point and the label is just decoration.

Levels can have any number of detail attributes. The detail attributes have no analytical meaning and are justignored during aggregations. If you want to do analysis based on an attribute, make it a separate dimensioninstead.

So now we fix our model by specifying detail attributes for the levels:

Figure 5.3: Attribute details.

The model description is:

"levels": [{

"name":"category","label":"Category","label_attribute": "category_label","attributes": ["category", "category_label"]

},{

"name":"subcategory","label":"Sub-category","label_attribute": "subcategory_label","attributes": ["subcategory", "subcategory_label"]

},{

"name":"line_item","label":"Line Item","attributes": ["line_item"]

}]

}



Note the label_attribute keys. They specify which attribute contains label to be displayed. Key attribute is by-default the first attribute in the list. If one wants to use some other attribute it can be specified in key_attribute.

Because we added two new attributes, we have to add mappings for them:

"mappings": { "item.line_item": "line_item","item.subcategory": "subcategory","item.subcategory_label": "subcategory_label","item.category": "category","item.category_label": "category_label"}

Now the result will be with labels instead of codes:

Assets: 32Derivative Assets: 8

Borrowings: 2Client operations: 2Investments: 2Other: 2

Due from Banks: 4Currencies subject to restriction: 2Unrestricted currencies: 2

Investments: 2Trading: 2

Loans Outstanding: 2Net loans outstanding: 2

Nonnegotiable: 2Nonnegotiable, nonintrest-bearing demand obligations on account of subscribed capital: 2

Other Assets: 6Assets under retirement benefit plans: 2Miscellaneous: 2Premises and equipment (net): 2

5.3.5 Implicit hierarchy

Try to remove the last level line_item from the model file and see what happens. Code still works, but displaysonly two levels. What does that mean? If metadata - logical model - is used properly in an application, thenapplication can handle most of the model changes without any application modifications. That is, if you add newlevel or remove a level, there is no need to change your reporting application.

5.3.6 Summary

• hierarchies can have multiple levels

• a hierarchy level is identifier by a key attribute

• a hierarchy level can have multiple detail attributes and there is one special detail attribute: label attributeused for display in user interfaces

5.4 Multiple Hierarchies

Dimension can have multiple hierarchies defined. To use specific hierarchy for drilling down:

result = browser.aggregate(cell, drilldown = [("date", "dmy", None)])

The drilldown argument takes list of three element tuples in form: (dimension, hierarchy, level). The hierarchyand level are optional. If level is None, as in our example, then next level is used. If hierarchy is None thendefault hierarchy is used.

5.4. Multiple Hierarchies 37


To sepcify hierarchy in cell cuts just pass hierarchy argument during cut construction. For example to specify cutthrough week 15 in year 2010:

cut = cubes.PointCut("date", [2010, 15], hierarchy="ywd")

Note: If drilling down a hierarchy and asking cubes for next implicit level the cuts should be using same hierarchyas drilldown. Otherwise exception is raised. For example: if cutting through year-month-day and asking for nextlevel after year in year-week-day hierarchy, exception is raised.


CHAPTER

SIX

CREATING CUBES

The Cubes framework provides funcitonality for denormalisation and for cube pre-computation. Currently SQLbackend supports denormalisation only and mongo backend supports cube precomputation.

6.1 Relational Database (SQL)

Following code will create a denormalized view (implemented as table) from a model and star/snowflake relationalschema:

import sqlalchemyimport cubes

model = cubes.model_from_path("/path/to/model")

engine = sqlalchemy.create_engine(common.staging_dburl)connection = engine.connect()cube = model.cube("contracts")

builder = cubes.backends.SQLDenormalizer(cube, connection)builder.create_materialized_view("mft_contracts")

connection.close()

See Also:

Module backends. More information about cube builders in different database environments.

Module model. Logical model description - required for preaggregated cube computation.

39


40 Chapter 6. Creating Cubes

CHAPTER

SEVEN

LOCALIZATION

Having origin in multi-lingual Europe one of the main features of the Cubes framework is ability to providelocalizable results. There are three levels of localization in each analytical application:

1. Application level - such as buttons or menus

2. Metadata level - such as table header labels

3. Data level - table contents, such as names of categories or procurement types

Figure 7.1: Localization levels.

The application level is out of scope of this framework and is covered in internationalization (i18n) libraries, suchas gettext. What is covered in Cubes is metadata and data level.

Localization in cubes is very simple:

1. Create master model definition and specify locale the model is in

2. Specify attributes that are localized (see PhysicalAttributeMappings)

3. Create model translations for each required language

4. Make cubes function or a tool create translated versions the master model

To create localized report, just specify locale to the browser and create reports as if the model was not localized.See Localized Reporting.

41


7.1 Metadata Localization

The metadata are used to display report labels or provide attribute descriptions. Localizable metadata are mostlylabel and description metadata attributes, such as dimension label or attribute description.

Say we have three locales: Slovak, English, Hungarian with Slovak being the main language. The master modelis described using Slovak language and we have to provide two model translation specifications: one for Englishand another for Hungarian.

The model translation file has the same structure as model definition file, but everything except localizable meta-data attributes is ignored. That is, only label and description keys are considered in most cases. Youcan not change structure of mode in translation file. If structure does not match you will get warning or error,depending on structure change severity.

There is one major difference between master model file and model translations: all attribute lists, such as cubemeasures, cube details or dimension level attributes are dictionaries, not arrays. Keys are attribute names, valuesare metadata translations. Therefore in master model file you will have:

attributes = [{ "name": "name", "label": "Name" },{ "name": "cat", "label": "Category" }

]

in translation file you will have:

attributes = {"name": {"label": "Meno"},"cat": {"label": "Kategoria"}

}

If a translation of a metadata attribute is missing, then the one in master model description is used.

In our case we have following files:

procurements.jsonprocurements_en.jsonprocurements_hu.json

Figure 7.2: Localization master model and translation files.

To load a model:

import cubesmodel_sk = cubes.load_model("procurements.json", translations = {

"en": "procurements_en.json",

42 Chapter 7. Localization


"hu": "procurements_hu.json",})

To get translated version of a model:

model_en = model.translate("en")model_hu = model.translate("hu")

Or you can get translated version of the model by directly passing translation dictionary:

handle = open("procurements_en.json")trans = json.load(handle)handle.close()

model_en = model.translate("en", trans)

7.2 Data Localization

If you have attributes that needs to be localized, specify the locales (languages) in the attribute definition inPhysicalAttributeMappings.

Note: Data localization is implemented only for Relational/SQL backend.

7.3 Localized Reporting

Main point of localized reporting is: Create query once, reuse for any language. Provide translated model anddesired locale to the aggregation browser and you are set. The browser takes care of appropriate value selection.

Aggregating, drilling, getting list of facts - all methods return localized data based on locale provided to thebrowser. If you want to get multiple languages at the same time, you have to create one browser for each languageyou are reporting.

7.2. Data Localization 43


44 Chapter 7. Localization

CHAPTER

EIGHT

OLAP SERVER

Cubes framework provides easy to install web service WSGI server with API that covers most of the Cubes logicalmodel metadata and aggregation browsing functionality.

Note: Server requires the Werkzeug framework.

For more information about how to run the server programmatically, please refer to the server module.

8.1 HTTP API

8.1.1 Model

GET /model Get model metadata as JSON. In addition to standard model attributes a locales key is addedwith list of available model locales.

GET /model/cubes Get list of cubes.

GET /model/cube/<name>/dimensions Get list of cube’s dimensions.

GET /model/dimension/<name> Get dimension metadata as JSON

GET /locales Get list of model locales

8.1.2 Browsing and Aggregation

Cube API calls have format: /cube/<cube_name>/<browser_action> where the browser action mightbe aggregate, facts, fact, dimension and report.

If the model contains only one cube or default cube name is specified in the configuration, then the/cube/<cube> part might be omitted and you can write only requests like /aggregate.

GET /cube/<cube>/aggregate Return aggregation result as JSON. The result will contain keys: summaryand drilldown. The summary contains one row and represents aggregation of whole cell specified in the cut.The drilldown contains rows for each value of drilled-down dimension.

If no arguments are given, then whole cube is aggregated.

Parameters:

• cut - specification of cell, for example: cut=date:2004,1|category:2|entity:12345

• drilldown - dimension to be drilled down. For example drilldown=date will give rowsfor each value of next level of dimension date. You can explicitly specify level to drilldown in form: dimension:level, such as: drilldown=date:month. To specify ahierarchy use dimension@hierarchy as in drilldown=date@ywd for implicit level ordrilldown=date@ywd:week to explicitly specify level.

45



• page - page number for paginated results

• pagesize - size of a page for paginated results

• order - list of attributes to be ordered by

• limit - limit number of results in form limit‘[,‘measure‘[,‘order_direction]]:limit=5:received_amount_sum:asc (this might not be implemented in all backends)

Reply:

A dictionary with keys:

• summary - dictionary of fields/values for summary aggregation

• drilldown - list of drilled-down cells

• total_cell_count - number of total cells in drilldown (after limir, before pagination)

• cell - dictionary representation of the query cell

Example:

{"summary": {

"record_count": 32,"amount_sum": 558430

}"drilldown": [

{"record_count": 16,"amount_sum": 275420,"year": 2009

},{

"record_count": 16,"amount_sum": 283010,"year": 2010

}],"total_cell_count": 2,"cell": [

{"path": [

"a"],"type": "point","dimension": "item","level_depth": 1

}],

}

If pagination is used, then drilldown will not contain more than pagesize cells.

Note that not all backengs might implement total_cell_count or providing this information can beconfigurable therefore might be disabled (for example for performance reasons).

GET /cube/<cube>/facts Return all facts within a cell.

Parameters:

• cut - see /aggregate

• page, pagesize - paginate results

• order - order results

• format - result format: json (default; see note below), csv

46 Chapter 8. OLAP Server


• fields - comma separated list of fact fields, by default all fields are returned

Note: Number of facts in JSON is limited to configuration value of json_record_limit, which is1000 by default. To get more records, either use pages with size less than record limit or use alternate resultformat, such as csv.

GET /cube/<cube>/fact/<id> Get single fact with specified id. For example: /fact/1024

GET /cube/<cube>/dimension/<dimension> Get values for attributes of a dimension.

Parameters:


• depth - specify depth (number of levels) to retrieve. If not specified, then all levels are returned

• page, pagesize - paginate results

• order - order results

Response: dictionary with keys dimension – dimension name, depth – level depth and data – list ofrecords.

Example for /dimension/item?depth=1:

{"dimension": "item""depth": 1,"data": [

{"item.category": "a","item.category_label": "Assets"

},{

"item.category": "e","item.category_label": "Equity"

},{

"item.category": "l","item.category_label": "Liabilities"

}],

}

GET /cube/<cube>/cell Get details for a cell.

Parameters:


Response: a dictionary representation of a cell (see cubes.Cell.as_dict()) with keys cube andcuts. cube is cube name and cuts is a list of dictionary representations of cuts.

Each cut is represented as:

{// Cut type is one of: "point", "range" or "set""type": cut_type,

"dimension": cut_dimension_name,"level_depth": maximal_depth_of_the_cut,

// Cut type specific keys.

// Point cut:"path": [ ... ],"details": [ ... ]

8.1. HTTP API 47


// Range cut:"from": [ ... ],"to": [ ... ],"details": { "from": [...], "to": [...] }

// Set cut:"paths": [ [...], [...], ... ],"details": [ [...], [...], ... ]

}

Each element of the details path contains dimension attributes for the corresponding level. In addition incontains two more keys: _key and _label which (redundantly) contain values of key attribute and labelattribute respectively.

Example for /cell?cut=item:a in the hello_world example:

{"cube": "irbd_balance","cuts": [

{"type": "point","dimension": "item","level_depth": 1"path": ["a"],"details": [

{"item.category": "a","item.category_label": "Assets","_key": "a","_label": "Assets"

}],

}]

}

GET /cube/<cube>/report Process multiple request within one API call. The data should be a JSONcontaining report specification where keys are names of queries and values are dictionaries describing thequeries.

report expects Content-type header to be set to application/json.

See Reports for more information.

GET /cube/<cube>/search/dimension/<dimension>/<query> Search values of dimensions forquery. If dimension is _all then all dimensions are searched. Returns search results as list of dictionarieswith attributes:

Search result

• dimension - dimension name

• level - level name

• depth - level depth

• level_key - value of key attribute for level

• attribute - dimension attribute name where searched value was found

• value - value of dimension attribute that matches search query

• path - dimension hierarchy path to the found value

• level_label - label for dimension level (value of label_attribute for level)



Warning: Not yet fully implemented, just proposal.

Note: Requires a search backend to be installed.

Parameters that can be used in any request:

• prettyprint - if set to true, space indentation is added to the JSON output

8.1.3 Cuts in URLs

The cell - part of the cube we are aggregating or we are interested in - is specified by cuts. The cut in URL aregiven as single parameter cut which has following format:

Examples:

date:2004date:2004,1date:2004,1|class:5date:2004,1,1|category:5,10,12|class:5

To specify a range where keys are sortable:

date:2004-2005date:2004,1-2005,5

Open range:

date:2004,1,1-date:-2005,5,10

Dimension name is followed by colon :, each dimension cut is separated by |, and path for dimension levels isseparated by a comma ,. Or in more formal way, here is the BNF for the cut:

<list> ::= <cut> | <cut> ’|’ <list><cut> ::= <dimension> ’:’ <path><dimension> ::= <identifier><path> ::= <value> | <value> ’,’ <path>

Note: Why dimension names are not URL parameters? This prevents conflict from other possible frequent URLparameters that might modify page content/API result, such as type, form, source.

To specify other than default hierarchy use format dimension@hierarchy, the path then should contain values forspecified hierarchy levels:

date@ywd:2004,25

Following image contains examples of cuts in URLs and how they change by browsing cube aggregates:

8.2 Reports

Report queries are done either by specifying a report name in the request URL or using HTTP POST request whereposted data are JSON with report specification.

Keys:

• queries - dictionary of named queries

8.2. Reports 49


Figure 8.1: Example of how cuts in URL work and how they should be used in application view templates.



Query specification should contain at least one key: query - which is query type: aggregate, cell_details,values (for dimension values), facts or fact (for multiple or single fact respectively). The rest of keys arequery dependent. For more information see AggregationBrowser documentation.

Note: Note that you have to set the content type to application/json.

Result is a dictionary where keys are the query names specified in report specification and values are result valuesfrom each query call.

Example report JSON file with two queries:

{"queries": {

"summary": {"query": "aggregate"

},"by_year": {

"query": "aggregate","drilldown": ["date"],"rollup": "date"

}}

}

Request:

curl -H "Content-Type: application/json" --data-binary "@report.json" \"http://localhost:5000/cube/contracts/report?prettyprint=true&cut=date:2004"

Reply:

{"by_year": {

"total_cell_count": 6,"drilldown": [

{"record_count": 4390,"requested_amount_sum": 2394804837.56,"received_amount_sum": 399136450.0,"date.year": "2004"

},...{

"record_count": 265,"requested_amount_sum": 17963333.75,"received_amount_sum": 6901530.0,"date.year": "2010"

}],"summary": {

"record_count": 33038,"requested_amount_sum": 2412768171.31,"received_amount_sum": 2166280591.0

}},"summary": {

"total_cell_count": null,"drilldown": {},"summary": {

"date.year": "2004","requested_amount_sum": 2394804837.56,"received_amount_sum": 399136450.0,"record_count": 4390

8.2. Reports 51


}}

}

Explicit specification of a cell (the cuts in the URL parameters are going to be ignored):

{"cell": [

{"dimension": "date","type": "range","from": [2010,9],"to": [2011,9]

}],"queries": {

"report": {"query": "aggregate","drilldown": {"date":"year"}

}}

}

8.2.1 Roll-up

Report queries might contain rollup specification which will result in “rolling-up” one or more dimensions todesired level. This functionality is provided for cases when you would like to report at higher level of aggregationthan the cell you provided is in. It works in similar way as drill down in serveraggregate but in the oppositedirection (it is like cd .. in a UNIX shell).

Example: You are reporting for year 2010, but you want to have a bar chart with all years. You specify rollup:..."rollup": "date",...

Roll-up can be:

• a string - single dimension to be rolled up one level

• an array - list of dimension names to be rolled-up one level

• a dictionary where keys are dimension names and values are levels to be rolled up-to

8.3 Running and Deployment

8.3.1 Local Server

To run your local server, prepare server configuration grants_config.ini:

[server]host: localhostport: 5000reload: yeslog_level: info

[workspace]url: postgres://localhost/mydata"

[model]path: grants_model.json



Run the server using the Slicer tool (see slicer - Command Line Tool):

slicer serve grants_config.ini

8.3.2 Apache mod_wsgi deployment

Deploying Cubes OLAP Web service server (for analytical API) can be done in four very simple steps:

1. Create server configuration json file

2. Create WSGI script

3. Prepare apache site configuration

4. Reload apache configuration

Create server configuration file procurements.ini:

[server]backend: sql.browser

[model]path: /path/to/model.json

[workspace]view_prefix: mft_schema: datamartsurl: postgres://localhost/transparency

[translations]en: /path/to/model-en.jsonhu: /path/to/model-hu.json

Note: the path in [model] has to be full path to the model, not relative to the configuration file.

Place the file in the same directory as the following WSGI script (for convenience).

Create a WSGI script /var/www/wsgi/olap/procurements.wsgi:

import os.pathimport cubes

CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))

# Set the configuration file name (and possibly whole path) hereCONFIG_PATH = os.path.join(CURRENT_DIR, "slicer.ini")

application = cubes.server.create_server(CONFIG_PATH)

Apache site configuration (for example in /etc/apache2/sites-enabled/):

<VirtualHost *:80>ServerName olap.democracyfarm.org

WSGIScriptAlias /vvo /var/www/wsgi/olap/procurements.wsgi

<Directory /var/www/wsgi/olap>WSGIProcessGroup olapWSGIApplicationGroup %{GLOBAL}Order deny,allowAllow from all

</Directory>

8.3. Running and Deployment 53


ErrorLog /var/log/apache2/olap.democracyfarm.org.error.logCustomLog /var/log/apache2/olap.democracyfarm.org.log combined

</VirtualHost>

Reload apache configuration:

sudo /etc/init.d/apache2 reload

And you are done.

8.3.3 Server requests

Example server request to get aggregate for whole cube:

$ curl http://localhost:5000/cube/procurements/aggregate?cut=date:2004

Reply:

{"drilldown": {},"summary": {

"received_amount_sum": 399136450.0,"requested_amount_sum": 2394804837.56,"record_count": 4390

}}

8.3.4 Configuration

Server configuration is stored in .ini files with sections:

• [server] - server related configuration, such as host, port

– backend - backend name, use sql for relational database backend

– log - path to a log file

– log_level - level of log details, from least to most: error, warn, info, debug

– json_record_limit - number of rows to limit when generating JSON output with iterableobjects, such as facts. Default is 1000. It is recommended to use alternate response format, suchas CSV, to get more records.

– modules - space separated list of modules to be loaded (only used if run by the slicer com-mand)

– prettyprint - default value of prettyprint parameter. Set to true for demonstrationpurposes.

– host - host where the server runs, defaults to localhost

– port - port on which the server listens, defaults to 5000

• [model] - model and cube configuration

– path - path to model .json file

– locales - comma separated list of locales the model is provided in. Currently this variable isoptional and it is used only by experimental sphinx search backend.

• [translations] - model translation files, option keys in this section are locale names and values arepaths to model translation files. See Localization for more information.



Backend workspace configuration should be in the [workspace]. See backends — Aggregation BrowsingBackends for more information.

Workspace with SQL backend (backend=sql in [server]) options:

• url (required) – database URL in form: adapter://user:password@host:port/database

• schema (optional) – schema containing denormalized views for relational DB cubes

• dimension_prefix (optional) – used by snowflake mapper to find dimension tables when no explicitmapping is specified

• dimension_schema – use this option when dimension tables are stored in different schema than the facttables

• fact_prefix (optional) – used by the snowflake mapper to find fact table for a cube, when no explicitfact table name is specified

• use_denormalization (optional) – browser will use dernormalized view instead of snowflake

• denormalized_view_prefix (optional, advanced) – if denormalization is used, then this prefix isadded for cube name to find corresponding cube view

• denormalized_view_schema (optional, advanced) – schema wehere denormalized views are located(use this if the views are in different schema than fact tables, otherwise default schema is going to be used)

Example configuration file:

[server]reload: yeslog: /var/log/cubes.loglog_level: infobackend: sql

[workspace]url: postgresql://localhost/dataschema: cubes

[model]path: ~/models/contracts_model.jsonlocales: en,sk

[translations]sk: ~/models/contracts_model-sk.json

8.3. Running and Deployment 55



CHAPTER

NINE

SLICER - COMMAND LINE TOOL

Cubes comes with a command line tool that can:

• run OLAP server

• build and compute cubes

• validate and translate models

Usage:

slicer command [command_options]

or:

slicer command sub_command [sub_command_options]

Commands are:

Command Descriptionserve Start OLAP servermodel validate Validates logical model for OLAP cubesmodel json Create JSON representation of a model (can be used) when model is a directory.extract_locale Get localizable part of the modeltranslate Translate model with translation filetest Test the model against backend database (experimental)ddl Generate DDL for SQL backend (experimental)

9.1 serve

Run Cubes OLAP HTTP server.

Example server configuration file slicer.ini:

[server]host: localhostport: 5000reload: yeslog_level: info

[workspace]url: sqlite:///tutorial.sqliteview_prefix: vft_

[model]path: models/model_04.json

To run local server:

57


slicer serve slicer.ini

In the [server] section, space separated list of modules to be imported can be specified under option modules:

[server]modules=cutom_backend...

For more information about OLAP HTTP server see OLAP Server

9.2 model validate

Usage:

slicer model validate /path/to/model/directoryslicer model validate model.jsonslicer model validate http://somesite.com/model.json

Optional arguments:

-d, --defaults show defaults-w, --no-warnings disable warnings-t TRANSLATION, --translation TRANSLATION

model translation file

For more information see Model Validation in Logical Model and Metadata

Example output:

loading model wdmmg_model.json-------------------------cubes: 1

wdmmgdimensions: 5

datepogregioncofogfrom

-------------------------found 3 issuesvalidation results:warning: No hierarchies in dimension ’date’, flat level ’year’ will be usedwarning: Level ’year’ in dimension ’date’ has no key attribute specifiedwarning: Level ’from’ in dimension ’from’ has no key attribute specified0 errors, 3 warnings

The tool output contains recommendation whether the model can be used:

• model can be used - if there are no errors, no warnings and no defaults used, mostly when the model isexplicitly described in every detail

• model can be used, make sure that defaults reflect reality - there are no errors, no warnings, but the modelmight be not complete and default assumptions are applied

• not recommended to use the model, some issues might emerge - there are just warnings, no validation errors.Some queries or any other operations might produce invalid or unexpected output

• model can not be used - model contain errors and it is unusable

58 Chapter 9. slicer - Command Line Tool


9.3 model json

For any given input model produce reusable JSON model.

9.4 model extract_locale

Extract localizable parts of the model. Use this before you start translating the model to get translation template.

9.5 model translate

Translate model using translation file:

slicer model translate model.json translation.json

9.6 ddl

Note: This is experimental command.

Generates DDL schema of a model for SQL backend

Usage:

slicer ddl [-h] [--dimension-prefix DIMENSION_PREFIX][--fact-prefix FACT_PREFIX] [--backend BACKEND]url model

positional arguments:

url SQL database connection URLmodel model reference - can be a local file path or URL

optional arguments:

--dimension-prefix DIMENSION_PREFIXprefix for dimension tables

--fact-prefix FACT_PREFIXprefix for fact tables

--backend BACKEND backend name (currently limited only to SQL backends)

9.7 denormalize

Usage:

slicer denormalize [-h] [-p PREFIX] [-f] [-m] [-i] [-s SCHEMA][-c CUBE] config

positional arguments:

config slicer confuguration .ini file

optional arguments:

9.3. model json 59


-h, --help show this help message and exit-p PREFIX, --prefix PREFIX

prefix for denormalized views (overrides config value)-f, --force replace existing views-m, --materialize create materialized view (table)-i, --index create index for key attributes-s SCHEMA, --schema SCHEMA

target view schema (overrides config value)-c CUBE, --cube CUBE cube(s) to be denormalized, if not specified then all

in the model

9.7.1 Examples

If you plan to use denormalized views, you have to specify it in the configuration in the [workspace] section:

[workspace]denormalized_view_prefix = mft_denormalized_view_schema = denorm_views

# This switch is used by the browser:use_denormalization = yes

The denormalization will create tables like denorm_views.mft_contracts for a cube namedcontracts. The browser will use the view if option use_denormalization is set to a true value.

Denormalize all cubes in the model:

slicer denormalize slicer.ini

Denormalize only one cube:

slicer denormalize -c contracts slicer.ini

Create materialized denormalized view with indexes:

slicer denormalize --materialize --index slicer.ini

Replace existing denormalized view of a cube:

slicer denormalize --force -c contracts slicer.ini

9.7.2 Schema

Schema where denormalized view is created is schema specified in the configuration file. Schema is shared withfact tables and views. If you want to have views in separate schema, specify denormalized_view_schemaoption in the configuration.

If for any specific reason you would like to denormalize into a completely different schema than specified in theconfiguration, you can specify it with the --schema option.

9.7.3 View name

By default, a view name is the same as corresponding cube name. If there is denormalized_view_prefixoption in the configuration, then the prefix is prepended to the cube name. Or it is possible to override the optionwith command line argument --prefix.

Note: The tool will not allow to create view if it’s name is the same as fact table name and is in the same schema.It is not even possible to --force it. A view prefix or different schema has to be specified.

60 Chapter 9. slicer - Command Line Tool

CHAPTER

TEN

REFERENCE

Contents:

10.1 Logical Model Reference

Model - Cubes meta-data objects and functionality for working with them. Logical Model and Metadata

Figure 10.1: The logical model classes.

10.1.1 Loading or creating a model

cubes.load_model(resource, translations=None)Load logical model from object reference. resource can be an URL, local file path or file-like object.

The path might be:

•JSON file with a dictionary describing model

•URL with a JSON dictionary

Raises ModelError when model description file does not contain a dictionary.

61


cubes.create_model(model, cubes=None, dimensions=None)Create a model from a model description dictionary in model. This is designated way of creating the modelfrom a dictionary.

cubes or dimensions are list of their respective dictionary definitions. If definition of a cube in cubes ordimension in dimensions already exists in the model, then ModelError is raised.

The model dictionary contains main model description. The structure is:

{"name": "public_procurements","label": "Procurements","description": "Procurement Contracts of an Organisation""cubes": [...]"dimensions": [...]

}

Note: Current implementation is the same as passing description to the Model class initialization. In thefuture all default constructions will be moved here and the initialization methods will require logical modelclass instances.

cubes.create_cube(desc, dimensions)Creates a Cube instance from dictionary description desc with dimension dictionary in dimensions

In file based model representation, the cube descriptions are stored in json files with prefix cube_ likecube_contracts, or as a dictionary for key cubes in the model description dictionary.

JSON example:

{"name": "contracts","measures": ["amount"],"dimensions": [ "date", "contractor", "type"]"details": ["contract_name"],

}

cubes.create_dimension(obj, dimensions=None)Creates a Dimension instance from obj which can be a Dimension instance or a string or a dictionary. If itis a string, then it represents dimension name, the only level name and the only attribute.

Keys of a dictionary representation:

•name: dimension name

•levels: list of dimension levels (see: cubes.Level)

•hierarchies or hierarchy: list of dimension hierarchies or list of level names of a single hierarchy.Only one of the two should be specified, otherwise an exception is raised.

•default_hierarchy_name: name of a hierarchy that will be used when no hierarchy is explicitly speci-fied

•label: dimension name that will be displayed (human readable)

•description: human readable dimension description

•info - custom information dictionary, might be used to store application/front-end specific information(icon, color, ...)

•template – name of a dimension to be used as template. The dimension is taken from dimensionsargument which should be a dictionary of already created dimensions.

Defaults

•If no levels are specified during initialization, then dimension name is considered flat, with singleattribute.

62 Chapter 10. Reference


•If no hierarchy is specified and levels are specified, then default hierarchy will be created from orderof levels

•If no levels are specified, then one level is created, with name default and dimension will be consideredflat

String representation of a dimension str(dimension) is equal to dimension name.

Class is not meant to be mutable.

Raises ModelInconsistencyError when both hierarchy and hierarchies is specified.

cubes.create_level(obj)Creates a level from obj which can be a Level instance, string or a dictionary. If it is a string, then the stringrepresents level name and the only one attribute of the level. If obj is a dictionary, then the keys are:

•name – level name

•attributes – list of level attributes

•key – name of key attribute

•label_attribute – name of label attribute

Defaults:

•if no attributes are specified, then one is created with the same name as the level name.

Simple Models

There might be cases where one would like to analyse simple (denormalised) table. It can be either a table in adatabase or data from a single CSV file. For convenience, there is an utility function called simple_model that willcreate a model with just one cube, simple dimensions and couple of specified measures.

cubes.simple_model(cube_name, dimensions, measures)Create a simple model with only one cube with name cube_name‘and flat dimensions. ‘dimensions is a listof dimension names as strings and measures is a list of measure names, also as strings. This is conveniencemethod mostly for quick model creation for denormalized views or tables with data from a single CSV file.

Example:

model = simple_model("contracts",dimensions=["year", "supplier", "subject"],measures=["amount"])

cube = model.cube("contracts")browser = workspace.create_browser(cube)

10.1.2 Model components

Model

class cubes.Model(name=None, cubes=None, dimensions=None, locale=None, label=None, descrip-tion=None, info=None, **kwargs)

Logical Model represents analysts point of view on data.

Attributes:

•name - model name

•cubes - list of Cube instances

•dimensions - list of Dimension instances

•locale - locale code of the model

•label - human readable name - can be used in an application

10.1. Logical Model Reference 63


•description - longer human-readable description of the model

•info - custom information dictionary

add_cube(cube)Adds cube to the model and also assigns the model to the cube. If cube has a model assigned and it isnot this model, then error is raised.

Raises ModelInconsistencyError when trying to assing a cube that is already assigned to a differentmodel or if trying to add a dimension with existing name but different specification.

add_dimension(dimension)Add dimension to model. Replace dimension with same name

cube(cube)Get a cube with name name or coalesce object to a cube.

dimension(dim)Get dimension by name or by object. Raises NoSuchDimensionError when there is no dimension dim.

is_valid(strict=False)Check whether model is valid. Model is considered valid if there are no validation errors. If you wantto be sure that there are no warnings as well, set strict to True. If strict is False only errors areconsidered fatal, if True also warnings will make model invalid.

Returns True when model is valid, otherwise returns False.

localizable_dictionary()Get model locale dictionary - localizable parts of the model

localize(translation)Return localized version of model

remove_cube(cube)Removes cube from the model

remove_dimension(dimension)Remove a dimension from receiver

to_dict(**options)Return dictionary representation of the model. All object references within the dictionary are namebased

•expand_dimensions - if set to True then fully expand dimension information in cubes

•full_attribute_names - if set to True then attribute names will be written asdimension_name.attribute_name

validate()Validate the model, check for model consistency. Validation result is array of tuples in form: (valida-tion_result, message) where validation_result can be ‘warning’ or ‘error’.

Returs: array of tuples

Cube

class cubes.Cube(name, dimensions=None, measures=None, model=None, label=None, de-tails=None, mappings=None, joins=None, fact=None, key=None, descrip-tion=None, options=None, info=None, **kwargs)

Create a new Cube model.

Attributes:

•name: cube name

•measures: list of measure attributes

•dimensions: list of dimensions (should be Dimension instances)



•model: model the cube belongs to

•label: human readable cube label

•details: list of detail attributes

•description - human readable description of the cube

•key: fact key field (if not specified, then backend default key will be used, mostly id for SLQ or _idfor document based databases)

•info - custom information dictionary, might be used to store application/front-end specific information

Attributes used by backends:

•mappings - backend-specific logical to physical mapping dictionary

•joins - backend-specific join specification (used in SQL backend)

•fact - fact dataset (table) name (physical reference)

•options - dictionary of other options used by the backend - refer to the backend documentation to seewhat options are used (for example SQL browser might look here for denormalized_view in caseof denormalized browsing)

add_dimension(dimension)Add dimension to cube. Replace dimension with same name. Raises ModelInconsistencyError whendimension with same name already exists in the receiver.

dimension(obj)Get dimension object. If obj is a string, then dimension with given name is returned, otherwise dimen-sion object is returned if it belongs to the cube.

Raises NoSuchDimensionError when there is no such dimension.

measure(obj)Get measure object. If obj is a string, then measure with given name is returned, otherwise measureobject is returned if it belongs to the cube. Returned object is of Attribute type.

Raises NoSuchAttributeError when there is no such measure or when there are multiple measures withthe same name (which also means that the model is not valid).

remove_dimension(dimension)Remove a dimension from receiver. dimension can be either dimension name or dimension object.

to_dict(expand_dimensions=False, with_mappings=True, **options)Convert to a dictionary. If expand_dimensions is True (default is False) then fully expand dimen-sion information If with_mappings is True (which is default) then joins, mappings, fact and optionsare included. Should be set to False when returning a dictionary that will be provided in an userinterface or through server API.

validate()Validate cube. See Model.validate() for more information.

Dimension, Hierarchy, Level

class cubes.Dimension(name, levels, hierarchies=None, default_hierarchy_name=None, la-bel=None, description=None, info=None, **desc)

Create a new dimension

Attributes:

•name: dimension name

•levels: list of dimension levels (see: cubes.Level)

•hierarchies: list of dimension hierarchies. If no hierarchies are specified, then default one is createdfrom ordered list of levels.



•default_hierarchy_name: name of a hierarchy that will be used when no hierarchy is explicitly speci-fied

•label: dimension name that will be displayed (human readable)

•description: human readable dimension description

•info - custom information dictionary, might be used to store application/front-end specific information(icon, color, ...)

Dimension class is not meant to be mutable. All level attributes will have new dimension assigned.

Note that the dimension will claim ownership of levels and their attributes. You should make sure that youpass a copy of levels if you are cloning another dimension.

all_attributes()Return all dimension attributes regardless of hierarchy. Order is not guaranteed, usecubes.Hierarchy.all_attributes() to get known order. Order of attributes within level ispreserved.

attribute(reference)Get dimension attribute from reference.

default_hierarchyGet default hierarchy specified by default_hierarchy_name, if the variable is not set then geta hierarchy with name default

Warning: Depreciated. Use Dimension.hierarchy() instead.

has_detailsReturns True when each level has only one attribute, usually key.

hierarchy(obj=None)Get hierarchy object either by name or as Hierarchy. If obj is None then default hierarchy is returned.

is_flatIs true if dimension has only one level

key_attributes()Return all dimension key attributes, regardless of hierarchy. Order is not guaranteed, use a hierarchyto have known order.

level(obj)Get level by name or as Level object. This method is used for coalescing value

level_namesGet list of level names. Order is not guaranteed, use a hierarchy to have known order.

levelsGet list of all dimension levels. Order is not guaranteed, use a hierarchy to have known order.

to_dict(**options)Return dictionary representation of the dimension

validate()Validate dimension. See Model.validate() for more information.

class cubes.Hierarchy(name, levels, dimension=None, label=None, info=None)Dimension hierarchy - specifies order of dimension levels.

Attributes:

•name: hierarchy name

•dimension: dimension the hierarchy belongs to

•label: human readable name

•levels: ordered list of levels or level names from dimension




Some collection operations might be used, such as level in hierarchy or hierarchy[index].String value str(hierarchy) gives the hierarchy name.

all_attributes()Return all dimension attributes as a single list.

is_last(level)Returns True if level is last level of the hierarchy.

key_attributes()Return all dimension key attributes as a single list.

level_index(level)Get order index of level. Can be used for ordering and comparing levels within hierarchy.

levels_for_depth(depth, drilldown=False)Returns levels for given depth. If path is longer than hierarchy levels, cubes.ArgumentError exceptionis raised

levels_for_path(path, drilldown=False)Returns levels for given path. If path is longer than hierarchy levels, cubes.ArgumentError exceptionis raised

next_level(level)Returns next level in hierarchy after level. If level is last level, returns None. If level is None, thenthe first level is returned.

path_is_base(path)Returns True if path is base path for the hierarchy. Base path is a path where there are no more levelsto be added - no drill down possible.

previous_level(level)Returns previous level in hierarchy after level. If level is first level or None, returns None

rollup(path, level=None)Rolls-up the path to the level. If level is None then path is rolled-up only one level.

If level is deeper than last level of path the cubes.HierarchyError exception is raised. If level is thesame as path level, nothing happens.

to_dict(**options)Convert to dictionary. Keys:

•name: hierarchy name

•label: human readable label (localizable)

•levels: level names

class cubes.Level(name, attributes, dimension=None, key=None, sort_key=None, la-bel_attribute=None, label=None, info=None)

Object representing a hierarchy level. Holds all level attributes.

This object is immutable, except localization. You have to set up all attributes in the initialisation process.

Attributes:

•name: level name

•dimension: dimnesion the level is associated with

•attributes: list of all level attributes. Raises ModelError when attribute list is empty.

•key: name of level key attribute (for example: customer_number for customer level,region_code for region level, month for month level). key will be used as a grouping field foraggregations. Key should be unique within level. If not specified, then the first attribute is used as key.

•sort_key: name of attribute that is going to be used for sorting



•label_attribute: name of attribute containing label to be displayed (for example: customer_namefor customer level, region_name for region level, month_name for month level)

•label: human readable label of the level

•info: custom information dictionary, might be used to store application/front-end specific information

attribute(name)Get attribute by name

has_detailsIs True when level has more than one attribute, for all levels with only one attribute it is False.

to_dict(full_attribute_names=False, **options)Convert to dictionary

class cubes.Attribute(name, label=None, locales=None, order=None, description=None, dimen-sion=None, aggregations=None, info=None, format=None, **kwargs)

Cube attribute - represents any fact field/column

Attributes:

•name - attribute name, used as identifier

•label - attribute label displayed to a user

•locales = list of locales that the attribute is localized to

•order - default order of this attribute. If not specified, then order is unexpected. Possible values are:’asc’ or ’desc’. It is recommended and safe to use Attribute.ASC and Attribute.DESC

•aggregations - list of default aggregations to be performed on this attribute if it is a measure. It isbackend-specific, but most common might be: ’sum’, ’min’, ’max’, ...


•format - application-specific display format information, useful for formatting numeric values of mea-sure attributes

String representation of the Attribute returns its name (without dimension prefix).

cubes.ArgumentError is raised when unknown ordering type is specified.

full_name(dimension=None, locale=None, simplify=True)Return full name of an attribute as if it was part of dimension. Append locale if it is one of of attribute’slocales, otherwise raise cubes.ArgumentError.

ref(simplify=True, locale=None)Return full attribute reference. Append locale if it is one of of attribute’s locales, otherwise raisecubes.ArgumentError. If simplify is True, then reference to an attribute of flat dimension withoutdetails will be just the dimension name.

Warning: This method might be renamed.

Helper function to coalesce list of attributes, which can be provided as strings or as Attribute objects:

cubes.attribute_list(attributes, dimension=None, attribute_class=None)Create a list of attributes from a list of strings or dictionaries. see cubes.coalesce_attribute()for more information.

exception ModelErrorException raised when there is an error with model and its structure, mostly during model construction.

exception ModelIncosistencyErrorRaised when there is incosistency in model structure, mostly when model was created programatically in awrong way by mismatching classes or misonfiguration.

exception NoSuchDimensionErrorRaised when a dimension is requested that does not exist in the model.



exception NoSuchAttributeErrorRaised when an unknown attribute, measure or detail requested.

10.2 Aggregation Browser Reference

Abstraction for aggregated browsing (concrete implementation is provided by one of the backends in packagebackend or a custom backend).

Figure 10.2: Browser package classes.

10.2.1 Aggregate browsing

class cubes.AggregationBrowser(cube)Class for browsing data cube aggregations

Attributes

• cube - cube for browsing

aggregate(cell=None, measures=None, drilldown=None, **options)Return aggregate of a cell.

Subclasses of aggregation browser should implement this method.

Attributes

•drilldown - dimensions and levels through which to drill-down, default None

•measures - list of measures to be aggregated. By default all measures are aggregated.

Drill down can be specified in two ways: as a list of dimensions or as a dictionary. If it is specified aslist of dimensions, then cell is going to be drilled down on the next level of specified dimension. Sayyou have a cell for year 2010 and you want to drill down by months, then you specify drilldown= ["date"].

If drilldown is a dictionary, then key is dimension or dimension name and value is last level to bedrilled-down by. If the cell is at year level and drill down is: { "date": "day" } then bothmonth and day levels are added.

10.2. Aggregation Browser Reference 69


If there are no more levels to be drilled down, an exception is raised. Say your model has three levelsof the date dimension: year, month, day and you try to drill down by date at the next level thenValueError will be raised.

Retruns a :class:AggregationResult object.

cell_details(cell=None, dimension=None)Returns details for the cell. Returned object is a list with one element for each cell cut. If dimensionis specified, then details only for cuts that use the dimension are returned.

Default implemenatation calls AggregationBrowser.cut_details() for each cut. Backends might cus-tomize this method to make it more efficient.

cut_details(cut)Returns details for a cut which should be a Cut instance.

•PointCut - all attributes for each level in the path

•SetCut - list of PointCut results, one per path in the set

•RangeCut - PointCut-like results for lower range (from) and upper range (to)

Default implemenatation uses AggregationBrowser.values() for each path. Backends might customizethis method to make it more efficient.

dimension_object(dimension)Helper function to return proper dimension object as a subclass of Dimension.

Warning: Depreciated. Use cubes.Cube.dimension()

Arguments:

•dimension - a dimension object or a string, if it is a string, then dimension object is retrieved fromcube

fact(key)Returns a single fact from cube specified by fact key key

facts(cell=None, **options)Return an iterable object with of all facts within cell

features = []List of browser features as strings.

report(cell, queries)Bundle multiple requests from queries into a single one.

Keys of queries are custom names of queries which caller can later use to retrieve respective queryresult. Values are dictionaries specifying arguments of the particular query. Each query should containat least one required value query which contains name of the query function: aggregate, facts,fact, values and cell cell (for cell details). Rest of values are function specific, please refer tothe respective function documentation for more information.

Example:

queries = {"product_summary" = { "query": "aggregate",

"drilldown": "product" }"year_list" = { "query": "values",

"dimension": "date","depth": 1 }

}

Result is a dictionary where keys wil lbe the query names specified in report specification and valueswill be result values from each query call.:



result = browser.report(cell, queries)product_summary = result["product_summary"]year_list = result["year_list"]

This method provides convenient way to perform multiple common queries at once, for example youmight want to have always on a page: total transaction count, total transaction amount, drill-down byyear and drill-down by transaction type.

Raises cubes.ArgumentError when there are no queries specified or if a query is of unknown type.

Roll-up

Report queries might contain rollup specification which will result in “rolling-up” one or moredimensions to desired level. This functionality is provided for cases when you would like to report athigher level of aggregation than the cell you provided is in. It works in similar way as drill down inAggregationBrowser.aggregate() but in the opposite direction (it is like cd .. in a UNIXshell).

Example: You are reporting for year 2010, but you want to have a bar chart with all years. You specifyrollup:..."rollup": "date",...

Roll-up can be:

•a string - single dimension to be rolled up one level

•an array - list of dimension names to be rolled-up one level

•a dictionary where keys are dimension names and values are levels to be rolled up-to

Future

In the future there might be optimisations added to this method, therefore it will become faster thansubsequent separate requests. Also when used with Slicer OLAP service server number of HTTP calloverhead is reduced.

values(cell, dimension, depth=None, paths=None, hierarchy=None, **options)Return values for dimension with level depth depth. If depth is None, all levels are returned.

Note: Some backends might support only default hierarchy.

Result

The result of aggregated browsing is returned as object:

class cubes.AggregationResultResult of aggregation or drill down.

Attributes:

•cell – cell that this result is aggregate of

•summary - dictionary of summary row fields

•cells - list of cells that were drilled-down

•total_cell_count - number of total cells in drill-down (after limit, before pagination)

•measures – measures that were selected in aggregation

•remainder - summary of remaining cells (not yet implemented)

•levels – aggregation levels for dimensions that were used to drilldown



Note: Implementors of aggregation browsers should populate cell, measures and levels from the aggregatequery.

cached()Return shallow copy of the receiver with cached cells. If cells are an iterator, they are all fetched in alist.

cross_table(onrows, oncolumns, measures=None)Creates a cross table from result’s cells. onrows contains list of attribute names to be placed at rowsand oncolumns contains list of attribute names to be placet at columns. measures is a list of measuresto be put into cells. If measures are not specified, then only record_count is used.

Returns a named tuble with attributes:

•columns - labels of columns. The tuples correspond to values of attributes in oncolumns.

•rows - labels of rows as list of tuples. The tuples correspond to values of attributes in onrows.

•data - list of measure data per row. Each row is a list of measure tuples as specified in measures.

Warning: Experimental implementation. Interface might change - either arguments or resultobject.

table_rows(dimension, depth=None)Returns iterator of drilled-down rows which yields a named tuple with named attributes: (key, label,path, record). depth is last level of interest. If not specified (set to None) then deepest level fordimension is used.

•key: value of key dimension attribute at level of interest

•label: value of label dimension attribute at level of interest

•path: full path for the drilled-down cell

•is_base: True when dimension element is base (can not drill down more)

•record: all drill-down attributes of the cell

Example use:

for row in result.table_rows(dimension):print "%s: %s" % (row.label, row.record["record_count"])

dimension has to be cubes.Dimension object. Raises TypeError when cut for dimension is notPointCut.

to_dict()Return dictionary representation of the aggregation result. Can be used for JSON serialisation.

10.2.2 Slicing and Dicing

class cubes.Cell(cube=None, cuts=[])Part of a cube determined by slicing dimensions. Immutable object.

cut_for_dimension(dimension)Return first found cut for given dimension

drilldown(dimension, value)Create another cell by drilling down dimension next level on current level’s key value.

Example:



cell = cubes.Cell(cube)cell = cell.drilldown("date", 2010)cell = cell.drilldown("date", 1)

is equivalent to:

cut = cubes.PointCut(“date”, [2010, 1]) cell = cubes.Cell(cube, [cut])

Reverse operation is cubes.rollup("date")

Works only if the cut for dimension is PointCut. Otherwise the behaviour is undefined.

Returns: new derived cell object.

is_base(dimension, hierarchy=None)Returns True when cell is base cell for dimension. Cell is base if there is a point cut with pathreferring to the most detailed level of the dimension hierarchy.

level_depths()Returns a dictionary of dimension names as keys and level depths (index of deepest level).

multi_slice(cuts)Create another cell by slicing through multiple slices. cuts is a list of Cut object instances. See alsoCell.slice().

point_cut_for_dimension(dimension)Return first point cut for given dimension

point_slice(dimension, path)Create another cell by slicing receiving cell through dimension at path. Receiving object is not modi-fied. If cut with dimension exists it is replaced with new one. If path is empty list or is none, then cutfor given dimension is removed.

Example:

full_cube = Cell(cube)contracts_2010 = full_cube.point_slice("date", [2010])

Returns: new derived cell object.

Warning: Depreiated. Use cell.slice() instead with argument PointCut(dimension, path)

rollup(rollup)Rolls-up cell - goes one or more levels up through dimension hierarchy. It works in similar way asdrill down in AggregationBrowser.aggregate() but in the opposite direction (it is like cd.. in a UNIX shell).

Roll-up can be:

•a string - single dimension to be rolled up one level

•an array - list of dimension names to be rolled-up one level

•a dictionary where keys are dimension names and values are levels to be rolled up-to

Note: Only default hierarchy is currently supported.

rollup_dim(dimension, level=None, hierarchy=None)Rolls-up cell - goes one or more levels up through dimension hierarchy. If there is no level to go up(we are at the top level), then the cut is removed.

Returns new cell object.

Note: Only default hierarchy is currently supported.



slice(cut, dummy=None)Returns new cell by slicing receiving cell with cut. Cut with same dimension as cut will be replaced,if there is no cut with the same dimension, then the cut will be appended.

to_dict()Returns a dictionary representation of the cell

to_str()Return string representation of the cell by using standard cuts-to-string conversion.

Cuts

class cubes.PointCut(dimension, path, hierarchy=None)Object describing way of slicing a cube (cell) through point in a dimension

level_depth()Returns index of deepest level.

to_dict()Returns dictionary representation of the receiver. The keys are: dimension, type‘=‘‘point‘ and path.

class cubes.RangeCut(dimension, from_path, to_path, hierarchy=None)Object describing way of slicing a cube (cell) between two points of a dimension that has ordered points.For dimensions with unordered points behaviour is unknown.

level_depth()Returns index of deepest level which is equivalent to the longest path.

to_dict()Returns dictionary representation of the receiver. The keys are: dimension, type‘=‘‘range‘, from andto paths.

class cubes.SetCut(dimension, paths, hierarchy=None)Object describing way of slicing a cube (cell) between two points of a dimension that has ordered points.For dimensions with unordered points behaviour is unknown.

level_depth()Returns index of deepest level which is equivalent to the longest path.

to_dict()Returns dictionary representation of the receiver. The keys are: dimension, type‘=‘‘range‘ and set asa list of paths.

String conversions

In applications where slicing and dicing can be specified in form of a string, such as arguments of HTTP requestsof an web application, there are couple helper methods that do the string-to-object conversion:

cubes.cuts_from_string(string)Return list of cuts specified in string. You can use this function to parse cuts encoded in a URL.

Examples:

date:2004date:2004,1date:2004,1|class=5date:2004,1,1|category:5,10,12|class:5

Ranges are in form from-to with possibility of open range:

date:2004-2010date:2004,5-2010,3date:2004,5-2010



date:2004,5-date:-2010

Sets are in form path1;path2;path3 (none of the paths should be empty):

date:2004;2010date:2004;2005,1;2010,10

Grammar:

<list> ::= <cut> | <cut> ’|’ <list><cut> ::= <dimension> ’:’ <path><dimension> ::= <identifier><path> ::= <value> | <value> ’,’ <path>

The characters ‘|’, ‘:’ and ‘,’ are configured in CUT_STRING_SEPARATOR, DIMEN-SION_STRING_SEPARATOR, PATH_STRING_SEPARATOR respectively.

cubes.string_from_cuts(cuts)Returns a string represeting cuts. String can be used in URLs

cubes.string_from_path(path)Returns a string representing dimension path. If path is None or empty, then returns empty string. The ptahelements are comma , spearated.

Raises ValueError when path elements contain characters that are not allowed in path element (alphanumericand underscore _).

cubes.path_from_string(string)Returns a dimension point path from string. The path elements are separated by comma , character.

Returns an empty list when string is empty or None.

cubes.levels_from_drilldown(cell, drilldown)Converts drilldown into a list of levels to be used to drill down. drilldown can be:

•list of dimensions

•list of dimension level specifier strings (dimension@hierarchy:level)

•list of tuples in form (dimension, hierarchy, level).

If drilldown is a list of dimensions or if the level is not specified, then next level in the cell is considered.The implicit next level is determined from a ‘PointCut for dimension in the cell.

For other types of cuts, such as range or set, “next” level is the first level of hierarachy.

Returns a list of tuples: (dimension, levels) where levels is a list of levels to be drilled down.

Note: For backward compatibility the function accepts a dictionary where keys are dimension names andvalues are level names to drill up to. This is argument format is depreciated.

Mapper

class cubes.Mapper(cube, locale=None, schema=None, fact_name=None, **options)Abstract class for mappers which maps logical references to physical references (tables and columns).

Attributes:

•cube - mapped cube

•simplify_dimension_references – references for flat dimensions (with one level and no details) willbe just dimension names, no attribute name. Might be useful when using single-table schema, forexample, with couple of one-column dimensions.

•fact_name – fact name, if not specified then cube.name is used



•schema – default database schema

all_attributes(expand_locales=False)Return a list of all attributes of a cube. If expand_locales is True, then localized logical reference isreturned for each attribute’s locale.

attribute(name)Returns an attribute with logical reference name.

logical(attribute, locale=None)Returns logical reference as string for attribute in dimension. If dimension is Null then fact table isassumed. The logical reference might have following forms:

•dimension.attribute - dimension attribute

•attribute - fact measure or detail

If simplify_dimension_references is True then references for flat dimensios without details is dimen-sion.

If locale is specified, then locale is added to the reference. This is used by backends and other mappers,it has no real use in end-user browsing.

map_attributes(attributes, expand_locales=False)Convert attributes to physical attributes. If expand_locales is True then physical reference for everyattribute locale is returned.

physical(attribute, locale=None)Returns physical reference as tuple for attribute, which should be an instance ofcubes.model.Attribute. If there is no dimension specified in attribute, then fact tableis assumed. The returned tuple has structure: (schema, table, column).

This method should be implemented by Mapper subclasses.

relevant_joins(attributes)Get relevant joins to the attributes - list of joins that are required to be able to acces specified attributes.attributes is a list of three element tuples: (schema, table, attribute).

Subclasses sohuld implement this method.

split_logical(reference)Returns tuple (dimension, attribute) from logical_reference string. Syntax of the string is:dimensions.attribute.

class cubes.SnowflakeMapper(cube, mappings=None, locale=None, schema=None,fact_name=None, dimension_prefix=None, joins=None, di-mension_schema=None, **options)

A snowflake schema mapper for a cube. The mapper creates required joins, resolves table names and mapslogical references to tables and respective columns.

Attributes:

•cube - mapped cube

•mappings – dictionary containing mappings

•simplify_dimension_references – references for flat dimensions (with one level and no details) willbe just dimension names, no attribute name. Might be useful when using single-table schema, forexample, with couple of one-column dimensions.

•dimension_prefix – default prefix of dimension tables, if default table name is used in physical refer-ence construction


•schema – default database schema

•dimension_prefix – prefix for dimension tables

•dimension_schema – schema whre dimension tables are stored (if different than fact table schema)



mappings is a dictionary where keys are logical attribute references and values are table column references.The keys are mostly in the form:

•attribute for measures and fact details

•attribute.locale for localized fact details

•dimension.attribute for dimension attributes

•dimension.attribute.locale for localized dimension attributes

The values might be specified as strings in the form table.column (covering most of the cases) or as adictionary with keys schema, table and column for more customized references.

physical(attribute, locale=None)Returns physical reference as tuple for attribute, which should be an instance ofcubes.model.Attribute. If there is no dimension specified in attribute, then fact tableis assumed. The returned tuple has structure: (schema, table, column).

The algorithm to find physicl reference is as follows:

IF localization is requested:IF is attribute is localizable:

IF requested locale is one of attribute localesUSE requested locale

ELSEUSE default attribute locale

ELSEdo not localize

IF mappings exist:GET string for logical referenceIF locale:

append ’.’ and locale to the logical reference

IF mapping value exists for localized logical referenceUSE value as reference

IF no mappings OR no mapping was found:column name is attribute name

IF locale:append ’_’ and locale to the column name

IF dimension specified:# Example: ’date.year’ -> ’date.year’table name is dimension name

IF there is dimension table prefixuse the prefix for table name

ELSE (if no dimension is specified):# Example: ’date’ -> ’fact.date’table name is fact table name

relevant_joins(attributes)Get relevant joins to the attributes - list of joins that are required to be able to acces specified attributes.attributes is a list of three element tuples: (schema, table, attribute).

table_map()Return list of references to all tables. Keys are aliased tables: (schema, aliased_table_name) andvalues are real tables: (schema, table_name). Included is the fact table and all tables mentioned injoins.

To get list of all physical tables where aliased tablesare included only once:



finder = JoinFinder(cube, joins, fact_name)tables = set(finder.table_map().keys())

class cubes.DenormalizedMapper(cube, locale=None, schema=None, fact_name=None,denormalized_view_prefix=None, denormal-ized_view_schema=None, **options)

Creates a mapper for a cube that has data stored in a denormalized view/table.

Attributes:

•denormalized_view_prefix – default prefix used for constructing view name from cube name


•schema – schema where the denormalized view is stored

•fact_schema – database schema for the original fact table

physical(attribute, locale=None)Returns same name as localized logical reference.

relevant_joins(attributes)Returns an empty list. No joins are necessary for denormalized view.

10.3 backends — Aggregation Browsing Backends

10.3.1 SQL - Star

Workspace

cubes.backends.sql.workspace.create_workspace(model, **options)Create workspace for model with configuration in dictionary options. This method is used by the slicerserver.

The options are:

Required (one of the two, engine takes precedence):

•url - database URL in form of: backend://user:password@host:port/database

•engine - SQLAlchemy engine - either this or URL should be provided

Optional:

•schema - default schema, where all tables are located (if not explicitly stated otherwise)

•fact_prefix - used by the snowflake mapper to find fact table for a cube, when no explicit fact tablename is specified

•dimension_prefix - used by snowflake mapper to find dimension tables when no explicit mapping isspecified

•dimension_schema – schema where dimension tables are stored, if different than common schema.

Options for denormalized views:

•use_denormalization - browser will use dernormalized view instead of snowflake

•denormalized_view_prefix - if denormalization is used, then this prefix is added for cube name to findcorresponding cube view

•denormalized_view_schema - schema wehere denormalized views are located (use this if the viewsare in different schema than fact tables, otherwise default schema is going to be used)

class cubes.backends.sql.workspace.SQLStarWorkspace(model, engine, **options)Create a workspace. For description of options see create_workspace()



browser(cube, locale=None)Returns a browser for a cube.

browser_for_cube(cube, locale=None)Creates, configures and returns a browser for a cube.

Note: Use workspace.browser() instead.

create_conformed_rollup(cube, dimension, level=None, hierarchy=None, schema=None,dimension_prefix=None, replace=False)

Extracts dimension values at certain level into a separate table. The new table name will be com-posed of dimension_prefix, dimension name and suffixed by dimension level. For example a productdimension at category level with prefix dim_ will be called dim_product_category

Attributes:

•dimension – dimension to be extracted

•level – grain level

•hierarchy – hierarchy to be used

•schema – target schema

•dimension_prefix – prefix used for the dimension table

•replace – if True then existing table will be replaced, otherwise an exception is raised if tablealready exists.

create_conformed_rollups(cube, dimensions, grain=None, schema=None, dimen-sion_prefix=None, replace=False)

Extract multiple dimensions from a snowflake. See extract_dimension() for more information. grainis a dictionary where keys are dimension names and values are levels, if level is None then all levelsare considered.

create_cube_aggregate(cube, table_name=None, dimensions=None, re-quired_dimensions=None, schema=None, replace=False)

Creates an aggregate table. If dimensions is None then all cube’s dimensions are considered.

Arguments:

•dimensions: list of dimensions to use in the aggregated cuboid, if None then all cube dimensionsare used

•required_dimensions: list of dimensions that are required for each aggregation (for example adate dimension in most of the cases). The list should be a subsed of dimensions.

•aggregates_prefix: aggregated table prefix

•aggregates_schema: schema where aggregates are stored

create_denormalized_view(cube, view_name=None, materialize=False, replace=False,create_index=False, keys_only=False, schema=None)

Creates a denormalized view named view_name of a cube. If view_name is None then view name isconstructed by pre-pending value of denormalized_view_prefix from workspace options to the cubename. If no prefix is specified in the options, then view name will be equal to the cube name.

Options:

•materialize - whether the view is materialized (a table) or regular view

•replace - if True then existing table/view will be replaced, otherwise an exception is raised whentrying to create view/table with already existing name

•create_index - if True then index is created for each key attribute. Can be used only on material-ized view, otherwise raises an exception

•keys_only - if True then only key attributes are used in the view, all other detail attributes areignored

10.3. backends — Aggregation Browsing Backends 79


•schema - target schema of the denormalized view, if not specified, then denormal-ized_view_schema from options is used if specified, otherwise default workspace schema is used(same schema as fact table schema).

validate_model()Validate physical representation of model. Returns a list of dictionaries with keys: type, issue,object.

Types might be: join or attribute.

The join issues are:

•no_table - there is no table for join

•duplicity - either table or alias is specified more than once

The attribute issues are:

•no_table - there is no table for attribute

•no_column - there is no column for attribute

•duplicity - attribute is found more than once

Browser

class cubes.backends.sql.star.SnowflakeBrowser(cube, connectable=None, locale=None,metadata=None, debug=False, **op-tions)

SnowflakeBrowser is a SQL-based AggregationBrowser implementation that can aggregate star andsnowflake schemas without need of having explicit view or physical denormalized table.

Attributes:

•cube - browsed cube

•connectable - SQLAlchemy connectable object (engine or connection)

•locale - locale used for browsing

•metadata - SQLAlchemy MetaData object

•debug - output SQL to the logger at INFO level

•options - passed to the mapper and context (see their respective documentation)

Limitations:

•only one locale can be used for browsing at a time

•locale is implemented as denormalized: one column for each language

aggregate(cell=None, measures=None, drilldown=None, attributes=None, page=None,page_size=None, order=None, **options)

Return aggregated result.

Number of database queries:

•without drill-down: 1 (summary)

•with drill-down: 3 (summary, drilldown, total drill-down record count)

fact(key_value)Get a single fact with key key_value from cube.

facts(cell, order=None, page=None, page_size=None)Return all facts from cell, might be ordered and paginated.



validate()Validate physical representation of model. Returns a list of dictionaries with keys: type, issue,object.

Types might be: join or attribute.

The join issues are:

•no_table - there is no table for join

•duplicity - either table or alias is specified more than once

The attribute issues are:

•no_table - there is no table for attribute

•no_column - there is no column for attribute

•duplicity - attribute is found more than once

values(cell, dimension, depth=None, paths=None, hierarchy=None, page=None,page_size=None, order=None, **options)

Return values for dimension with level depth depth. If depth is None, all levels are returned.

Number of database queries: 1.

class cubes.backends.sql.star.QueryContext(cube, mapper, metadata, **options)Object providing context for constructing queries. Puts together the mapper and physical structure. map-per - which is used for mapping logical to physical attributes and performing joins. metadata is asqlalchemy.MetaData instance for getting physical table representations.

Object attributes:

•fact_table – the physical fact table - sqlalchemy.Table instance

•tables – a dictionary where keys are table references (schema, table) or (shchema, alias) to real tables- sqlalchemy.Table instances

Note: To get results as a dictionary, you should zip() the returned rows after statement execution with:

labels = [column.name for column in statement.columns] ... record = dict(zip(labels, row))

This is little overhead for a workaround for SQLAlchemy behaviour in SQLite database. SQLite enginedoes not respect dots in column names which results in “duplicate column name” error.

aggregation_statement(cell, measures=None, attributes=None, drilldown=None)Return a statement for summarized aggregation. whereclause is same as SQLAlchemy whereclausefor sqlalchemy.sql.expression.select(). attributes is list of logical references to attributes to be selected.If it is None then all attributes are used. drilldown has to be a dictionary. Use levels_from_drilldown()to prepare correct drill-down statement.

aggregations_for_measure(measure)Returns list of aggregation functions (sqlalchemy) on measure columns. The result columns are la-beled as measure + _ = aggregation, for example: amount_sum or discount_min.

measure has to be Attribute instance.

If measure has no explicit aggregations associated, then sum is assumed.

boundary_condition(dim, hierarchy, path, bound, first=None)Return a Condition tuple for a boundary condition. If bound is 1 then path is considered to be upperbound (operators < and <= are used), otherwise path is considered as lower bound (operators > and >=are used )

column(attribute, locale=None)Return a column object for attribute. locale is explicit locale to be used. If not specified, then thecurrent browsing/mapping locale is used for localizable attributes.

10.3. backends — Aggregation Browsing Backends 81


columns(attributes, expand_locales=False)Returns list of columns.If expand_locales is True, then one column per attribute locale is added.

condition_for_cell(cell)Constructs conditions for all cuts in the cell. Returns a named tuple with keys:

•condition - SQL conditions

•attributes - attributes that are involved in the conditions. This should be used for join con-struction.

•group_by - attributes used for GROUP BY expression

condition_for_point(dim, path, hierarchy=None)Returns a Condition tuple (attributes, conditions, group_by) dimension dim point at path. It is acompound condition - one equality condition for each path element in form: level[i].key =path[i]

denormalized_statement(attributes=None, expand_locales=False, include_fact_key=True)Return a statement (see class description for more information) for denormalized view. whereclause issame as SQLAlchemy whereclause for sqlalchemy.sql.expression.select(). attributes is list of logicalreferences to attributes to be selected. If it is None then all attributes are used.

Set expand_locales to True to expand all localized attributes.

fact_statement(key_value)Return a statement for selecting a single fact based on key_value

join_expression(joins)Create partial expression on a fact table with joins that can be used as core for a SELECT statement.join is a list of joins returned from mapper (most probably by Mapper.relevant_joins())

join_expression_for_attributes(attributes, expand_locales=False)Returns a join expression for attributes

range_condition(dim, hierarchy, from_path, to_path)Return a condition for a hierarchical range (from_path, to_path). Return value is a Condition tuple.

table(schema, table_name)Return a SQLAlchemy Table instance. If table was already accessed, then existing table is returned.Otherwise new instance is created.

If schema is None then browser’s default schema is used.

Helper functions

cubes.backends.sql.star.paginated_statement(statement, page, page_size)Returns paginated statement if page is provided, otherwise returns the same statement.

cubes.backends.sql.star.ordered_statement(statement, order, context)Returns a SQL statement which is ordered according to the order. If the statement contains attributes thathave natural order specified, then the natural order is used, if not overriden in the order.

cubes.backends.sql.star.order_column(column, order)Orders a column according to order specified as string.

10.3.2 Slicer

This backend is just for backend development demonstration purposes.

class cubes.backends.slicer.SlicerBrowser(cube, url, locale=None)Demo backend browser. This backend is serves just as example of a backend. Uses another Slicer serverinstance for doing all the work. You might use it as a template for your own browser.

Attributes:



•cube – obligatory, but currently unused here

•url - base url of Cubes Slicer OLAP server

10.3.3 Implementing Custom Backend

Custom backend is just a subclass of cubes.AggregationBrowser class.

Slicer and Server Integration

If the backend is intended to be used by the Slicer server, then backend should be placed in its own module.The module should contain a method create_workspace(model, config) which returns a workspace object. configis a configuration dictionary taken from the config file (see below). The workspace object should implementbrowser_for_cube(cube, locale) which returns an AggregationBrowser subclass.

The create_workspace() can be compared to create_engine() in a database abstraction framework and thebrowser_for_cube() can be compared to engine.connect() to get a connection from a pool.

The configuration for create_workspace() comes from slicer .ini configuration file in section [workspace]and is provided as dict object.

10.4 HTTP WSGI OLAP Server Reference

Light-weight HTTP WSGI server based on the Werkzeug framework. For more information about using the serversee OLAP Server.

cubes.server.run_server(config)Run OLAP server with configuration specified in config

class cubes.server.Slicer(config=None)Create a WSGI server for providing OLAP web service. You might provide config as ConfigParserobject.

localized_model(locale=None)Tries to translate the model. Looks for language in configuration file under [translations], if notranslation is provided, then model remains untouched.

10.5 Utility functions

Utility functions for computing combinations of dimensions and hierarchy levels

cubes.common.get_logger()Get brewery default logger

cubes.common.create_logger()Create a default logger

class cubes.common.IgnoringDictionarySimple dictionary extension that will ignore any keys of which values are empty (None/False)

setnoempty(key, value)Set value in a dictionary if value is not null

class cubes.common.MissingPackage(package, feature=None, source=None, comment=None)Bogus class to handle missing optional packages - packages that are not necessarily required for Cubes, butare needed for certain features.

cubes.common.localize_common(obj, trans)Localize common attributes: label and description

10.4. HTTP WSGI OLAP Server Reference 83



cubes.common.localize_attributes(attribs, translations)Localize list of attributes. translations should be a dictionary with keys as attribute names, values aredictionaries with localizable attribute metadata, such as label or description.

cubes.common.get_localizable_attributes(obj)Returns a dictionary with localizable attributes of obj.

cubes.common.collect_subclasses(parent, suffix=None)Collect all subclasses of parent and return a dictionary where keys are decamelized class names transformedto identifiers and with suffix removed.


CHAPTER

ELEVEN

DEVELOPING CUBES

This chapter describes some guidelines how to contribute to the Cubes.

11.1 General

• If you are puzzled why is something implemented certain way, ask before complaining. There might be areason that is not explained and that should be described in documentation. Or there might not even be areason for current implementation at all, and you suggestion might help.

• Until 1.0 the interface is not 100% decided and might change

• Focus is on usability, simplicity and understandability. There might be places where this might not becompletely achieved and this feature of code should be considered as bug. For example: overcomplicatedinterface, too long call sequence which can be simplified, required over-configuration,...

• Magic is not bad, if used with caution and the mechanic is well documented. Also there should be a wayhow to do it manually for every magical feature.

11.2 New or changed feature checklist

• add/change method

• add docstring documentation

• reflect documentation

– are any examples affected?

• commit

85


86 Chapter 11. Developing Cubes

CHAPTER

TWELVE

DEVELOPMENT NOTES

This chapter contains notes related to Cubes development, such as:

• unresolved design decisions

• suggestions

• proposals for changes

• explaination for certain design decisions

I’ve included this document as part of documentation to get more feedback or to help understanding why certainthings are done in certain way at the time being.

12.1 Fact Table

Currently all models are required to specify fact table. This can be somehow discovered from model and modelmapping. Or from database schema itself.

87


88 Chapter 12. Development Notes

CHAPTER

THIRTEEN

CONTACT AND GETTING HELP

If you have questions, problems or suggestions, you can send a message to Google group or write to the author(Stefan Urbanek).

Report bugs in github issues tracking

There is an IRC channel #databrewery on server irc.freenode.net.

89

http://groups.google.com/group/cubes-discuss

mailto:[email protected]

https://github.com/Stiivi/cubes/issues


90 Chapter 13. Contact and Getting Help

CHAPTER

FOURTEEN

LICENSE

Cubes is licensed under MIT license with small addition:

Copyright (c) 2011-2012 Stefan Urbanek, see AUTHORS for more details

Permission is hereby granted, free of charge, to any person obtaining acopy of this software and associated documentation files (the "Software"),to deal in the Software without restriction, including without limitationthe rights to use, copy, modify, merge, publish, distribute, sublicense,and/or sell copies of the Software, and to permit persons to whom theSoftware is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included inall copies or substantial portions of the Software.

If your version of the Software supports interaction with it remotelythrough a computer network, the above copyright notice and this permissionnotice shall be accessible to all users.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS ORIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THEAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHERLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISINGFROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHERDEALINGS IN THE SOFTWARE.

Simply said, that if you use it as part of software as a service (SaaS) you have to provide the copyright notice inan about, legal info, credits or some similar kind of page or info box.

91


92 Chapter 14. License

CHAPTER

FIFTEEN

INDICES AND TABLES

• genindex

• modindex

• search

93


94 Chapter 15. Indices and tables

PYTHON MODULE INDEX

bbackends, 78

ccubes.common, 83

sserver, 83

95


96 Python Module Index

INDEX

Aadd_cube() (cubes.Model method), 64add_dimension() (cubes.Cube method), 65add_dimension() (cubes.Model method), 64aggregate() (cubes.AggregationBrowser method), 69aggregate() (cubes.backends.sql.star.SnowflakeBrowser

method), 80aggregation_statement()

(cubes.backends.sql.star.QueryContextmethod), 81

AggregationBrowser (class in cubes), 69AggregationResult (class in cubes), 71aggregations_for_measure()


all_attributes() (cubes.Dimension method), 66all_attributes() (cubes.Hierarchy method), 67all_attributes() (cubes.Mapper method), 76Attribute (class in cubes), 68attribute() (cubes.Dimension method), 66attribute() (cubes.Level method), 68attribute() (cubes.Mapper method), 76attribute_list() (in module cubes), 68

Bbackends (module), 78boundary_condition() (cubes.backends.sql.star.QueryContext

method), 81browser() (cubes.backends.sql.workspace.SQLStarWorkspace

method), 78browser_for_cube() (cubes.backends.sql.workspace.SQLStarWorkspace

method), 79

Ccached() (cubes.AggregationResult method), 72Cell (class in cubes), 72cell_details() (cubes.AggregationBrowser method), 70collect_subclasses() (in module cubes.common), 84column() (cubes.backends.sql.star.QueryContext

method), 81columns() (cubes.backends.sql.star.QueryContext

method), 81condition_for_cell() (cubes.backends.sql.star.QueryContext

method), 82condition_for_point() (cubes.backends.sql.star.QueryContext

method), 82

create_conformed_rollup()(cubes.backends.sql.workspace.SQLStarWorkspacemethod), 79

create_conformed_rollups()(cubes.backends.sql.workspace.SQLStarWorkspacemethod), 79

create_cube() (in module cubes), 62create_cube_aggregate()

(cubes.backends.sql.workspace.SQLStarWorkspacemethod), 79

create_denormalized_view()(cubes.backends.sql.workspace.SQLStarWorkspacemethod), 79

create_dimension() (in module cubes), 62create_level() (in module cubes), 63create_logger() (in module cubes.common), 83create_model() (in module cubes), 61create_workspace() (in module

cubes.backends.sql.workspace), 78cross_table() (cubes.AggregationResult method), 72Cube (class in cubes), 64cube() (cubes.Model method), 64cubes.common (module), 83cut_details() (cubes.AggregationBrowser method), 70cut_for_dimension() (cubes.Cell method), 72cuts_from_string() (in module cubes), 74

Ddefault_hierarchy (cubes.Dimension attribute), 66denormalized_statement()


DenormalizedMapper (class in cubes), 78Dimension (class in cubes), 65dimension() (cubes.Cube method), 65dimension() (cubes.Model method), 64dimension_object() (cubes.AggregationBrowser

method), 70drilldown() (cubes.Cell method), 72

Ffact() (cubes.AggregationBrowser method), 70fact() (cubes.backends.sql.star.SnowflakeBrowser

method), 80fact_statement() (cubes.backends.sql.star.QueryContext

method), 82

97


facts() (cubes.AggregationBrowser method), 70facts() (cubes.backends.sql.star.SnowflakeBrowser

method), 80features (cubes.AggregationBrowser attribute), 70full_name() (cubes.Attribute method), 68

Gget_localizable_attributes() (in module

cubes.common), 84get_logger() (in module cubes.common), 83

Hhas_details (cubes.Dimension attribute), 66has_details (cubes.Level attribute), 68Hierarchy (class in cubes), 66hierarchy() (cubes.Dimension method), 66

IIgnoringDictionary (class in cubes.common), 83is_base() (cubes.Cell method), 73is_flat (cubes.Dimension attribute), 66is_last() (cubes.Hierarchy method), 67is_valid() (cubes.Model method), 64

Jjoin_expression() (cubes.backends.sql.star.QueryContext

method), 82join_expression_for_attributes()


Kkey_attributes() (cubes.Dimension method), 66key_attributes() (cubes.Hierarchy method), 67

LLevel (class in cubes), 67level() (cubes.Dimension method), 66level_depth() (cubes.PointCut method), 74level_depth() (cubes.RangeCut method), 74level_depth() (cubes.SetCut method), 74level_depths() (cubes.Cell method), 73level_index() (cubes.Hierarchy method), 67level_names (cubes.Dimension attribute), 66levels (cubes.Dimension attribute), 66levels_for_depth() (cubes.Hierarchy method), 67levels_for_path() (cubes.Hierarchy method), 67levels_from_drilldown() (in module cubes), 75load_model() (in module cubes), 61localizable_dictionary() (cubes.Model method), 64localize() (cubes.Model method), 64localize_attributes() (in module cubes.common), 83localize_common() (in module cubes.common), 83localized_model() (cubes.server.Slicer method), 83logical() (cubes.Mapper method), 76

Mmap_attributes() (cubes.Mapper method), 76

Mapper (class in cubes), 75measure() (cubes.Cube method), 65MissingPackage (class in cubes.common), 83Model (class in cubes), 63ModelError, 68ModelIncosistencyError, 68multi_slice() (cubes.Cell method), 73

Nnext_level() (cubes.Hierarchy method), 67NoSuchAttributeError, 68NoSuchDimensionError, 68

Oorder_column() (in module cubes.backends.sql.star), 82ordered_statement() (in module

cubes.backends.sql.star), 82

Ppaginated_statement() (in module

cubes.backends.sql.star), 82path_from_string() (in module cubes), 75path_is_base() (cubes.Hierarchy method), 67physical() (cubes.DenormalizedMapper method), 78physical() (cubes.Mapper method), 76physical() (cubes.SnowflakeMapper method), 77point_cut_for_dimension() (cubes.Cell method), 73point_slice() (cubes.Cell method), 73PointCut (class in cubes), 74previous_level() (cubes.Hierarchy method), 67

QQueryContext (class in cubes.backends.sql.star), 81

Rrange_condition() (cubes.backends.sql.star.QueryContext

method), 82RangeCut (class in cubes), 74ref() (cubes.Attribute method), 68relevant_joins() (cubes.DenormalizedMapper method),

78relevant_joins() (cubes.Mapper method), 76relevant_joins() (cubes.SnowflakeMapper method), 77remove_cube() (cubes.Model method), 64remove_dimension() (cubes.Cube method), 65remove_dimension() (cubes.Model method), 64report() (cubes.AggregationBrowser method), 70rollup() (cubes.Cell method), 73rollup() (cubes.Hierarchy method), 67rollup_dim() (cubes.Cell method), 73run_server() (in module cubes.server), 83

Sserver (module), 83SetCut (class in cubes), 74setnoempty() (cubes.common.IgnoringDictionary

method), 83

98 Index


simple_model() (in module cubes), 63slice() (cubes.Cell method), 73Slicer (class in cubes.server), 83SlicerBrowser (class in cubes.backends.slicer), 82SnowflakeBrowser (class in cubes.backends.sql.star),

80SnowflakeMapper (class in cubes), 76split_logical() (cubes.Mapper method), 76SQLStarWorkspace (class in

cubes.backends.sql.workspace), 78string_from_cuts() (in module cubes), 75string_from_path() (in module cubes), 75

Ttable() (cubes.backends.sql.star.QueryContext method),

82table_map() (cubes.SnowflakeMapper method), 77table_rows() (cubes.AggregationResult method), 72to_dict() (cubes.AggregationResult method), 72to_dict() (cubes.Cell method), 74to_dict() (cubes.Cube method), 65to_dict() (cubes.Dimension method), 66to_dict() (cubes.Hierarchy method), 67to_dict() (cubes.Level method), 68to_dict() (cubes.Model method), 64to_dict() (cubes.PointCut method), 74to_dict() (cubes.RangeCut method), 74to_dict() (cubes.SetCut method), 74to_str() (cubes.Cell method), 74

Vvalidate() (cubes.backends.sql.star.SnowflakeBrowser

method), 80validate() (cubes.Cube method), 65validate() (cubes.Dimension method), 66validate() (cubes.Model method), 64validate_model() (cubes.backends.sql.workspace.SQLStarWorkspace

method), 80values() (cubes.AggregationBrowser method), 71values() (cubes.backends.sql.star.SnowflakeBrowser

method), 81

Index 99

Documents

Cubes - Lightweight Python OLAP Framework