Cubes – pluggable model explained

Preview:

DESCRIPTION

Description of the new pluggable model in Cubes – lightweight Python OLAP framework.

Citation preview

Pluggable ModelCubes Analytical Workspace Redesign

data brewery

February 2014Stefan Urbanek – @Stiivi

Original CubesCubes before 1.0

Model

■ single JSON or a model bundle

■ contains all model objects

■ full description required

backends

model browser✂

server

http

workspace

formatters

modules

one file or one directory bundle

one per serving:[workspace] backend=sql url=postgresql://localhost/database

Browser

SQL Snowflake Browser

Aggregation Browser

SQL Denormalized Browser MongoDB Browser Some HTTP Data

Service Browser

?

multiple backends available

Backend

■ implemented as python module with an entry point create_workspace()

■ provides Workspace and Browser workspace represents data storage

■ only one Workspace per serving only one kind of storage per serving

Requirements

Model

■ composed of multiple parts

■ external model definition provided from external source, such as analytical service

■ shared dimension descriptions only one dimension description is necessary per composed model

Backend

■ heterogenous storage multiple data stores, different types of data stores

■ different schemas in same store

■ multiple environments dev, test, production, ...

Redesign

Backend

■ “backend” are multiple objects:

!

■ better plug-in system instead of Python module

■ more flexible composition

|Browser

"Store

#Provider

Backend Objects

■ Browser – performs aggregated browsing

■ Store – maintains database connection

■ Model Provider – provides model

Note: not every kind has to be implemented

Logical Physical

physical data store(database or API)

|Browser

"Store

#Provider

∑aggregate

connectcreate model

model

cubes

dimensions

model

backend objects

Browser

Browser

■ depends on the logical model

■ implements aggregation aggregate(), values(), …

■ gets data from associated store

Logical Physical

physical data store(database or API)

|Browser

"Store

∑aggregate

model

browser

Browser Methods

■ features()

■ aggregate()

■ members()

■ facts()

■ fact()

Store

Store

■ provides database or API connection

■ might provide a model

■ slicer tool actions physical mapping validation, model from schema generation, schema from model generation, schema conversions and optimization, ...

*former backend’s “Workspace” object

*

Logical Physical

physical data store(database or API)

|Browser

"Store

connect

store

Store Methods

■ validate(cube) – does logical map to physical?

■ create(object) – create physical structure

Store is not required to implement any methods at this time. Future:

Model Provider

Model Provider

■ creates model from external source

■ might suggest store to be used

Logical

!Provider

create model

model

cubes

dimensions

model

model provider

Provider Methods

■ dimension_metadata(name,temps,locale)

■ cube_metadata(name,locale)

or

■ dimension(name,temps,locale)

■ cube(name,locale)

example backends

SQL Backend Mongo Backend Google Analytics Backend

|Snowflake Browser

"SQL Store

|Mongo Browser

"Mongo Store

|GA Browser

"GA Store

#GA Model Provider

from cubes import | AggregateBrowser, " Store !class " SQLStore(" Store): | default_browser_name = “sql_snowflake” ! def __init__(self, # **options): # initialize the store here ! def validate_cube(self, cube): return True # if valid !!class | SQLSnowflakeBrowser(| Browser): def __init__(self, model, locale): # initialize the browser ! def features(self): # return list of browser features def aggregate(self, cell, ...): # return aggregation of the cell

from slicer.ini

New Workspace

■ global object at library level

■ provides appropriate browser

■ contains run-time configuration

■ might have state persistence

*former backend Workspace is now Store

*

Future Workspace

■ caching

■ cube composition

■ … ?

Workspace Example

heterogenous environment

Workspace

Cubes

Model Providers

Stores

sales churn eventsactivations

Static Model Provider

API Model Provider

BI Data(Postgres)

BI Data 2(Mongo)

Events(API)

Workspace

Cubes

Model Providers

Stores

sales churn eventsactivations

Static Model Provider

BI Data(Postgres)

BI Data 2(Mongo)

crm sales events

[workspace] models_path: /var/lib/cubes/models ![models] crm: crm.cubesmodel sales: sales.cubesmodel events: events.cubesmodel ![datastore_bidata] type: sql url: postgresql://localhost/crm ![datastore_bidata2] type: mongo host: localhost collection: events

Conclusion

Conclusion

■ heterogenous pluggable environment

■ externally provided models

■ easier backend implementation

Cubes Home

cubes.databrewery.org

github

github.com/Stiivi/cubes

Development Documentation

cubes.databrewery.org/dev/doc/for github master HEAD

Recommended