34
Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated with ADBIS 2017, Chipre) Luis-Daniel Ibañez, Elena Simperl University of Southampton (UK) Jose Norberto Mazón Universidad de Alicante (Spain) Twitter: @jnmazon email: [email protected]

semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Towards

semantic assessment of

summarizability

in self-service BI

BigNovelTI, September 24 2017

(collocated with ADBIS 2017, Chipre)

Luis-Daniel Ibañez, Elena Simperl University of Southampton (UK)

Jose Norberto Mazón Universidad de Alicante (Spain) Twitter: @jnmazon email: [email protected]

Page 2: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Data, data everywhere

• The promise of (big) data

2

++ data ++ insights better decisions

Page 3: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Business Intelligence & OLAP

• Business Intelligence (BI) used for decision support in organizations

• OLAP (On-line Analytical Process) • Multidimensional modeling

• Fact

• Events of interests for analysis (e.g. sales, treatments of patients...)

• Measures

• Dimension

• Specify different ways the data can be viewed, aggregated and sorted (e.g. time, store, customer...)

• Hierarchies

3

Page 4: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

• Multidimensional model is implemented in cubes

• Example: cube sales data

• A product is sold in a supermarket at a specific date

Dimensions

Dimension hierarchies

Time

Supermarket

Product

Product sales

Fact

Measures

Quantity

city province country

product type

day month year ...

Madrid

Barcelona

Alicante

Drink

Food ...

January May

Business Intelligence & OLAP

9

8

4

Page 5: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

• OLAP Algebra for data analysis

• Roll-up and drill-down to navigate through

hierarchies

• Aggregation functions applied to measures

• avg, sum, min, etc.

• Drill-through from one fact to another

Business Intelligence & OLAP

5

Page 6: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Business Intelligence & OLAP

• MD query structure [Rafanelli et al 1996]

• Phenomenon of interest, i.e. measure to be

analyzed

• Category attributes, i.e. context for analyzing the

phenomenon of interest (dimensions)

• Aggregation sets, i.e. subsets of the phenomenon of

interest according to several category attributes

• Aggregation functions, i.e. operators to apply on the

aggregation sets to summarize their factual data.

6

Page 7: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

• OLAP cubes are easy-to-use

• Multidimensional queries

• Fast and simple data aggregation

• Data analysis from different contexts

100

food drink

product.type

Sales

supermarket.

region =

“Valencia

Region”

Frozen Fresh Spirits Alcohol

Alicante Albatera

Elche

Valencia Burjasot

Cullera

500

900

1300

200

600

1000

1400

300

700

1100

1500

400

800

1200

1600

Business Intelligence & OLAP

1400 2200

4600 5400

Sales’ Food Drink

Alicante

Valencia

7

Page 8: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Business Intelligence & OLAP

8

Page 9: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Business Intelligence & OLAP

• Data within “traditional” BI

• Known (internal) data sources

• MD design for specific data sources

• Data integration at design time

• Data sources owned by decision maker

• Only domain-expert users access to data

9

Page 11: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Self-service Business Intelligence

• New type of BI

• Unknown (external) data sources

• MD design for unseen data sources

• Data integration at runtime

• Open data as source

• Everybody can access data (including non-expert users)

• Situational BI, Exploratory BI, Live BI, Self-

service BI, Open BI, etc.

11

Page 12: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Self-service Business Intelligence

• Self-service BI [Abelló et al 2013]

• Enabling non-expert users to make well-informed

decisions by enriching the decision process with

new data not owned and controlled by the decision

maker

• Search, extraction, integration, and storage for reuse

or sharing should be accomplished by non-expert

decision makers without any intervention by

designers or programmers

12

Page 13: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Self-service Business Intelligence

13

[Abelló et al 2013]

Page 14: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Some Self-service BI Challenges

• Multidimensional divide

• Open data unkown at design time

• Incorrect multidimensional elements

• Data divide

• Non-expert users (no enough skills for data analysis)

• Unmeaning queries

14

Avoid summarizability problems in non-expert queries

Page 15: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Self-service BI Challenges

• Summarizability

• Multidimensional models must ensure to accurately

compute aggregation of measures along dimensions [Lenz and Shoshani, 1997]

15

Page 16: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Self-service BI Challenges

• Summarizability

• At syntactic level

• Many-to-one relationship between dimension hierarchy

levels [Mazón et al 2009]

16

100

food drink

product.type

Sales

supermarket.

region =

“Valencia

Region”

Frozen Fresh Spirits Alcohol

Alicante Albatera

Elche

Valencia Burjasot

Cullera

500

900

1300

200

600

1000

1400

300

700

1100

1500

400

800

1200

1600

1400 2200

4600 5400

Sales’ Food Drink

Alicante

Valencia

Page 17: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Self-service BI Challenges

• Summarizability

• At the semantic level [Niemi et al, 2014]

• Type compatibility

• Aggregation function

• Measure

• Dimension

17

100

June July

time

Stock level

supermarket.

region =

“Valencia

Region”

Day1 Day30 Day1 Day31

Alicante Albatera

Elche

Valencia Burjasot

Cullera

500

900

1300

200

600

1000

1400

300

700

1100

1500

400

800

1200

1600

Alicante

Valencia

Stock level’

1400

June July

800

Page 18: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Self-service BI Challenges

• Type compatibility [Lenz and Shoshani, 1997]

• Flow: measure recorded at the end of a period

• Monthly number of births, annual income, etc.

• Stock: measure recorded at particular point of time

• Inventory of cars, number of citizens, etc.

• Value for unit: same as stock but unit is not a ratio

• Item price, cost per unit manufactured, exchange rate, etc.

18

Type compatibility

through Time dimension

Type compatibility

through non-Time dimension

Page 19: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Self-service BI Challenges

• Statistical linked open data as source for Self-

service BI

• RDF Data Cube

• https://www.w3.org/TR/vocab-data-cube

• Vocabulary for publishing multidimensional data, such as

statistics, on the Web

• Building upon Statistical Data and Metadata Exchange

(SDMX)

• Collect, exchange, process, and disseminate aggregate

statistics

• http://sdmx.org/docs/2_0/SDMX_2_0%20SECTION_02_I

nformationModel.pdf 19

Page 20: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Self-service BI Challenges • Summarizability is not considered in current statistical

open data tools

• Word DataBank [http://databank.worldbank.org/]

20

Page 21: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Research goal

• Some works propose mechanisms for supporting

users in using statistical open data

• RDF Data Cube extension QB4OLAP [Etcheverry et al 2014]

• Survey on exploratory BI & Semantic Web [Abello et al 2015]

• OpenGovIntelligence - http://www.opengovintelligence.eu/

• OpenCube - http://opencube-project.eu/ & http://opencube-

toolkit.eu/

21

Page 22: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Research goal

• Unfortunately, current research fails in ensuring

summarizability issues

• There is always a manual step for ensuring type

compatibility

• Inconsistent with a self-service BI scenario that

reuses open data

22

Summarizability-aware querying of statistical

open data considering type compatibility

Page 23: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Summarizability-aware querying

23

*NO means that there is not enough information to support user but there may be

information on similar situations HINT new knowledge from the user to enrich KB

Page 24: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Summarizability KB

• Extending RDF Data Cube and QB4OLAP to

include type compatibility

• https://www.w3.org/TR/vocab-data-cube/

• Steps to automate the enrichment of QB data sets

with specific QB4OLAP semantics [Varga et al 2016]

• Also, provenance information is included

24

Page 25: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

25

Page 26: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Summarizability KB

26

• One measure can be aggregated by using the

aggregation function through the dimension

Page 27: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Summarizability KB

• Created from USEWOD

• http://usewod.org/

• Usage Analysis and the Web of Data

• DBpedia logs

• Reference data set for research on query logs of

Linked Data endpoints

• [Luczak-Roesch et al 2016]

27

Page 28: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Summarizability KB

28

prefix dbpprop: <http://dbpedia.org/property/>

SELECT ?movies max(?runtime) min(?runtime) avg(?runtime)

WHERE

{

?movies <http://dbpedia.org/ontology/runtime> ?runtime .

?movies <http://dbpedia.org/ontology/starring>

<http://dbpedia.org/resource/Clint_Eastwood>.

}

group by ?movies

Max, min and average of movies runtime can be calculated

< runtime , MAX , movie >

< runtime , MIN , movie >

< runtime , AVG , movie >

Page 29: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Proposed aproach (1)

• STEP 0 Parse SPARQL sentence (with ARQ)

• STEP 1 Look for aggregation functions in the

SELECT

• avg, sum, max, min (future work may consider UDF)

• For each aggregation function

• STEP 2 Get variable within the aggregation function

• STEP 3 Extract basic graph pattern (triplet with

variables) containing the measure

• STEP 4 From the basic graph pattern, assume that

measure is the object and then extract predicate (this

predicate is the measure)

• STEP 5 Determine dimension 29

Page 30: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Proposed approach (and 2)

• STEP 5 Determine dimension

• If there is “group by” then

• Get variable in group by

• Look for the variable in the subjects in the where clause

• Get the predicates and look for the most specific domain (this

is the dimension).

• Else

• Get variable in the subject of the basic graph pattern

containing the measure

• Look for "a" or "rdf:type" to look for the type. If not there, then

look for the domain of the predicate (this is the dimension)

30

Page 31: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Structure of Type Compatibility KB

31

<http://dbpedia.org/ontology/runtime> TypeCompatibility :blank1

:blank1 dimension <http://dbpedia.org/ontology/Work>

:blank1 aggFunction qb4olap:avg

:blank1 query "...."

:blank1 provenance "dbpedia"

prefix dbpprop: <http://dbpedia.org/property/>

SELECT ?movies max(?runtime) min(?runtime) avg(?runtime)

WHERE

{

?movies <http://dbpedia.org/ontology/runtime> ?runtime .

?movies <http://dbpedia.org/ontology/starring>

<http://dbpedia.org/resource/Clint_Eastwood>.

}

group by ?movies rdfs:domain <http://dbpedia.org/ontology/Work>

Page 32: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Conclusions

• Self-Service BI uses external (open) data for

supporting decision making

• Decision makers are likely to make meaningless

queries that lead to summarizability problems

• Framework for semantic assessment of

summarizability in Self-service BI

• Based on DBpedia logs of the USEWOD dataset

• Queries using aggregation functions (8946 out of 35M)

• Future work

• Use other source of logs with higher percentage of

queries using aggregation functions

32

Page 33: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

33 Source: https://c1.staticflickr.com/7/6156/6164861516_05d435e7b4_b.jpg

Work in progress…

Let’s unleash statistical open data potential!

Page 34: semantic assessment of summarizability in self-service BI · 2017-09-25 · Towards semantic assessment of summarizability in self-service BI BigNovelTI, September 24 2017 (collocated

Towards

semantic assessment of

summarizability

in self-service BI

BigNovelTI, September 24 2017

(collocated with ADBIS 2017, Chipre)

Luis-Daniel Ibañez, Elena Simperl University of Southampton (UK)

Jose Norberto Mazón Universidad de Alicante (Spain) Twitter: @jnmazon email: [email protected]