237

Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl
Page 2: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

CONCEPTUALIZINGANALYTICSConceptual Modeling and Data Analytics –A Tutorial

Christoph G. Schuetz Michael Schrefl

Page 3: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

CONCEPTUALIZINGANALYTICSConceptual Modeling and Data Analytics –A Tutorial

Christoph G. Schuetz Michael SchreflThanksIlko Kovacic, Median Hilal, and Georg Grossmann (UniSA) formaterial that served as the basis for parts of this tutorial.

Page 4: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Table of Contents

Introduction and Background

Acquisition and Recording

Extraction, Cleaning, and Annotation

Integration, Aggregation, and Representation

Analysis and Modeling

Interpretation and Action

Open Issues

1/131

Page 5: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

INTRODUCTION ANDBACKGROUND

Page 6: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Scope of this Tutorial

How may conceptual modeling facilitate data analytics?

2/131

Page 7: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Scope of this Tutorial

How may conceptual modeling facilitate data analytics?

2/131

Page 8: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What is Data Analytics?

Definition from industry [32, p. 37]“data-based applications of quantitative analysis methods”

Another definition from industry [29, p. 16]“examination of information to uncover insights that give abusiness person the knowledge to make informed decisions”

“Analytics tools enable people to query and analyze informa-tion”

Definition from academia [31, p. 329]“discovery and communication of meaningful patterns in data”

“Organizations apply analytics to their data in order to de-scribe, predict, and improve organizational performance.”

3/131

Page 9: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What is Data Analytics?

Definition from industry [32, p. 37]“data-based applications of quantitative analysis methods”

Another definition from industry [29, p. 16]“examination of information to uncover insights that give abusiness person the knowledge to make informed decisions”

“Analytics tools enable people to query and analyze informa-tion”

Definition from academia [31, p. 329]“discovery and communication of meaningful patterns in data”

“Organizations apply analytics to their data in order to de-scribe, predict, and improve organizational performance.”

3/131

Page 10: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What is Data Analytics?

Definition from industry [32, p. 37]“data-based applications of quantitative analysis methods”

Another definition from industry [29, p. 16]“examination of information to uncover insights that give abusiness person the knowledge to make informed decisions”

“Analytics tools enable people to query and analyze informa-tion”

Definition from academia [31, p. 329]“discovery and communication of meaningful patterns in data”

“Organizations apply analytics to their data in order to de-scribe, predict, and improve organizational performance.”

3/131

Page 11: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What is Data Analytics?

Definition from industry [32, p. 37]“data-based applications of quantitative analysis methods”

Another definition from industry [29, p. 16]“examination of information to uncover insights that give abusiness person the knowledge to make informed decisions”“Analytics tools enable people to query and analyze informa-tion”

Definition from academia [31, p. 329]“discovery and communication of meaningful patterns in data”

“Organizations apply analytics to their data in order to de-scribe, predict, and improve organizational performance.”

3/131

Page 12: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What is Data Analytics?

Definition from industry [32, p. 37]“data-based applications of quantitative analysis methods”

Another definition from industry [29, p. 16]“examination of information to uncover insights that give abusiness person the knowledge to make informed decisions”“Analytics tools enable people to query and analyze informa-tion”

Definition from academia [31, p. 329]“discovery and communication of meaningful patterns in data”

“Organizations apply analytics to their data in order to de-scribe, predict, and improve organizational performance.”

3/131

Page 13: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What is Data Analytics?

Definition from industry [32, p. 37]“data-based applications of quantitative analysis methods”

Another definition from industry [29, p. 16]“examination of information to uncover insights that give abusiness person the knowledge to make informed decisions”“Analytics tools enable people to query and analyze informa-tion”

Definition from academia [31, p. 329]“discovery and communication of meaningful patterns in data”“Organizations apply analytics to their data in order to de-scribe, predict, and improve organizational performance.”

3/131

Page 14: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Analytics

� Descriptive: What happened? Multidimensional analysis(OLAP), statistical analysis of the past. Dashboards,scorecards, key performance indicators.

� Predictive: Use of statistical methods in an attempt topredict what will happen in the future.

� Prescriptive: What actions should be taken? Alerts andactions triggered by analysis results. Active datawarehouse.

Data analytics must be viewed in the broader context ofbusiness intelligence

4/131

Adv

ance

d

Page 15: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Analytics

� Descriptive: What happened? Multidimensional analysis(OLAP), statistical analysis of the past. Dashboards,scorecards, key performance indicators.

� Predictive: Use of statistical methods in an attempt topredict what will happen in the future.

� Prescriptive: What actions should be taken? Alerts andactions triggered by analysis results. Active datawarehouse.

Data analytics must be viewed in the broader context ofbusiness intelligence

4/131

Adv

ance

d

Page 16: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Analytics

� Descriptive: What happened? Multidimensional analysis(OLAP), statistical analysis of the past. Dashboards,scorecards, key performance indicators.

� Predictive: Use of statistical methods in an attempt topredict what will happen in the future.

� Prescriptive: What actions should be taken? Alerts andactions triggered by analysis results. Active datawarehouse.

Data analytics must be viewed in the broader context ofbusiness intelligence

4/131

Adv

ance

d

Page 17: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What is Business Intelligence?

Data

Integration

Data

Warehousing

Business

Intelligence

Store

Integrated

Data

Integrate &

Cleanse Data

from Multiple

Sources

Present &

Analyze

Information

Figure: The relationship between data integration, data warehousing,and business intelligence [29, p. 15]

5/131

Page 18: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What about big data analytics?

Compared to traditional business intelligence (BI), analysis ofbig data is really not so different from a conceptual point of view

Acquisition/

Recording

Exctraction/

Cleaning/

Annotation

Integration/

Aggregation/

Representation

Analysis/

Modeling

Interpretation/

Action

Figure: The (big) data analysis pipeline (adapted from [3])

One may argue that business intelligence has always beenabout the analysis of what constituted “big” data at the time [32]

The specific technologies, however, may differ.

6/131

Page 19: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What about big data analytics?

Compared to traditional business intelligence (BI), analysis ofbig data is really not so different from a conceptual point of view

Acquisition/

Recording

Exctraction/

Cleaning/

Annotation

Integration/

Aggregation/

Representation

Analysis/

Modeling

Interpretation/

Action

Figure: The (big) data analysis pipeline (adapted from [3])

One may argue that business intelligence has always beenabout the analysis of what constituted “big” data at the time [32]

The specific technologies, however, may differ.

6/131

Page 20: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What about big data analytics?

Compared to traditional business intelligence (BI), analysis ofbig data is really not so different from a conceptual point of view

Acquisition/

Recording

Exctraction/

Cleaning/

Annotation

Integration/

Aggregation/

Representation

Analysis/

Modeling

Interpretation/

Action

Figure: The (big) data analysis pipeline (adapted from [3])

One may argue that business intelligence has always beenabout the analysis of what constituted “big” data at the time [32]

The specific technologies, however, may differ.

6/131

Page 21: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

What about big data analytics?

Compared to traditional business intelligence (BI), analysis ofbig data is really not so different from a conceptual point of view

Acquisition/

Recording

Exctraction/

Cleaning/

Annotation

Integration/

Aggregation/

Representation

Analysis/

Modeling

Interpretation/

Action

Figure: The (big) data analysis pipeline (adapted from [3])

One may argue that business intelligence has always beenabout the analysis of what constituted “big” data at the time [32]

The specific technologies, however, may differ.

6/131

Page 22: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Scope of this Tutorial

How may conceptual modeling facilitate data analytics?

This tutorial follows the steps of the (big) data analysis pipelineand illustrates selected examples of conceptual modelingsupporting each step.

7/131

Page 23: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Running Example: Precision Dairy Farming

From this ...

8/131

Page 24: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Running Example: Precision Dairy Farming

From this ...

8/131

Page 25: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Running Example: Precision Dairy Farming

... to that!

9/131

Page 26: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The AgriProKnow Project

Joint research effort between various companies and researchinstitutions on data analytics in dairy farming

� Smartbow develops smart animal eartags to track activity.� Wasserbauer develops automated feeding machines.� The University of Veterinary Medicine Vienna provides the

domain knowledge.� Johannes Kepler University (JKU) Linz has statistical and

business intelligence (BI) knowledge for data analysis.

Project goal: Building an active semantic data warehouse forprecision dairy farming [28]

10/131

Page 27: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The AgriProKnow Project

Joint research effort between various companies and researchinstitutions on data analytics in dairy farming

� Smartbow develops smart animal eartags to track activity.� Wasserbauer develops automated feeding machines.� The University of Veterinary Medicine Vienna provides the

domain knowledge.� Johannes Kepler University (JKU) Linz has statistical and

business intelligence (BI) knowledge for data analysis.

Project goal: Building an active semantic data warehouse forprecision dairy farming [28]

11/131

Further ReadingC. G. Schuetz, S. Schausberger, M. Schrefl. Building anactive semantic data warehouse for precision dairy farming.Journal of Organizational Computing and Electronic Com-merce, 28(2), 122-144, 2018.

Page 28: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ACQUISITION ANDRECORDING

Page 29: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Acquisition and Recording

Interesting data originate from various sources such asoperational databases, sensors, or the web.

Possible storage forms for (big) data with support for dataanalysis are:

� Data Warehouse: A clean and integrated databaseproviding data of interest in a format fit for analysis

� Data Lake: Store the raw data as-is, possibly withadditional metadata to help retrieve datasets. Data aretransformed when needed for the analysis.

12/131

Page 30: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Acquisition and Recording

Interesting data originate from various sources such asoperational databases, sensors, or the web.

Possible storage forms for (big) data with support for dataanalysis are:

� Data Warehouse: A clean and integrated databaseproviding data of interest in a format fit for analysis

� Data Lake: Store the raw data as-is, possibly withadditional metadata to help retrieve datasets. Data aretransformed when needed for the analysis.

12/131

Page 31: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Acquisition and Recording

Interesting data originate from various sources such asoperational databases, sensors, or the web.

Possible storage forms for (big) data with support for dataanalysis are:

� Data Warehouse: A clean and integrated databaseproviding data of interest in a format fit for analysis

� Data Lake: Store the raw data as-is, possibly withadditional metadata to help retrieve datasets. Data aretransformed when needed for the analysis.

12/131

Page 32: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Acquisition and Recording

Interesting data originate from various sources such asoperational databases, sensors, or the web.

Possible storage forms for (big) data with support for dataanalysis are:

� Data Warehouse: A clean and integrated databaseproviding data of interest in a format fit for analysis

� Data Lake: Store the raw data as-is, possibly withadditional metadata to help retrieve datasets. Data aretransformed when needed for the analysis.

12/131

Page 33: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Data Warehouse is Dead!

“Barely a company uses a data warehouse anymore becauseit’s too cumbersome to build it.

(But the multidimensional modelremains relevant.)”

— An actual colleague from a consulting firm

“You’re using a data warehouse to analyze sensor data?

Really? But everyone uses data stream processing for that.”

— An actual attendant of EDOC 2016

13/131

Page 34: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Data Warehouse is Dead!

“Barely a company uses a data warehouse anymore becauseit’s too cumbersome to build it.

(But the multidimensional modelremains relevant.)”

— An actual colleague from a consulting firm

“You’re using a data warehouse to analyze sensor data?

Really? But everyone uses data stream processing for that.”

— An actual attendant of EDOC 2016

13/131

Page 35: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Data Warehouse is Dead!

“Barely a company uses a data warehouse anymore becauseit’s too cumbersome to build it. (But the multidimensional modelremains relevant.)”

— An actual colleague from a consulting firm

“You’re using a data warehouse to analyze sensor data?

Really? But everyone uses data stream processing for that.”

— An actual attendant of EDOC 2016

13/131

Page 36: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Data Warehouse is Dead!

“Barely a company uses a data warehouse anymore becauseit’s too cumbersome to build it. (But the multidimensional modelremains relevant.)”

— An actual colleague from a consulting firm

“You’re using a data warehouse to analyze sensor data?

Really? But everyone uses data stream processing for that.”

— An actual attendant of EDOC 2016

13/131

Page 37: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Data Warehouse is Dead!

“Barely a company uses a data warehouse anymore becauseit’s too cumbersome to build it. (But the multidimensional modelremains relevant.)”

— An actual colleague from a consulting firm

“You’re using a data warehouse to analyze sensor data?

Really? But everyone uses data stream processing for that.”

— An actual attendant of EDOC 2016

13/131

Page 38: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Data Warehouse is Dead!

“Barely a company uses a data warehouse anymore becauseit’s too cumbersome to build it. (But the multidimensional modelremains relevant.)”

— An actual colleague from a consulting firm

“You’re using a data warehouse to analyze sensor data?Really?

But everyone uses data stream processing for that.”

— An actual attendant of EDOC 2016

13/131

Page 39: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Data Warehouse is Dead!

“Barely a company uses a data warehouse anymore becauseit’s too cumbersome to build it. (But the multidimensional modelremains relevant.)”

— An actual colleague from a consulting firm

“You’re using a data warehouse to analyze sensor data?Really? But everyone uses data stream processing for that.”

— An actual attendant of EDOC 2016

13/131

Page 40: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Data Warehouse is Dead!

“Barely a company uses a data warehouse anymore becauseit’s too cumbersome to build it. (But the multidimensional modelremains relevant.)”

— An actual colleague from a consulting firm

“You’re using a data warehouse to analyze sensor data?Really? But everyone uses data stream processing for that.”

— An actual attendant of EDOC 2016

13/131

Page 41: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“What is a data warehouse? Our view is that the data is thewarehouse, and our data just happens to be managed with arelational database today. Our data could be managed on anon-relational platform, and it would still be a warehouse. (. . . )The idea that Hadoop would replace a warehouse is misguidedbecause the data and its platform are two non-equivalent layersof the data warehouse architecture. It’s more to the point toconjecture that Hadoop might replace an equivalent dataplatform, such as a relational database management system.”[26, p. 15]

⇒ Be flexible regarding implementation technology!

14/131

Page 42: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“What is a data warehouse?

Our view is that the data is thewarehouse, and our data just happens to be managed with arelational database today. Our data could be managed on anon-relational platform, and it would still be a warehouse. (. . . )The idea that Hadoop would replace a warehouse is misguidedbecause the data and its platform are two non-equivalent layersof the data warehouse architecture. It’s more to the point toconjecture that Hadoop might replace an equivalent dataplatform, such as a relational database management system.”[26, p. 15]

⇒ Be flexible regarding implementation technology!

14/131

Page 43: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“What is a data warehouse? Our view is that the data is thewarehouse, and our data just happens to be managed with arelational database today.

Our data could be managed on anon-relational platform, and it would still be a warehouse. (. . . )The idea that Hadoop would replace a warehouse is misguidedbecause the data and its platform are two non-equivalent layersof the data warehouse architecture. It’s more to the point toconjecture that Hadoop might replace an equivalent dataplatform, such as a relational database management system.”[26, p. 15]

⇒ Be flexible regarding implementation technology!

14/131

Page 44: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“What is a data warehouse? Our view is that the data is thewarehouse, and our data just happens to be managed with arelational database today. Our data could be managed on anon-relational platform, and it would still be a warehouse.

(. . . )The idea that Hadoop would replace a warehouse is misguidedbecause the data and its platform are two non-equivalent layersof the data warehouse architecture. It’s more to the point toconjecture that Hadoop might replace an equivalent dataplatform, such as a relational database management system.”[26, p. 15]

⇒ Be flexible regarding implementation technology!

14/131

Page 45: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“What is a data warehouse? Our view is that the data is thewarehouse, and our data just happens to be managed with arelational database today. Our data could be managed on anon-relational platform, and it would still be a warehouse. (. . . )The idea that Hadoop would replace a warehouse is misguidedbecause the data and its platform are two non-equivalent layersof the data warehouse architecture.

It’s more to the point toconjecture that Hadoop might replace an equivalent dataplatform, such as a relational database management system.”[26, p. 15]

⇒ Be flexible regarding implementation technology!

14/131

Page 46: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“What is a data warehouse? Our view is that the data is thewarehouse, and our data just happens to be managed with arelational database today. Our data could be managed on anon-relational platform, and it would still be a warehouse. (. . . )The idea that Hadoop would replace a warehouse is misguidedbecause the data and its platform are two non-equivalent layersof the data warehouse architecture. It’s more to the point toconjecture that Hadoop might replace an equivalent dataplatform, such as a relational database management system.”[26, p. 15]

⇒ Be flexible regarding implementation technology!

14/131

Page 47: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“What is a data warehouse? Our view is that the data is thewarehouse, and our data just happens to be managed with arelational database today. Our data could be managed on anon-relational platform, and it would still be a warehouse. (. . . )The idea that Hadoop would replace a warehouse is misguidedbecause the data and its platform are two non-equivalent layersof the data warehouse architecture. It’s more to the point toconjecture that Hadoop might replace an equivalent dataplatform, such as a relational database management system.”[26, p. 15]

⇒ Be flexible regarding implementation technology!

14/131

Page 48: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“There have been numerous times when vendors proclaim thatdata warehousing is no longer needed.

(. . . )There is no “silverbullet” that helps an enterprise avoid the hard work of dataintegration. Information that is clean, comprehensive,consistent, conformed, and current is not a happenstance; itrequires thought and work.” [29, p. 12]

The data warehouse remains a relevant concept also in the ageof big data, storing clean data in a format and granularitysuitable for analysis.

⇒ Conceptual modeling to the rescue!

15/131

Page 49: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“There have been numerous times when vendors proclaim thatdata warehousing is no longer needed. (. . . )There is no “silverbullet” that helps an enterprise avoid the hard work of dataintegration.

Information that is clean, comprehensive,consistent, conformed, and current is not a happenstance; itrequires thought and work.” [29, p. 12]

The data warehouse remains a relevant concept also in the ageof big data, storing clean data in a format and granularitysuitable for analysis.

⇒ Conceptual modeling to the rescue!

15/131

Page 50: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“There have been numerous times when vendors proclaim thatdata warehousing is no longer needed. (. . . )There is no “silverbullet” that helps an enterprise avoid the hard work of dataintegration. Information that is clean, comprehensive,consistent, conformed, and current is not a happenstance; itrequires thought and work.” [29, p. 12]

The data warehouse remains a relevant concept also in the ageof big data, storing clean data in a format and granularitysuitable for analysis.

⇒ Conceptual modeling to the rescue!

15/131

Page 51: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“There have been numerous times when vendors proclaim thatdata warehousing is no longer needed. (. . . )There is no “silverbullet” that helps an enterprise avoid the hard work of dataintegration. Information that is clean, comprehensive,consistent, conformed, and current is not a happenstance; itrequires thought and work.” [29, p. 12]

The data warehouse remains a relevant concept also in the ageof big data, storing clean data in a format and granularitysuitable for analysis.

⇒ Conceptual modeling to the rescue!

15/131

Page 52: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Is The Data Warehouse Dead?

“There have been numerous times when vendors proclaim thatdata warehousing is no longer needed. (. . . )There is no “silverbullet” that helps an enterprise avoid the hard work of dataintegration. Information that is clean, comprehensive,consistent, conformed, and current is not a happenstance; itrequires thought and work.” [29, p. 12]

The data warehouse remains a relevant concept also in the ageof big data, storing clean data in a format and granularitysuitable for analysis.

⇒ Conceptual modeling to the rescue!

15/131

Page 53: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Lake

A data lake serves to store raw data for later analysis

once theanalysts have figured out what to do with the data

A data lake may complement a traditional data warehouse,especially in the presence of high velocity data streams

16/131

Page 54: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Lake

A data lake serves to store raw data for later analysis once theanalysts have figured out what to do with the data

A data lake may complement a traditional data warehouse,especially in the presence of high velocity data streams

16/131

Page 55: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Lake

A data lake serves to store raw data for later analysis once theanalysts have figured out what to do with the data

A data lake may complement a traditional data warehouse,especially in the presence of high velocity data streams

16/131

Page 56: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Sensor Data Warehousing (Dobson et al. [5])

Real-Time

Analysis

Business

IntelligenceData Lake

Event

Processing

Data

Warehouse

Stream

Processing

17/131

Page 57: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Conceptual Model: Sensor Measurements

Agent

+ receptionTimestamp

+ sensingTimestamp

+ value

+ accuracy

Measurement*

1

+ name

+ unit

MeasurementType

1

*

+ name

Transformation 0..1 *

+ latitude

+ longitude

Location0..1*

Figure: A domain model for sensor measurements [5]

18/131

Page 58: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Conceptual Model: Agent Types

Agent

+ name

+ age

+ position

Person

+ id

Process

+ name

ProcessType1 *

+ assignmentTimestamp

AssignedDevice

+ id

+ nominalAccuracy

PhysicalDevice

+ id

LogicalDevice

Stationary MobileLocation

1 1* *

1 *

0..1

*

Figure: A domain model for sensor agents [5]

19/131

Page 59: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Dimensional Fact Model: Measurements

Figure: Multidimensional model for measurements [5]

Page 60: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Example Sensor Readings

S. Time Meas. Type Agent Transform. Acc. Value

2018/10/02 14:00 3 1 AVG10 0.1 22.2

2018/10/02 14:10 3 1 AVG10 0.1 22.4

2018/10/02 14:05 2 2 AVG5 0.1 61.3

2018/10/02 14:15 2 2 AVG5 0.2 60.9

Meas. Type ID Meas. Type Unit

1 Milk yield kg

2 Rumination activity Chews/Cud

3 Temperature °C

2018/10/03 10:20 2 3 62

Agent ID

1

2

3

Agent

THE01

EAR23

VET01

Agent Type

Device

Device

Person

Phys. Dev. Log. Dev. Loc. Dev. Type

THE01232

EAR03143

Temp. Feed

Area #1

Feed

Area #1Thermo.

Earmark

Page 61: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

In order to allow for comparison of results, a sharedconceptualization is vital.

Example: Activity Tracking in AgriProKnowSensors track movement activity of animals within a farm.

Inorder to allow for a comparison of movement activity data be-tween farms, rather than the precise location, it is more impor-tant to capture the function area, e.g., feeding area, restingarea, milking area

.

But first, common function areas acrossfarms must be identified, and then captured during data ac-quisition and recording.

22/131

Page 62: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

In order to allow for comparison of results, a sharedconceptualization is vital.

Example: Activity Tracking in AgriProKnowSensors track movement activity of animals within a farm.

Inorder to allow for a comparison of movement activity data be-tween farms, rather than the precise location, it is more impor-tant to capture the function area, e.g., feeding area, restingarea, milking area

.

But first, common function areas acrossfarms must be identified, and then captured during data ac-quisition and recording.

22/131

Page 63: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

In order to allow for comparison of results, a sharedconceptualization is vital.

Example: Activity Tracking in AgriProKnowSensors track movement activity of animals within a farm. Inorder to allow for a comparison of movement activity data be-tween farms, rather than the precise location, it is more impor-tant to capture the function area

, e.g., feeding area, restingarea, milking area

.

But first, common function areas acrossfarms must be identified, and then captured during data ac-quisition and recording.

22/131

Page 64: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

In order to allow for comparison of results, a sharedconceptualization is vital.

Example: Activity Tracking in AgriProKnowSensors track movement activity of animals within a farm. Inorder to allow for a comparison of movement activity data be-tween farms, rather than the precise location, it is more impor-tant to capture the function area, e.g., feeding area, restingarea, milking area.

But first, common function areas acrossfarms must be identified, and then captured during data ac-quisition and recording.

22/131

Page 65: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

In order to allow for comparison of results, a sharedconceptualization is vital.

Example: Activity Tracking in AgriProKnowSensors track movement activity of animals within a farm. Inorder to allow for a comparison of movement activity data be-tween farms, rather than the precise location, it is more impor-tant to capture the function area, e.g., feeding area, restingarea, milking area. But first, common function areas acrossfarms must be identified, and then captured during data ac-quisition and recording.

22/131

Page 66: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.

Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnow

Sensors may track position and activity every two seconds.30× 60× 24 = 43, 200 readings per day and animalLarge farm: 1000 animals⇒ 15, 768, 000, 000 readings per year for one farmThe ultimate vision of AgriProKnow is to collect data fromthousands of farms for inter-farm data analysis.But: All those readings are often not needed. More abstractlevel is more interesting.

23/131

Page 67: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnow

Sensors may track position and activity every two seconds.30× 60× 24 = 43, 200 readings per day and animalLarge farm: 1000 animals⇒ 15, 768, 000, 000 readings per year for one farmThe ultimate vision of AgriProKnow is to collect data fromthousands of farms for inter-farm data analysis.But: All those readings are often not needed. More abstractlevel is more interesting.

23/131

Page 68: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnow

Sensors may track position and activity every two seconds.30× 60× 24 = 43, 200 readings per day and animalLarge farm: 1000 animals⇒ 15, 768, 000, 000 readings per year for one farmThe ultimate vision of AgriProKnow is to collect data fromthousands of farms for inter-farm data analysis.But: All those readings are often not needed. More abstractlevel is more interesting.

23/131

Page 69: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowSensors may track position and activity every two seconds.

30× 60× 24 = 43, 200 readings per day and animalLarge farm: 1000 animals⇒ 15, 768, 000, 000 readings per year for one farmThe ultimate vision of AgriProKnow is to collect data fromthousands of farms for inter-farm data analysis.But: All those readings are often not needed. More abstractlevel is more interesting.

23/131

Page 70: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowSensors may track position and activity every two seconds.30× 60× 24 = 43, 200 readings per day and animal

Large farm: 1000 animals⇒ 15, 768, 000, 000 readings per year for one farmThe ultimate vision of AgriProKnow is to collect data fromthousands of farms for inter-farm data analysis.But: All those readings are often not needed. More abstractlevel is more interesting.

23/131

Page 71: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowSensors may track position and activity every two seconds.30× 60× 24 = 43, 200 readings per day and animalLarge farm: 1000 animals

⇒ 15, 768, 000, 000 readings per year for one farmThe ultimate vision of AgriProKnow is to collect data fromthousands of farms for inter-farm data analysis.But: All those readings are often not needed. More abstractlevel is more interesting.

23/131

Page 72: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowSensors may track position and activity every two seconds.30× 60× 24 = 43, 200 readings per day and animalLarge farm: 1000 animals⇒ 15, 768, 000, 000 readings per year for one farm

The ultimate vision of AgriProKnow is to collect data fromthousands of farms for inter-farm data analysis.But: All those readings are often not needed. More abstractlevel is more interesting.

23/131

Page 73: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowSensors may track position and activity every two seconds.30× 60× 24 = 43, 200 readings per day and animalLarge farm: 1000 animals⇒ 15, 768, 000, 000 readings per year for one farmThe ultimate vision of AgriProKnow is to collect data fromthousands of farms for inter-farm data analysis.

But: All those readings are often not needed. More abstractlevel is more interesting.

23/131

Page 74: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowSensors may track position and activity every two seconds.30× 60× 24 = 43, 200 readings per day and animalLarge farm: 1000 animals⇒ 15, 768, 000, 000 readings per year for one farmThe ultimate vision of AgriProKnow is to collect data fromthousands of farms for inter-farm data analysis.But: All those readings are often not needed. More abstractlevel is more interesting.

23/131

Page 75: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowRather than storing thousands of location points and move-ment patterns for each animal, a higher level of abstraction ismore useful.

For each animal, the walking distance and du-ration as well as the lying and standing duration within eachhour of the day is more important.⇒ shared conceptualization of activity types

, which shouldbe known upon recording to be able to reduce data early on

24/131

Page 76: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowRather than storing thousands of location points and move-ment patterns for each animal, a higher level of abstraction ismore useful.

For each animal, the walking distance and du-ration as well as the lying and standing duration within eachhour of the day is more important.⇒ shared conceptualization of activity types

, which shouldbe known upon recording to be able to reduce data early on

24/131

Page 77: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowRather than storing thousands of location points and move-ment patterns for each animal, a higher level of abstraction ismore useful. For each animal, the walking distance and du-ration as well as the lying and standing duration within eachhour of the day is more important.

⇒ shared conceptualization of activity types

, which shouldbe known upon recording to be able to reduce data early on

24/131

Page 78: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowRather than storing thousands of location points and move-ment patterns for each animal, a higher level of abstraction ismore useful. For each animal, the walking distance and du-ration as well as the lying and standing duration within eachhour of the day is more important.⇒ shared conceptualization of activity types

, which shouldbe known upon recording to be able to reduce data early on

24/131

Page 79: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

The Need for Shared Conceptualization

Furthermore, a shared conceptualization may help to reducethe load of data early on during data acquisition and recording.Later computations might be expensive or even unfeasible.

Example: Activity Tracking in AgriProKnowRather than storing thousands of location points and move-ment patterns for each animal, a higher level of abstraction ismore useful. For each animal, the walking distance and du-ration as well as the lying and standing duration within eachhour of the day is more important.⇒ shared conceptualization of activity types, which shouldbe known upon recording to be able to reduce data early on

24/131

Page 80: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Dimensional Fact Model: Movement Activity

For example, animal AT23464 on the Kremesberg farm sitemay have spent 0 minutes lying, 10 minutes standing, and 5minutes walking in a feeding area on the 10 October 2018 inthe 13th hour of the day.

Page 81: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Dimensional Fact Model: Movement Activity

For example, animal AT23464 on the Kremesberg farm sitemay have spent 0 minutes lying, 10 minutes standing, and 5minutes walking in a feeding area on the 10 October 2018 inthe 13th hour of the day.

Page 82: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Lake

A data lake serves to store raw data for later analysis

once theanalysts have figured out what to do with the data.

Of course, the data sets need to be organized such that theanalysts can find them.

⇒ Structured data lake approach

26/131

Page 83: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Lake

A data lake serves to store raw data for later analysis once theanalysts have figured out what to do with the data.

Of course, the data sets need to be organized such that theanalysts can find them.

⇒ Structured data lake approach

26/131

Page 84: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Lake

A data lake serves to store raw data for later analysis once theanalysts have figured out what to do with the data.

Of course, the data sets need to be organized such that theanalysts can find them.

⇒ Structured data lake approach

26/131

Page 85: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Lake

A data lake serves to store raw data for later analysis once theanalysts have figured out what to do with the data.

Of course, the data sets need to be organized such that theanalysts can find them.

⇒ Structured data lake approach

26/131

Page 86: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Semantic Data Containers [28]

The semantic container approach allows to organize data setsalong spatial, temporal, and other semantic dimensions

(orfacets).

The dimensions/facets consist of concepts, which arehierarchically organized.

Example: ATM Information CubesModern air traffic management (ATM) heavily relies on timelyexchange of accurate information. ATM stakeholders requireinformation at various granularities and levels of details. ATMinformation cubes are structured repositories of ATM mes-sages, where each cell is a semantic container.

27/131

Page 87: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Semantic Data Containers [28]

The semantic container approach allows to organize data setsalong spatial, temporal, and other semantic dimensions (orfacets).

The dimensions/facets consist of concepts, which arehierarchically organized.

Example: ATM Information CubesModern air traffic management (ATM) heavily relies on timelyexchange of accurate information. ATM stakeholders requireinformation at various granularities and levels of details. ATMinformation cubes are structured repositories of ATM mes-sages, where each cell is a semantic container.

27/131

Page 88: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Semantic Data Containers [28]

The semantic container approach allows to organize data setsalong spatial, temporal, and other semantic dimensions (orfacets).

The dimensions/facets consist of concepts, which arehierarchically organized.

Example: ATM Information CubesModern air traffic management (ATM) heavily relies on timelyexchange of accurate information. ATM stakeholders requireinformation at various granularities and levels of details. ATMinformation cubes are structured repositories of ATM mes-sages, where each cell is a semantic container.

27/131

Page 89: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Semantic Data Containers [28]

The semantic container approach allows to organize data setsalong spatial, temporal, and other semantic dimensions (orfacets).

The dimensions/facets consist of concepts, which arehierarchically organized.

Example: ATM Information CubesModern air traffic management (ATM) heavily relies on timelyexchange of accurate information. ATM stakeholders requireinformation at various granularities and levels of details. ATMinformation cubes are structured repositories of ATM mes-sages, where each cell is a semantic container.

27/131

Page 90: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ATM Information Cube: Operations

Operational

Restriction

ED

UU

-01

Flight Critical

ED

UU

-02

Essential

Briefing Package

ED

UU

� Merge: Change granularity of the cube by merging thecontents of the cells.

� Abstract: Replace entities inside a cell with more abstractentities.

Page 91: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ATM Information Cube: Example

Operational

Restriction

TS

-LO

WW

-01

Flight

Critical

TS

-LO

WW

-02

TS

-LZ

IB-0

1T

S-L

ZIB

-02

Potential

Hazard

Additional

Information

Page 92: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ATM Information Cube: Merge

Essential

Briefing Package

LO

VV

LZ

BB

Supplementary

Briefing Package

Page 93: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

1

Page 94: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

2

Page 95: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

3

Page 96: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

4

Page 97: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

EXTRACTION, CLEANING,AND ANNOTATION

Page 98: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Process Modeling and ETL

Extract, transform, and load (ETL) processes feed the datafrom the sources into the data warehouse.

Traditionally, the implementation of ETL processes involves alot of low-level programming.

Process modeling approaches with support for code generationmay facilitate the implementation of ETL processes and alsoserve as documentation.

Besides proprietary modeling languages, the Business ProcessModel and Notation (BPMN) or UML activity diagrams mayserve for ETL process modeling.

31/131

Page 99: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

BPMN Models of ETL Processes(El Akkaoui et al. [7, 6])

Two perspectives on ETL processes:

� Control process (process orchestration): Handlebranching and synchronizing of the data flow

� Data process: Specify precisely how the input datatransforms into output data

32/131

Page 100: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Control Process: Example

Before animal movement data can be loaded into theAgriProKnow data warehouse, the animal dimension and thefunction areas at specific farms must be loaded.

Agr

iPro

Kn

ow

DW

H

FarmFunctionArea Load

AnimalDimLoad

AnimalMovement Load

Page 101: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Process: Animal Movement

Input Data

Lookup

Insert Data

File: EAR34-Movement.csvType: CSV

Insert Data

Retrieve: FunctionAreaTypeDatabase: AgriProKnowDWHTable: FarmFunctionAreaWhere: AreaContains: CoordinatesFile: BadCoordinates.txt

Type: Text

NotFound

Column: CoordinatesExpression: SDO_POINT_TYPE(Lat, Long, NULL)

Add Column

Column: Timestamp: Date

Convert Column

Group By: EXTRACT(Hour FROM Date), EXTRACT(Year FROM Date),EXTRACT(Month FROM Date),EXTRACT(Day FROM Date), NationalID, FunctionAreaTypeColumns: Duration = COUNT(*)

AggregateDatabase: AgriProKnowDWH

Table: Movement

Found

Page 102: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Process: Animal Movement

Input Data

Lookup

Insert Data

File: EAR34-Movement.csvType: CSV

Insert Data

NationalID,Lat,Long,TimestampAT-12,5,10,1537348997000AT-12,6,10,1537348998000AT-23,7,15,1537348997000AT-23,7,15,1537348998000

Retrieve: FunctionAreaTypeDatabase: AgriProKnowDWHTable: FarmFunctionAreaWhere: AreaContains: CoordinatesFile: BadCoordinates.txt

Type: Text

NotFound

Column: CoordinatesExpression: SDO_POINT_TYPE(Lat, Long, NULL)

Add Column

Column: Timestamp: Date

Convert Column

Group By: EXTRACT(Hour FROM Date), EXTRACT(Year FROM Date),EXTRACT(Month FROM Date),EXTRACT(Day FROM Date), NationalID, FunctionAreaTypeColumns: Duration = COUNT(*)

AggregateDatabase: AgriProKnowDWH

Table: Movement

Found

Page 103: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Process: Animal Movement

Input Data

Lookup

Insert Data

File: EAR34-Movement.csvType: CSV

Insert Data

NationalID Lat Long TimestampAT-12 5 10 1537348997000AT-12 6 10 1537348998000AT-23 7 15 1537348997000AT-23 7 15 1537348998000

Retrieve: FunctionAreaTypeDatabase: AgriProKnowDWHTable: FarmFunctionAreaWhere: AreaContains: CoordinatesFile: BadCoordinates.txt

Type: Text

NotFound

Column: CoordinatesExpression: SDO_POINT_TYPE(Lat, Long, NULL)

Add Column

Column: Timestamp: Date

Convert Column

Group By: EXTRACT(Hour FROM Date), EXTRACT(Year FROM Date),EXTRACT(Month FROM Date),EXTRACT(Day FROM Date), NationalID, FunctionAreaTypeColumns: Duration = COUNT(*)

AggregateDatabase: AgriProKnowDWH

Table: Movement

Found

Page 104: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Process: Animal Movement

Input Data

Lookup

Insert Data

File: EAR34-Movement.csvType: CSV

Insert Data

NationalID Lat Long TimestampAT-12 5 10 Sep 19, 2018 09:23:17AT-12 6 10 Sep 19, 2018 09:23:18AT-23 7 15 Sep 19, 2018 09:23:17AT-23 7 15 Sep 19, 2018 09:23:18

Retrieve: FunctionAreaTypeDatabase: AgriProKnowDWHTable: FarmFunctionAreaWhere: AreaContains: CoordinatesFile: BadCoordinates.txt

Type: Text

NotFound

Column: CoordinatesExpression: SDO_POINT_TYPE(Lat, Long, NULL)

Add Column

Column: Timestamp: Date

Convert Column

Group By: EXTRACT(Hour FROM Date), EXTRACT(Year FROM Date),EXTRACT(Month FROM Date),EXTRACT(Day FROM Date), NationalID, FunctionAreaTypeColumns: Duration = COUNT(*)

AggregateDatabase: AgriProKnowDWH

Table: Movement

Found

Page 105: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Process: Animal Movement

Input Data

Lookup

Insert Data

File: EAR34-Movement.csvType: CSV

Insert Data

NationalID Coordinates TimestampAT-12 (5, 10) Sep 19, 2018 09:23:17AT-12 (6, 10) Sep 19, 2018 09:23:18AT-23 (7, 15) Sep 19, 2018 09:23:17AT-23 (7, 15) Sep 19, 2018 09:23:18

Retrieve: FunctionAreaTypeDatabase: AgriProKnowDWHTable: FarmFunctionAreaWhere: AreaContains: CoordinatesFile: BadCoordinates.txt

Type: Text

NotFound

Column: CoordinatesExpression: SDO_POINT_TYPE(Lat, Long, NULL)

Add Column

Column: Timestamp: Date

Convert Column

Group By: EXTRACT(Hour FROM Date), EXTRACT(Year FROM Date),EXTRACT(Month FROM Date),EXTRACT(Day FROM Date), NationalID, FunctionAreaTypeColumns: Duration = COUNT(*)

AggregateDatabase: AgriProKnowDWH

Table: Movement

Found

Page 106: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Process: Animal Movement

Input Data

Lookup

Insert Data

File: EAR34-Movement.csvType: CSV

Insert Data

NationalID Coordinates TimestampAT-12 (5, 10) Sep 19, 2018 09:23:17AT-12 (6, 10) Sep 19, 2018 09:23:18AT-23 (7, 15) Sep 19, 2018 09:23:17AT-23 (7, 15) Sep 19, 2018 09:23:18

FunctionArea Area FunctionAreaType1stFarmFeeding [(0,0);(6,12)] Feeding1stFarmResting [(5,14);(10,20)] Resting Retrieve: FunctionAreaType

Database: AgriProKnowDWHTable: FarmFunctionAreaWhere: AreaContains: CoordinatesFile: BadCoordinates.txt

Type: Text

NotFound

Column: CoordinatesExpression: SDO_POINT_TYPE(Lat, Long, NULL)

Add Column

Column: Timestamp: Date

Convert Column

Group By: EXTRACT(Hour FROM Date), EXTRACT(Year FROM Date),EXTRACT(Month FROM Date),EXTRACT(Day FROM Date), NationalID, FunctionAreaTypeColumns: Duration = COUNT(*)

AggregateDatabase: AgriProKnowDWH

Table: Movement

Found

Page 107: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Process: Animal Movement

Input Data

Lookup

Insert Data

File: EAR34-Movement.csvType: CSV

Insert Data

NationalID FAType ... TimestampAT-12 Feeding Sep 19, 2018 09:23:17AT-12 Feeding Sep 19, 2018 09:23:18AT-23 Resting Sep 19, 2018 09:23:17AT-23 Resting Sep 19, 2018 09:23:18

Retrieve: FunctionAreaTypeDatabase: AgriProKnowDWHTable: FarmFunctionAreaWhere: AreaContains: CoordinatesFile: BadCoordinates.txt

Type: Text

NotFound

Column: CoordinatesExpression: SDO_POINT_TYPE(Lat, Long, NULL)

Add Column

Column: Timestamp: Date

Convert Column

Group By: EXTRACT(Hour FROM Date), EXTRACT(Year FROM Date),EXTRACT(Month FROM Date),EXTRACT(Day FROM Date), NationalID, FunctionAreaTypeColumns: Duration = COUNT(*)

AggregateDatabase: AgriProKnowDWH

Table: Movement

Found

Page 108: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Process: Animal Movement

Input Data

Lookup

Insert Data

File: EAR34-Movement.csvType: CSV

Insert Data

NatlID FAType Hour Year Month Day DurAT-12 Feeding 10 2018 9 19 2AT-23 Resting 10 2018 9 19 2

Retrieve: FunctionAreaTypeDatabase: AgriProKnowDWHTable: FarmFunctionAreaWhere: AreaContains: CoordinatesFile: BadCoordinates.txt

Type: Text

NotFound

Column: CoordinatesExpression: SDO_POINT_TYPE(Lat, Long, NULL)

Add Column

Column: Timestamp: Date

Convert Column

Group By: EXTRACT(Hour FROM Date), EXTRACT(Year FROM Date),EXTRACT(Month FROM Date),EXTRACT(Day FROM Date), NationalID, FunctionAreaTypeColumns: Duration = COUNT(*)

AggregateDatabase: AgriProKnowDWH

Table: Movement

Found

Page 109: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ETL Patterns (Oliveira et al. [23, 24])

Identify conceptual models for a set of standard ETL processessuch as change data capture, slowly changing dimensions, andsurrogate key pipelining [23].

The goal is to foster code reusability.

Oliveira et al. [24] also extend the BPMN metamodel withconcepts specific to ETL processes.

42/131

Page 110: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Mining ETL Patterns(Theodorou et al. [30])

ETL process modeling also faciliates a comprehensive analysisof ETL processes based on the mining for ETL patterns, usingthe Workflow Patterns Initiative as guide.

What are the use cases for such mined ETL patterns?

� Identify recurring patterns in existing ETL processes tosubsequently redesign the ETL processes

� Apply quality metrics on ETL process models at higherlevel of abstraction.

� Show a higher level summary of the ETL process to fosterunderstanding.

43/131

Page 111: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Mining ETL Patterns(Theodorou et al. [30])

ETL process modeling also faciliates a comprehensive analysisof ETL processes based on the mining for ETL patterns, usingthe Workflow Patterns Initiative as guide.

What are the use cases for such mined ETL patterns?

� Identify recurring patterns in existing ETL processes tosubsequently redesign the ETL processes

� Apply quality metrics on ETL process models at higherlevel of abstraction.

� Show a higher level summary of the ETL process to fosterunderstanding.

43/131

Page 112: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Mining ETL Patterns(Theodorou et al. [30])

ETL process modeling also faciliates a comprehensive analysisof ETL processes based on the mining for ETL patterns, usingthe Workflow Patterns Initiative as guide.

What are the use cases for such mined ETL patterns?

� Identify recurring patterns in existing ETL processes tosubsequently redesign the ETL processes

� Apply quality metrics on ETL process models at higherlevel of abstraction.

� Show a higher level summary of the ETL process to fosterunderstanding.

43/131

Page 113: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Mining ETL Patterns(Theodorou et al. [30])

ETL process modeling also faciliates a comprehensive analysisof ETL processes based on the mining for ETL patterns, usingthe Workflow Patterns Initiative as guide.

What are the use cases for such mined ETL patterns?

� Identify recurring patterns in existing ETL processes tosubsequently redesign the ETL processes

� Apply quality metrics on ETL process models at higherlevel of abstraction.

� Show a higher level summary of the ETL process to fosterunderstanding.

43/131

Page 114: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Mining ETL Patterns(Theodorou et al. [30])

ETL process modeling also faciliates a comprehensive analysisof ETL processes based on the mining for ETL patterns, usingthe Workflow Patterns Initiative as guide.

What are the use cases for such mined ETL patterns?

� Identify recurring patterns in existing ETL processes tosubsequently redesign the ETL processes

� Apply quality metrics on ETL process models at higherlevel of abstraction.

� Show a higher level summary of the ETL process to fosterunderstanding.

43/131

Page 115: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Mining ETL Patterns(Theodorou et al. [30])

ETL process modeling also faciliates a comprehensive analysisof ETL processes based on the mining for ETL patterns, usingthe Workflow Patterns Initiative as guide.

What are the use cases for such mined ETL patterns?

� Identify recurring patterns in existing ETL processes tosubsequently redesign the ETL processes

� Apply quality metrics on ETL process models at higherlevel of abstraction.

� Show a higher level summary of the ETL process to fosterunderstanding.

43/131

Page 116: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ETL Processes for Big Data (Bala et al. [4])

Massive distribution and parallelization is key to handling bigdata processing.

Employ distribution and parallelization for ETL processes withbig data!

� Describe ETL process in terms of core functionalities

� Distribute processing of these core functionalities tomultiple nodes

⇒ Conceptual modeling is key to effective optimization ofETL processes in the age of big data

44/131

Page 117: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ETL Processes for Big Data (Bala et al. [4])

Massive distribution and parallelization is key to handling bigdata processing.

Employ distribution and parallelization for ETL processes withbig data!

� Describe ETL process in terms of core functionalities

� Distribute processing of these core functionalities tomultiple nodes

⇒ Conceptual modeling is key to effective optimization ofETL processes in the age of big data

44/131

Page 118: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ETL Processes for Big Data (Bala et al. [4])

Massive distribution and parallelization is key to handling bigdata processing.

Employ distribution and parallelization for ETL processes withbig data!

� Describe ETL process in terms of core functionalities

� Distribute processing of these core functionalities tomultiple nodes

⇒ Conceptual modeling is key to effective optimization ofETL processes in the age of big data

44/131

Page 119: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ETL Processes for Big Data (Bala et al. [4])

Massive distribution and parallelization is key to handling bigdata processing.

Employ distribution and parallelization for ETL processes withbig data!

� Describe ETL process in terms of core functionalities

� Distribute processing of these core functionalities tomultiple nodes

⇒ Conceptual modeling is key to effective optimization ofETL processes in the age of big data

44/131

Page 120: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ETL Processes for Big Data (Bala et al. [4])

An ETL library contains a list of ETL functionalities, which canbe used to design ETL processes.

LookUp

Source

Target

Lookup Table

Errors

Outputs

Which data will be stored?

Cache mode

Figure: Example description of ETL functionality in the ETL library [4]

45/131

Page 121: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Modeling Transformations for Data Mining(Ordonez et al. [25])

Data mining algorithms require the source data in a veryspecific format.

The source data, however, are often scattered across multipledatasets/relations (even in a data warehouse).

Transformations include denormalizations and aggregations,where denormalization is a rather broad term that also includesapplying complex expressions on attributes.

Modeling transformations as separate entities along with anSQL query allows to track lineage of data.

Page 122: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Modeling Transformations for Data Mining(Ordonez et al. [25])

Data mining algorithms require the source data in a veryspecific format.

The source data, however, are often scattered across multipledatasets/relations (even in a data warehouse).

Transformations include denormalizations and aggregations,where denormalization is a rather broad term that also includesapplying complex expressions on attributes.

Modeling transformations as separate entities along with anSQL query allows to track lineage of data.

Page 123: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Modeling Transformations for Data Mining(Ordonez et al. [25])

Data mining algorithms require the source data in a veryspecific format.

The source data, however, are often scattered across multipledatasets/relations (even in a data warehouse).

Transformations include denormalizations and aggregations,where denormalization is a rather broad term that also includesapplying complex expressions on attributes.

Modeling transformations as separate entities along with anSQL query allows to track lineage of data.

Page 124: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Modeling Transformations for Data Mining(Ordonez et al. [25])

Data mining algorithms require the source data in a veryspecific format.

The source data, however, are often scattered across multipledatasets/relations (even in a data warehouse).

Transformations include denormalizations and aggregations,where denormalization is a rather broad term that also includesapplying complex expressions on attributes.

Modeling transformations as separate entities along with anSQL query allows to track lineage of data.

Page 125: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Example: Source Schema

Animal

MilkYield

Feeding

FeedMix

NationalIDPK

Name

Sex

Breed

AnimalNationalIDFKPK

DatePK

TimePK

MilkYield

AnimalNationalIDFKPK

DatePK

TimePK

Quantity

FeedMixIDFK

FeedMixIDPK

PercentRoughage

FeedMixType

PercentSilage

Page 126: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Example: Transformation for Data Mining

Can the feed intake serve as a predictor for milk yield on thenext day?

A data mining algorithm may answer that question.

But first, we need to obtain a data set that contains the animalsmilk yield on a particular date along with the feed intake fromthe day before.

Page 127: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Transformation Entity: Denormalization

Animal

MilkYield

Denormalzation: T1

NationalIDPK

Name

Sex

Breed

AnimalNationalIDFKPK

DatePK

TimePK

MilkYield

AnimalNationalIDFKPK

DateFKPK

TimeFKPK

MilkYield

AnimalBreed

SQL

Page 128: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Transformation Entity: Denormalization

Denormalization: T2 Feeding

FeedMix

AnimalNationalIDFKPK

QuantityRoughage

QuantitySilage

AnimalNationalIDFKPK

DatePK

TimePK

Quantity

FeedMixIDFK

FeedMixIDPK

PercentRoughage

FeedMixType

PercentSilage

DateFKPK

TimeFKPK

FeedMixIDFK

FeedMixType

SQL

Page 129: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Transformation Entity: Aggregation

Aggregation: T3 Aggregation: T4

AnimalNationalIDFKPK AnimalNationalIDFKPK

DatePK

QuantityRoughage

FeedMixIDFKPK

DatePK

SQL

MilkYield

AnimalBreedPK

QuantitySilage

FeedMixTypePK

SQL

Page 130: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Transformation Entity: Target

Denormalization: T5

AnimalNationalIDFKPK

MilkingDatePK

FeedingDatePK

MilkYield

FeedMixIDFKPK

AnimalBreedPK

QuantitySilage

QuantityRoughage

Page 131: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Superimposed Multidimensional Schemas

In some cases, it may be impractical to extract the data fromthe source systems.

⇒ Volume/Velocity/Volatility

Rather, a multidimensional schema with mapping rules may besuperimposed over the sources

Further ReadingM. Hilal, C. G. Schuetz, M. Schrefl. Using superimposed mul-tidimensional schemas and OLAP patterns for RDF data anal-ysis Open Computer Science, 8(1), 18-37, 2018.

53/131

Page 132: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Superimposed Multidimensional Schemas

In some cases, it may be impractical to extract the data fromthe source systems.

⇒ Volume/Velocity/Volatility

Rather, a multidimensional schema with mapping rules may besuperimposed over the sources

Further ReadingM. Hilal, C. G. Schuetz, M. Schrefl. Using superimposed mul-tidimensional schemas and OLAP patterns for RDF data anal-ysis Open Computer Science, 8(1), 18-37, 2018.

53/131

Page 133: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Superimposed Multidimensional Schemas

In some cases, it may be impractical to extract the data fromthe source systems.

⇒ Volume/Velocity/Volatility

Rather, a multidimensional schema with mapping rules may besuperimposed over the sources

Further ReadingM. Hilal, C. G. Schuetz, M. Schrefl. Using superimposed mul-tidimensional schemas and OLAP patterns for RDF data anal-ysis Open Computer Science, 8(1), 18-37, 2018.

53/131

Page 134: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Superimposed Multidimensional Schemas

In some cases, it may be impractical to extract the data fromthe source systems.

⇒ Volume/Velocity/Volatility

Rather, a multidimensional schema with mapping rules may besuperimposed over the sources

Further ReadingM. Hilal, C. G. Schuetz, M. Schrefl. Using superimposed mul-tidimensional schemas and OLAP patterns for RDF data anal-ysis Open Computer Science, 8(1), 18-37, 2018.

53/131

Page 135: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Example: Superimposed MultidimensionalSchemas for Linked Data Analysis [12]

Repositories of linked data such as Wikidata can be animportant resource for data analysis.

� RDF data do not follow a structure suitable for OLAP-styledata analysis

� These data are not under analyst’s control.

� Exploiting these data by casual analysis is not an easytask and requires knowledge of SPARQL

⇒ Superimposition of multidimensional schemas renders thesedata accessible for OLAP

54/131

Page 136: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Analytical SPARQL Query over Wikidata

Page 137: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Film Cube over Wikidata

Page 138: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

INTEGRATION,AGGREGATION, ANDREPRESENTATION

Page 139: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Integration

Most ETL processes integrate data from multiple sources

The presented techniques for conceptual ETL processmodeling account for that fact

With the emergence of the (semantic) web and social media,the data generated on web platforms has become a valuableresource for the analysis

57/131

Page 140: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Fusion Cubes (Abelló et al. [2])

The vision:

� Complement existing data cubes with fusion cubes thatinclude external data from RDF and linked data sources.

� Provide a drill-beyond operator to allow the user to definehow and where an existing cube should be extended withexternal data.

� Business intelligence should become truly self-service.

⇒ A uniform representation format might help

58/131

Page 141: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Fusion Cubes (Abelló et al. [2])

The vision:

� Complement existing data cubes with fusion cubes thatinclude external data from RDF and linked data sources.

� Provide a drill-beyond operator to allow the user to definehow and where an existing cube should be extended withexternal data.

� Business intelligence should become truly self-service.

⇒ A uniform representation format might help

58/131

Page 142: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

QB4OLAP: BI Vocabulary for Linked Data(Etcheverry et al. [8])

QB4OLAP extends the W3C’s Data Cube (QB) Vocabulary withconcepts required for OLAP, e.g., hierarchies.

⇒ Representation of statistical linked data.

In AgriProKnow, QB4OLAP serves for the semantic descriptionof the data warehouse schema, where elements can be linkedto domain ontologies and websites.

QB4OLAP may also serve as the vocabulary for superimposedmultidimensional schemas as well [12].

59/131

Page 143: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Social Business Intelligence

Combines data from companies (e.g., sales) with datagenerated by users on social media.

Often, social business intelligence involves sentiment analysisof user content based on natural language processing.

The results of such analysis may be stored in cubes for furtheranalysis [9].

Example query: What is the average sentiment towardssmartphones?

60/131

Page 144: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

ANALYSIS AND MODELING

Page 145: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Business Intelligence Model (BIM)(Horkoff et al. [13])

Representation of business strategies:

� Goals, which are selected from the Balanced Scorecarddimensions (financial, customer, processes, learning), atstrategic, tactial, or operational level.

� Situations represent internal and external factors thatinfluence goals positively or negatively.

� Processes aim to achieve the goals.

� Key Performance Indicators (→ later in this tutorial).

61/131

Page 146: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Example: BIM for AgriProKnow

Maximize milk

yielddesires

Milk yield

evaluates

Farmer

Prevent

animal illness

Optimize

feed intake

++

AND

Automatic

Feeding

Well-fed

animals

Strength

Body Condition Score

Antibiotics

resistance

Threat

# of known

resistant germs

P L

F

Page 147: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Business-Driven Data Analytics(Nalchigar and Yu [20])

Requirements analysis and design of data analytics systemshas multiple, complementary views.

� Business view: Starting from the business goals, the dataanalytics goals are defined.

� Data analytics design view: Explore different methods toachieve the data analytics goals by comparing theirstrengths and weaknesses.

� Data preparation view: Define what data sets and datapreparation steps are required to perform the chosenanalytics

→ similar to ETL models but at higher level

63/131

Page 148: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Business-Driven Data Analytics(Nalchigar and Yu [20])

Requirements analysis and design of data analytics systemshas multiple, complementary views.

� Business view: Starting from the business goals, the dataanalytics goals are defined.

� Data analytics design view: Explore different methods toachieve the data analytics goals by comparing theirstrengths and weaknesses.

� Data preparation view: Define what data sets and datapreparation steps are required to perform the chosenanalytics

→ similar to ETL models but at higher level

63/131

Page 149: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Business-Driven Data Analytics(Nalchigar and Yu [20])

Requirements analysis and design of data analytics systemshas multiple, complementary views.

� Business view: Starting from the business goals, the dataanalytics goals are defined.

� Data analytics design view: Explore different methods toachieve the data analytics goals by comparing theirstrengths and weaknesses.

� Data preparation view: Define what data sets and datapreparation steps are required to perform the chosenanalytics

→ similar to ETL models but at higher level

63/131

Page 150: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Business-Driven Data Analytics(Nalchigar and Yu [20])

Requirements analysis and design of data analytics systemshas multiple, complementary views.

� Business view: Starting from the business goals, the dataanalytics goals are defined.

� Data analytics design view: Explore different methods toachieve the data analytics goals by comparing theirstrengths and weaknesses.

� Data preparation view: Define what data sets and datapreparation steps are required to perform the chosenanalytics

→ similar to ETL models but at higher level

63/131

Page 151: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Business-Driven Data Analytics(Nalchigar and Yu [20])

Requirements analysis and design of data analytics systemshas multiple, complementary views.

� Business view: Starting from the business goals, the dataanalytics goals are defined.

� Data analytics design view: Explore different methods toachieve the data analytics goals by comparing theirstrengths and weaknesses.

� Data preparation view: Define what data sets and datapreparation steps are required to perform the chosenanalytics→ similar to ETL models but at higher level

63/131

Page 152: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Business View: Animal Illness

Maximize milk

yielddesires

Milk yield

evaluates

Farmer

Prevent

animal illness

Decision on

feed change

DDecision on

calling veterinarian

D

Which animals

are at risk?

QWhich feed mix

is best for animals?

Q

Optimize

feed intake

++

AND

+ type = Predictive

model

+ input = movement

and health data

+ output = Alert

+ usageFrequency =

daily

+ updateFrequency =

quarterly

+ learningPeriod =

12 months

Animals at risk

predictive model

answers

Page 153: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Analytics Design View: Animal Illness

Predict animal

illness

Recall

evaluatesClassification

of animals

+ type = Predictive

model

+ input = movement

and health data

+ output = Alert

+ usageFrequency =

daily

+ updateFrequency =

quarterly

+ learningPeriod =

12 months

Animals at risk

predictive model

generates

Precision evaluates

Logistic Regression

Deep Learning

0.85

0.55

achievesachieves

0.75

0.65

65/131

Page 154: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Data Analytics Design View: Animal Illness

Predict animal

illness

Recall

evaluatesClassification

of animals

+ type = Predictive

model

+ input = movement

and health data

+ output = Alert

+ usageFrequency =

daily

+ updateFrequency =

quarterly

+ learningPeriod =

12 months

Animals at risk

predictive model

generates

Precision evaluates

Logistic Regression

Deep Learning

0.85

0.55

achievesachieves

0.75

0.65

Tolerance to

missing values

++

-

66/131

Page 155: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Reference Modeling: BIRD Approach [27]

The idea stems from a small industry project we had a coupleof years ago.

Lightweight reference models for OLAP cubes, calculatedmeasures, and business terms should be customizable fordifferent small and medium-sized companies within an industryor large companies with multiple divisions.

Calculated measures and business terms are representedusing snippets of SQL code.

Further ReadingC. G. Schuetz, B. Neumayr, M. Schrefl, T. Neuböck. Refer-ence Modeling for Data Analysis: The BIRD Approach. Inter-national Journal of Cooperative Information Systems, 25(2),1-46, 2016.

Page 156: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Example: Reference Model

plannedQuantity

actualQuantity «mandatory»

plannedCosts

actualCosts «mandatory»

/plannedCostsPerUnit

/actualCostsPerUnit «mandatory»

/actualCostsYTD

/actualCostsToPreviousDay

MaterialUsedForProduct

costs «mandatory»

shippingCosts

/totalCosts

/totalCostsYTD

/totalCostsToPrevWeek

MaterialInSupplyOrder

«mandatory»

week

day

quarter

month

year

«mandatory»

«mandatory»

«mandatory»productOrder

productCategory

minLifeTime

minTemperature

maxTemperature

ColdResistantProduct

HeatResistantProduct

DurableProduct

«mandatory»

materialCategory

material

maxTemperature

minTemperature

ColdResistantMaterial

HeatResistantMaterial

customercustomer-

Region

industry

consumer-

Group

building

site

country

«mandatory» «mandatory» «mandatory»

«mandatory»«mandatory»

Product Time

Material

Factory

Customer

Page 157: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Example: Reference Model Customization

plannedQuantity

actualQuantity

plannedCosts

actualCosts

/plannedCostsPerUnit

/actualCostsPerUnit

/actualCostsYTD

/actualCostsToPreviousDay

MaterialUsedForProduct

+ orderedQuantity

+ deliveredQuantity

costs

shippingCosts

/totalCosts

+ /totalCostsPerUnit

/totalCostsYTD

/totalCostsToPrevWeek

MaterialInSupplyOrder

week

day

quarter

month

year

productOrder

productCategory

minLifeTime

minTemperature

maxTemperature

ColdResistantProduct

HeatResistantProduct

DurableProduct

materialCategory

material

maxTemperature

minTemperature

ColdResistantMaterial

HeatResistantMaterial

+ supplier-

Region

+ supplier

customercustomer-

Region

industry

consumer-

Group

building

site

country

Product Time

Material

+ Supplier

Factory

Customer

69/131

Page 158: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Tool Support [27]

Implementation using Indyco Builder as modeling tool, XML forthe specification of business terms and calculated measures aswell as customizations/redefinitions, and XQuery to apply thetransformations to the reference model.

70/131

Page 159: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Applicability of BIRD to AgriProKnowAgriProKnow aims to integrate data from multiple farms in orderto generate new process knowledge, e.g., early indicators of aswell as influence factors for animal illness.

The thus generated knowledge should be applied forrule-based process monitoring and control, e.g., by calling adoctor when danger of illness is detected.

Thus, once operational, the AgriProKnow data warehousecould consist of two parts:

� Inter-farm data warehouse: Integrating the data fromvarious sources in order to generate new knowledge.

� Farm-specific data warehouses: A data warehouse foreach farm built through customization of a referencemodel; the analysis rules are executed over thefarm-specific data warehouses.

Page 160: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Applicability of BIRD to AgriProKnowAgriProKnow aims to integrate data from multiple farms in orderto generate new process knowledge, e.g., early indicators of aswell as influence factors for animal illness.

The thus generated knowledge should be applied forrule-based process monitoring and control, e.g., by calling adoctor when danger of illness is detected.

Thus, once operational, the AgriProKnow data warehousecould consist of two parts:

� Inter-farm data warehouse: Integrating the data fromvarious sources in order to generate new knowledge.

� Farm-specific data warehouses: A data warehouse foreach farm built through customization of a referencemodel; the analysis rules are executed over thefarm-specific data warehouses.

Page 161: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Applicability of BIRD to AgriProKnowAgriProKnow aims to integrate data from multiple farms in orderto generate new process knowledge, e.g., early indicators of aswell as influence factors for animal illness.

The thus generated knowledge should be applied forrule-based process monitoring and control, e.g., by calling adoctor when danger of illness is detected.

Thus, once operational, the AgriProKnow data warehousecould consist of two parts:

� Inter-farm data warehouse: Integrating the data fromvarious sources in order to generate new knowledge.

� Farm-specific data warehouses: A data warehouse foreach farm built through customization of a referencemodel; the analysis rules are executed over thefarm-specific data warehouses.

Page 162: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Applicability of BIRD to AgriProKnowAgriProKnow aims to integrate data from multiple farms in orderto generate new process knowledge, e.g., early indicators of aswell as influence factors for animal illness.

The thus generated knowledge should be applied forrule-based process monitoring and control, e.g., by calling adoctor when danger of illness is detected.

Thus, once operational, the AgriProKnow data warehousecould consist of two parts:

� Inter-farm data warehouse: Integrating the data fromvarious sources in order to generate new knowledge.

� Farm-specific data warehouses: A data warehouse foreach farm built through customization of a referencemodel; the analysis rules are executed over thefarm-specific data warehouses.

Page 163: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

AgriProKnow Reference Model [27]

72/131

Page 164: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

AgriProKnow Reference Model [27]

73/131

Page 165: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

AgriProKnow Reference ModelCustomization [27]

74/131

Page 166: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Fact Table [27]

A fact table is generated by Indyco Builder based on acustomized reference model.

75/131

Page 167: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Query [27]

Queries are formulated using analysis situations (or OLAPpatterns) and then automatically translated into SQL.

XQuery functions take the customized reference model and theSQL DDL statements to generate SQL queries for analysissituations.

76/131

Page 168: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Basic Idea

77/131

Page 169: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Basic Idea

78/131

Page 170: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Basic Idea

79/131

Page 171: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Definition

80/131

Page 172: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Examples

81/131

Page 173: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Enhanced Dimensional Fact Model (eDFM)

82/131

Page 174: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

eDFM in QB/QB4OLAP + Extension

83/131

Page 175: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Description Form

84/131

Page 176: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Description

85/131

Page 177: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Framework

86/131

Page 178: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: RDF Definition

:HomogeneousIndependentSetComparison a pl:Pattern;

pl:name "Homogeneous independent-set comparison"@en;

pl:situation "Compare SI and SC with the same ..."@en;

pl:solution "The fact class, dimensions, grouping ..."@en;

pl:structure "SI: 1 fact class, 1..* selection ..."@en;

pl:example "Calculate the delta (comparative ..."@en.

pl:hasElement :base, :baseSlice, :measure, :dimensionLevel,

:dimension, :measureNotNull, :siSlice, :scSlice,

:compMeasure, :compHaving, :SetOfInterest,

:SetOfComparison;

pl:result :compMeasure, :dimensionLevel,

[pl:element :measure; pl:elementPrefix "SI_"],

[pl:element :measure; pl:elementPrefix "SC_"].

87/131

Page 179: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Instantiation

:DeltaMilkYield a pl:QbPatternInstance;

pl:instanceOf :HIndependentSetComparison;

:base agri:Milk;

:baseSlice :DateIn2017;

:measure :SumOfMilkYield;

:dimension agri:Animal, agri:FarmSite;

:dimensionLevel agri:Animal, agri:FarmSite;

:siSlice :today;

:scSlice :prior5days;

:compMeasure :DeltaMilkYield;

:compHaving :positiveDeltaMilkYield;

88/131

Page 180: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Measures and Predicates

89/131

Page 181: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Measures and Predicates

90/131

Page 182: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Pattern Expression

For each pattern, a generic query template in a target languageis defined – the pattern expression.

That target language can be SQL but also another languagesuch as SPARQL [12]

.

Upon pattern instantiation, predicate and measure expressionsare inserted into the placeholders in the pattern expression.

91/131

Page 183: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Pattern Expression

For each pattern, a generic query template in a target languageis defined – the pattern expression.

That target language can be SQL

but also another languagesuch as SPARQL [12]

.

Upon pattern instantiation, predicate and measure expressionsare inserted into the placeholders in the pattern expression.

91/131

Page 184: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Pattern Expression

For each pattern, a generic query template in a target languageis defined – the pattern expression.

That target language can be SQL but also another languagesuch as SPARQL [12].

Upon pattern instantiation, predicate and measure expressionsare inserted into the placeholders in the pattern expression.

91/131

Page 185: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Pattern Expression

For each pattern, a generic query template in a target languageis defined – the pattern expression.

That target language can be SQL but also another languagesuch as SPARQL [12].

Upon pattern instantiation, predicate and measure expressionsare inserted into the placeholders in the pattern expression.

91/131

Page 186: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Patterns: Guided Instantiation

The RDF representation of pattern and multidimensional modelelements as well as the relationships among those elementsmay serve to build a “wizard” for guided query instantiation.

A demonstration video can be found here:https://www.youtube.com/watch?v=BLt6heO7WKY

92/131

Page 187: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Analysis Graphs (Neuböck et al. [21, 27])

Analysis graphs explicitly represent knowledge about analysisprocesses.

Potential applications:

� Documentation of analysis processes

� To build tool support for exploratory OLAP

� Potentially automate complex analysis processes

� As the representation format for analysis process mining

93/131

Page 188: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Example: Analysis Graph (Bird’s-Eye View)

Quantity and Expected Delivery

Time of Undelivered MaterialMaterial

Order

Canceled

Orders from other Suppliers that

Contain Undelivered Material

Products that Contain

Undeliverd Material

Ordered Material with Properties

Similar to Undeliverd Material

List of Customer Orders

Affected by Material Order

Canceling

Figure: An unrefined analysis graph for analysis in the event of ordercancellation [27]

94/131

Page 189: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Example: Analysis Graph

factClass = MaterialUsedForProduct

measure = {SUM(actualCosts)}

MonthlyCostsOfMaterialUse : AnalysisSituation

diceLevel = material

diceNode = ?mat

MaterialParameters

diceLevel = ?tmLevel

diceNode = ?tm

granularity = Time.month

TimeParameters

factClass = MaterialUsedForProduct

measure = {SUM(actualCosts)}

MonthlyCostsOfMaterialSupplyOrderWithProperty : AnalysisSituation

diceLevel = material

diceNode = ?mat

sliceCondition = ?prop

MaterialParameters

diceLevel = ?tmLevel

diceNode = ?tm

granularity = Time.month

TimeParameters

diceLevel =

Product.productCategory

diceNode = ?prodCat

ProductParameters

addSliceCondition(Material, ?prop)

moveToNode(Product, Product.product-

Category, ?prodCat)

FocusO

n-

Pro

pert

y

Figure: Example navigation step between analysis situations [27]

Page 190: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Endpoints for Linked Data [11]

Linked data repositories could provide an endpoint for OLAPanalysis based on superimposed multidimensional schemasand analysis graphs in order to facilitate exploration andanalysis of the data repository.

Page 191: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OLAP Endpoints for Linked Data [11]: Video

A demonstration video of a preliminary version can be foundhere: https://youtu.be/ymhkqla8J1I

We have since improved the appearance and are currentlypreparing a user study.

97/131

Page 192: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

INTERPRETATION ANDACTION

Page 193: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Summarizability

What is summarizability about? Correct interpretation

Conceptual modeling may help to ensure summarizability or atleast make issues with summarizability explicit:

� Concepts in the modeling language [22, 17, 15]

� Constraint-based approaches [16, 14, 1]

98/131

Page 194: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Summarizability

What is summarizability about? Correct interpretation

Conceptual modeling may help to ensure summarizability

or atleast make issues with summarizability explicit:

� Concepts in the modeling language [22, 17, 15]

� Constraint-based approaches [16, 14, 1]

98/131

Page 195: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Summarizability

What is summarizability about? Correct interpretation

Conceptual modeling may help to ensure summarizability or atleast make issues with summarizability explicit:

� Concepts in the modeling language [22, 17, 15]

� Constraint-based approaches [16, 14, 1]

98/131

Page 196: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Summarizability

What is summarizability about? Correct interpretation

Conceptual modeling may help to ensure summarizability or atleast make issues with summarizability explicit:

� Concepts in the modeling language [22, 17, 15]

� Constraint-based approaches [16, 14, 1]

98/131

Page 197: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Conditions for Summarizability

DaVinciCode

Book

HonoluluSkirt

ClothingCategory

All

Product

Entertainment

+ Profit = 10 + Profit = 20

+ Profit = 10 + Profit = 10 + Profit = 20

All+ Profit = 30

10 + 10 + 20 = 40

10 + 20 = 30

Figure: Condition 1: Disjointness (Strict Hierarchies)

99/131

Page 198: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Conditions for Summarizability

Allall

AustriaSwitzerland

Vaud

Lausanne Montreux Salzburg Viennacity

canton

country

profit = 70

profit = 10 profit = 5

profit = 15

profit = 15 profit = 40

Figure: Condition 2: Completeness (Balanced Hierarchies)

100/131

Page 199: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Attribute Groups in DFM

101/131

Page 200: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Generalized/Hetero-HomogeneousHierarchies

<agent>

<agentType>

Sensor : <T>

concretization of

<agent>

<position>

Person : <agentType>

<age>

<agent>

<processType>

Process : <agentType>

concretization of

<logicalDevice>

<deviceType>

Device : <agentType>

<agent>

+ nominalAccuracy

concretization of

<logicalDevice>

<milkingParlorType>

MilkingParlor : <deviceType>

<agent>

+ measuredIngredients

concretization of

Page 201: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Key Performance Indicators (KPIs)

The base measures are typically combined into morecomprehensive indicators of economic success.

Definition [31, p. 362]“KPIs are complex measurements used to estimate the effec-tiveness of an organization in carrying out their activities andto monitor the performance of their processes and businessstrategies.

KPIs are traditionally defined with respect to abusiness strategy and business objectives”

103/131

Page 202: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Key Performance Indicators (KPIs)

The base measures are typically combined into morecomprehensive indicators of economic success.

Definition [31, p. 362]“KPIs are complex measurements used to estimate the effec-tiveness of an organization in carrying out their activities andto monitor the performance of their processes and businessstrategies.

KPIs are traditionally defined with respect to abusiness strategy and business objectives”

103/131

Page 203: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Key Performance Indicators (KPIs)

The base measures are typically combined into morecomprehensive indicators of economic success.

Definition [31, p. 362]“KPIs are complex measurements used to estimate the effec-tiveness of an organization in carrying out their activities andto monitor the performance of their processes and businessstrategies. KPIs are traditionally defined with respect to abusiness strategy and business objectives”

103/131

Page 204: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Goal Modeling and KPIs (Maté et al. [18])

The Business Intelligence Model (BIM) can be employed for thesystematic derivation of KPIs that are in line with businessstrategy.

104/131

Page 205: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Example: BIM for AgriProKnow

Maximize milk

yielddesires

Milk yield

evaluates

Farmer

Prevent

animal illness

Optimize

feed intake

++

AND

Automatic

Feeding

Well-fed

animals

Strength

Body Condition Score

Antibiotics

resistance

Threat

# of known

resistant germs

P L

F

Page 206: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Goal Modeling and KPIs (Maté et al. [18])

The Business Intelligence Model (BIM) can be employed for thesystematic derivation of KPIs that are in line with the overallbusiness strategy.

Using the Semantics of Business Vocabulary and Rules(SBVR), the KPIs are subsequently precisely specified inStructured English.

The KPIs defined in SBVR then translate into executable MDXqueries over a multidimensional schema.

106/131

Page 207: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Goal-Based Selection of Visualizations(Golfarelli et al. [10])

Idea: The user specifies their analysis goals and otherparameters, which are subsequently used to recommend (orrecommend against) certain types of visualizations.

Users may hence declare:

� Goal : composition, order, cluster, distribution, etc.

� Interaction: overview, zoom, filter, details-on-demand

� Experience: lay or tech person

� Dimensionality : n-dimensional, tree, graph

� Cardinality : low, high

� Type of measure: nominal, ordinal, interval, ratio

107/131

Page 208: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Goal-Based Selection of Visualizations(Golfarelli et al. [10])

Given a single criterion, a visualization may be fit, acceptable,neutral, discouraged, or unfit.

For example, a pie chart is fit for composition whereas a heatmap is unfit. A pie chart is fit for giving an overview whereas abubble chart is acceptable.

Given selections for multiple criteria, the optimal visualizationtypes can be calculated based on the qualitative suitability ofeach visualization for the different criteria.

108/131

Page 209: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

VizDSL (Morgan et al. [19])

Goals:

� Platform-independent and extensible modeling language� Non-IT experts are able to quickly and easily describe,

model, and create interactive visualizations

Structured Source Code VizDSL Model Interactive Visualization

109/131

Page 210: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

VizDSL (Morgan et al. [19])

Extension of Interaction Flow Modeling Language (IFML) withmodeling elements for interactive visualization of data.

Figure: The visual notation for VizDSL

110/131

Page 211: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

VizDSL (Morgan et al. [19])

111/131

Page 212: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

VizDSL (Morgan et al. [19])

112/131

Page 213: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

VizDSL (Morgan et al. [19])

113/131

Page 214: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Analysis Rules

In AgriProKnow, we have implemented analysis rules based onthe notion of OLAP patterns.

An action, e.g., calling a vet, can be triggered by noteworthyresults of analyses that are periodically carried out.

The analyses are specified using OLAP patterns.

114/131

Page 215: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

OPEN ISSUES

Page 216: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Open Issues

� Integration of conceptual models (or knowledge graphs)with machine/deep learning

→ overlap between ER and Semantic Web communities

� .

.. any thoughts?

115/131

Page 217: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Open Issues

� Integration of conceptual models (or knowledge graphs)with machine/deep learning→ overlap between ER and Semantic Web communities

� .

.. any thoughts?

115/131

Page 218: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Open Issues

� Integration of conceptual models (or knowledge graphs)with machine/deep learning→ overlap between ER and Semantic Web communities

� .

.. any thoughts?

115/131

Page 219: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Open Issues

� Integration of conceptual models (or knowledge graphs)with machine/deep learning→ overlap between ER and Semantic Web communities

� ..

. any thoughts?

115/131

Page 220: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Open Issues

� Integration of conceptual models (or knowledge graphs)with machine/deep learning→ overlap between ER and Semantic Web communities

� ...

any thoughts?

115/131

Page 221: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

Open Issues

� Integration of conceptual models (or knowledge graphs)with machine/deep learning→ overlap between ER and Semantic Web communities

� ... any thoughts?

115/131

Page 222: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl
Page 223: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References I

[1] Combining objects with rules to represent aggregationknowledge in data warehouse and OLAP systems.

[2] A. Abelló, J. Darmont, L. Etcheverry, M. Golfarelli, J.-N.Mazón, F. Naumann, T. B. Pedersen, S. Rizzi, J. Trujillo,P. Vassiliadis, and G. Vossen.Fusion cubes: towards self-service business intelligence.International Journal of Data Warehousing and Mining.

117/131

Page 224: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References II

[3] D. Agrawal, P. Bernstein, E. Bertino, S. Davidson,U. Dayal, M. Franklin, and others.Challenges and opportunities with big data – a communitywhite paper developed by leading researchers across theUnited States.Technical report, 2012.https:

//cra.org/ccc/resources/ccc-led-whitepapers/ (lastaccess: 2 October 2018).

118/131

Page 225: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References III

[4] M. Bala, O. Boussaid, and Z. Alimazighi.A fine-grained distribution approach for ETL processes inbig data environments.Data & Knowledge Engineering, 111:114–136, 2017.

[5] S. Dobson, M. Golfarelli, S. Graziani, and S. Rizzi.A reference architecture and model for sensor datawarehousing.IEEE Sensors Journal, 18(18):7659–7670, 2018.

119/131

Page 226: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References IV

[6] Z. El Akkaoui, J. Mazón, A. A. Vaisman, and E. Zimányi.BPMN-based conceptual modeling of ETL processes.In A. Cuzzocrea and U. Dayal, editors, DaWaK 2012,volume 7448 of Lecture Notes in Computer Science,pages 1–14. Springer, 2012.

[7] Z. El Akkaoui and E. Zimányi.Defining ETL worfklows using BPMN and BPEL.In Proceedings of the ACM 12th International Workshopon Data Warehousing and OLAP, pages 41–48, 2009.

120/131

Page 227: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References V

[8] L. Etcheverry, A. Vaisman, and E. Zimányi.Modeling and querying data warehouses on the semanticweb using QB4OLAP.In L. Bellatreche and M. K. Mohania, editors, DaWaK2014, volume 8646 of LNCS, pages 45–56. Springer,2014.

[9] M. Golfarelli.Design issues in social business intelligence projects.In E. Zimányi and A. Abelló, editors, eBISS 2015, volume253 of LNBIP, pages 62–86. Springer, 2016.

121/131

Page 228: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References VI

[10] M. Golfarelli, T. Pirini, and S. Rizzi.Goal-based selection of visual representations for big dataanalytics.In S. de Cesare and U. Frank, editors, ER 2017Workshops, volume 10651 of LNCS, pages 47–57.Springer, 2017.

[11] M. Hilal, C. G. Schuetz, and M. Schrefl.An OLAP endpoint for RDF data analysis using analysisgraphs.In Proceedings of the ISWC 2017 Posters &Demonstrations and Industry Tracks, 2017.

122/131

Page 229: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References VII

[12] M. Hilal, C. G. Schuetz, and M. Schrefl.Using superimposed multidimensional schemas andOLAP patterns for RDF data analysis.Open Computer Science, 8(1):18–37, 2018.

[13] J. Horkoff, D. Barone, L. Jiang, E. Yu, D. Amyot,A. Borgida, and J. Mylopoulos.Strategic business modeling: representation andreasoning.Software & Systems Modeling, 13(3):1015–1041, 2014.

123/131

Page 230: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References VIII

[14] C. Hurtado, C. Gutierrez, and A. Mendelzon.Capturing summarizability with integrity constraints inOLAP.ACM Transactions on Database Systems, 30:854–886,2005.

[15] Indyco.Attribute groups, 2015.http://indyco.freshdesk.com/support/solutions/

articles/1000212913-attribute-groups [Online;accessed 7-October-2018].

124/131

Page 231: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References IX

[16] J. Lechtenbörger and G. Vossen.Multidimensional normal forms for data warehouse design.

Information Systems, 28:415–434, 2003.

[17] E. Malinowski and E. Zimányi.A conceptual model for temporal data warehouses and itstransformation to the ER and the object-relational models.Data & Knowledge Engineering, 64(1):101–133, 2008.

[18] A. Maté, J. Trujillo, and J. Mylopoulos.Frequent patterns in ETL workflows: An empiricalapproach.Data & Knowledge Engineering, 108:30–49, 2017.

125/131

Page 232: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References X

[19] R. Morgan, G. Grossmann, M. Schrefl, M. Stumptner, andT. Payne.Vizdsl: A visual DSL for interactive informationvisualization.In J. Krogstie and H. A. Reijers, editors, CAiSE 2018,volume 10816 of LNCS, pages 440–455. Springer, 2018.

[20] S. Nalchigar and E. Yu.Business-driven data analytics: A conceptual modelingframework.Data & Knowledge Engineering, 117:359–372, 2018.

126/131

Page 233: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References XI

[21] T. Neuböck, B. Neumayr, M. Schrefl, and C. G. Schütz.Ontology-driven business intelligence for comparative dataanalysis.In E. Zimányi, editor, eBISS 2013, volume 172 of LNBIP,pages 77–120. Springer, 2014.

[22] B. Neumayr, M. Schrefl, and B. Thalheim.Hetero-homogeneous hierarchies in data warehouses.In S. Link and A. Ghose, editors, APCCM 2010, volume110 of CRPIT, pages 61–70. Australian Computer Society,2010.

127/131

Page 234: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References XII

[23] B. Oliveira and O. Belo.BPMN patterns for ETL conceptual modelling andvalidation, 2012.

[24] B. Oliveira, V. Santos, and O. Belo.Pattern-based ETL conceptual modelling, 2013.

[25] C. Ordonez, S. Maabout, D. S. Matusevich, andW. Cabrera.Extending er models to capture database transformationsto build data sets for data mining.Data & Knowledge Engineering, 89:38–54, 2014.

128/131

Page 235: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References XIII

[26] P. Russom.Hadoop for the enterprise.Technical report, TDWI, 2015.https://www.cloudera.com/content/dam/cloudera/

Resources/PDF/Reports/TDWI-Best-Practices-Report_

Hadoop-for-the-Enterprise.pdf (last access: 28 June2016).

[27] C. G. Schuetz, B. Neumayr, M. Schrefl, and T. Neuböck.Reference modeling for data analysis: The BIRDapproach.International Journal of Cooperative Information Systems,25(2):1–46, 2016.

129/131

Page 236: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References XIV

[28] C. G. Schuetz, S. Schausberger, and M. Schrefl.Building an active semantic data warehouse for precisiondairy farming.Journal of Organizational Computing and ElectronicCommerce, 28(2):122–141, 2018.

[29] R. Sherman.Business Intelligence Guidebook.Morgan Kaufmann, 2015.

[30] V. Theodoroua, A. Abelló, M. Thieleb, and W. Lehner.Frequent patterns in ETL workflows: An empiricalapproach.Data & Knowledge Engineering, 112:1–16, 2017.

130/131

Page 237: Conceptualizing Analytics - Conceptual Modeling and Data ... · CONCEPTUALIZING ANALYTICS Conceptual Modeling and Data Analytics – A Tutorial Christoph G. Schuetz Michael Schrefl

References XV

[31] A. Vaisman and E. Zimányi.Data Warehouse Systems – Design and Implementation.Springer, 2014.

[32] S. Williams.Business Intelligence Strategy and Big Data Analytics.Morgan Kaufmann, 2016.

131/131