93
Data Analytics process in Learning and Academic Analytics projects Day 1: Data selection and capture Alex Rayón Jerez [email protected] DeustoTech Learning – Deusto Institute of Technology – University of Deusto Avda. Universidades 24, 48007 Bilbao, Spain www.deusto.es

Data Analytics.01. Data selection and capture

Embed Size (px)

DESCRIPTION

Data Analytics process in Learning and Academic Analytics projects. Day 1: Data selection and capture

Citation preview

Page 1: Data Analytics.01. Data selection and capture

Data Analytics process in Learning and Academic

Analytics projects

Day 1: Data selection and capture

Alex Rayón [email protected]

DeustoTech Learning – Deusto Institute of Technology – University of DeustoAvda. Universidades 24, 48007 Bilbao, Spain

www.deusto.es

Page 2: Data Analytics.01. Data selection and capture

Objectives

How to tackle an AA/LA project

1. Objectives: what do I want to improve?2. Data: automated processes for data discovery

and later processing3. Integration, not substitution4. Technology5. KPIs: define and test

Page 3: Data Analytics.01. Data selection and capture

Table of contents

● ETL approach● Data analytics cycle● Architecture principles● Requirements● Components● Data process

○ Questions○ Data model○ Data sources○ Use cases

Page 4: Data Analytics.01. Data selection and capture

Table of contents

● ETL approach● Data analytics cycle● Architecture principles● Requirements● Components● Data process

Page 5: Data Analytics.01. Data selection and capture

ETL approachDefinition and characteristics

● An ETL tool is a tool that○ Extracts data from various data sources (usually

legacy data)○ Transforms data

■ from → being optimized for transaction■ to → being optimized for reporting and analysis■ synchronizes the data coming from different

databases■ data cleanses to remove errors

○ Loads data into a data warehouse

Page 6: Data Analytics.01. Data selection and capture

ETL approachWhy do I need it?

● ETL tools save time and money when developing a data warehouse by removing the need for hand-coding

● It is very difficult for database administrators to connect between different brands of databases without using an external tool

● In the event that databases are altered or new databases need to be integrated, a lot of hand-coded work needs to be completely redone

Page 7: Data Analytics.01. Data selection and capture

ETL approachKettle

Project Kettle

Powerful Extraction, Transformation and Loading (ETL) capabilities using an

innovative, metadata-driven approach

Page 8: Data Analytics.01. Data selection and capture

ETL approachKettle (II)

● It uses an innovative meta-driven approach● It has a very easy-to-use GUI● Strong community of 13,500 registered

users● It uses a stand-alone Java engine that

process the tasks for moving data between many different databases and files

Page 9: Data Analytics.01. Data selection and capture

ETL approachKettle (III)

Page 10: Data Analytics.01. Data selection and capture

ETL approachKettle (IV)

Source: http://download.101com.com/tdwi/research_report/2003ETLReport.pdf

Page 11: Data Analytics.01. Data selection and capture

ETL approachKettle (V)

Source: Pentaho Corporation

Page 12: Data Analytics.01. Data selection and capture

ETL approachKettle (VI)

● Datawarehouse and datamart loads● Data integration● Data cleansing● Data migration● Data export● etc.

Page 13: Data Analytics.01. Data selection and capture

ETL approachTransformations

● String and Date Manipulation● Data Validation / Business Rules● Lookup / Join● Calculation, Statistics● Cryptography● Decisions, Flow control● Scripting● etc.

Page 14: Data Analytics.01. Data selection and capture

ETL approachWhat is good for?

● Mirroring data from master to slave● Syncing two data sources● Processing data retrieved from multiple

sources and pushed to multiple destinations

● Loading data to RDBMS● Datamart / Datawarehouse○ Dimension lookup/update step

● Graphical manipulation of data

Page 15: Data Analytics.01. Data selection and capture

Table of contents

● ETL approach● Data analytics cycle● Architecture principles● Requirements● Components● Data process

Page 16: Data Analytics.01. Data selection and capture

Data Analytics cycleChallenges

● Data is everywhere● Data is inconsistent

○ Records are different in each system● Performance issues

○ Running queries to summarize data for stipulated long period takes operating system for task

○ Brings the OS on max load● Data is never all in Data Warehouse

○ Excel sheet, acquisition, new application

Page 17: Data Analytics.01. Data selection and capture

Data Analytics cycleChallenges (II)

● Data is incomplete● Certain types of usage data are not logged● Data are not aggregated following a

didactical perspective● Users are afraid that they could draw

unsound inferences from some of the data

[Mazza2012]

Page 18: Data Analytics.01. Data selection and capture

Data Analytics cycleAcademic Analytics Model

1) Capture

2) Report5) Refine

4) Act 3) Predict

Academic Analytics [CampbellOblinger2007]

Page 19: Data Analytics.01. Data selection and capture

Data Analytics cycleLearning Analytics Model

1) Select

2) Capture

3) Aggregate

4) Process

5) Visualize

On the design of collective applications[DronAnderson2009]

Page 20: Data Analytics.01. Data selection and capture

Data Analytics cycleLearning Analytics Model (II)

On the design of collective applications[DronAnderson2009]

1) Select

2) Capture

3) Aggregate

4) Process

5) Visualize

Day 1

Day 2

Day 3

Day 4

Page 21: Data Analytics.01. Data selection and capture

Data Analytics cycleLearning Analytics Model (III)

● As [Clow2012] states, it is necessary to close the feedback loop through appropriate interventions unmistakable

● It also draws on the wider educational literature, seeking to place learning analytics on an established theoretical base, and develops a number of insights for learning analytics practice

Page 22: Data Analytics.01. Data selection and capture

Data Analytics cycleLearning Analytics Model (IV)

Page 23: Data Analytics.01. Data selection and capture

Data Analytics cycleLearning Analytics Model (III)

Page 24: Data Analytics.01. Data selection and capture

Table of contents

● ETL approach● Data analytics cycle● Architecture principles● Requirements● Components● Data process

Page 25: Data Analytics.01. Data selection and capture

Architecture principlesA model for adoption, use and improvement of analytics

A framework of characteristics for AnalyticsAdam Cooper, 2012 [Cooper2012]

Page 26: Data Analytics.01. Data selection and capture

Architecture principlesDevelopment of common language for data exchange

The IEEE defines interoperability to be:

“The ability of two or more systems or components to exchange information and

to use the information that has been exchanged”

Page 27: Data Analytics.01. Data selection and capture

Architecture principlesDevelopment of common language for data exchange (II)

Page 28: Data Analytics.01. Data selection and capture

Architecture principlesDevelopment of common language for data exchange (III)

● The most difficult challenges with achieving interoperability are typically found in establishing common meanings to the data

● Sometimes this is a matter of technical precision○ But culture – regional, sector-specific, and

institutional – and habitual practices also affect meaning

Page 29: Data Analytics.01. Data selection and capture

Architecture principlesDevelopment of common language for data exchange (IV)

● Potential benefits○ Efficiency and timeliness

■ No need for a persona to intervene to re-enter, re-format or transform data

○ Independence■ Resilience

○ Adaptability■ Faster, cheaper and less disruptive to change

○ Innovation and market growth

■ Interoperability combined with modularity makes

it easier to build IT systems that are better

matched to local culture without needing to create and maintain numerous whole systems

Page 30: Data Analytics.01. Data selection and capture

Architecture principlesDevelopment of common language for data exchange (V)

● Potential benefits○ Durability of data

■ Structures and formats change over time■ The changes are rarely properly documented

○ Aggregation

■ Data joining might be supported by a common set

of definitions around course structure, combined with a unified identification scheme

○ Sharing■ Specially when there are multiple parties involved

Page 31: Data Analytics.01. Data selection and capture

Architecture principlesDevelopment of common language for data exchange (VI)

[LACE2013]

Page 32: Data Analytics.01. Data selection and capture

Architecture principlesDevelopment of common language for data exchange (VII)

[LACE2013]

In our case?

Page 33: Data Analytics.01. Data selection and capture

Architecture principlesDevelopment of common language for data exchange (VIII)

[LACE2013]

In our case?

Page 34: Data Analytics.01. Data selection and capture

Table of contents

● ETL approach● Data analytics cycle● Architecture principles● Requirements● Components● Data process

Page 35: Data Analytics.01. Data selection and capture

Requirements

● Usability: prepare an understandable user interface (UI), appropriate methods for data visualization, and guide the user through the analytics process.

● Usefulness: provide relevant, meaningful indicators that help teachers to gain insight in the learning behavior of their students and support them in reflecting on their teaching.

● Interoperability: ensure compatibility for any kind of VLE by allowing for integration of different data sources.

[Dyckhoff2010]

Page 36: Data Analytics.01. Data selection and capture

Requirements (II)

● Extensibility: allow for incremental extension of analytics functionality after the system has been deployed without rewriting code.

● Reusability: target for a building-block approach to make sure that re-using simpler ones can implement more complex functions.

● Real-time operation: make sure that the toolkit can return answers within microseconds to allow for an exploratory user experience

● Data Privacy: preserve confidential user information and protect the identities of the users at all times

[Dyckhoff2010]

Page 37: Data Analytics.01. Data selection and capture

Table of contents

● ETL approach● Data analytics cycle● Architecture principles● Requirements● Components● Data process

Page 38: Data Analytics.01. Data selection and capture

Components

● Process○ A systematic process of educational data analysis

● Model○ The definition of a suitable model to represent the

knowledge domain● Tool/platform

○ The design and implementation of a monitoring and presentation tool based on the Process and Model

[Mazza2012]

Page 39: Data Analytics.01. Data selection and capture

Table of contents

● ETL approach● Data analytics cycle● Architecture principles● Requirements● Data process

Page 40: Data Analytics.01. Data selection and capture

Data processIntroduction

“Measurement, collection, analysis and reporting of data about learners and

their contexts, for purposes of understanding and optimising learning

and the environments in which it occurs”

First international conference on Learning Analytics and Knowledge, Alberta, 2011 [LAK2011]

Page 41: Data Analytics.01. Data selection and capture

Data processIntroduction (II)

However, the challenge is to determine which data are of interest

We are now in an era where gaining access to data is not the problem;

the challenge lies in determining which data are significant and why

Page 42: Data Analytics.01. Data selection and capture

Data processIntroduction (III)

“The basic question is not what can we measure? The basic

question is what does a good

education look like? Big questions”

Page 43: Data Analytics.01. Data selection and capture

Data processIntroduction (IV)

“More data does not mean more knowledge” [Jenkins2013]

Searching for the evidence in a mass of data requires knowing what kind of evidence is

neededKnowledge of the domain and understanding

and interpretation of the patterns we see

Page 45: Data Analytics.01. Data selection and capture

Data processIntroduction (VI)

Source: http://www.learningfrontiers.eu/?q=story/will-analytics-transform-education

Page 46: Data Analytics.01. Data selection and capture

Data processIntroduction (VII)

A brief comparison of the two fields (George Siemens and Ryan Baker [SiemensBaker2012])

Page 47: Data Analytics.01. Data selection and capture

Data processIntroduction (VIII)

First of all, education is a highly collaborative space and it represents a social good. Keeping a valuable secret that might help students succeed is antithetical to the nature of education. Second, education is a complex

ecosystem of people, processes, policies, content, etc. I would have strong doubts about anyone who claimed to have a formula that worked for a wide variety of institutions.

Mike Sharkey, 2014

Page 48: Data Analytics.01. Data selection and capture

Data processQuestions

Source: http://www.slideshare.net/sbs/learning-analytics-uts-2013

Page 49: Data Analytics.01. Data selection and capture

Data processQuestions (II)

The question depends on who is making it ;)

Page 50: Data Analytics.01. Data selection and capture

Data processQuestions (III)

Horizon Report 2012 [HR2012]

Page 51: Data Analytics.01. Data selection and capture

Data processQuestions (IV)

1) Adaptive testing, tracking and reporting

● Progress summary, daily activity report, class goals report, progress report, student activity report, student focus report, etc [Khan2012]

● By using various analytics tools, students can review their learning progress and teachers are also supported in how to personalise learning for students in need for more help in specific areas

Page 52: Data Analytics.01. Data selection and capture

Data processQuestions (V)

2) Analytics tools for early alert, intervention and collaboration

● Integrating their data collected from a variety of information management systems○ Allowing educators to assess the risk, initiate early

interventions and support collaborative learning

Page 53: Data Analytics.01. Data selection and capture

Data processQuestions (VI)

2) Analytics tools for early alert, intervention and collaboration

● For example, the Signals project at Purdue University utilizes the data collected from student information systems, learning management systems, and the grade book for a specific course to track students’ performances and identify at-risk students in real time

Page 54: Data Analytics.01. Data selection and capture

Data processQuestions (VII)

2) Analytics tools for early alert, intervention and collaboration

● The LOCO-Analyst provides teachers with charts, graphs, and other data representations that help them see how their students are performing and how students interact with one another in web-based learning environments to help the teacher determine how to engage their students online

Page 55: Data Analytics.01. Data selection and capture

Data processQuestions (VIII)

2) Analytics tools for early alert, intervention and collaboration

● Social Networks Adapting Pedagogical Practice (SNAPP), a network visualization tool developed by researchers at the University of Wollongong, can analyse students’ interactions in a forum and display it in a visualised diagram which help teachers to identify the key connections and disconnected students and support collaborative learning in a web-based learning environment

Page 56: Data Analytics.01. Data selection and capture

Data processQuestions (IX)

3) Analytics projects for institutional efficiency and effectiveness

● There are a number of institutional analytics initiatives which enable institutions to improve the effectiveness of operations, including admission management and drop-out prevention, resource management, financial planning, etc○ Student Experience Traffic Lighting (SETL)

○ The Enhancing Student Centred Administration for Placement Experience (ESCAPES)

Page 57: Data Analytics.01. Data selection and capture

Data processQuestions (X)

Learning Analytics are not neutral

Page 58: Data Analytics.01. Data selection and capture

Data processQuestions (XI)

“Accounting tools… do not simply aid the measurement of economic

activity, they shape the reality they measure”

[GayPryke2002]

Page 59: Data Analytics.01. Data selection and capture

Data processQuestions (XII)

Fuente: http://mfeldstein.com/harvard-mit-learn-university-phoenix-analytics/

Page 60: Data Analytics.01. Data selection and capture

Data processQuestions (XIII)

● The Harvard and MIT data ignores student goals or any information giving a clue on whether students desired to complete the course, get a good grade, get a certificate, or just sample some material

● Without this information, the actual aggregate behavior is missing context ○ We don’t know if a certain student intended to just

audit a course, sample it, or attempt to complete it.

○ We don’t know if students started the course intended to complete but became frustrated

Page 61: Data Analytics.01. Data selection and capture

Data processQuestions (XIV)

● The value of learner behavior patterns, which can only be learned by viewing data patterns over time

● If you want to “share best practices to improve teaching and learning”, then you need data organized around the learner○ With transactions captured over time – not just in

aggregate

○ What we have now is an honest start, but a very limited data set

Page 62: Data Analytics.01. Data selection and capture

Data processData model

So, which is our data model to answer to

our questions?

Page 63: Data Analytics.01. Data selection and capture

Data processData model (II)

The data model, or the concept map, describes the concepts and their

relationships used by the organization in its daily work, expressed in its own

language

It enables the whole organization to participate in the maintenance of it

Page 64: Data Analytics.01. Data selection and capture

Data processData model (III)

Source: http://www.economist.com/news/finance-and-economics/21578041-containers-have-been-more-important-globalisation-freer-trade-humble

Source: http://www.economist.com/blogs/economist-explains/2013/05/economist-explains-14

Page 65: Data Analytics.01. Data selection and capture

Data processData model (IV)

The best approach that we have found for this task is constituted by the theory of eLearning functions

Reinmann [Reinmann2006]

Page 66: Data Analytics.01. Data selection and capture

Data processData model (V)

[Reinmann2006]

Page 67: Data Analytics.01. Data selection and capture

Data processData model (VI)

Example

[Mazza2012]

Page 68: Data Analytics.01. Data selection and capture

Data processData model (VII)

Example

This model answers the monitoring questions:

● Which way of eLearning enables to reach the given objectives?

● By which means (functions, tools) does the LMS enable these ways of learning?

● How is the use of these means traced in the log files (activity log codes)?

[Mazza2012]

Page 69: Data Analytics.01. Data selection and capture

Data processData model (VIII)

[Mazza2012]

Page 70: Data Analytics.01. Data selection and capture

Data processData sources

Today we have so much data that come in an unstructured or semi-structured form that may nonetheless be of value in understanding more about our

learners

Page 71: Data Analytics.01. Data selection and capture

Data processData sources (II)

“Learning is a complex social activity” [Siemens2012]

Lots of dataLots of tools

Humans to make sense

Page 72: Data Analytics.01. Data selection and capture

Data processData sources (III)

Traditional data sources:● Student data: demographics,

qualification aim, modules taken, results, etc.

● Student feedback data: end of module survey and others

● Student activity data: delivery data, completion, pass rates, etc.

Page 73: Data Analytics.01. Data selection and capture

Data processData sources (IV)

● The world of technology has changed [Eaton2012]○ 80% of the world’s information is unstructured

○ Unstructured data are growing at 15 times the rate of structured information

○ Raw computational power is growing at such an

enormous rate that we almost have a supercomputer in our hands

○ Access to information is available to all

Page 74: Data Analytics.01. Data selection and capture

Data processData sources (V)

Source: http://www.bigdata-startups.com/BigData-startup/understanding-sources-big-data-infographic/

Page 75: Data Analytics.01. Data selection and capture

Data processData sources (VI)

● RDBMS (SQL Server, DB2, Oracle, MySQL, PostgreSQL, Sybase IQ, etc.)

● NoSQL Data: HBase, Cassandra, MongoDB● OLAP (Mondrian, Palo, XML/A)● Web (REST, SOAP, XML, JSON)● Files (CSV, Fixed, Excel, etc.)● ERP (SAP, Salesforce, OpenERP)● Hadoop Data: HDFS, Hive● Web Data: Twitter, Facebook, Log Files, Web Logs● Others: LDAP/Active Directory, Google Analytics,

etc.

Page 76: Data Analytics.01. Data selection and capture

Data processUse cases

1) Student data: XML

Page 77: Data Analytics.01. Data selection and capture

Data processUse cases

1) Student data: XML

Page 78: Data Analytics.01. Data selection and capture

Data processUse cases

1) Student data: XML

Page 79: Data Analytics.01. Data selection and capture

Data processUse cases (II)

2) Moodle: MySQL database

mdl_forum

- id- course- name

mdl_user

- id- username- firstname- lastname

mdl_forum_discussions- id- name- userid- timemodified- usermodified

mdl_forum_posts- id- userid- discussion- message- modified- created

Page 80: Data Analytics.01. Data selection and capture

Data processUse cases (III)

3) MediaWiki: MySQL database

user

- user_real_name- user_editcount recentchanges

- rc_old_len- rc_new_len

revision

- rev_timestamp page

- page_counter- page_len

rev_user = user_id

rev_page = page_id

user_id = rc_user

Page 81: Data Analytics.01. Data selection and capture

Data processUse cases (IV)

4) Google Doc: Google API

Page 82: Data Analytics.01. Data selection and capture

Data processUse cases (IV)

4) Google Doc: Google API

Page 83: Data Analytics.01. Data selection and capture

Data processUse cases (IV)

4) Google Doc: Google API

Page 84: Data Analytics.01. Data selection and capture

Data processUse cases (V)

4) Google Doc: Google API

Page 85: Data Analytics.01. Data selection and capture

Data processUse cases (VI)

4) Google Doc: Google API

Page 86: Data Analytics.01. Data selection and capture

Data processUse cases (VII)

4) Google Doc: Google API

Page 87: Data Analytics.01. Data selection and capture

Data processUse cases (VIII)

4) Google Doc: Google API

Page 88: Data Analytics.01. Data selection and capture

Data processUse cases (IX)

4) Google Doc: Google API

Page 89: Data Analytics.01. Data selection and capture

Data processUse cases (X)

4) Google Doc: Google API

Page 90: Data Analytics.01. Data selection and capture

Data processUse cases (XI)

4) Google Doc: Google API

Page 91: Data Analytics.01. Data selection and capture

Data processUse cases (VI)

5) Twitter: API

Page 92: Data Analytics.01. Data selection and capture

References[CampbellOblinger2007] Campbell, John P., Peter B. DeBlois, and Diana G. Oblinger. "Academic analytics: A new tool for a new era." Educause Review 42.4 (2007): 40.

[Clow2012] Clow, Doug. "The learning analytics cycle: closing the loop effectively." Proceedings of the 2nd International Conference on Learning Analytics and Knowledge. ACM, 2012.

[Cooper2012] Cooper, Adam. "What is analytics? Definition and essential characteristics." CETIS Analytics Series 1.5 (2012): 1-10.

[DronAnderson2009] Dron, J., & Anderson, T. (2009). On the design of collective applications. In Proceedings of the 2009 International Conference on Computational Science and Engineering, 4, 368–374.

[Dyckhoff2010] Dyckhoff, Anna Lea, et al. "Design and Implementation of a Learning Analytics Toolkit for Teachers." Educational Technology & Society 15.3 (2012): 58-76.

[Eaton2012] Chris Eaton, Dirk Deroos, Tom Deutsch, George Lapis & Paul Zikopoulos, “Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data”, p.XV. McGraw-Hill, 2012.

[GayPryke2002] Cultural Economy: Cultural Analysis and Commercial Life (Culture, Representation and Identity series) Paul du Gay (Editor), Michael Pryke. 2002.

[HR2012] NMC Horizon Report 2012 http://www.nmc.org/publications/horizon-report-2012-higher-ed-edition

[Jenkins2013] BBC Radio 4, Start the Week, Big Data and Analytics, first broadcast 11 February 2013 http://www.bbc.co.uk/programmes/b01qhqfv

[Khan2012] http://www.emergingedtech.com/2012/04/exploring-the-khan-academys-use-of-learning-data-and-learning-analytics/

[LACE2013] Learning Analytics Community Exchange http://www.laceproject.eu/

[LAK2011] 1st International Conference on Learning Analytics and Knowledge, 27 February - 1 March 2011, Banff, Alberta, Canada https://tekri.athabascau.ca/analytics/

[Mazza2006] Mazza, Riccardo, et al. "MOCLog–Monitoring Online Courses with log data." Proceedings of the 1st Moodle Research Conference. 2012.

[Reinmann2006] Reinmann, G. (2006). Understanding e-learning: an opportunity for Europe? European Journal of Vocational Training, 38, 27-42.

[SiemensBaker2012] Siemens & Baker (2012). Learning Analytics and Educational Data Mining: Towards Communication and Collaboration. Learning Analytics and Knowledge 2012. Available in .pdf format at http://users.wpi.edu/~rsbaker/LAKs%20reformatting%20v2.pdf

Page 93: Data Analytics.01. Data selection and capture

Data Analytics process in Learning and Academic

Analytics projects

Day 1: Data selection and capture

Alex Rayón [email protected]

DeustoTech Learning – Deusto Institute of Technology – University of DeustoAvda. Universidades 24, 48007 Bilbao, Spain

www.deusto.es