24
Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Embed Size (px)

Citation preview

Page 1: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Data Science and Big Data Analytics

Chap 9: Advanced Analytical Theory and Methods: Text

Analysis

Charles TappertSeidenberg School of CSIS, Pace

University

Page 2: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Data Analytics Lifecycle

Data Analytics Lifecycle Overview Phase 1: Discovery Phase 2: Data Preparation Phase 3: Model Planning Phase 4: Model Building Phase 5: Communicate Results Phase 6: Operationalize Case Study: GINA

Page 3: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

2.1 Data Analytics Lifecycle Overview

Huge volume of data Not just thousands/millions, but billions of

items Complexity of data types and

structures Varity of sources, formats, structures

Speed of new data creation and grow High velocity, rapid ingestion, fast

analysis

Page 4: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

2.2 Phase 1: Discovery

Mobile sensors Social media – 700 Facebook updates/sec

in2012 Video surveillance Video rendering Smart grids Geophysical exploration Medical imaging Gene sequencing – more prevalent, less

expensive

Page 5: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

2.3 Phase 2: Data Preparation

image

Page 6: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

2.4 Phase 3: Model Planning

image

Page 7: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

2.6 Phase 5: Communicate Results

Structured – defined data type, format, structure Transactional data, OLAP cubes, RDBMS, CVS files,

spreadsheets Semi-structured

Text data with discernable patterns – e.g., XML data Quasi-structured

Text data with erratic data formats – e.g., clickstream data Unstructured

Data with no inherent structure – text docs, PDF’s, images, video

Page 8: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

2.7 Phase 6: Operationalize

image

Page 9: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

2.8 Case Study: Global Innovation Network and

Analysis (GINA)

image

Page 10: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

1.2 State of the Practicein Analytics

Business Intelligence (BI) versus Data Science

Current Analytical Architecture Drivers of Big Data Emerging Big Data Ecosystem

and a New Approach to Analytics

Page 11: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Business Intelligence (BI) versus Data Science

image

Page 12: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Business Intelligence (BI) versus Data Science

image

Page 13: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Current Analytical Architecture

image

Page 14: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Current Analytical Architecture

image

Page 15: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Drivers of Big Data

image

Page 16: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Emerging Big Data Ecosystem and a New Approach to Analytics

Four main groups of players Data devices

Games, smartphones, computers, etc. Data collectors

Phone and TV companies, Internet, Gov’t, etc. Data aggregators – make sense of data

Websites, credit bureaus, media archives, etc. Data users and buyers

Banks, law enforcement, marketers, employers, etc.

Page 17: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Emerging Big Data Ecosystem and a New Approach to Analytics

image

Page 18: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

1.3 Key Roles for theNew Big Data

Ecosystem

image

Page 19: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Three Key Roles of theNew Big Data

Ecosystem

1. Deep analytical talent Advanced training in quantitative disciplines –

e.g., math, statistics, machine learning

2. Data savvy professionals Savvy but less technical than group 1

3. Technology and data enablers Support people – e.g., DB admins,

programmers, etc.

Page 20: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Three Recurring Data Scientist Activities

1. Reframe business challenges as analytics challenges

2. Design, implement, and deploy statistical models and data mining techniques on Big Data

3. Develop insights that lead to actionable recommendations

Page 21: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Profile of Data ScientistFive Main Sets of Skills

image

Page 22: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Profile of Data ScientistFive Main Sets of Skills

Quantitative skill – e.g., math, statistics

Technical aptitude – e.g., software engineering, programming

Skeptical mindset and critical thinking – ability to examine work critically

Curious and creative – passionate about data and finding creative solutions

Communicative and collaborative – can articulate ideas, can work with others

Page 23: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

1.4 Examples of Big Data Analytics

Retailer Target Uses life events: marriage, divorce,

pregnancy Apache Hadoop

Open source Big Data infrastructure innovation

MapReduce paradigm, ideal for many projects

Social Media Company LinkedIn Social network for working professionals Can graph a user’s professional network 250 million users in 2014

Page 24: Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

Focus of Course

Focus on quantitative disciplines – e.g., math, statistics, machine learning

Provide overview of Big Data analytics In-depth study of a several key algorithms