Training in Analytics, R and Social Media Analytics

Preview:

Citation preview

Basics of Analysis, Analytics and R

Ajay Ohri

Why analysis● Humans can count only till so much● We understand summarized information● We understand graphs faster● We need to take decisions● Wrong Decisions lead to huge costs

Central Tendency● What is the difference between mean and

median● When to use what?● What is expected value?● When can mean be misleading?

Exercise- What is the average height of this class

Grouped MeansExercise-

What is height of classWhat is the height of class by genderWhat is the height of class by teamWhat is the height of class by dark-light colored clothing

CROSS TABS- exercise of mtcars

VarianceWhat is the range (max - min)What is a quartile (4 quarters)What is a decile (10 deciles)

No one really uses standard deviation in business world

Frequency Analysiscontingency tables

Height range Number of students Cumulative number

less than 5.0 feet

25 25

5.0–5.5 feet 35 60

5.5–6.0 feet 20 80

6.0–6.5 feet 20 100

Dance Sports TV Total

Men 2 10 8 20

Women 16 6 8 30

Total 18 16 16 50

Histogram

What is a distribution

EDA

Exploratory Data Analysis

Box Plot

Analytics• What is analytics?• Where is it used?• How is it used?• What are some good practices?

Analytics• What is analytics? – Study of data for helping

with decision making using software• Where is it used?• How is it used?• What are some good practices?

Analytics• What is analytics?• Where is it used? – Industries (like Pharma,

BFSI, Telecom, Retail)• How is it used? –Use statistics and software• What are some good practices?

Analytics• What is analytics?• Where is it used?• How is it used?• What are some good practices? –

– Learn one new thing extra from your competition every day. This is a fast moving field.

– Etc.

What is Data Science

Other Analytics Software• SAS (Base) et al• JMP• SPSS

• Python• Octave• Clojure• Julia(?)

R

Social Media Analytics

Some examples http://decisionstats.com/2013/12/04/top-fourteen-interfaces-in-social-media-and-web-analytics-on-the-internet/

Some use cases http://decisionstats.com/2014/05/10/analyzing-facebook-networks-using-rstats/http://decisionstats.com/2013/09/11/using-twitter-data-with-r/

What is R?http://www.r-project.org/

• Language– Object oriented– Open Source– Free– Widely used

the concept of "objects" that have data fields(attributes that describe the object) and associated procedures known as methods. Objects, which are usually instances of classes, are used to interact with one another to design applications and computer programs

Pre Requisites• Installation of Rhttp://cran.rstudio.com/bin/windows/base/

• R Studio

• R Packages

Pre Requisites• Installation of R

– RTools

• R Studiohttp://www.rstudio.com/products/rstudio/download/

• R Packages

install.packages(),update.packages(),library()Packages are installed once, updated periodically, but loaded every time

Interfaces to R• ConsoleDefaultCustomization

• IDE

• GUI

Demo- Basic Objects on R Console

• +• -• Log• Exp• *• /• ()

Hint- Up arrow gives you lasttyped command

Functions- ls() – what objects are hererm(“foo”) removes object named foo

Assignment Using = or -> assigns object names to values

Functions and Loops• Loops for (number in 1:5){ print (number) }

Functions and Loops• Functionfunctionajay=function(a)(a^2+2*a+1)

Hint: Always match brackets

Each ( deserves a )

Each { deserves a }Each [ deserves a ]

Demo- Basic Objects on R Console

• +• -• Log• Exp• *

This is made more clear in next slide

Hint- Up arrow gives you lasttyped command

Functions- class() gives classdim() gives dimensionsnrow() gives rowsncol() gives columnslength() gives length

str() gives structure

Demo- Datasets on R Console

Hint- use data() to list all loaded datasets

Demo- Datasets on R Console

Hint- use data() to list all loaded datasetslibrary(FOO) loads package “FOO”

Packages in R• CRAN• CRAN Views• R Documentation

Documentation in R• Help ? And ??• CRAN Views• Package Help• Tips for Googling

– Stack Overflow– Email Lists– Twitter– R Bloggers

Graphical Interfaces to R• R Commander

• Rattle

• Deducer

Overview of R Commander

DemoR Commander – 3D Graphs

Overview of Rattle

Demo Rattle

Overview of Deducer (with JGR)

Demo Deducer• data()• data(mtcars)

read.table()

From DatabasesThe RODBC package provides access to databases through

an ODBC interface.The primary functions are • odbcConnect(dsn, uid="", pwd="") Open a connection

to an ODBC database• sqlFetch(channel, sqltable) Read a table from an ODBC

database into a data frameHint- a good site to learn R http://www.statmethods.net

A Detour to SQL

From Web (aka Web Scraping)• readlines Hint : R is case sensitive

readlines is not the same as readLines

Hint : Use head() and tail() to inspect objects

Other packages are XML and CurlCase Study- http://decisionstats.com/2013/04/14/using-r-for-cricket-analysis-rstats/

Inspecting Data Quality: Demo•

Inspecting Data Quality: Demo•

Data Selection: Demo

Questions- How do I use multiple conditions (AND OR)Can I do away with subset functionHow do I select random sample

Useful Link- http://decisionstats.com/2013/11/24/50-functions-to-clear-a-basic-interview-for-business-analytics-rstats/

Data Exploration• missing values are represented by NA in R• Demo

– is.na– na.omit– na.rm

Data Visualization

Notes- Explaining Basic Types of Graphs

Customizing GraphsGraph OutputAdvanced GraphsFacets,

Grammar of GraphicsData Visualization Rules

Data Manipulation Demo

Notes-1. gsub2. gsub with

escape 3. as operator4. is operator