5th Tableau Meetup: Tableau & R

Preview:

Citation preview

UC

BM

ar 27th 2015

Tableau desktop Trial: http://goo.gl/iHPnz7

5th Tableau Meetup:R-integration

Overview

+ What is R?

+ Tableau & R Connection

+ Ex. 1: Sentiment Analysis

+ Ex. 2: Clustering, Outliers

+ Discussing more advanced examples

What is R?

+ Language and environment for statistical computing

+ Open source ⇒ Free

+ A lot of packages available covering a very wide range of modern statistics

+ Widely used in academics

Tableau & R

Tableau had two goals:

1. Rich collection of statistical analysis for deeper insights

2. Connect Tableau’s fluid data exploration to R users.

⇒ New functions in the calculated field list

R Download

1. Download R 2.11.1. (or higher) via https://cran.rstudio.com/

2. Download RStudio via https://www.rstudio.com/products/rstudio/download/ which makes using R a lot nicer, and lets you use lots of packages very easily.

Setting up the connection

1. Open RStudio

2. Install the package Rserve by going to the tab Packages.

3. Initiate Rserve for R by entering on the command line: Rserve()

4. Open TableauDownload a free trial using http://goo.gl/iHPnz7

5. Go to Help ⇒ Settings & Performance ⇒ Manage R Connection

6. Set the Server and Port to localhost and 6311 respectively

Downloading folder

Download the zip file on http://bit.ly/1Pqp6hT

ATTR()

Tableau uses following formula:

In other words:

“Does every row has the same value ?” If yes then value, if not then *

Ex. 1: Sentiment analysis on TweetsInstall the following R packages: tm, Rstem, sentiment.

Create the following calculated fields in Tableau:

1. Polarity:

SCRIPT_STR("

library(sentiment)

classify_polarity(.arg1)[,4]"

,ATTR([Tweet]))

2. Emotion:

SCRIPT_STR("

library(sentiment)

classify_polarity(.arg1)[,4]"

,ATTR([Tweet]))

Ex. 2: Clustering

We will use 4 features for clustering our customers: 1. Sales

2. Frequency

3. How many months ago did they buy something?

4. Since how many months are they customers?

Ex. 2: Clustering

Create the following calculated fields in Tableau.

1. Sales

2. Frequency: SUM([Number of Records])

3. How many months ago did they buy something?DATEDIFF('month',[Order Date],{EXCLUDE [Customer Name]:MAX([Order Date])})

4. Since how many months are they customers?First occur as customer: { FIXED [Customer Name]: MIN([Order Date])}

Months as customer: DATEDIFF('month', [First occur], {EXCLUDE [Customer Name]: MAX([Order Date])})

Ex. 2: Clustering

Create 2 parameters:1. Nr of clusters: Allows you to choose the number of clusters2. Seed: Provides reproducible clusters

Create a calculated field called Clusters:SCRIPT_INT(" ## Sets the seed set.seed( .arg6[1] )

## Standardize the variables sales <- ( .arg1 - mean(.arg1) ) / sd(.arg1) freq <- ( .arg2 - mean(.arg2) ) / sd(.arg2) min <- ( .arg3 - mean(.arg3) ) / sd(.arg3) loyal <- ( .arg4 - mean(.arg4) ) / sd(.arg4)

dat <- cbind(sales, freq, min, loyal) num <- .arg5[1]

kmeans(dat, num)$cluster", SUM([Sales]), [Frequency], min([How many months ago?]), attr([Number of months customers]),[Nr of clusters], [Seed])

Note: Leave the partition of Clusters empty

Ex. 2: OutliersCreate the following calculated field called Outlier:IF SCRIPT_REAL(" library(mvoutlier) sign2(cbind(.arg1))$wfinal01", SUM([Profit])) = 0THEN "Outlier"ELSE "Normal"END

Datatonic is a team of data experts that enables businesses to perform better through the power of analytics. We advise to build a better data architecture, bring the data to life through advanced visual reporting and crafting state-of-the-art analytic tools.

Datatonic is trusted by global players in the retail, finance, telecom, manufacturing and non-for-profit sectors.1. Data viz2. Big data3. Data science4. All of the above as a service

Datatonic

Roadshow

Datatonic is going on the road in Belgium with Tableau Software.

Find out yourself how Tableau can be (better) involved in your organisation!

roadshow.datatonic.com

+ Hands-on Training for Beginners

+ Hands-on Training Advanced (Table calculations, LoD or server scaling)

+ Doctor Session (Solving specific issues or optimize for performance)

+ Enterprise Deployment: How to & best practices

Recommended