Quirrel & R for Dummies

Introduction to Quirrel & ROSCON, July 25

John A. De Goes@jdegoes

Quirrel is an open standard language designed for the analysis of large-scale, heterogeneous data sets.

overview

R is an open source programming language and interactive environment for statistical computing and graphics.

Quirrel R

● Young language, still evolving

● Nascent community● Intentionally limited

● Simple, consistent core● Fully parallel● Purely functional● Programmatic or

interactive

quirrel versus r

Quirrel R

CONS / PROS

PROS / CONS

● Mature language, "feature-complete"

● Robust community● Turing-complete

● Complex core● Mostly parallel● Imperative● Interactive

what's the right tool for the job?

Small amount of

data?

Simple analytics?

Simple analytics?

YES

NO

NO

YES

YES

NO

Quirrel

Hive / Pig

SQL

R

pageViews := //pageViewsavg := mean(pageViews.duration)bound := 1.5 * stdDev(pageViews.duration)pageViews.userId where pageViews.duration > avg + bound

sneak peek

pageViews <- read.csv("pageViews.csv")avg <- mean(pageViews$duration)bound <- 1.5 * sd(pageViews$duration)userIds <- subset(pageViews, duration > avg + bound, select=userId)

Quirrel

R

data models

Everything is a random variable.

true, false1, 3.1415null, undefined"Mary Jane"[1, 2, 3][[1, 2, 3], [4, 5, 6], [7, 8, 9]]{"name": "John"}1 || 2 || 3 || 4 || 5 || 6[1, "foo", [1, false]]

Quirrel REverything is an ordered sequence of values.*

TRUE, FALSE1, 3.1415NA, NaN, Inf"Mary Jane"c(1, 2, 3)array(c(1,4,7,2,5,9,3,6,9), dim=c(3,3))data.frame(name=c("John"))c(1, 2, 3, 4, 5, 6)list(1, "foo", list(1, FALSE))

*Except when it's not.

comments

-- ignore me

(- ignore me too! -)

Quirrel R

# ignore me

# ignore # me # too!

basic expressions

2 * 4

(1 + 2) * 3 / 9 > 23

3 > 2 & (1 != 2)

2 + 2 = 4

false & true | !false

undefined = undefined

Quirrel R2 * 4

(1 + 2) * 3 / 9 > 23

3 > 2 & (1 != 2)

2 + 2 == 4

FALSE & TRUE | !FALSE

NA == NA

named expressions

x := 2

square := x * x

Quirrel R

x <- 2

square <- x * x

loading data

//pageViews

load("/pageViews")

//daily_snapshots/*

Quirrel R

read.csv("pageViews")

read.csv("pageViews")

lapply(Sys.glob("daily_snapshots/*", read.csv))

drilldown

pageViews := //pageViews

pageViews.userId

pageViews.keywords[2]

Quirrel R

pageViews <- read.csv("pageViews")

pageViews$userId

vector[2]

list[[1]]

reductions

count(purchases)

sum(purchases.total)

mean(purchases.total)

stdDev(purchases.total)

Quirrel R

length(purchases)

sum(purchases$total)

mean(purchases$total)

sd(purchases$total)

filtering

views.userId where views.duration > 1000

Quirrel Rsubset(views, duration > 100, select=userId)

augmentation

clicks with {dow: dayOfWeek(clicks.ts)}

Quirrel Rclicks$dow <- weekdays(clicks$ts)

libraries

import std::stats::rank

pageViews := //pageViews

rank(pageViews.duration)

Quirrel Rlibrary(data.table)

pageViews <- read.csv("views.csv)

rank(pageViews$duration)

user-defined functions

ctr(day) := count(clicks where clicks.day = day) / count(impressions where impressions.day = day)

ctr("Monday")

Quirrel Rctr <- function(d) { c1 <- subset(clicks, clicks$day == d) c2 <- subset(impressions, impressions$day == d) length(c1$day) / length(c2$day)}

ctr("Monday")

grouping - implicit constraints

solve 'day {day: 'day, ctr: count(clicks where clicks.day = 'day) / count(impressions where impressions.day = 'day)}

Quirrel Rclicks$count1 <- 0

c1 <- aggregate(count1 ~ day, data = clicks, FUN=length)

impressions$count2 <- 0 c2 <- aggregate(count2 ~ day, data = impressions, FUN=length)

r <- merge(c1, c2)

ctr <- data.frame(day = r$day, ctr = r$count1 / r$count2)

grouping - explicit constraints

solve 'date = purchases.date {date: 'date, cummTotal: sum(purchases.total where purchases.date < 'date)}

Quirrel Rpurchases2 <-purchases[ order(purchases$date)]

data.frame( date = purchases2$date, cummTotal = cumsum(purchases2$total))

Questions?Nov - Dec 2012

Quirrel / R Challenge ProblemsNov - Dec 2012

■ Using the /london_medals/summer_games data, find the youngest athlete to win a medal

challenge problem #1

Download dataset at http://labcoat.precog.com

■ Using the /london_medals/summer_games data, find the oldest athlete to win a medal



■ Using the /london_medals/summer_games data, find the average age at which athletes win medals



■ Using the /london_medals/summer_games data, find the most common age to win a medal



Thank you!

Follow me on Twitter:@jdegoes

Learn more about R:r-project.org

Download R:r-project.org/mirrors.html

Sign up for a free Precog account:precog.com

Learn more about Quirrel:quirrel-lang.org

Nov - Dec 2012

Technology

Quirrel & R for Dummies