Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
R and lsquoFaster DatarsquoThe Case for Rcpp
Dirk EddelbuettelISM HPCCON 2015 amp ISM HPC on R WorkshopThe Institute of Statistical Mathematics Tokyo JapanOctober 9 - 12 2015Released under a Creative Commons Attribution-ShareAlike 40 International License
173
Introduction
273
A Very Kind Tweet
373
And Another Tweet
473
And Yet Another Tweet
573
And Why Not Another Tweet
673
And Last But Not Least
773
Extending R
873
Why R Programming with Data
ChambersComputationalMethods for DataAnalysis Wiley1977
Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988
Chambers andHastie StatisticalModels in SChapman amp Hall1992
ChambersProgramming withData Springer1998
ChambersSoftware for DataAnalysisProgramming withR Springer 2008
Thanks to John Chambers for sending me high-resolution scans of the covers of his books
973
A Simple Example
xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)
1073
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Introduction
273
A Very Kind Tweet
373
And Another Tweet
473
And Yet Another Tweet
573
And Why Not Another Tweet
673
And Last But Not Least
773
Extending R
873
Why R Programming with Data
ChambersComputationalMethods for DataAnalysis Wiley1977
Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988
Chambers andHastie StatisticalModels in SChapman amp Hall1992
ChambersProgramming withData Springer1998
ChambersSoftware for DataAnalysisProgramming withR Springer 2008
Thanks to John Chambers for sending me high-resolution scans of the covers of his books
973
A Simple Example
xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)
1073
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
A Very Kind Tweet
373
And Another Tweet
473
And Yet Another Tweet
573
And Why Not Another Tweet
673
And Last But Not Least
773
Extending R
873
Why R Programming with Data
ChambersComputationalMethods for DataAnalysis Wiley1977
Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988
Chambers andHastie StatisticalModels in SChapman amp Hall1992
ChambersProgramming withData Springer1998
ChambersSoftware for DataAnalysisProgramming withR Springer 2008
Thanks to John Chambers for sending me high-resolution scans of the covers of his books
973
A Simple Example
xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)
1073
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
And Another Tweet
473
And Yet Another Tweet
573
And Why Not Another Tweet
673
And Last But Not Least
773
Extending R
873
Why R Programming with Data
ChambersComputationalMethods for DataAnalysis Wiley1977
Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988
Chambers andHastie StatisticalModels in SChapman amp Hall1992
ChambersProgramming withData Springer1998
ChambersSoftware for DataAnalysisProgramming withR Springer 2008
Thanks to John Chambers for sending me high-resolution scans of the covers of his books
973
A Simple Example
xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)
1073
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
And Yet Another Tweet
573
And Why Not Another Tweet
673
And Last But Not Least
773
Extending R
873
Why R Programming with Data
ChambersComputationalMethods for DataAnalysis Wiley1977
Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988
Chambers andHastie StatisticalModels in SChapman amp Hall1992
ChambersProgramming withData Springer1998
ChambersSoftware for DataAnalysisProgramming withR Springer 2008
Thanks to John Chambers for sending me high-resolution scans of the covers of his books
973
A Simple Example
xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)
1073
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
And Why Not Another Tweet
673
And Last But Not Least
773
Extending R
873
Why R Programming with Data
ChambersComputationalMethods for DataAnalysis Wiley1977
Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988
Chambers andHastie StatisticalModels in SChapman amp Hall1992
ChambersProgramming withData Springer1998
ChambersSoftware for DataAnalysisProgramming withR Springer 2008
Thanks to John Chambers for sending me high-resolution scans of the covers of his books
973
A Simple Example
xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)
1073
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
And Last But Not Least
773
Extending R
873
Why R Programming with Data
ChambersComputationalMethods for DataAnalysis Wiley1977
Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988
Chambers andHastie StatisticalModels in SChapman amp Hall1992
ChambersProgramming withData Springer1998
ChambersSoftware for DataAnalysisProgramming withR Springer 2008
Thanks to John Chambers for sending me high-resolution scans of the covers of his books
973
A Simple Example
xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)
1073
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Extending R
873
Why R Programming with Data
ChambersComputationalMethods for DataAnalysis Wiley1977
Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988
Chambers andHastie StatisticalModels in SChapman amp Hall1992
ChambersProgramming withData Springer1998
ChambersSoftware for DataAnalysisProgramming withR Springer 2008
Thanks to John Chambers for sending me high-resolution scans of the covers of his books
973
A Simple Example
xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)
1073
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Why R Programming with Data
ChambersComputationalMethods for DataAnalysis Wiley1977
Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988
Chambers andHastie StatisticalModels in SChapman amp Hall1992
ChambersProgramming withData Springer1998
ChambersSoftware for DataAnalysisProgramming withR Springer 2008
Thanks to John Chambers for sending me high-resolution scans of the covers of his books
973
A Simple Example
xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)
1073
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
A Simple Example
xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)
1073
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
A Simple Example
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1173
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
A Simple Example - Refined
xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000
x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y
)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))
col=grey border=F)lines(fit1)
1273
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
A Simple Example - Refined
1 2 3 4 5 6
00
01
02
03
04
05
densitydefault(x = xx)
N = 272 Bandwidth = 03348
Den
sity
1373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
So Why R
R enables us to
middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip
making it the key language for statistical computing and apreferred environment for many data analysts
1473
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
So Why R
R has always been extensible via
middot C via a bare-bones interface described in Writing RExtensions
middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C
So while in theory this always worked ndash it was tedious inpractice
1573
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1673
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Why Extend R
Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran
Since the core of R is in fact a program written inthe C language itrsquos not surprising that the mostdirect interface to non-R software is for code writtenin C or directly callable from C All the sameincluding additional C code is a serious step withsome added dangers and often a substantial amountof programming and debugging required You shouldhave a good reason
1773
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Why Extend R
Chambers proceeds with this rough map of the road ahead
middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software
middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references
1873
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Why Extend R
The Why boils down to
middot speed Often a good enough reason for us hellip and a focusfor us in this workshop
middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R
middot references Chambers quote from 2008 foreshadowed thework on Reference Classes now in R and built upon viaRcpp Modules Rcpp Classes (and also RcppR6)
1973
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
And Why C++
middot Asking Google leads to about ~ 50 million hitsmiddot Wikipedia C++ is a statically typed free-form
multi-paradigm compiled general-purpose powerfulprogramming language
middot C++ is industrial-strength vendor-independentwidely-used and still evolving
middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connect to it probably has a CC++ API
middot As a widely used language it also has good tool support(debuggers profilers code analysis)
2073
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Why C++
Scott Meyers View C++ as a federation of languages
middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C
middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)
middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages
middot The Standard Template Library (STL) is a specifictemplate library which is powerful but has its ownconventions
middot C++11 (and C++14 and beyond) add enough to becalled a fifth language
NB Meyers original list of four languages appeared years before C++11 2173
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Why C++
middot Mature yet currentmiddot Strong performance focus
middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the
machine level and C++
middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly
shielded
2273
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Interface Vision
2373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Bell Labs May 1976
2473
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Interface Vision
R offers us the best of both worlds
middot Compiled code withmiddot Access to proven libraries and algorithms in
CC++Fortranmiddot Extremely high performance (in both serial and parallel
modes)middot Interpreted code with
middot An accessible high-level language made for Programmingwith Data
middot An interactive workflow for data analysismiddot Support for rapid prototyping research and
experimentation2573
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Why Rcpp
middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples
middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure
middot Expressive as it allows for vectorised C++ using RcppSugar
middot Seamless access to all R objects vector matrix listS3S4RefClass Environment Function hellip
middot Speed gains for a variety of tasks Rcpp excels preciselywhere R struggles loops function calls hellip
middot Extensions greatly facilitates access to external librariesusing eg Rcpp modules
2673
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Getting Started
2773
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
sourceCpp Jumping Right In
RStudio makes starting very easy
2873
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
A First Example Contrsquoed
The following file gets createdinclude ltRcpphgtusing namespace Rcpp
This is a simple example of exporting a C++ function to R You can source this function into an R session using the RcppsourceCpp function (or via the Source button on the editor toolbar)
[[Rcppexport]]NumericVector timesTwo(NumericVector x)
return x 2
You can include R code blocks in C++ files processed with sourceCpp (useful for testing and development) The R code will be automatically run after the compilation
RtimesTwo(42)
2973
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
A First Example Contrsquoed
So what just happened
middot We defined a simple C++ functionmiddot It operates on a numeric vector argumentmiddot We asked Rcpp to lsquosource itrsquo for usmiddot Behind the scenes Rcpp creates a wrappermiddot Rcpp then compiles links and loads the wrappermiddot The function is available in R under its C++ name
3073
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
A First Example Contrsquoed
Try it
middot Save the file as say timesTwocppmiddot You could a temporary directory or a projects directory
or your desktop (keep it simple)middot Either press the Source button or callsourceCpp(thefilecpp) to compile it
middot Then at the R prompt
simpletimesTwo(21) more interestingtimesTwo(c(12344101))
3173
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
cppFunction
cppFunction() creates compiles and links a C++ file andcreates an R function to access it
cppFunction(int times2(int x) return 2x )times2(21) same identifier as C++ function
3273
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
evalCpp
evalCpp() evaluates a single C++ expression Includes anddependencies can be declared
This allows us to quickly check C++ constructs
library(Rcpp)evalCpp(2 + 2) simple test
[1] 4
evalCpp(stdnumeric_limitsltdoublegtmax())
[1] 1797693e+308
3373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Speed
3473
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Speed Example (due to StackOverflow)
Consider a function defined as
f(n) such that n when n lt 2
f(n minus 1) + f(n minus 2) when n ge 2
3573
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Speed Example in R
R implementation and use
f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))
Using it on first 11 argumentssapply(010 f)
[1] 0 1 1 2 3 5 8 13 21 34 55
3673
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Speed Example Timed
Timing
library(rbenchmark)benchmark(f(10) f(15) f(20))[14]
test replications elapsed relative 1 f(10) 100 0023 1000 2 f(15) 100 0542 23565 3 f(20) 100 6172 268348
3773
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Speed Example in C C++
A C or C++ solution can be equally simple
int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))
But how do we call it from R
3873
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Mattrsquos Example from useR 2015include ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
SEXP fibonacci_c(SEXP n) SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
need to compile link load fibonacci lt- function(n) Call(fibonacci_c n)sapply(010 fibonacci) 3973
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
One Minor Modification to Mattrsquos Exampleinclude ltRhgtinclude ltRinternalshgt
int fibonacci_c_impl(int n) if (n lt 2) return nreturn fibonacci_c_impl(n - 1) + fibonacci_c_impl(n - 2)
[[Rcppexport]]SEXP fibonacci_c(SEXP n)
SEXP result = PROTECT(allocVector(INTSXP 1))INTEGER(result)[0] = fibonacci_c_impl(asInteger(n))UNPROTECT(1)return result
Rsapply(010 fibonacci_c)
4073
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Speed Example in C C++
But Rcpp makes this much easier
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
sapply(010 g)
[1] 0 1 1 2 3 5 8 13 21 34 55
4173
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Speed Example Comparing R and C++
Timing
RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )
library(rbenchmark)benchmark(f(25) g(25) order=relative)[14]
test replications elapsed relative 2 g(25) 100 020 10 1 f(25) 100 6622 3311
A nice gain of a few orders of magnitude4273
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Another Angle on Speed
Run-time performance is just one example
Time to code is another metric
We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development
A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building
4373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Speed Example Footnote Also Due to Matt
include ltRcpphgt
[[Rcppplugins(cpp11)]]
constexpr int fibonacci_recursive_constexpr(const int n) return n lt 2 n (fibonacci_recursive_constexpr(n - 1) +
fibonacci_recursive_constexpr(n - 2))
[[Rcppexport]]int constexprFib()
const int N = 42constexpr int result = fibonacci_recursive_constexpr(N)return result
4473
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Popularity
4573
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Used by 483 CRAN Packages as of this week
4673
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Page Rank One (According to Andrie de Vries)
4773
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Case Study
4873
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Previous Status
middot We have a lot of data circulating at workmiddot Market prices positions risk estimates profitloss hellipmiddot The used to be displayed in a one-off lsquodisplay gridrsquomiddot But no history and no plots
4973
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Easy R Fix
middot Use Redis to cache datamiddot Redis is simple well-established widely usedmiddot Excellent R package rredis by Bryan Lewismiddot Use Shiny to access Redis and create lsquodashboardsrsquomiddot We need to be fast enough to keep users engagedmiddot Goal is ~ 250 msec (in-line with web UI research)
5073
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
What does Redis do
middot Essentially a very fast and lightweight keyvalue storemiddot After SET key valuemiddot Do GET key to retrieve value
middot APIs for multiple languages CC++ Python Java hellipmiddot Can also store lists sets hellipmiddot Can be coaxed to provide simple columnar data storemiddot Basic access store strings retrieve strings
5173
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
What is wrong with that
middot String conversion lsquoexpensiversquo when done repeatedly for afew thousand points
middot Do string conversion in compiled code ndash RcppRedismiddot A step better R serialization and deserialization using
RApiSerialize
5273
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Getting Data
library(Quandl)Quandlapi_key(yourAPIkeyhere) register obtain key anon possible toosp lt- Quandl(CHRISCME_SP1 type=xts)saveRDS(sp file=dataquandl-sp1rds) longer serieses lt- Quandl(CHRISCME_ES1 type=xts)saveRDS(sp file=dataquandl-es1rds) more activehead(sp 3)
5373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard Monthly Plot
Apr1982
Jan1984
Jan1986
Jan1988
Jan1990
Jan1992
Jan1994
Jan1996
Jan1998
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Oct2015
SP Apr 1982 Oct 2015
200
400
600
800
1000
1200
1400
1600
1800
2000
200
400
600
800
1000
1200
1400
1600
1800
2000
5473
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Setter Version 1 via rredis
insertXtsR lt- function(x key) xm lt- coredata(x)xi lt- asinteger(index(x))for (i in seq_len(nrow(xm)))
dat lt- unname(c(xi[i] xm[i drop=TRUE]))redisRPush(key dat)
invisible(NULL)
5573
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Getter Base Version via rredis
getXtsR lt- function(key) n lt- asinteger(redisLLen(key))vals lt- redisLRange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5673
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Getter Rcpp Version 1
getXtsRcpp1 lt- function(key) n lt- asinteger(redis$llen(key))vals lt- redis$lrange(key 0 n)m lt- length(vals)mat lt- matrix(NA n 8)dat lt- rep(NA n)for (i in 1n)
z lt- vals[[i]]dat[i] lt- z[1]mat[i ] lt- z[-1]
x lt- xts(mat orderby=asDate(dat origin=1970-01-01))colnames(x) lt- colnamsx
5773
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Getter Rcpp Version 2
getXtsRcpp2 lt- function(key) mat lt- redis$listToMatrix(redis$lrange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
5873
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Timings
key lt- quandlcmesp1res lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)order=relative replications=25)[14]
print(res)
test replications elapsed relative 3 getXtsRcpp2(key) 25 0608 1000 2 getXtsRcpp1(key) 25 1768 2908 1 getXtsR(key) 25 29063 47801
5973
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Can we do better
middot Yes Redis also offers a binary typemiddot We grab each data row as a vectormiddot Pointer plus length a common form of expression
6073
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
New Rcpp Function R Side
insertXtsRcpp lt- function(x key) xm lt- coredata(x)xi lt- asnumeric(index(x))dat lt- unname(cbind(xi xm))for (i in seq_len(nrow(xm)))
redis$listRPush(key dat[i])invisible(NULL)
6173
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
New Rcpp Function Setter
redis append to list -- without R serializationstdstring listRPush(stdstring key RcppNumericVector x)
uses binary protocol see hiredis docsredisReply reply =
static_castltredisReplygt(redisCommand(prc_ RPUSH s bkeyc_str()xbegin() xsize()szdb))
stdstring res = freeReplyObject(reply)return(res)
6273
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
New Rcpp Function Getter
redis get from list from start to end -- without R serializationRcppList listRange(stdstring key int start int end)
redisReply reply =static_castltredisReplygt(redisCommand(prc_ LRANGE s d d
keyc_str() start end))checkReplyType(reply replyArray_t) ensure we got arrayunsigned int len = reply-gtelementsRcppList x(len)for (unsigned int i = 0 i lt len i++)
checkReplyType(reply-gtelement[i] replyString_t) ensure binaryint nc = reply-gtelement[i]-gtlenRcppNumericVector v(ncszdb)memcpy(vbegin() reply-gtelement[i]-gtstr nc)x[i] = v
freeReplyObject(reply)return(x)
6373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Use This Way
getXtsRcpp3 lt- function(key) mat lt- redis$listToMatrix(redis$listRange(key 0 -1))x lt- xts(mat[-1] orderby=asDate(mat[1] origin=1970-01-01))colnames(x) lt- colnamsx
6473
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Timings
key2 lt- quandlcmesp1rcppres2 lt- benchmark(getXtsR(key)
getXtsRcpp1(key)getXtsRcpp2(key)getXtsRcpp3(key2)order=relative replications=25)[14]
print(res2)
test replications elapsed relative 4 getXtsRcpp3(key2) 25 0364 1000 3 getXtsRcpp2(key) 25 0582 1599 2 getXtsRcpp1(key) 25 1747 4799 1 getXtsR(key) 25 29481 80992
6573
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Status
middot Not so bad ~ 80-fold increase for RcppRedis over rredismiddot Inner retrieval (outside of xts creation) about 100 times
fastermiddot 25 retrieval in 364 msec is clearly lsquogood enoughrsquomiddot Limitation Storing small binary vectors not elegantmiddot Possible fix MessagePackmiddot Alternative to lsquobinary JSONrsquo and alternativemiddot Easy to use API
6673
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Simple MessagePack buffer creation then sendingMessagePack buffer as binary load
typedef msgpacktypetupleltdouble int int intgt msg_t
msgpacksbuffer buffermsg_t m(v[0] (int)v[1] (int)v[2] (int)v[3]) fill the message typemsgpackpack(buffer m) and pack it
replynew =static_castltredisReplygt(redisCommand(d RPUSH s b
keyc_str()bufferdata() buffersize()))
freeReplyObject(replynew)
6773
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Time Series Dashboard
Conclusion
middot Simple things remain simplemiddot Memory allocation loops conversions hellip faster in C++middot Yet easily accessible from Rmiddot Leverage R strength (eg shiny) by overcoming bottlenecksmiddot Leads to Seamless Integration of R and C++ for
accelerated modeling
6873
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
The End
6973
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Documentation
middot The Rcpp package comes with nine pdf vignettes andnumerous help pages
middot The introductory vignettes are now published (for Rcppand RcppEigen in J Stat Software for RcppArmadillo inComp Stat amp Data Anlys)
middot The rcpp-devel list is the recommended resourcegenerally very helpful and fairly low volume
middot StackOverflow has over 900 posts too and Andmiddot A number of blog posts introducediscuss features
7073
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Rcpp Gallery
7173
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
The Rcpp book
7273
Very Last Words
Thank Youdirkeddelbuettelcom
7373
Very Last Words
Thank Youdirkeddelbuettelcom
7373