Realtime r

  • View
    488

  • Download
    2

Embed Size (px)

Text of Realtime r

  • 1. Streaming Data,Concurrency And R Rory Winston rory@theresearchkitchen.com

2. About MeIndependent Software ConsultantM.Sc. Applied Computing, 2000M.Sc. Finance, 2008Apache CommitterWorking in the nancial sector for the last 7 years or soInterested in practical applications of functional languages andmachine learningRelatively recent convert to R ( 2 years) 3. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 4. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 5. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 6. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 7. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 8. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 9. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 10. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 11. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 12. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 13. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 14. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 15. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 16. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 17. R - Pros and Cons Pro Designed by statisticiansCon Can be extremely elegantDesigned by statisticians Comprehensive extensionCan be clunky (S4) libraryBewildering array of Open-sourceoverlapping extensions Huge parallelization eortInherently single-threaded Fantastic reportingIncredibly Popular capabilities Incredibly Popular 18. Parallelization vs. ConcurrencyR interpreter is single threadedSome historical context for this (BLAS implementations)Not necessarily a limitation in the general contextMultithreading can be complex and problematicInstead a focus on parallelization: Distributed computation: gridR, nws, snow Multicore/multi-cpu scaling: Rmpi, Romp, pnmath/pnmath0 Interfaces to Pthreads/PBLAS/OpenMP/MPI/Globus/etc.Parallelization suits cpu-bound large data processingapplications 19. Other Scalability and Performance WorkJIT/bytecode compilation (Ra)Implicit vectorization a la Matlab (code analysis)Large ( RAM) dataset handling (bigmemory,ff)Many incremental performance improvements (e.g. lessinternal copying)Next: GPU/massive multicore...? 20. What Benet Concurrency? Real-time (streaming to be more precise) data analysis Growing Interest in using R for streaming data, not just oine analyis GUI toolkit integration Fine-grained control over independent task execution "I believe that explicit concurrency management tools (i.e. a threads toolkit) are what we really need in R at this point." - Luke Tierney, 2001 21. Will There Be A Multithreaded R?Short answer is: probably notAt least not in its current incarnationInternal workings of the interpreter not particularly amenableto concurrency:Functions can manipulate caller state (- vs.