79
High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst [email protected] 8/4/2014 https://polsci.umass.edu/profiles/ denny matthew j/workshop-materials

High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst [email protected]

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

High Performance Computing and BigData Analytics: An Introduction

Matthew J. DennyUniversity of Massachusetts Amherst

[email protected]

8/4/2014

https://polsci.umass.edu/profiles/

denny matthew j/workshop-materials

Page 2: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Overview

What we will go over tonight:

1. What is High Performance Computing

(HPC)/Big Data Analytics?

2. Strategy - work smart, not hard.

3. Software and programming choices.

4. Hardware.

5. Resources.

Page 3: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Warning:

You will not receive direct,

career-advancing professional

compensation for developing HPC

and Big Data skills.

You still have to work on somethingimportant/interesting.

Page 4: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

1. What is it?

Page 5: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

What it is not

My Data Are So Big...

Page 6: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

What it is

I An approach more than a specific set

of tools

I Effort to scale up analysis.

I Determining the most efficient way tocomplete your analysis task.

I Time

I Resources

Page 7: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

High Performance Computing

I Run analysis faster.

I Run analysis at larger scale.

I Work on more complex problems.

Page 8: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

How?

I Make use of low overhead, high speed

programming languages (C, C++,

Fortran, Java, etc.)

I Parallelization.

I Efficient implementation.

I Good scheduling.

Page 9: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Big Data Analytics

I Work with larger datasets.

I Efficiently analyze large datasets.

I Leverage large amounts of data.

Page 10: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

How?

I Use memory efficient data structures and

programming languages.

I More RAM.

I Databases.

I Efficient inference procedures.

I Good scheduling.

Page 11: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

How they fit together

High Performance

Computing

Page 12: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

2. Strategy

Page 13: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Checklist

1. Understand your challenge.

2. Determine which approach is

appropriate.

3. Exercise Constraint.

4. Work smarter, not harder.

Page 14: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Hardware constraints

I RAM = computer working memory –determines size of datasets you can work on.

I CPU = processor, determines speed ofanalysis and degree of parallelization.

Page 15: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Look at your activity monitor!

Page 16: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Other factors to consider

I Different analysis packages are designed

for different scales.

I Know your data.

I When does the project have to be

complete?

Page 17: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Understand your challenge

I Large dataset.

I Analysis takes a long time to run.

I Analysis requires many replications.

Page 18: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Large dataset – panel data, event historydata

I Determine memory requirements.

I Use memory efficient software.

I Find a high RAM computer.

I Break problem up.

Page 19: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Long run time – MCMC, optimization

I Determine approximate run time.

I Less than a month? – Just let it run.

I Reliable power, turn off automatic

updates, save periodically if possible.

I Implement in a faster language (C++

gives 1,000+ times speedup over R)

Page 20: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Many replications – cross validation,bootstrapping

I Write code to run one instance.

I Use looping, try subset first.

I Parallelize.

I Use several computers/cluster.

Page 21: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

2.1 Parallelization

Page 22: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Homogeneous parallelization

A B FEDC

A

A B FEDC

B C

Homogeneous Task

Results

Multiple Processors

A B C D E F

A B C D FE

Results

Heterogeneous Task

Page 23: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Heterogeneous parallelization

A B FEDC

A

A B FEDC

B C

Homogeneous Task

Results

Multiple Processors

A B C D E F

A B C D FE

Results

Heterogeneous Task

Page 24: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Map-Reduce

A B D E F

A B C D FE

Reduce

C

A B C D FE

A B D E F

A B C D FE

C

A B C D FE

Update

Map

Map

Reduce

Page 25: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

2.2 General Advice

Page 26: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

It takes time...

I Implementing HPC can take

more time than you save.

I Reuse somebody else’s code where

possible – GOOGLE IT!

I You will not be professionally

rewarded for HPC skills.

I Get help, find a collaborator.

Page 27: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Exercise restraint

I Weight the costs of an HPC project

before pursuing it – get advice.

I HPC resources are expensive, be careful

with your money.

I Invest in resources/languages that will

be transferable to other projects.

Page 28: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Know the benefits

I Developing HPC skillset can make you a

valuable collaborator.

I HPC skills most valuable when they let

you work on a problem you could

not otherwise.

I Industry loves HPC, can get internships

in data science.

Page 29: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

3. Software andProgramming Choices

Page 30: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Important considerations

1. Software platform can make a big

difference in speed/efficiency.

2. Programming speed and readability vs.

run time.

3. Repetitive tasks can be automated.

4. Remote access is valuable.

Page 31: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Software choices

I Stata, SAS, SPSS, Matlab

I Readable syntax, fast programming,

less control.

I R, Python

I Flexible, more control, harder to code.

I C++, C, Fortran, Java, CUDA

I Most control, fastest, hardest to code.

Page 32: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Stata

I Memory efficient

I Reasonably fast.

I Not flexible.

I Have to pay for multithreaded version.

Page 33: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Python

I Great for file I/O.

I Easy to read

I Incredibly flexible.

I Can interface with other languages.

Page 34: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

R

I Access to many statistical packages.

I Existing code base for HPC.

I Not memory efficient, or fast.

Page 35: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

C++

I Very fast.

I Interface with R.

I Difficult to program.

Page 36: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

3.1 Software Packages

Page 37: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

R packages for HPC

I snowfall – cluster paralellization

I sfClusterApplyLB( )

I parallel – included in base R

I mclapply( ) or foreach

I biglm – high memory regression

I bigglm( )

Page 38: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

C++ in R

I Rcpp – allows for the integration of C++

code in R.

I RcppArmadillo – Gives access to

Armadillo libraries.

I RStudio – IDE of R with built in C++

debugging

Page 39: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

3.2 ProgrammingChoices

Page 40: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Efficient R programming

I Loops are slow in R, but faster than

doing something by hand

I Built-in functions are mostly written in

C – much faster!

I Subset data before processing when

possible.

I Test with system.time({ code })

Page 41: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Loops are “slow” in R

system.time({

vect <- c(1:10000000)

total <- 0

#check using a loop

for(i in 1:length(as.numeric(vect))){

total <- total + vect[i]

}

print(total)

})

[1] 5e+13

user system elapsed

7.641 0.062 7.701

Page 42: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

And fast in C

system.time({

vect <- c(1:10000000)

#use the builtin R function

total <- sum(as.numeric(vect))

print(total)

})

[1] 5e+13

user system elapsed

0.108 0.028 0.136

Page 43: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Summing over a sparse dataset

#number of observations

numobs <- 100000000

#observations we want to check

vec <- rep(0,numobs)

#only select 100 to check

vec[sample(1:numobs,100)] <- 1

#combine data

data <- cbind(c(1:numobs),vec)

Page 44: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Conditional checking

system.time({

total <- 0

for(i in 1:numobs){

if(data[i,2] == 1)

total <- total + data[i,1]

}

print(total)

})

[1] 5385484508

user system elapsed

199.917 0.289 200.350

Page 45: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Subsetting

system.time({

dat <- subset(data, data[,2] ==1)

total <- sum(dat[,1])

print(total)

})

[1] 5385484508

user system elapsed

5.474 1.497 8.245

Page 46: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

3.3 Remote Access

Page 47: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Overview

I Connect from your laptop to an HPC

resource.

I Secure shell (SSH), graphical interface

(RDP/VNC) or web interface (RStudio

Server)

I Cluster job scheduling.

Page 48: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

SSH

Page 49: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Connecting to a remotedesktop/Workstation

Page 50: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

RStudio Server over Internet (Linux only)

Page 51: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

How a cluster works

Page 52: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Connecting to a cluster

Page 53: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Job scheduling on a cluster

I System to share resources.

I Job Schedulers (Moab ,Grid Engine,

LoadLeveler, SLURM, LSF).

I SSH −→ FTP data to local directory

−→ submit ”job”

bsub -n 4 -R "rusage[mem=2048]"

-W 0:10 -q long example.sh

Page 54: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

example.sh

The file has now been created and saved:

Page 55: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Connecting to your owndesktop/Workstation

I You will need an IP address:

Example – 35.2.23.132

I Static vs. Dynamic

I Your university should be able to provide

a static IP for free.

I Dyn DNS – $25 per year

http://dyn.com/remote-access/

Page 56: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Security concerns

I If you get a static IP, people will try to

hack into your computer.

I Set firewall (see course handout)

I Use a Virtual Private Network (VPN).

Page 57: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

4. Hardware

Page 58: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Classes of hardware

1. Supercomputers

2. Mainframes

3. Cluster Computing Resources

4. Servers

5. HPC Workstations

6. Consumer Desktops

7. GPGPU

Page 59: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Supercomputers

I Used when all computing resources are

needed to solve one problem.I Physics, engineering, materials science

Page 60: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Mainframes

I Used for large database applications.

I Business analytics, healthcare.

Page 61: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Cluster Computing Resources

I Flexible, used for parallel and high

memory tasks.

I General purpose academic computing

infrastructure.

Page 62: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Servers

I Most often used for hosting websites.

I Can be useful for long jobs, high

memory.

Page 63: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

HPC Workstations

I Personal mid-size high memory and

parallel computing.

I For people who moderate resources

constantly.

Page 64: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Desktop

I Everything, depending on how long you

are willing to wait.I Will run 95% of what you want to do.

Page 65: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

General Purpose GPU Computing

I Problems that break down to small,

interdependent parts.I Bootstrapping, complex looping,

optimization.

Page 66: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Pricing tiers

I Cluster Access : Usually free through

your institution but often requires

application/faculty sponsorship.

I HPC Workstation: $8,000-$15,000 –

Not a good investment for most.

I Desktop: $700-$2000 – Often a very

good investment.

Page 67: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

My suggestion

I Ask a faculty member for access. TRY

THIS FIRST.

I Investing in a desktop with 4C/8T (Intel

i7) and 16GB of ram is often a smart

idea if it will not get in the way of

conference attendance.

I Access a university cluster only once you

are confident a desktop can no longer

meet your needs.

Page 68: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Upgrades

I Old computers are good for HPC tasks

that simply take a while to run.

I Locate computer in academic office for

free electricity/internet/easier remote

access.

I Relatively cheap upgrades can

dramatically improve performance.

I BENCHMARK

Page 69: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Know your motherboard

I How many RAM Slots?

I Peripherals, CPU, GPU slots.

Page 70: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

RAM

I Work with larger datasets.

I 16GB Kits – ($100-150)

I 32GB Kits – ($200-300)

Page 71: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Solid state hard drive

I General system performance, data I/O.

I $0.40-$0.80 per GB

I Leave 15-20% free.

Page 72: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Check review sites

Page 73: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Things to remember

I Get an Uninterrupted Power Supply

(UPS) for stable power.

I Put a sign on your computer that says

don’t touch.

Page 74: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Summary

I Don’t buy it unless you absolutely

need it.

I Most resources can be borrowed/had for

free.

I More powerful resources require more

time to learn.

Page 75: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

5. Resources

Page 76: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

R HPC resources

I Tim Churches – parallelization tutorial

https://github.com/timchurches/smaRts/blob/master/

parallel-package/R-parallel-package-example.md

I Introduction to Scientific Programming and

Simulation Using R

Page 77: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Rcpp/C++ resources

I Dirk Eddelbuettel – Rcpp

http://www.rcpp.org/

I Hadley Wickham – Advanced R

http://adv-r.had.co.nz/

I Armadillo library API documentation

http://arma.sourceforge.net/docs.html

Page 78: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu

Before tomorrow:

I Download RStudio.

I Get snowfall, biglm, Rcpp and

RcppArmadillo packages.

I Get Cygwin and PuTTY if you have a

Windows machine.