62
The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data Introduction to R – a computing software for statistical analysis Krzysztof Podg´ orski Department of Mathematics and Statistics University of Limerick September 8, 2009

Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Introduction to R – a computing software forstatistical analysis

Krzysztof PodgorskiDepartment of Mathematics and Statistics

University of Limerick

September 8, 2009

Page 2: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Quotation of the lecture

“I was still a couple of miles above the clouds when it broke,and with such violence I fell to the ground that I found myselfstunned, and in a hole nine fathoms under the grass, when Irecovered, hardly knowing how to get out again. Looking down,I observed that I had on a pair of boots with exceptionallysturdy straps. Grasping them firmly, I pulled with all my might.Soon I had hoist myself to the top and stepped out on terrafirma without further ado.”

R. E. Raspe, Singular Travels, Campaigns and Adventures ofBaron Munchausen, 1786.

Page 3: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Outline

1 The R Project for Statistical Computing

2 Statistical Tables using R

3 Monte Carlo method

4 Resampling from data

Page 4: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Downloading and installing the R-package

can be downloaded from the following webside:http://www.r-project.org/index.html

Page 5: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:WindowsMac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 6: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:WindowsMac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 7: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:

WindowsMac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 8: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:Windows

Mac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 9: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:WindowsMac OS 10

Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 10: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:WindowsMac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 11: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:WindowsMac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 12: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:WindowsMac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 13: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:WindowsMac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 14: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:WindowsMac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 15: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:WindowsMac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 16: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The highlights

The package is available for all popular operating systems:WindowsMac OS 10Linux

It is free!

Everyone (knowledgeable enough) can contribute to the software bywriting a package

Packages are available for download through a convienient facility

It is fairly well documented and the documentation is available eitherfrom the program help menu or from the website

It is the top choice of statistical software among academic statisticiansbut also very popular in industry specially among biostatisticians andmedical researchers (mostly due to the huge package calledBioconductor that is built on the top of R)

It is a powerful tool not only for doing statistics but also all kind ofscientific programming

Page 17: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 18: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 19: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implemented

linear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 20: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,

classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 21: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,

time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 22: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,

classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 23: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,

clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 24: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,

...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 25: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 26: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 27: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includes

an effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 28: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,

a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 29: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,

a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 30: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,

graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 31: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, and

a well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 32: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

What is R? – only some basic information

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical and graphical techniques, and is highly extensible. Among its tools

one can find implementedlinear and nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering,...

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced,including mathematical symbols and formulae where needed.

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It

includesan effective data handling and storage facility,a suite of operators for calculations on arrays, in particular matrices,a large, coherent, integrated collection of intermediate tools for data analysis,graphical facilities for data analysis and display either on-screen or on hardcopy, anda well-developed, simple and effective programming language which includes conditionals, loops,user-defined recursive functions and input and output facilities.

Page 33: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Outline

1 The R Project for Statistical Computing

2 Statistical Tables using R

3 Monte Carlo method

4 Resampling from data

Page 34: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Table a normal distribution

The following is a fragment of the table of values of F (x) for thestandard normal cumulative distribution function

Page 35: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The same “table” in RHere is a simple code in R that produce the same values as in the table

#Preceding line with the symbol ’#’ makes it a comment in R#The following line produce a single value of the standard normal cumulative#function. It is the value corresponding to the first value in the table

pnorm(-3.4)

#[1] 0.0003369293

#Then the first row of the table

z=seq(-3.4,-3.31,by=0.01)pnorm(z)

# [1] 0.0003369293 0.0003494631 0.0003624291 0.0003758409 0.0003897124# [6] 0.0004040578 0.0004188919 0.0004342299 0.0004500872 0.0004664799

#And all values from the table

z=seq(-3.4,3.4,by=0.01)pnorm(z)

[1] 0.0003369293 0.0003494631 0.0003624291 0.0003758409 0.0003897124 0.0004040578 0.0004188919 0.0004342299 0.0004500872 0.0004664799[11] 0.0004834241 0.0005009369 0.0005190354 0.0005377374 0.0005570611 0.0005770250 0.0005976485 0.0006189511 0.0006409530 0.0006636749[21] 0.0006871379 0.0007113640 0.0007363753 0.0007621947 0.0007888457 0.0008163523 0.0008447392 0.0008740315 0.0009042552 0.0009354367[31] 0.0009676032 0.0010007825 0.0010350030 0.0010702939 0.0011066850 0.0011442068 0.0011828907 0.0012227687 0.0012638734 0.0013062384[41] 0.0013498980 0.0013948872 0.0014412419 0.0014889987 0.0015381952 0.0015888696 0.0016410612 0.0016948100 0.0017501569 0.0018071438[51] 0.0018658133 0.0019262091 0.0019883759 0.0020523590 0.0021182050 0.0021859615 0.0022556767 0.0023274002 0.0024011825 0.0024770750[61] 0.0025551303 0.0026354021 0.0027179449 0.0028028146 0.0028900681 0.0029797632 0.0030719592 0.0031667163 0.0032640958 0.0033641604[71] 0.0034669738 0.0035726010 0.0036811080 0.0037925623 0.0039070326 0.0040245885 0.0041453014 0.0042692434 0.0043964883 0.0045271111[81] 0.0046611880 0.0047987966 0.0049400158 0.0050849257 0.0052336082 0.0053861460 0.0055426234 0.0057031263 0.0058677417 0.0060365581[91] 0.0062096653 0.0063871548 0.0065691191 0.0067556526 0.0069468508 0.0071428107 0.0073436310 0.0075494114 0.0077602536 0.0079762603

[101] 0.0081975359 0.0084241864 0.0086563190 0.0088940426 0.0091374675 0.0093867055 0.0096418699 0.0099030756 0.0101704387 0.0104440771[111] 0.0107241100 0.0110106583 0.0113038442 0.0116037915 0.0119106254 0.0122244727 0.0125454614 0.0128737214 0.0132093838 0.0135525811[121] 0.0139034475 0.0142621184 0.0146287308 0.0150034230 0.0153863348 0.0157776074 0.0161773834 0.0165858067 0.0170030226 0.0174291779[131] 0.0178644206 0.0183088999 0.0187627664 0.0192261722 0.0196992704 0.0201822154 0.0206751629 0.0211782696 0.0216916938 0.0222155944[141] 0.0227501319 0.0232954678 0.0238517643 0.0244191853 0.0249978951 0.0255880595 0.0261898449 0.0268034189 0.0274289497 0.0280666067[151] 0.0287165598 0.0293789800 0.0300540390 0.0307419089 0.0314427630 0.0321567748 0.0328841187 0.0336249694 0.0343795024 0.0351478936[161] 0.0359303191 0.0367269557 0.0375379803 0.0383635704 0.0392039033 0.0400591569 0.0409295090 0.0418151376 0.0427162208 0.0436329365[171] 0.0445654628 0.0455139773 0.0464786579 0.0474596818 0.0484572263 0.0494714680 0.0505025835 0.0515507485 0.0526161385 0.0536989281[181] 0.0547992917 0.0559174025 0.0570534332 0.0582075556 0.0593799406 0.0605707580 0.0617801767 0.0630083645 0.0642554878 0.0655217121[191] 0.0668072013 0.0681121180 0.0694366233 0.0707808770 0.0721450370 0.0735292596 0.0749336995 0.0763585095 0.0778038405 0.0792698415[201] 0.0807566592 0.0822644387 0.0837933224 0.0853434508 0.0869149619 0.0885079914 0.0901226725 0.0917591357 0.0934175090 0.0950979178[211] 0.0968004846 0.0985253290 0.1002725680 0.1020423151 0.1038346811 0.1056497737 0.1074876971 0.1093485524 0.1112324374 0.1131394464[221] 0.1150696702 0.1170231960 0.1190001075 0.1210004844 0.1230244031 0.1250719356 0.1271431506 0.1292381122 0.1313568810 0.1334995132[231] 0.1356660609 0.1378565720 0.1400710901 0.1423096544 0.1445722997 0.1468590564 0.1491699503 0.1515050028 0.1538642304 0.1562476450[241] 0.1586552539 0.1610870595 0.1635430593 0.1660232461 0.1685276075 0.1710561263 0.1736087803 0.1761855422 0.1787863796 0.1814112549[251] 0.1840601253 0.1867329430 0.1894296548 0.1921502021 0.1948945213 0.1976625431 0.2004541933 0.2032693918 0.2061080536 0.2089700879[261] 0.2118553986 0.2147638842 0.2176954376 0.2206499463 0.2236272924 0.2266273524 0.2296499972 0.2326950923 0.2357624978 0.2388520681[271] 0.2419636522 0.2450970937 0.2482522305 0.2514288951 0.2546269147 0.2578461108 0.2610862997 0.2643472921 0.2676288935 0.2709309038[281] 0.2742531178 0.2775953248 0.2809573089 0.2843388490 0.2877397188 0.2911596868 0.2945985162 0.2980559654 0.3015317875 0.3050257309[291] 0.3085375387 0.3120669494 0.3156136965 0.3191775088 0.3227581103 0.3263552203 0.3299685537 0.3335978206 0.3372427268 0.3409029738[301] 0.3445782584 0.3482682735 0.3519727076 0.3556912452 0.3594235668 0.3631693488 0.3669282640 0.3706999811 0.3744841653 0.3782804782[311] 0.3820885778 0.3859081188 0.3897387524 0.3935801268 0.3974318868 0.4012936743 0.4051651283 0.4090458849 0.4129355774 0.4168338365[321] 0.4207402906 0.4246545653 0.4285762841 0.4325050683 0.4364405371 0.4403823076 0.4443299952 0.4482832133 0.4522415740 0.4562046875[331] 0.4601721627 0.4641436074 0.4681186280 0.4720968298 0.4760778173 0.4800611942 0.4840465631 0.4880335266 0.4920216863 0.4960106437[341] 0.5000000000 0.5039893563 0.5079783137 0.5119664734 0.5159534369 0.5199388058 0.5239221827 0.5279031702 0.5318813720 0.5358563926[351] 0.5398278373 0.5437953125 0.5477584260 0.5517167867 0.5556700048 0.5596176924 0.5635594629 0.5674949317 0.5714237159 0.5753454347[361] 0.5792597094 0.5831661635 0.5870644226 0.5909541151 0.5948348717 0.5987063257 0.6025681132 0.6064198732 0.6102612476 0.6140918812[371] 0.6179114222 0.6217195218 0.6255158347 0.6293000189 0.6330717360 0.6368306512 0.6405764332 0.6443087548 0.6480272924 0.6517317265[381] 0.6554217416 0.6590970262 0.6627572732 0.6664021794 0.6700314463 0.6736447797 0.6772418897 0.6808224912 0.6843863035 0.6879330506[391] 0.6914624613 0.6949742691 0.6984682125 0.7019440346 0.7054014838 0.7088403132 0.7122602812 0.7156611510 0.7190426911 0.7224046752[401] 0.7257468822 0.7290690962 0.7323711065 0.7356527079 0.7389137003 0.7421538892 0.7453730853 0.7485711049 0.7517477695 0.7549029063[411] 0.7580363478 0.7611479319 0.7642375022 0.7673049077 0.7703500028 0.7733726476 0.7763727076 0.7793500537 0.7823045624 0.7852361158[421] 0.7881446014 0.7910299121 0.7938919464 0.7967306082 0.7995458067 0.8023374569 0.8051054787 0.8078497979 0.8105703452 0.8132670570[431] 0.8159398747 0.8185887451 0.8212136204 0.8238144578 0.8263912197 0.8289438737 0.8314723925 0.8339767539 0.8364569407 0.8389129405[441] 0.8413447461 0.8437523550 0.8461357696 0.8484949972 0.8508300497 0.8531409436 0.8554277003 0.8576903456 0.8599289099 0.8621434280[451] 0.8643339391 0.8665004868 0.8686431190 0.8707618878 0.8728568494 0.8749280644 0.8769755969 0.8789995156 0.8809998925 0.8829768040[461] 0.8849303298 0.8868605536 0.8887675626 0.8906514476 0.8925123029 0.8943502263 0.8961653189 0.8979576849 0.8997274320 0.9014746710[471] 0.9031995154 0.9049020822 0.9065824910 0.9082408643 0.9098773275 0.9114920086 0.9130850381 0.9146565492 0.9162066776 0.9177355613[481] 0.9192433408 0.9207301585 0.9221961595 0.9236414905 0.9250663005 0.9264707404 0.9278549630 0.9292191230 0.9305633767 0.9318878820[491] 0.9331927987 0.9344782879 0.9357445122 0.9369916355 0.9382198233 0.9394292420 0.9406200594 0.9417924444 0.9429465668 0.9440825975[501] 0.9452007083 0.9463010719 0.9473838615 0.9484492515 0.9494974165 0.9505285320 0.9515427737 0.9525403182 0.9535213421 0.9544860227[511] 0.9554345372 0.9563670635 0.9572837792 0.9581848624 0.9590704910 0.9599408431 0.9607960967 0.9616364296 0.9624620197 0.9632730443[521] 0.9640696809 0.9648521064 0.9656204976 0.9663750306 0.9671158813 0.9678432252 0.9685572370 0.9692580911 0.9699459610 0.9706210200[531] 0.9712834402 0.9719333933 0.9725710503 0.9731965811 0.9738101551 0.9744119405 0.9750021049 0.9755808147 0.9761482357 0.9767045322[541] 0.9772498681 0.9777844056 0.9783083062 0.9788217304 0.9793248371 0.9798177846 0.9803007296 0.9807738278 0.9812372336 0.9816911001[551] 0.9821355794 0.9825708221 0.9829969774 0.9834141933 0.9838226166 0.9842223926 0.9846136652 0.9849965770 0.9853712692 0.9857378816[561] 0.9860965525 0.9864474189 0.9867906162 0.9871262786 0.9874545386 0.9877755273 0.9880893746 0.9883962085 0.9886961558 0.9889893417[571] 0.9892758900 0.9895559229 0.9898295613 0.9900969244 0.9903581301 0.9906132945 0.9908625325 0.9911059574 0.9913436810 0.9915758136[581] 0.9918024641 0.9920237397 0.9922397464 0.9924505886 0.9926563690 0.9928571893 0.9930531492 0.9932443474 0.9934308809 0.9936128452[591] 0.9937903347 0.9939634419 0.9941322583 0.9942968737 0.9944573766 0.9946138540 0.9947663918 0.9949150743 0.9950599842 0.9952012034[601] 0.9953388120 0.9954728889 0.9956035117 0.9957307566 0.9958546986 0.9959754115 0.9960929674 0.9962074377 0.9963188920 0.9964273990[611] 0.9965330262 0.9966358396 0.9967359042 0.9968332837 0.9969280408 0.9970202368 0.9971099319 0.9971971854 0.9972820551 0.9973645979[621] 0.9974448697 0.9975229250 0.9975988175 0.9976725998 0.9977443233 0.9978140385 0.9978817950 0.9979476410 0.9980116241 0.9980737909[631] 0.9981341867 0.9981928562 0.9982498431 0.9983051900 0.9983589388 0.9984111304 0.9984618048 0.9985110013 0.9985587581 0.9986051128[641] 0.9986501020 0.9986937616 0.9987361266 0.9987772313 0.9988171093 0.9988557932 0.9988933150 0.9989297061 0.9989649970 0.9989992175[651] 0.9990323968 0.9990645633 0.9990957448 0.9991259685 0.9991552608 0.9991836477 0.9992111543 0.9992378053 0.9992636247 0.9992886360[661] 0.9993128621 0.9993363251 0.9993590470 0.9993810489 0.9994023515 0.9994229750 0.9994429389 0.9994622626 0.9994809646 0.9994990631[671] 0.9995165759 0.9995335201 0.9995499128 0.9995657701 0.9995811081 0.9995959422 0.9996102876 0.9996241591 0.9996375709 0.9996505369[681] 0.9996630707

Page 36: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

There is more than meets the eye in the table

It is not only the table values that can be explored for thestandard normal distribution using R. Recall that the normaldistribution equally often referred to as a Gaussian distribution,is defined by the density

f (z) =1√2π

e−z2/2.

The density represents distribution of probability for arandom variable associated with it.The area under the density represents the probability sothe that the total area under it is equal to one.The area accumulated up to certain value z representsprobability that a corresponding random variable takesvalue smaller than z and this probability defines thecumulative distribution function F (z) which is tabularized.

Page 37: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

There is more than meets the eye in the table

It is not only the table values that can be explored for thestandard normal distribution using R. Recall that the normaldistribution equally often referred to as a Gaussian distribution,is defined by the density

f (z) =1√2π

e−z2/2.

The density represents distribution of probability for arandom variable associated with it.The area under the density represents the probability sothe that the total area under it is equal to one.The area accumulated up to certain value z representsprobability that a corresponding random variable takesvalue smaller than z and this probability defines thecumulative distribution function F (z) which is tabularized.

Page 38: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

There is more than meets the eye in the table

It is not only the table values that can be explored for thestandard normal distribution using R. Recall that the normaldistribution equally often referred to as a Gaussian distribution,is defined by the density

f (z) =1√2π

e−z2/2.

The density represents distribution of probability for arandom variable associated with it.

The area under the density represents the probability sothe that the total area under it is equal to one.The area accumulated up to certain value z representsprobability that a corresponding random variable takesvalue smaller than z and this probability defines thecumulative distribution function F (z) which is tabularized.

Page 39: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

There is more than meets the eye in the table

It is not only the table values that can be explored for thestandard normal distribution using R. Recall that the normaldistribution equally often referred to as a Gaussian distribution,is defined by the density

f (z) =1√2π

e−z2/2.

The density represents distribution of probability for arandom variable associated with it.The area under the density represents the probability sothe that the total area under it is equal to one.

The area accumulated up to certain value z representsprobability that a corresponding random variable takesvalue smaller than z and this probability defines thecumulative distribution function F (z) which is tabularized.

Page 40: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

There is more than meets the eye in the table

It is not only the table values that can be explored for thestandard normal distribution using R. Recall that the normaldistribution equally often referred to as a Gaussian distribution,is defined by the density

f (z) =1√2π

e−z2/2.

The density represents distribution of probability for arandom variable associated with it.The area under the density represents the probability sothe that the total area under it is equal to one.The area accumulated up to certain value z representsprobability that a corresponding random variable takesvalue smaller than z and this probability defines thecumulative distribution function F (z) which is tabularized.

Page 41: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

All this can be seen in RThe following code explores various aspects of the standard normal distribution

#Plotting the density function of the standard normal variable

z=seq(-3,3,by=0.01)plot(z,dnorm(z),type=’l’,col="red",lwd=4)

#Plotting the cumulative distribution function (that one from the table)

plot(z,pnorm(z),type=’l’,col="red",lwd=4)

#And plotting them one at the top of the other

par(mfrow=c(2, 1))

plot(z,dnorm(z),type=’l’,col="red",lwd=4)

plot(z,pnorm(z),type=’l’,col="red",lwd=4)

#Side by side

par(mfrow=c(1, 2))

plot(z,dnorm(z),type=’l’,col="red",lwd=4)

plot(z,pnorm(z),type=’l’,col="red",lwd=4)

Page 42: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Outline

1 The R Project for Statistical Computing

2 Statistical Tables using R

3 Monte Carlo method

4 Resampling from data

Page 43: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Deciding for Poisson model

Recall that the Poisson model is given by

P(X = x |θ) =θxe−θ

x!.

It is relatively easy to demonstrate that the mean value of thisdistribution is equal to θ and standard deviation is also equal to θ.

How?Thus for a sample of observations x = (x1, . . . , xn) it is reasonable toconsider both

θ1 = x,

θ2 = x2 − x2

as estimators of θ.Important Question

Which one is better?

Page 44: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Deciding for Poisson model

Recall that the Poisson model is given by

P(X = x |θ) =θxe−θ

x!.

It is relatively easy to demonstrate that the mean value of thisdistribution is equal to θ and standard deviation is also equal to θ.How?

Thus for a sample of observations x = (x1, . . . , xn) it is reasonable toconsider both

θ1 = x,

θ2 = x2 − x2

as estimators of θ.Important Question

Which one is better?

Page 45: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Deciding for Poisson model

Recall that the Poisson model is given by

P(X = x |θ) =θxe−θ

x!.

It is relatively easy to demonstrate that the mean value of thisdistribution is equal to θ and standard deviation is also equal to θ.How?Thus for a sample of observations x = (x1, . . . , xn) it is reasonable toconsider both

θ1 = x,

θ2 = x2 − x2

as estimators of θ.

Important Question

Which one is better?

Page 46: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Deciding for Poisson model

Recall that the Poisson model is given by

P(X = x |θ) =θxe−θ

x!.

It is relatively easy to demonstrate that the mean value of thisdistribution is equal to θ and standard deviation is also equal to θ.How?Thus for a sample of observations x = (x1, . . . , xn) it is reasonable toconsider both

θ1 = x,

θ2 = x2 − x2

as estimators of θ.Important Question

Which one is better?

Page 47: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Monte Carlo study

Computational approach to the problem is to run many samples from thePoisson distribution and check which of the estimates performs better. This iswhat is done in the following R code:

%Generating in a loop samples of size n=20 from Poisson distribution with parameter theta=4%and evaluating theis means and standard deviationsn=20theta=4N=1000means=vector(’numeric’,N)vars=vector(’numeric’,N)for(i in 1:N){x=rpois(n,theta)means[i]=mean(x)vars[i]=var(x)

}

%Plotting histogramssplit.screen(c(2,1),1)hist(means)hist(vars)

It is quite clear from the graphs that the estimator based on the mean isbetter than the one based on the variance.

Page 48: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Monte Carlo study

Computational approach to the problem is to run many samples from thePoisson distribution and check which of the estimates performs better. This iswhat is done in the following R code:

%Generating in a loop samples of size n=20 from Poisson distribution with parameter theta=4%and evaluating theis means and standard deviationsn=20theta=4N=1000means=vector(’numeric’,N)vars=vector(’numeric’,N)for(i in 1:N){x=rpois(n,theta)means[i]=mean(x)vars[i]=var(x)

}

%Plotting histogramssplit.screen(c(2,1),1)hist(means)hist(vars)

It is quite clear from the graphs that the estimator based on the mean isbetter than the one based on the variance.

Page 49: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Outline

1 The R Project for Statistical Computing

2 Statistical Tables using R

3 Monte Carlo method

4 Resampling from data

Page 50: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Nitrate ion concentration measurements in a certainchemical lab

Also in the file Table2_1.txt

0.51 0.51 0.51 0.50 0.51 0.49 0.52 0.53 0.50 0.470.51 0.52 0.53 0.48 0.49 0.50 0.52 0.49 0.49 0.500.49 0.48 0.46 0.49 0.49 0.48 0.49 0.49 0.51 0.470.51 0.51 0.51 0.48 0.50 0.47 0.50 0.51 0.49 0.480.51 0.50 0.50 0.53 0.52 0.52 0.50 0.50 0.51 0.51

Page 51: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

The mean concentration

#Getting data in a vectorx=scan(’Table2_1.txt’)

mean(x)#[1] 0.4998

sd(x)#[1] 0.01647385

QuestionWhat is the error of this determination of the nitrate concentration?

Page 52: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Pulling yourselves up by bootstraps, i.e. something fornothing

If we would repeat our experiment of collecting 50 samples ofnitrate concentrations many times we would see the range oferror.

But it would be a waste of resources and not a viable method.

Instead we resample ‘new’ data from our data and use soobtained new samples for assessment of the error.

The following R code does the job.#Getting data in a vectorm=mean(x)bootstrap=vector(’numeric’,500)for(i in 1:500){bootstrap[i]=mean(sample(x,replace=T))-mean(x)}

#The distribution of estimation errorhist(boostrap)

We can safely say that the nitrate concentration is 49.99± 0.005.

Page 53: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Gaussian distribution or is it?

Abraham de Moivre French mathematician residing in England, anarticle in 1733 proved that coins count will follow Normal law,

Anecdote: He noted shortly before his death that it was necessary orhim to sleep a quarter of hour longer each day, he computed that the

day he will have to sleep 24 hours was on 27 November 1754. Hedeclared this date to be the day of his death and indeed he died on this

day.

Pierre-Simon, marquis de Laplace a great French mathematician andphysicist (when Gauss was two years old), an article in 1778.

Anecdote: His brain was removed by his physician and kept for manyyears, it was reportedly smaller than the average brain.

Carl Friedrich Gauss a great German mathematician and physicist, anarticle in 1809.

Anecdote: While his wife was lying sick, Gauss became engrossed in atheoretical problem. When the doctor told him that his wife was dying,

Gauss wave him away, Tell her, he ordered without looking up, to wait aminute until I’ve finished.

Page 54: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Gaussian distribution or is it?

Abraham de Moivre French mathematician residing in England, anarticle in 1733 proved that coins count will follow Normal law,

Anecdote: He noted shortly before his death that it was necessary orhim to sleep a quarter of hour longer each day, he computed that the

day he will have to sleep 24 hours was on 27 November 1754. Hedeclared this date to be the day of his death and indeed he died on this

day.

Pierre-Simon, marquis de Laplace a great French mathematician andphysicist (when Gauss was two years old), an article in 1778.

Anecdote: His brain was removed by his physician and kept for manyyears, it was reportedly smaller than the average brain.

Carl Friedrich Gauss a great German mathematician and physicist, anarticle in 1809.

Anecdote: While his wife was lying sick, Gauss became engrossed in atheoretical problem. When the doctor told him that his wife was dying,

Gauss wave him away, Tell her, he ordered without looking up, to wait aminute until I’ve finished.

Page 55: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Gaussian distribution or is it?

Abraham de Moivre French mathematician residing in England, anarticle in 1733 proved that coins count will follow Normal law,

Anecdote: He noted shortly before his death that it was necessary orhim to sleep a quarter of hour longer each day, he computed that the

day he will have to sleep 24 hours was on 27 November 1754. Hedeclared this date to be the day of his death and indeed he died on this

day.

Pierre-Simon, marquis de Laplace a great French mathematician andphysicist (when Gauss was two years old), an article in 1778.

Anecdote: His brain was removed by his physician and kept for manyyears, it was reportedly smaller than the average brain.

Carl Friedrich Gauss a great German mathematician and physicist, anarticle in 1809.

Anecdote: While his wife was lying sick, Gauss became engrossed in atheoretical problem. When the doctor told him that his wife was dying,

Gauss wave him away, Tell her, he ordered without looking up, to wait aminute until I’ve finished.

Page 56: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Gaussian distribution or is it?

Abraham de Moivre French mathematician residing in England, anarticle in 1733 proved that coins count will follow Normal law,

Anecdote: He noted shortly before his death that it was necessary orhim to sleep a quarter of hour longer each day, he computed that the

day he will have to sleep 24 hours was on 27 November 1754. Hedeclared this date to be the day of his death and indeed he died on this

day.

Pierre-Simon, marquis de Laplace a great French mathematician andphysicist (when Gauss was two years old), an article in 1778.

Anecdote: His brain was removed by his physician and kept for manyyears, it was reportedly smaller than the average brain.

Carl Friedrich Gauss a great German mathematician and physicist, anarticle in 1809.

Anecdote: While his wife was lying sick, Gauss became engrossed in atheoretical problem. When the doctor told him that his wife was dying,

Gauss wave him away, Tell her, he ordered without looking up, to wait aminute until I’ve finished.

Page 57: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Gaussian distribution or is it?

Abraham de Moivre French mathematician residing in England, anarticle in 1733 proved that coins count will follow Normal law,

Anecdote: He noted shortly before his death that it was necessary orhim to sleep a quarter of hour longer each day, he computed that the

day he will have to sleep 24 hours was on 27 November 1754. Hedeclared this date to be the day of his death and indeed he died on this

day.

Pierre-Simon, marquis de Laplace a great French mathematician andphysicist (when Gauss was two years old), an article in 1778.

Anecdote: His brain was removed by his physician and kept for manyyears, it was reportedly smaller than the average brain.

Carl Friedrich Gauss a great German mathematician and physicist, anarticle in 1809.

Anecdote: While his wife was lying sick, Gauss became engrossed in atheoretical problem. When the doctor told him that his wife was dying,

Gauss wave him away, Tell her, he ordered without looking up, to wait aminute until I’ve finished.

Page 58: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Gaussian distribution or is it?

Abraham de Moivre French mathematician residing in England, anarticle in 1733 proved that coins count will follow Normal law,

Anecdote: He noted shortly before his death that it was necessary orhim to sleep a quarter of hour longer each day, he computed that the

day he will have to sleep 24 hours was on 27 November 1754. Hedeclared this date to be the day of his death and indeed he died on this

day.

Pierre-Simon, marquis de Laplace a great French mathematician andphysicist (when Gauss was two years old), an article in 1778.

Anecdote: His brain was removed by his physician and kept for manyyears, it was reportedly smaller than the average brain.

Carl Friedrich Gauss a great German mathematician and physicist, anarticle in 1809.

Anecdote: While his wife was lying sick, Gauss became engrossed in atheoretical problem. When the doctor told him that his wife was dying,

Gauss wave him away, Tell her, he ordered without looking up, to wait aminute until I’ve finished.

Page 59: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Gaussian distribution or is it?

Abraham de Moivre French mathematician residing in England, anarticle in 1733 proved that coins count will follow Normal law,

Anecdote: He noted shortly before his death that it was necessary orhim to sleep a quarter of hour longer each day, he computed that the

day he will have to sleep 24 hours was on 27 November 1754. Hedeclared this date to be the day of his death and indeed he died on this

day.

Pierre-Simon, marquis de Laplace a great French mathematician andphysicist (when Gauss was two years old), an article in 1778.

Anecdote: His brain was removed by his physician and kept for manyyears, it was reportedly smaller than the average brain.

Carl Friedrich Gauss a great German mathematician and physicist, anarticle in 1809.

Anecdote: While his wife was lying sick, Gauss became engrossed in atheoretical problem. When the doctor told him that his wife was dying,

Gauss wave him away, Tell her, he ordered without looking up, to wait aminute until I’ve finished.

Page 60: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Stigler’s law of eponymy (Stephen Stigler, statisticsprofessor, University of Chicago)

No scientific discovery is named after its originaldiscoverer.

attributed to Robert K. Merton

Page 61: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Stigler’s law of eponymy (Stephen Stigler, statisticsprofessor, University of Chicago)

No scientific discovery is named after its originaldiscoverer.

attributed to Robert K. Merton

Page 62: Introduction to R a computing software for statistical ...nunez/mastertecnologiastelecomu... · It is the top choice of statistical software among academic statisticians but also

The R Project for Statistical Computing Statistical Tables using R Monte Carlo method Resampling from data

Stigler’s law of eponymy (Stephen Stigler, statisticsprofessor, University of Chicago)

No scientific discovery is named after its originaldiscoverer.

attributed to Robert K. Merton