23
Basic Statistics with Microsoft Excel Helen Dixon

Basic Statistics with Microsoft Excel

Embed Size (px)

DESCRIPTION

Basic Statistics with Microsoft Excel. Helen Dixon. Aim and Objectives. Aim of today’s course To illustrate how Excel can be used to carry out some basic statistical analyses and tests Objectives To show you how to use some of the statistical worksheet functions available within Excel - PowerPoint PPT Presentation

Citation preview

Page 1: Basic Statistics  with Microsoft Excel

Basic Statistics with Microsoft Excel

Helen Dixon

Page 2: Basic Statistics  with Microsoft Excel

Aim and Objectives Aim of today’s course

To illustrate how Excel can be used to carry out some basic statistical analyses and tests

ObjectivesTo show you how to use some of the

statistical worksheet functions available within Excel

To show you how to use some of the tools available in the Analysis ToolPak

To make you aware of the limitations of Excel

Page 3: Basic Statistics  with Microsoft Excel

Why use Excel? Software more accessible Previous familiarity with software Easy to format output Better charting facilities than some

statistical applications Access to other key Excel facilities Easy to use results with other applications

Page 4: Basic Statistics  with Microsoft Excel

Problems with Excel Errors due to rounding, missing data or extreme

values Not suitable for very large data sets Output labelled or arranged inappropriately Need to repeat processes for different variables

or options No record of analyses Some algorithms are numerically unstable - little

or no information about algorithms employed Analysis ToolPak results are not dynamic and

may vary with results generated by functions

Page 5: Basic Statistics  with Microsoft Excel

Statistical Functions

Frequency Distributions Mean, Median and Mode Percentiles and Quartiles Deviation and Squared Deviation about

the Mean Variance and Standard Deviation Covariance and the Correlation Coefficient

Page 6: Basic Statistics  with Microsoft Excel

Frequency Use COUNTIF to count how many times

an item appears in a list=COUNTIF(range, criteria)

Use FREQUENCY to calculate how often values occur within a range=FREQUENCY(data_array, bins_array)

Can also use Histogram tool in Analysis Toolpak

Page 7: Basic Statistics  with Microsoft Excel

Mean, Median, Mode

Use AVERAGE or AVERAGEA to calculate the arithmetic mean=AVERAGE(number1, number2, etc.)

Use MEDIAN to return the middle number=MEDIAN(number1, number2, etc)

Use MODE to return the most common value=MODE(number1, number2, etc)

Page 8: Basic Statistics  with Microsoft Excel

Percentiles and Quartiles Use PERCENTILE to return the kth percentile of

a data set =PERCENTILE(array, percentile) Percentile argument is a value between 0 and 1

Use QUARTILE to return the given quartile of a data set =QUARTILE(array, quart) Quart is 1, 2, 3 or 4 IQR = Q3-Q1

May return different values to statistical package

Page 9: Basic Statistics  with Microsoft Excel

Variance and Standard Deviation Use VAR, VARA, VARP or VARPA to

calculate the variance for a rangeE.g. =VAR(value1, value 2, etc.)Squared deviations about the mean/N or /n-1

Use STDEV, STDEVA, STDEVP or STDEVPA to calculate the standard deviation for a range=E.g. =STDEV(value1, value2, etc.)Positive square root of variance

Page 10: Basic Statistics  with Microsoft Excel

Covariance and the Correlation Coefficient Use COVAR to calculate the covariance

=COVAR(array1, array2)Average of products of deviations for each data point

pairDepends on units of measurement

Use CORREL to return the correlation coefficient =CORREL(array1, array2)Returns value between -1 and +1

Also available in Analysis ToolPak

Page 11: Basic Statistics  with Microsoft Excel

Probability Numerical measure of the likelihood that

an event will occur Some probabilities that can be calculated

using Excel:Binomial ProbabilitiesPoisson ProbabilitiesHypergeometric ProbabilitiesNormal ProbabilitiesExponential Probabilities

Page 12: Basic Statistics  with Microsoft Excel

Binomial Probabilities

Use BINOMDIST to compute binomial distribution probabilities and cumulative binomial probabilities=BINOMDIST(number_s, trials, probability_s,

cumulative)Calculates the probability that a sequence of

independent trials with two possible outcomes will have a given number of successes

Cumulative is either TRUE or FALSE

Page 13: Basic Statistics  with Microsoft Excel

Poisson Probabilities

Use POISSON to compute Poisson Probabilities=POISSON(x, mean, cumulative)Shows the probability of x occurrences of an

event over a specified interval of time or space

Page 14: Basic Statistics  with Microsoft Excel

Hypergeometric Probabilities Use HYPGEOMDIST to compute

hypergeometric probabilities=HYPGEOMDIST(sample_s,

number_sample, population_s, number_pop)Computes the probability of x successes

(sample_s) in n trials (number_sample) when the trials are dependent

Similar to Binomial except trials are not independent – probability of success changes from trial to trial

Does not compute cumulative probabilities

Page 15: Basic Statistics  with Microsoft Excel

Normal Probabilities Use NORMSDIST or NORMDIST to

compute the cumulative probability=NORMSDIST(z)=NORMDIST(x, mean, standard_dev,

cumulative) Use NORMSINV or NORMINV to compute

the z or x value given a cumulative probability=NORMSINV(probability)=NORMINV(probability, mean, standard_dev)

Page 16: Basic Statistics  with Microsoft Excel

Exponential Probabilities Use EXPONDIST to compute exponential

probabilities=EXPONDIST(x, lambda, cumulative)x is the random variableLambda is 1/meanUseful in computing probabilities for the time it

takes to complete a task

Page 17: Basic Statistics  with Microsoft Excel

Analysis ToolPak

Descriptive Statistics Correlation Linear Regression t-Tests z-Tests ANOVA Covariance

Page 18: Basic Statistics  with Microsoft Excel

Descriptive Statistics

Mean, Median, Mode Standard Error Standard Deviation Sample Variance Kurtosis Skewness Confidence Level for

Mean

Range Minimum Maximum Sum Count kth Largest kth Smallest

Page 19: Basic Statistics  with Microsoft Excel

Correlation and Regression

Correlation is a measure of the strength of linear association between two variables Values between -1 and +1 Values close to -1 indicate strong negative relationship Values close to +1 indicate strong positive relationship Values close to 0 indicate weak relationship

Linear Regression is the process of finding a line of best fit through a series of data points Can also use the SLOPE, INTERCEPT, CORREL and RSQ

functions

Page 20: Basic Statistics  with Microsoft Excel

t-Tests and z-Tests

Used to test hypotheses by comparing means If sample means are equal suggests both

samples came from same population t-Test – n <30

Equal or unequal variances or paired testCheck result using TTEST function

z-Test – n>30Used for means with known variances

Page 21: Basic Statistics  with Microsoft Excel

ANOVA: Analysis of Variances

Compares variances in two or more data sets If difference is found it can be assumed that the

means of the data sets are different Single Factor – use instead of t-Test for more than 2

samples Two Factor with Replication – useful when data can

be classified along 2 different dimensions Two Factor without Replication – as above but only

one observation for each pair

Page 22: Basic Statistics  with Microsoft Excel

PivotTables

Use for crosstabulations Data must be in tabular format: columns

with headings, no blank columns Easy to pivot data Easy to create PivotCharts Can summarise and analyse data without

affecting data source

Page 23: Basic Statistics  with Microsoft Excel

Final Tips

Excel only suitable for basic analysis using small data sets

Later versions of Excel more reliable than Excel 97

Check Analysis TookPak results with worksheet functions

Check overall results by hand or with dedicated statistical package