Excel Functions for Statistics

8/8/2019 Excel Functions for Statistics

1/12

Basic Statistics with Microsoft Excel

1

Excel Functions for Statistics

DESCRIPTIVE STATISTICS: FREQUENCIES

Using COUNTIF to Construct a Frequency Distribution

The spreadsheet below shows how COUNTIF can be used to calculate how many times eachcountry appears in the list in Column A.

To calculate Frequency select D2 and enter the function:

=COUNTIF(A$2:A$17, C2) A2:A17 is the range to be evaluated (need absolute rowreference ($) to make sure function will fill down correctly).C2 is the country to be counted.

To calculate Relative Frequency, divide Frequency (D2) by Total (D7).

To calculate Percent Frequency, multiply Relative Frequency (E2) by 100.

Note: Cells D2:F7 show the formulas used. Cells D11:F16 show the results.


2/12


2

Using FREQUENCY to Construct a Frequency Distribution

The FREQUENCY function involves the use of array formulas that provide multiple values (in thiscase the class frequencies) as output.

1. Select cells D2:D5 where thefrequencies are to appear.

2. Type the formula:=FREQUENCY(A2:A16,{9,14,19,24})

3. Press CTRL + SHIFT +ENTER and the arrayformula will be entered intoeach of the cells D2:D6.

Because we entered an arrayformula, the formula that Exceldisplays in each of the cells is thesame, but the values are not - theyare the frequencies for each class.The class upper limits in the secondargument of the FREQUENCYfunction tell Excel which frequency to put in each cell within the range of the array formula.

Using PivotTable Report to Construct a Crosstabulation

PivotTable Report provides a general tool for summarising the data for two or more variablessimultaneously.

1. Select the Data menu and choose PivotTable and PivotChart Report.2. Choose Microsoft Excel list or database.3. Choose PivotTable and select Next.4. Enter the data range in the Range box and select Next.5. Select New Worksheet (if required).6. Click on the Layout button.7. Drag the field buttons to the ROW, COLUMN and DATA sections of the diagram asappropriate.8. Double click the Sum of field button in the data section.9. Choose Count underSummarise by: and clickOK.10.ClickOK and the Finish.


3/12


3

DESCRIPTIVE STATISTICS: NUMERICAL METHODS

The following spreadsheet shows the functions used to calculate the mean, median, mode,percentiles and quartiles for a cell range named hours.

Displaying the Mean, Median, and Mode

=AVERAGE(array), =MEDIAN(array), =MODE(array)

Percentiles and Quartiles

=PERCENTILE(array, percentile) where percentile is between 0 and 1=QUARTILE(array, quart) where quart is 1, 2, 3 or 41st Quartile = 25th Percentile, 2nd Quartile = 50th Percentile, 3rd Quartile = 75th Percentile

Deviation and Squared Deviation About the Mean

The sum of the deviations about the mean will always equal 0.To calculate the square of a value enter =A1^2.The sample variance (difference between the value of each observation and the mean) will be thesum of the Squared Deviation divided by n-1. In the example above this will be 41320/4 = 10330


4/12


4

Sample Variance and Sample Standard Deviation

To calculate the Variance use =VAR(range).To calculate the Standard Deviation use =STDEV(range).

Using the Descriptive Statistics Tool

The Descriptive Statistics Tool is oneof Excels Data Analysis Tools(which are available from theAnalysis Toolpak Add-In) andallows the user to compute a varietyof statistics at once.

1. Make sure that the AnalysisToolpak has already beeninstalled. (If not, go to

Tools/Add-Ins and selectAnalysis Toolpak).

2. Select Data Analysis fromthe Tools menu.

3. Choose DescriptiveStatistics from the list ofAnalysis Tools. TheDescriptive Statistics box willopen.


5/12


5

4. In the Input Range: box,

enter the range for yourdata (B3:B18).

5. Select Columns in theGrouped By: section.

6. Select Labels in first row.7. Enter where you want the

statistics to appear in theOutput Range: box (D3).

8. Select Summary statistics.9. ClickOK.

Covariance and the Correlation Coefficient

Covariance is a measure of linear association between two variables. Positive values indicate apositive relationship; negative values indicate a negative relationship. The correlation coefficientis another measure of linear association between two variables that takes on values between -1 and

+1. Values near +1 indicate a strong positive linear relationship, values near -1 indicate a strongnegative linear relationship, and values near 0 indicate the lack of a linear relationship.

The covariance function =COVAR() treats the data as a population and the correlation function=CORREL() treats the data as a sample. The result obtained using the covariance function must beadjusted to provide the sample covariance. The formula for the population covariance requiresdividing by the total number of observations in the data set, but the formula for the samplecovariance requires dividing by the total number of observations minus 1. Therefore to compute thesample covariance multiply the population covariance by n/(n-1).


6/12


6

PROBABILITY

Computing Posterior Probabilities

The spreadsheet below shows the prior probabilities for two mutually exclusive events A1 and A2.

Prior probability is the initial estimate of the probability of an event.Conditional probability is the probability of an event given that another event has occurred.Joint probability is the probability of two events both occurring (intersection of two events) in thecase above the prior probability multiplied by the conditional probability.Posterior probability is the revised probability of an event based on additional information.


7/12


7

DISCRETE PROBABILITY DISTRIBUTIONS

Expected Value, Variance, and Standard Deviation

Expected value is a measure of the mean or central location of a random variable.

Variance is a measure of the variability or dispersion of a random variable.Standard deviation is the positive square root of the variance.

The SUMPRODUCT function multiplies each value in one range by the corresponding value inanother range and sums the products.

Binomial Probabilities

A binomial experiment has the following four properties:

1. The experiment consists of a sequence ofn identical trials.2. Two outcomes are possible on each trial a success and afailure.3. The probability of a success, denoted byp, does not change from trial to trial. Consequently

the probability of a failure, denoted by 1-p, does not change from trial to trial.4. The trials are independent.

Excels BINOMDIST function can be used to compute binomial probabilities and cumulative

binomial probabilities. The spreadsheet below shows how to calculate the probability of 0, 1, 2 and3 successful outcomes given 3 trials if each trial has a 0.3 probability of success.


8/12


8

If you prefer to use the Insert Functioncommand, select the Statistical category andclick on BINOMDIST. Enter the argumentsfor the first calculation as shown, usingabsolute referencing where necessary.

Using POISSON to Compute Poisson Probabilities

A Poisson probabilitydistribution is a

probability distributionshowing the probability of

x occurrences of an eventover a specified interval oftime or space. ThePOISSON functionrequires three argumentsand has the followingsyntax:=POISSON(x, mean,

cumulative).

If you prefer to use the Insert Functioncommand, select the Statistical category andclick on POISSON. Enter the arguments forthe first calculation as shown, using absolutereferencing where necessary.

To calculate the cumulative probabilities, thethird argument will be TRUE.


9/12


9

Using HYPGEOMDIST to Compute Hypergeometric Probabilities

The HYPGEOMDIST function is the function used tocompute the probability ofx successes in n trialswhen the trials are dependent. HYPGEOMDIST willonly compute probabilities, not cumulative

probabilities and has four arguments:x, n, r, andN.Its syntax is:=HYPGEOMDIST(sample_s, number_sample,

population_s, number_pop)

If a sample of 5 children contains 3 girls and 2 boys,the probability of selecting 2 children who are bothgirls will be 0.3.


10/12


10

CONTINUOUS PROBABILITY DISTRIBUTIONS

Normal Probabilities

In a normal probability distribution the probability density function is bell shaped and

determined by its mean and standard deviation . A standard normal probability distributionis a normal distribution with a mean of zero and a standard deviation of one.

Excel has two functions for computing probabilities andz values for a standard normal probabilitydistribution: NORMSDIST and NORMSINV. The NORMSDIST function is used to compute thecumulative probability given az value and its syntax is =NORMSDIST(z) where z is the value forwhich you want the distribution. The NORMSINV function is used to compute thez value given acumulative probability and has the syntax =NORMSINV(probability) where probability is a value

between 0 and 1. The letter S reminds us that the functions relate to the standard normal probabilitydistributions.

The NORMSDIST function provides the area under the standard normal curve to the left of a givenz value. For nonnegativez values, the NORMSDIST function provides the same cumulativeprobability we would obtain if we used a cumulative normal probabilities table. However, unlike atable, the NORMSDIST function provides cumulative probabilities for negativez values as well.

To calculate the probability ofz being in an interval you must calculate the value of NORMSDISTat the upper end point and subtract the value of NORMSDIST at the lower endpoint of the interval.

To calculate the area under the standard normal curve to the right of a given z value you must takethe cumulative probability away from 1.


11/12


11

The NORMSINV function is the inverse of the NORMSDIST function; it takes a cumulativeprobability (lower tail area) input and provides thez value corresponding to that cumulativeprobability. To work out thez value for an upper tail probability, subtract the probability from 1.

Two similar functions, NORMDIST and NORMINV are available for computing the cumulativeprobability and thex value for any normal distribution. The NORMDIST function provides the area

under the normal curve to the left of a given value of the random variablex. Its syntax is=NORMDIST(x, mean, standard_dev, cumulative). If cumulative is TRUE it will return thecumulative distribution function; if FALSE it returns the probability mass function (height of thecurve).

The NORMINV function is the inverse of NORMDIST and takes a cumulative probability as inputand provides the value ofx corresponding to that cumulative probability. Its syntax is=NORMINV(probability, mean, standard_dev).


12/12


12

Exponential Probability Distribution

The EXPONDIST function can be used to compute exponential probabilities. Its syntax is=EXPONDIST(x, lambda, cumulative) where x is the random variable, lambda is 1/ andcumulative will always be TRUE.

Documents

Excel Functions for Statistics