Excel Functions for Statistics

  • Upload
    imidken

  • View
    228

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 Excel Functions for Statistics

    1/12

    Basic Statistics with Microsoft Excel

    1

    Excel Functions for Statistics

    DESCRIPTIVE STATISTICS: FREQUENCIES

    Using COUNTIF to Construct a Frequency Distribution

    The spreadsheet below shows how COUNTIF can be used to calculate how many times eachcountry appears in the list in Column A.

    To calculate Frequency select D2 and enter the function:

    =COUNTIF(A$2:A$17, C2) A2:A17 is the range to be evaluated (need absolute rowreference ($) to make sure function will fill down correctly).C2 is the country to be counted.

    To calculate Relative Frequency, divide Frequency (D2) by Total (D7).

    To calculate Percent Frequency, multiply Relative Frequency (E2) by 100.

    Note: Cells D2:F7 show the formulas used. Cells D11:F16 show the results.

  • 8/8/2019 Excel Functions for Statistics

    2/12

    Basic Statistics with Microsoft Excel

    2

    Using FREQUENCY to Construct a Frequency Distribution

    The FREQUENCY function involves the use of array formulas that provide multiple values (in thiscase the class frequencies) as output.

    1. Select cells D2:D5 where thefrequencies are to appear.

    2. Type the formula:=FREQUENCY(A2:A16,{9,14,19,24})

    3. Press CTRL + SHIFT +ENTER and the arrayformula will be entered intoeach of the cells D2:D6.

    Because we entered an arrayformula, the formula that Exceldisplays in each of the cells is thesame, but the values are not - theyare the frequencies for each class.The class upper limits in the secondargument of the FREQUENCYfunction tell Excel which frequency to put in each cell within the range of the array formula.

    Using PivotTable Report to Construct a Crosstabulation

    PivotTable Report provides a general tool for summarising the data for two or more variablessimultaneously.

    1. Select the Data menu and choose PivotTable and PivotChart Report.2. Choose Microsoft Excel list or database.3. Choose PivotTable and select Next.4. Enter the data range in the Range box and select Next.5. Select New Worksheet (if required).6. Click on the Layout button.7. Drag the field buttons to the ROW, COLUMN and DATA sections of the diagram asappropriate.8. Double click the Sum of field button in the data section.9. Choose Count underSummarise by: and clickOK.10.ClickOK and the Finish.

  • 8/8/2019 Excel Functions for Statistics

    3/12

    Basic Statistics with Microsoft Excel

    3

    DESCRIPTIVE STATISTICS: NUMERICAL METHODS

    The following spreadsheet shows the functions used to calculate the mean, median, mode,percentiles and quartiles for a cell range named hours.

    Displaying the Mean, Median, and Mode

    =AVERAGE(array), =MEDIAN(array), =MODE(array)

    Percentiles and Quartiles

    =PERCENTILE(array, percentile) where percentile is between 0 and 1=QUARTILE(array, quart) where quart is 1, 2, 3 or 41st Quartile = 25th Percentile, 2nd Quartile = 50th Percentile, 3rd Quartile = 75th Percentile

    Deviation and Squared Deviation About the Mean

    The sum of the deviations about the mean will always equal 0.To calculate the square of a value enter =A1^2.The sample variance (difference between the value of each observation and the mean) will be thesum of the Squared Deviation divided by n-1. In the example above this will be 41320/4 = 10330

  • 8/8/2019 Excel Functions for Statistics

    4/12

    Basic Statistics with Microsoft Excel

    4

    Sample Variance and Sample Standard Deviation

    To calculate the Variance use =VAR(range).To calculate the Standard Deviation use =STDEV(range).

    Using the Descriptive Statistics Tool

    The Descriptive Statistics Tool is oneof Excels Data Analysis Tools(which are available from theAnalysis Toolpak Add-In) andallows the user to compute a varietyof statistics at once.

    1. Make sure that the AnalysisToolpak has already beeninstalled. (If not, go to

    Tools/Add-Ins and selectAnalysis Toolpak).

    2. Select Data Analysis fromthe Tools menu.

    3. Choose DescriptiveStatistics from the list ofAnalysis Tools. TheDescriptive Statistics box willopen.

  • 8/8/2019 Excel Functions for Statistics

    5/12

    Basic Statistics with Microsoft Excel

    5

    4. In the Input Range: box,

    enter the range for yourdata (B3:B18).

    5. Select Columns in theGrouped By: section.

    6. Select Labels in first row.7. Enter where you want the

    statistics to appear in theOutput Range: box (D3).

    8. Select Summary statistics.9. ClickOK.

    Covariance and the Correlation Coefficient

    Covariance is a measure of linear association between two variables. Positive values indicate apositive relationship; negative values indicate a negative relationship. The correlation coefficientis another measure of linear association between two variables that takes on values between -1 and

    +1. Values near +1 indicate a strong positive linear relationship, values near -1 indicate a strongnegative linear relationship, and values near 0 indicate the lack of a linear relationship.

    The covariance function =COVAR() treats the data as a population and the correlation function=CORREL() treats the data as a sample. The result obtained using the covariance function must beadjusted to provide the sample covariance. The formula for the population covariance requiresdividing by the total number of observations in the data set, but the formula for the samplecovariance requires dividing by the total number of observations minus 1. Therefore to compute thesample covariance multiply the population covariance by n/(n-1).

  • 8/8/2019 Excel Functions for Statistics

    6/12

    Basic Statistics with Microsoft Excel

    6

    PROBABILITY

    Computing Posterior Probabilities

    The spreadsheet below shows the prior probabilities for two mutually exclusive events A1 and A2.

    Prior probability is the initial estimate of the probability of an event.Conditional probability is the probability of an event given that another event has occurred.Joint probability is the probability of two events both occurring (intersection of two events) in thecase above the prior probability multiplied by the conditional probability.Posterior probability is the revised probability of an event based on additional information.

  • 8/8/2019 Excel Functions for Statistics

    7/12

    Basic Statistics with Microsoft Excel

    7

    DISCRETE PROBABILITY DISTRIBUTIONS

    Expected Value, Variance, and Standard Deviation

    Expected value is a measure of the mean or central location of a random variable.

    Variance is a measure of the variability or dispersion of a random variable.Standard deviation is the positive square root of the variance.

    The SUMPRODUCT function multiplies each value in one range by the corresponding value inanother range and sums the products.

    Binomial Probabilities

    A binomial experiment has the following four properties:

    1. The experiment consists of a sequence ofn identical trials.2. Two outcomes are possible on each trial a success and afailure.3. The probability of a success, denoted byp, does not change from trial to trial. Consequently

    the probability of a failure, denoted by 1-p, does not change from trial to trial.4. The trials are independent.

    Excels BINOMDIST function can be used to compute binomial probabilities and cumulative

    binomial probabilities. The spreadsheet below shows how to calculate the probability of 0, 1, 2 and3 successful outcomes given 3 trials if each trial has a 0.3 probability of success.

  • 8/8/2019 Excel Functions for Statistics

    8/12

    Basic Statistics with Microsoft Excel

    8

    If you prefer to use the Insert Functioncommand, select the Statistical category andclick on BINOMDIST. Enter the argumentsfor the first calculation as shown, usingabsolute referencing where necessary.

    Using POISSON to Compute Poisson Probabilities

    A Poisson probabilitydistribution is a

    probability distributionshowing the probability of

    x occurrences of an eventover a specified interval oftime or space. ThePOISSON functionrequires three argumentsand has the followingsyntax:=POISSON(x, mean,

    cumulative).

    If you prefer to use the Insert Functioncommand, select the Statistical category andclick on POISSON. Enter the arguments forthe first calculation as shown, using absolutereferencing where necessary.

    To calculate the cumulative probabilities, thethird argument will be TRUE.

  • 8/8/2019 Excel Functions for Statistics

    9/12

    Basic Statistics with Microsoft Excel

    9

    Using HYPGEOMDIST to Compute Hypergeometric Probabilities

    The HYPGEOMDIST function is the function used tocompute the probability ofx successes in n trialswhen the trials are dependent. HYPGEOMDIST willonly compute probabilities, not cumulative

    probabilities and has four arguments:x, n, r, andN.Its syntax is:=HYPGEOMDIST(sample_s, number_sample,

    population_s, number_pop)

    If a sample of 5 children contains 3 girls and 2 boys,the probability of selecting 2 children who are bothgirls will be 0.3.

  • 8/8/2019 Excel Functions for Statistics

    10/12

    Basic Statistics with Microsoft Excel

    10

    CONTINUOUS PROBABILITY DISTRIBUTIONS

    Normal Probabilities

    In a normal probability distribution the probability density function is bell shaped and

    determined by its mean and standard deviation . A standard normal probability distributionis a normal distribution with a mean of zero and a standard deviation of one.

    Excel has two functions for computing probabilities andz values for a standard normal probabilitydistribution: NORMSDIST and NORMSINV. The NORMSDIST function is used to compute thecumulative probability given az value and its syntax is =NORMSDIST(z) where z is the value forwhich you want the distribution. The NORMSINV function is used to compute thez value given acumulative probability and has the syntax =NORMSINV(probability) where probability is a value

    between 0 and 1. The letter S reminds us that the functions relate to the standard normal probabilitydistributions.

    The NORMSDIST function provides the area under the standard normal curve to the left of a givenz value. For nonnegativez values, the NORMSDIST function provides the same cumulativeprobability we would obtain if we used a cumulative normal probabilities table. However, unlike atable, the NORMSDIST function provides cumulative probabilities for negativez values as well.

    To calculate the probability ofz being in an interval you must calculate the value of NORMSDISTat the upper end point and subtract the value of NORMSDIST at the lower endpoint of the interval.

    To calculate the area under the standard normal curve to the right of a given z value you must takethe cumulative probability away from 1.

  • 8/8/2019 Excel Functions for Statistics

    11/12

    Basic Statistics with Microsoft Excel

    11

    The NORMSINV function is the inverse of the NORMSDIST function; it takes a cumulativeprobability (lower tail area) input and provides thez value corresponding to that cumulativeprobability. To work out thez value for an upper tail probability, subtract the probability from 1.

    Two similar functions, NORMDIST and NORMINV are available for computing the cumulativeprobability and thex value for any normal distribution. The NORMDIST function provides the area

    under the normal curve to the left of a given value of the random variablex. Its syntax is=NORMDIST(x, mean, standard_dev, cumulative). If cumulative is TRUE it will return thecumulative distribution function; if FALSE it returns the probability mass function (height of thecurve).

    The NORMINV function is the inverse of NORMDIST and takes a cumulative probability as inputand provides the value ofx corresponding to that cumulative probability. Its syntax is=NORMINV(probability, mean, standard_dev).

  • 8/8/2019 Excel Functions for Statistics

    12/12

    Basic Statistics with Microsoft Excel

    12

    Exponential Probability Distribution

    The EXPONDIST function can be used to compute exponential probabilities. Its syntax is=EXPONDIST(x, lambda, cumulative) where x is the random variable, lambda is 1/ andcumulative will always be TRUE.