54
Very Basic Very Basic Statistics Statistics

Very Basic Statistics

  • Upload
    rendor

  • View
    34

  • Download
    1

Embed Size (px)

DESCRIPTION

Very Basic Statistics. Course Content. Data Types Descriptive Statistics Data Displays. Data Types. Variables. Quantitative Variable A variable that is counted or measured on a numerical scale Can be continuous or discrete (always a whole number). Qualitative Variable - PowerPoint PPT Presentation

Citation preview

Page 1: Very Basic Statistics

Very Basic StatisticsVery Basic Statistics

Page 2: Very Basic Statistics

Course Content Course Content

• Data Types

• Descriptive Statistics

• Data Displays

Page 3: Very Basic Statistics

Data TypesData Types

Page 4: Very Basic Statistics

VariablesVariables

• Quantitative Variable• A variable that is counted or measured on a

numerical scale

• Can be continuous or discrete (always a whole number).

• Qualitative Variable• A non-numerical variable that can be classified into

categories, but can’t be measured on a numerical scale.

• Can be nominal or ordinal

Page 5: Very Basic Statistics

Continuous DataContinuous Data

• Continuous data is measured on a scale.

• The data can have almost any numeric value and can be recorded at many different points.

• For example• Temperature (39.25oC)• Time (2.468 seconds)• Height (1.25m)• Weight (66.34kg)

Page 6: Very Basic Statistics

Discrete DataDiscrete Data

• Discrete data is based on counts, for example;

• The number of cars parked in a car park

• The number of patients seen by a dentist each day.

• Only a finite number of values are possible e.g. a dentist could see 10, 11, 12 people but not 12.3 people

Page 7: Very Basic Statistics

Nominal DataNominal Data

• A Nominal scale is the most basic level of measurement. The variable is divided into categories and objects are ‘measured’ by assigning them to a category.

• For example,

• Colours of objects (red, yellow, blue, green)

• Types of transport (plane, car, boat)

• There is no order of magnitude to the categories i.e. blue is no more or less of a colour than red.

Page 8: Very Basic Statistics

Ordinal DataOrdinal Data

• Ordinal data is categorical data, where the categories can be placed in a logical order of ascendance e.g.;

• 1 – 5 scoring scale, where 1 = poor and 5 = excellent

• Strength of a curry (mild, medium, hot)

• There is some measure of magnitude, a score of ‘5 – excellent’ is better than a score of ‘4 – good’.

• But this says nothing about the degree of difference between the categories i.e. we cannot assume a customer who thinks a service is excellent is twice as happy as one who thinks the same service is good.

Page 9: Very Basic Statistics

Task 1Task 1

• Look at the following variables and decide if they are qualitative or quantitative, ordinal, nominal, discrete or continuous

• Age• Year of birth• Sex• Height • Number of staff in a department• Time taken to get to work• Preferred strength of coffee• Company size

Page 10: Very Basic Statistics

Descriptive StatisticsDescriptive Statistics

Page 11: Very Basic Statistics

Session Content Session Content

• Measures of Location

• Measures of Dispersion

Page 12: Very Basic Statistics

Measures of LocationMeasures of Location

Page 13: Very Basic Statistics

Common MeasuresCommon Measures

• Measures of location summarise the data with a single number

• There are three common measures of location

• Mean

• Mode

• Median

• Quartiles are another measure

Page 14: Very Basic Statistics

MeanMean

• The mean (more precisely, the arithmetic mean) is commonly called the average

• In formulas the mean is usually represented by read as ‘x-bar’.

• The formula for calculating the mean from ‘n’ individual data-points is;

x

n

xx

X bar equals the sum of the data divided by the number of data-points

Page 15: Very Basic Statistics

Pro’s & Con’sPro’s & Con’s

• Advantages

– basic calculation is easily understood

– all data values are used in the calculation

– used in many statistical procedures.

• Disadvantages

– It may not be an actual ‘meaningful’ value, e.g. an average of 2.4 children per family.

– Can be greatly affected by extreme values in a dataset. e.g. seven students take a test and receive the following scores.

40 42 45 50 53 54 99

– The average score is 54.7 – but is this really representative of the group?

– If the extreme value of 99 is dropped, the average falls to 47.3

Page 16: Very Basic Statistics

ModeMode

• The mode represents the most commonly occurring value within a dataset.

• We usually find the mode by creating a frequency distribution in which we tally how often each value occurs.

• If we find that every value occurs only once, the distribution has no mode.

• If we find that two or more values are tied as the most common, the distribution has more than one mode.

Page 17: Very Basic Statistics

Pro’s & Con’sPro’s & Con’s

• Advantages

– easy to understand

– not affected by outliers (extreme values)

– can also be obtained for qualitative data

e.g. when looking at the frequency of colours of cars we may find that silver occurs most often

• Disadvantages

– not all sets of data have a modal value

– some sets of data have more than one modal value

– multiple modal values are often difficult to interpret

Page 18: Very Basic Statistics

Task 2Task 2

• The following values are the ages of students in their first year of a course

18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18

• Find the mean age of the students

• Find the modal value

• In your opinion which is the better measure of location for this data set?

Page 19: Very Basic Statistics

MedianMedian

• Median means middle, and the median is the middle of a set of data that has been put into rank order.

• Specifically, it is the value that divides a set of data into two halves, with one half of the observations being larger than the median value, and one half smaller.

18 24 29 30 32

Half the data > 29Half the data < 29

Page 20: Very Basic Statistics

Finding the Median from Individual Finding the Median from Individual DataData

• Step 1:- Arrange the observations in increasing order i.e. rank order. The median will be the number that corresponds to the middle rank.

• Step 2:- Find the middle rank with the following formula: Middle rank = ½*(n+1)

• Step 3 – Identify the value of the median

• If ‘n’ is an odd number the middle rank will fall on an observation. The median is then the value of that observation.

Page 21: Very Basic Statistics

Finding the Median from Individual Finding the Median from Individual DataData

• If ‘n’ is an even number, the middle rank will fall between two observations. In this case the median is equal to the arithmetic mean of the values of the two observations

40 42 45 50 53 54 70 99

Position of Median = ½*(n+1) = 4.5

Median =

Median =

2

5point -data 4point -data

5.512

5350

Page 22: Very Basic Statistics

Pro’s & Con’sPro’s & Con’s

• Advantages

– the concept is easy to understand

– the median can be determined for any type of data (with the exception of nominal)

– the median is not unduly influenced by extreme values in the dataset

• Disadvantages

– data must be arranged in rank order (ascending or descending)

– cannot combine medians in statistical calculations as with mean values

Page 23: Very Basic Statistics

Task 3Task 3

• Using the student age data below, find the median age

18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18

Page 24: Very Basic Statistics

QuartilesQuartiles

• Also known as percentiles

• Lower quartile - 25% of the data is below this • Position of Q1 = ¼*(n+1)

• Upper quartile – 75% of the data is below this

• Position of Q3 = ¾*(n+1)

• If a quartile falls on an observation, the value of the quartile is the value of that observation.

• For example, if the position of a quartile is 20, its value is the value of the 20th observation.

Page 25: Very Basic Statistics

QuartilesQuartiles

• If a quartile lies between observations, the value of the quartile is the value of the lower observation plus the specified fraction of the difference between the two observations.

40 42 45 50 53 54 70 99

Position of Upper Quartile = ¾*(n+1) = 6.75

Upper quartile = data-point 6 + 0.75*(data-point 7 – data-point 6)

Upper quartile = 54 + 0.75*(70 – 54) = 66

Page 26: Very Basic Statistics

Task 4Task 4

• Using the student age data below find the upper and lower quartiles

18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18

Page 27: Very Basic Statistics

Measures of DispersionMeasures of Dispersion

Page 28: Very Basic Statistics

Common MeasuresCommon Measures

• The dispersion in a set of data is the variation among the set of data values.

• It measures whether they are all close together, or more scattered.

Report turnaround time (days)Report turnaround time (days)4 162 6 8 10 12 14 42 6 8 10 12

Page 29: Very Basic Statistics

Common MeasuresCommon Measures

• The four common measures of spread are

• the range

• the inter-quartile range

• the variance

• the standard deviation

Page 30: Very Basic Statistics

RangeRange

• The range is the difference between the largest and the smallest values in the dataset i.e. the maximum difference between data-points in the list.

• It is sensitive to only the most extreme values in the list. The range of a list is 0 if and only if all the data-points in the list are equal.

4 16 DaysRange

Page 31: Very Basic Statistics

Pro’s & Con’sPro’s & Con’s

• Advantages

– best for symmetric data with no outliers

– easy to compute and understand

– good option for ordinal data

• Disadvantages

– doesn’t use all of the data, only the extremes

– very much affected if the extremes are outliers

– only shows maximum spread, does not show shape

Page 32: Very Basic Statistics

Task 5Task 5

• Using the student age data find the range of the data.

18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18

Page 33: Very Basic Statistics

Inter-quartile RangeInter-quartile Range

• (upper quartile – lower quartile)

• Essentially describes how much the middle 50% of your dataset varies

• example: if all patients in a dentist surgery took more-or-less the same time to be treated with only one or two exceptionally quick or long appointments you would expect the inter-quartile range to be very small

• but if all appointments were either very quick or very long, with few in between then the inter-quartile range would be larger.

Page 34: Very Basic Statistics

Pro’s & Con’sPro’s & Con’s

• Advantages

– Good for ordinal data

– Ignores extreme values

– More stable than the range because it ignores outliers

• Disadvantages

– Harder to calculate and understand

– Doesn’t use all the information (ignores half of the data-points, not just the outliers)

• Tails almost always matter in data and these aren’t included

• Outliers can also sometimes matter and again these aren’t included.

Page 35: Very Basic Statistics

Task 6Task 6

• Using the student age data find the inter-quartile range.

18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18

Page 36: Very Basic Statistics

Variance and Standard DeviationVariance and Standard Deviation

(, s2) =(population notation, sample notation)

• The variance (s2)and standard deviation (s)are measures of the deviation or dispersion of observations (x) around the mean ( of a distribution

• Variance is an ‘average’ squared deviation from the mean

Page 37: Very Basic Statistics

Variance and Standard DeviationVariance and Standard Deviation

• The standard deviation (SD) is the square root of the variance. • small SD = values cluster closely around the mean• large SD = values are scattered

Days 8 1210

1 SD1 SD Mean

4 16

Mean

10

1 SD1 SD

6 8 12 14

Page 38: Very Basic Statistics

Variance and Standard DeviationVariance and Standard Deviation

• The following formulae define these measures

Population Sample

22

22

22

1

ss

n

xxs

N

x

Deviation Standard Deviation Standard

VarianceVariance

Page 39: Very Basic Statistics

VarianceVariance

• Advantages: • uses all of the data values

• Disadvantages: • the variance is measured in the original units squared

• extreme values or outliers effect the variance considerably

• hard to calculate manually

Page 40: Very Basic Statistics

Standard DeviationStandard Deviation

• Advantages:

• same units of measurement as the values

• useful in theoretical work and statistical methods and inference

• Disadvantages:

• hard to calculate manually

Page 41: Very Basic Statistics

Task 7Task 7

• Using the student age data find the variance and the standard deviation

18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18

Page 42: Very Basic Statistics

Session SummarySession Summary

• Measures of Location• Mean• Mode • Median• Quartiles

• Measures of Dispersion• Range• Interquartile Range• Variance• Standard Deviation

Page 43: Very Basic Statistics

Data DisplaysData Displays

Page 44: Very Basic Statistics

Session ContentSession Content

– Histograms– Run charts– Box plots– Bar charts– Pareto charts– Pie charts– Scatter plots– Contingency tables

Page 45: Very Basic Statistics

HistogramsHistograms

90.082.575.067.560.052.545.0

30

25

20

15

10

5

0

dataset 1 (normal)

Frequency

Histogram of dataset 1 (normal)

Page 46: Very Basic Statistics

Run ChartsRun Charts

frithuwedtuemonfrithuwedtuemonfrithuwedtuemonfrithuwedtuemon

35.0

32.5

30.0

27.5

25.0

Day

Tim

e T

ake

n

Time Series Plot of Time Taken

Page 47: Very Basic Statistics

BoxplotsBoxplots

dataset 3 (uniform)dataset 2 (exponential)dataset 1 (normal)

400

300

200

100

0

Data

Boxplot of dataset 1 (norma, dataset 2 (expon, dataset 3 (unifo

Page 48: Very Basic Statistics

Bar ChartsBar Charts

wrong medicinewrong timewrong dosewrong patientmissed dose

20

15

10

5

0

Causes of Medication Errors

Frequency

Chart of Frequency

Page 49: Very Basic Statistics

Pareto ChartsPareto Charts

Frequency 18 15 4 2 1Percent 45.0 37.5 10.0 5.0 2.5Cum % 45.0 82.5 92.5 97.5 100.0

Causes of Medication Errors

Othe

r

wrong

patien

t

wrong

med

icine

wrong

time

wrong

dos

e

40

30

20

10

0

100

80

60

40

20

0

Frequency

Perc

ent

Pareto Chart of Causes of Medication Errors

Page 50: Very Basic Statistics

Pie ChartsPie Charts

missed dosewrong patientwrong dosewrong timewrong medicine

Category

4, 10.0%

15, 37.5%18, 45.0%

2, 5.0%1, 2.5%

Pie Chart of Causes of Medication Errors

Page 51: Very Basic Statistics

ScatterplotsScatterplots

2520151050

80

70

60

50

40

30

20

10

0

Time on Diet

Weig

ht

Loss

Scatterplot of Weight Loss vs Time on Diet

Page 52: Very Basic Statistics

Contingency TablesContingency Tables

Colour of eyes

Colour of hair Brown Green/grey Blue Total

Black 50 54 41 145

Brown 38 46 48 132

Fair 22 30 31 83

Ginger 10 10 20 40

Total 120 140 140 400=N

Page 53: Very Basic Statistics

Session SummarySession Summary

– Histograms– Run charts– Box plots– Bar charts– Pareto charts– Pie charts– Scatter plots– Contingency tables

Page 54: Very Basic Statistics

Course SummaryCourse Summary

• Data Types

• Descriptive Statistics

• Data Displays