Upload
cuthbert-miller
View
233
Download
2
Embed Size (px)
Citation preview
© aSup-2011
Sampling Survey
1
SAMPLING SURVEY
TECHNIQUE
© aSup-2011
Sampling Survey
2
POPULATIONS and SAMPLES
THE POPULATIONis the set of all the individuals of
interest in particular study
THE SAMPLEis a set of individuals selected from a population, usually intended to represent the population in a research study
The sample is selected from the population
The result from the sample are generalized
from the population
© aSup-2011
Sampling Survey
Teknik pengumpulan data
Pengumpulan Data
Sensus (populasi) Sampling (sampel)
Probabilita Non-Probabilita
© aSup-2011
Sampling Survey
4
PARAMETER and STATISTIC A parameter is a value, usually a numerical
value, that describes a population.A parameter may be obtained from a single measurement, or it may be derived from a set of measurements from the population
A statistic is a value, usually a numerical value, that describes a sample.A statistic may be obtained from a single measurement, or it may be derived from a set of measurement from sample
© aSup-2011
Sampling Survey
5
SAMPLING ERROR Although samples are generally
representative of their population, a sample is not expected to give a perfectly accurate picture of the whole population
There usually is some discrepancy between sample statistic and the corresponding population parameter called sampling error
© aSup-2011
Sampling Survey
6
TWO KINDS OF NUMERICAL DATA
Generally fall into two major categories:
1. Counted frequencies enumeration data
2. Measured metric or scale values measurement or metric dataStatistical procedures deal with both kinds
of data
© aSup-2011
Sampling Survey
7
DATUM and DATA The measurement or observation obtain
for each individual is called a datum or, more commonly a score or raw score
The complete set of score or measurement is called the data set or simply the data
After data are obtained, statistical methods are used to organize and interpret the data
© aSup-2011
Sampling Survey
8
VARIABLE A variable is a characteristic or
condition that changes or has different values for different individual
A constant is a characteristic or condition that does not vary but is the same for every individual
A research study comparing vocabulary skills for 12-year-old boys
© aSup-2011
Sampling Survey
9
QUALITATIVE and QUANTITATIVE
Categories Qualitative: the classes of objects are
different in kind.There is no reason for saying that one is greater or less, higher or lower, better or worse than another.
Quantitative: the groups can be ordered according to quantity or amountIt may be the cases vary continuously along a continuum which we recognized.
© aSup-2011
Sampling Survey
10
DISCRETE and CONTINUOUS Variables
A discrete variable. No values can exist between two neighboring categories.
A continuous variable is divisible into an infinite number or fractional parts○ It should be very rare to obtain identical
measurements for two different individual○ Each measurement category is actually an
interval that must be define by boundaries called real limits
© aSup-2011
Sampling Survey
11
CONTINUOUS Variables Most interval-scale measurement are
taken to the nearest unit (foot, inch, cm, mm) depending upon the fineness of the measuring instrument and the accuracy we demand for the purposes at hand.
And so it is with most psychological and educational measurement. A score of 48 means from 47.5 to 48.5
We assume that a score is never a point on the scale, but occupies an interval from a half unit below to a half unit above the given number.
© aSup-2011
Sampling Survey
12
FREQUENCIES, PERCENTAGES, PROPORTIONS, and RATIOS
Frequency defined as the number of objects or event in category.
Percentages (P) defined as the number of objects or event in category divided by 100.
Proportions (p). Whereas with percentage the base 100, with proportions the base or total is 1.0
Ratio is a fraction. The ratio of a to b is the fraction a/b.A proportion is a special ratio, the ratio of a part to a total.
© aSup-2011
Sampling Survey
13
MEASUREMENTS and SCALES (Stevens, 1946)
Nominal
Ordinal
Interval
Ratio
© aSup-2011
Sampling Survey
14
FREQUENCY DISTRIBUTION, GRAPH,
and PERCENTILE
© aSup-2011
Sampling Survey
15
A class of 40 students has just returned the Perceptual Speed test score. Aside from the primary question about your grade, you’d like to know more about how you stand in the class
How does your score compare with other in the class? What was the range of performance
What more can you learn by studying the scores?
© aSup-2011
Sampling Survey
16
Score of PERCEPTUAL SPEED Test
29 47 45 40 48 48 49 4540 49 48 37 48 46 55 6736 53 25 58 33 33 43 4232 51 48 54 47 40 38 4446 50 28 44 52 49 56 48
Taken from Guilford p.55
© aSup-2011
Sampling Survey
17
OVERVIEW
When a researcher finished the data collect phase of an experiment, the result usually consist pages of numbers
The immediate problem for the researcher is to organize the scores into some comprehensible form so that any trend in the data can be seen easily and communicated to others
This is the jobs of descriptive statistics; to simplify the organization and presentation of data
One of the most common procedures for organizing a set of data is to place the scores in a FREQUENCY DISTRIBUTION
© aSup-2011
Sampling Survey
18
GROUPED SCORES After we obtain a set of measurement
(data), a common next step is to put them in a systematic order by grouping them in classes
With numerical data, combining individual scores often makes it easier to display the data and to grasp their meaning. This is especially true when there is a wide range of values.
© aSup-2011
Sampling Survey
19
TWO GENERAL CUSTOMS IN THE SIZE OF CLASS
INTERVAL1.We should prefer not fewer than 10 and
more than 20 class interval.○ More commonly, the number class
interval used is 10 to 15.○ An advantage of a small number class
interval is that we have fewer frequencies which to deal with
○ An advantage of larger number class interval is higher accuracy of computation
© aSup-2011
Sampling Survey
20
TWO GENERAL CUSTOMS IN THE SIZE OF CLASS
INTERVAL2. Determining the choice of class interval
is that certain ranges of units (scores) are preferred.Those ranges are 2, 3, 5, 10, and 20.These five interval sizes will take care of almost all sets of data
© aSup-2011
Sampling Survey
21
Score of PERCEPTUAL SPEED Test
29 47 45 40 48 48 49 4540 49 48 37 48 46 55 6736 53 25 58 33 33 43 4232 51 48 54 47 40 38 4446 50 28 44 52 49 56 48
Taken from Guilford p.55
© aSup-2011
Sampling Survey
22
HOW TO CONSTRUCT A GROUPED FREQUENCY DISTRIBUTION
Step 1 : find the lowest score and the highest score
Step 2 : find the range by subtracting the lowest score from the highest
Step 3 : divide the range by 10 and by 20 to determine the largest and the smallest acceptable interval widths. Choose a convenient width (i) within these limits
© aSup-2011
Sampling Survey
23
Score of PERCEPTUAL SPEED Test
29 47 45 40 48 48 49 45
40 49 48 37 48 46 55 67
36 53 25 58 33 33 43 42
32 51 48 54 47 40 38 44
46 50 28 44 52 49 56 48
Range = 42 42 : 10 = 4,2 and 42 : 20 = 2,1
© aSup-2011
Sampling Survey
24
WHERE TO START CLASS INTERVAL
It’s natural to start the interval with their lowest scores at multiples of the size of the interval.
When the interval is 3, to start with 24, 27, 30, 33, etc.; when the interval is 4, to start with 24, 28, 32, 36, etc.
© aSup-2011
Sampling Survey
25
HOW TO CONSTRUCT A GROUPED FREQUENCY DISTRIBUTION
Step 4 : determine the score at which the lowest interval should begin. It should ordinarily be a multiple of the interval.
Step 5 : record the limits of all class interval, placing the interval containing the highest score value at the top. Make the intervals continuous and of the same width
Step 6 : using the tally system, enter the raw scores in the appropriate class intervals
Step 7 : convert each tally to a frequency
© aSup-2011
Sampling Survey
26
FREQUENCY DISTRIBUTION TABLESCORE
66 - 68
63 - 65
60 - 62
57 -59
54 - 56
51 - 53
48 - 50
45 - 47
42 - 44
39 - 41
36 - 38
33 - 35
30 - 32
27 - 29
24 - 26
SCORE
64 - 67
60 - 63
56 - 59
52 - 55
48 - 51
44 - 47
40 - 43
36 - 39
32 - 35
28 - 31
24 - 27
X max = 67
X min = 25
RANGE = 42Interval = 3
C.i = 15
Interval = 4
C.i = 11
© aSup-2011
Sampling Survey
27
PERCEPTUAL
SPEED
SCORE f Xc Lower Exact Limit Upper Exact Limit
64 -67 1 65.5 63.5 67.5
60 - 63 0 61.5 59.5 63.5
56 - 59 2 57.5 55.5 59.5
52 - 55 4 53.5 51.5 55.5
48 - 51 11 49.5 47.5 51.5
44 - 47 8 45.5 43.5 47.5
40 - 43 5 41.5 39.5 43.5
36 - 39 3 37.5 35.5 39.5
32 - 35 3 33.5 31.5 35.5
28 - 31 2 29.5 27.5 31.5
24 - 27 1 25.5 23.5 27.5
© aSup-2011
Sampling Survey
28
WARNING!! Although grouped frequency distribution can make
easier to interpret data, some information is lost. In the table, we can see that more people scored in
the interval 48 – 51 than in any other interval However, unless we have all the original scores to
look at, we would not know whether the 11 scores in this interval were all 48s, all, 49s, all 50s, or all 51 or were spread throughout the interval in some way
This problem is referred to as GROUPING ERROR The wider the class interval width, the greater the
potential for grouping error
© aSup-2011
Sampling Survey
29
STEM and LEAF DISPLAY In 1977, J.W. Tukey presented a
technique for organizing data that provides a simple alternative to a frequency distribution table or graph
This technique called a stem and leaf display, requires that each score be separated into two parts.
The first digit (or digits) is called the stem, and the last digit (or digits) is called the leaf.
© aSup-2011
Sampling Survey
30
Data Stem & Leaf Display83 82 6362 93 7871 68 3376 52 9785 42 4632 57 5956 73 7474 81 76
3
4
5
6
7
8
9
3
2
1 6
5
2 3
2 66 2 7 9
4 3 8 4 6
2 8 3
2 1
3 7
© aSup-2011
Sampling Survey
31
GROUPED FREQUENCY DISTRIBUTION HISTOGRAM AND A STEM AND LEAF
DISPLAY
3 4 5 6 7 8 9
2 3
2 6
6 2
7
9
2 8
3
1 6
4
3
8 4
6
3 5
2
1
3 7
7654321
30 40 50 60 70 80 900
© aSup-2011
Sampling Survey
32
MAKING GRAPHPOLIGON and HISTOGRAM
© aSup-2011
Sampling Survey
33
MAKING GRAPH
POLIGON
© aSup-2011
Sampling Survey
34
PERCEPTUAL
SPEED
SCORE
f XcLower Exact
LimitLower Exact
Limit
64 -67 165.
563.5 67.5
60 - 63 061.
559.5 63.5
56 - 59 257.
555.5 59.5
52 - 55 453.
551.5 55.5
48 - 5111
49.5
47.5 51.5
44 - 47 845.
543.5 47.5
40 - 43 541.
539.5 43.5
36 - 39 337.
535.5 39.5
32 - 35 333.
531.5 35.5
28 - 31 229.
527.5 31.5
24 - 27 125.
523.5 27.5
© aSup-2011
Sampling Survey
35
POLIGON
X
f
0 29.5 37.5 45.5 53.5 61.525.5 33.5 41.5 49.5 57.5 65.5
12
10
8
6
4
2
21.5 69.5
Class Interval’s
MIDPOINT
© aSup-2011
Sampling Survey
36
PERCEPTUAL SPEED
X
f
0 29.5 37.5 45.5 53.5 61.525.5 33.5 41.5 49.5 57.5 65.5
12
10
8
6
4
2
21.5 69.5
© aSup-2011
Sampling Survey
37
MAKING GRAPH
HISTOGRAM
© aSup-2011
Sampling Survey
38
PERCEPTUAL
SPEED
SCORE
f XcLower Exact
LimitLower Exact
Limit
64 -67 165.
563.5 67.5
60 - 63 061.
559.5 63.5
56 - 59 257.
555.5 59.5
52 - 55 453.
551.5 55.5
48 - 5111
49.5
47.5 51.5
44 - 47 845.
543.5 47.5
40 - 43 541.
539.5 43.5
36 - 39 337.
535.5 39.5
32 - 35 333.
531.5 35.5
28 - 31 229.
527.5 31.5
24 - 27 125.
523.5 27.5
© aSup-2011
Sampling Survey
39
HISTOGRAM
X
f
0 27.5 35.5 43.5 51.5 59.5 67.523.5 31.5 39.5 47.5 55.5 63.5
12
10
8
6
4
2
Class Interval’s EXACT LIMIT
© aSup-2011
Sampling Survey
40
POLIGON and HISTOGRAM
X
f
0 27.5 35.5 43.5 51.5 59.5 67.523.5 31.5 39.5 47.5 55.5 63.5
12
10
8
6
4
2
© aSup-2011
Sampling Survey
41
THE SHAPE OF A FREQUENCY DISTRIBUTION
Symmetrical
It is possible to draw a vertical line through the
middle so that one side of the distribution is a mirror
image of the other
Skewed
The scores tend to pile up toward one end of the scale and taper off gradually at the other end
positive negative
© aSup-2011
Sampling Survey
42
Describe the shape of distribution for the data in the following table
X f
5
4
3
2
1
4
6
3
1
1
LEARNING CHECK
The distribution is negatively
skewed
© aSup-2011
Sampling Survey
43
PERCENTILES and PERCENTILE RANKS The percentile system is widely used in
educational measurement to report the standing of an individual relative performance of known group. It is based on cumulative percentage distribution.
A percentile is a point on the measurement scale below which specified percentage of the cases in the distribution falls
The rank or percentile rank of a particular score is defined as the percentage of individuals in the distribution with scores at or below the particular value
When a score is identified by its percentile rank, the score called percentile
© aSup-2011
Sampling Survey
44
Suppose, for example that A have a score of X=78 on an exam and we know exactly 60% of the class had score of 78 or lower….…
Then A score X=78 has a percentile of 60%, and A score would be called the 60th percentile
Percentile Rank refers to a percentage
Percentile refers to a score
© aSup-2011
Sampling Survey
45
CENTRAL TENDENCY
Mean, Median, and Mode
© aSup-2011
Sampling Survey
46
OVERVIEW The general purpose of descriptive
statistical methods is to organize and summarize a set score
Perhaps the most common method for summarizing and describing a distribution is to find a single value that defines the average score and can serve as a representative for the entire distribution
In statistics, the concept of an average or representative score is called central tendency
© aSup-2011
Sampling Survey
47
OVERVIEW Central tendency has purpose to provide
a single summary figure that best describe the central location of an entire distribution of observation
It also help simplify comparison of two or more groups tested under different conditions
There are three most commonly used in education and the behavioral sciences: mode, median, and arithmetic mean
© aSup-2011
Sampling Survey
48
The MODE A common meaning of mode is
‘fashionable’, and it has much the same implication in statistics
In ungrouped distribution, the mode is the score that occurs with the greatest frequency
In grouped data, it is taken as the midpoint of the class interval that contains the greatest numbers of scores
The symbol for the mode is Mo
© aSup-2011
Sampling Survey
49
The MEDIAN The median of a distribution is the point
along the scale of possible scores below which 50% of the scores fall and is there another name for P50
Thus, the median is the value that divides the distribution into halves
It symbols is Mdn
© aSup-2011
Sampling Survey
50
The ARITHMETIC MEAN The arithmetic mean is the sum of all
the scores in the distribution divided by the total number of scores
Many people call this measure the average, but we will avoid this term because it is sometimes used indiscriminately for any measure of central tendency
For brevity, the arithmetic mean is usually called the mean
© aSup-2011
Sampling Survey
51
The ARITHMETIC MEAN Some symbolism is needed to express the mean
mathematically. We will use the capital letter X as a collective term to specify a particular set of score (be sure to use capital letters; lower-case letters are used in a different way)
We identify an individual score in the distribution by a subscript, such as X1 (the first score), X8 (the eighth score), and so forth
You remember that n stands for the number in a sample and N for the number in a population
© aSup-2011
Sampling Survey
52
Properties of the Mode The mode is easy to obtain, but it is not very
stable from sample to sample Further, when quantitative data are grouped,
the mode maybe strongly affected by the width and location of class interval
There may be more than one mode for a particular set of scores. In rectangular distribution the ultimate is reached: every score share the honor! For these reason, the mean or the median is often preferred with numerical data
However, the mode is the only measure that can be used for data that have the character of a nominal scale
© aSup-2011
Sampling Survey
53
Properties of the Mean Unlike the other measures of central tendency,
the mean is responsive to the exact position of reach score in the distribution
Inspect the basic formula ΣX/n. Increasing or decreasing the value of any score changes ΣX and thus also change the value of the mean
The mean may be thought of as the balance point of the distribution, to use a mechanical analogy. There is an algebraic way of stating that the mean is the balance point:
0)( XX
© aSup-2011
Sampling Survey
54
Properties of the Mean The sums of negative deviation from the
mean exactly equals the sum of the positive deviation
The mean is more sensitive to the presence (or absence) of scores at the extremes of the distribution than are the median or (ordinarily the mode
When a measure of central tendency should reflect the total of the scores, the mean is the best choice because it is the only measure based of this quantity
© aSup-2011
Sampling Survey
55
The MEAN of Ungrouped Data The mean (M), commonly known as the
arithmetic average, is compute by adding all the scores in the distribution and dividing by the number of scores or cases
M =ΣX
N
© aSup-2011
Sampling Survey
56
The MEAN of Grouped Data When data come to us
grouped, or when they are too
lengthy for comfortable addition without the aid of a calculating machine, or
when we are going to group them for other purpose anyway,
we find it more convenient to apply another formula for the mean:
M =Σ f.Xc
N
X Xc f f.Xc
20 - 24
15 - 19
10 - 14
5 - 9
0 - 4
22
17
12
7
2
1
4
7
5
3
22
68
84
35
6
© aSup-2011
Sampling Survey
57
The MEDIAN of Ungrouped Data
Method 1: When N is an odd number list the score in order (lowest to highest), and the median is the middle score in the list
Method 2: When N is an even number list the score in order (lowest to highest), and then locate the median by finding the point halfway between the middle two scores
© aSup-2011
Sampling Survey
58
The MEDIAN of Ungrouped Data
Method 3: When there are several scores with the same value in the middle of the distribution 1, 2, 2, 3, 4, 4, 4, 4, 4, 5
There are 10 scores (an even number), so you normally would use method 2 and average the middle pair to determine the median
By this method, the median would be 4
© aSup-2011
Sampling Survey
59
X0 1 2 3 4 5
5
4
3
2
1
f
X0 1 2 3 4 5
5
4
3
2
1
f
© aSup-2011
Sampling Survey
60
The MEDIAN of Grouped Data There are 10 scores (an even number), so you
normally would use method 2 and average the middle pair to determine the median. By this method the median would be 4
In many ways, this is a perfectly legitimate value for the median. However when you look closely at the distribution of scores, you probably get the clear impression that X = 4 is not in the middle
The problem comes from the tendency to interpret the score of 4 as meaning exactly 4.00 instead of meaning an interval from 3.5 to 4.5
© aSup-2011
Sampling Survey
61
THE MODE The word MODE means the most
common observation among a group of scores
In a frequency distribution, the mode is the score or category that has the greatest frequency
© aSup-2011
Sampling Survey
62
SELECTING A MEASURE OF CENTRAL TENDENCY
How do you decide which measure of central tendency to use? The answer depends on several factors
Note that the mean is usually the preferred measure of central tendency, because the mean uses every score score in the distribution, it typically produces a good representative value
The goal of central tendency is to find the single value that best represent the entire distribution
© aSup-2011
Sampling Survey
63
SELECTING A MEASURE OF CENTRAL TENDENCY
Besides being a good representative, the mean has the added advantage of being closely related to variance and standard deviation, the most common measures of variability
This relationship makes the mean a valuable measure for purposes of inferential statistics
For these reasons, and others, the mean generally is considered to be the best of the three measure of central tendency
© aSup-2011
Sampling Survey
64
SELECTING A MEASURE OF CENTRAL TENDENCY
But there are specific situations in which it is impossible to compute a mean or in which the mean is not particularly representative
It is in these condition that the mode an the median are used
© aSup-2011
Sampling Survey
65
WHEN TO USE THE MEDIAN1. Extreme scores or skewed distribution
When a distribution has a (few) extreme score(s), score(s) that are very different in value from most of the others, then the mean may not be a good representative of the majority of the distribution.The problem comes from the fact that one or two extreme values can have a large influence and cause the mean displaced
© aSup-2011
Sampling Survey
66
WHEN TO USE THE MEDIAN2. Undetermined values
Occasionally, we will encounter a situation in which an individual has an unknown or undetermined score
Person Time (min.)
123456
811121317
Never finished
Notice that person 6 never complete the puzzle. After one hour, this person still showed no sign of solving the puzzle, so the experimenter stop him or her
© aSup-2011
Sampling Survey
67
WHEN TO USE THE MEDIAN2. Undetermined values
There are two important point to be noted: The experimenter should not throw out this
individual’s score. The whole purpose to use a sample is to gain a picture of population, and this individual tells us about that part of the population cannot solve this puzzle
This person should not be given a score of X = 60 minutes. Even though the experimenter stopped the individual after 1 hour, the person did not finish the puzzle. The score that is recorded is the amount of time needed to finish. For this individual, we do not know how long this is
© aSup-2011
Sampling Survey
68
WHEN TO USE THE MEDIAN3. Open-ended distribution
A distribution is said to be open-ended when there is no upper limit (or lower limit) for one of the categories
Number of children (X)
5 or more43210
322364
Notice that is impossible to compute a mean for these data because you cannot find ΣX
f
© aSup-2011
Sampling Survey
69
WHEN TO USE THE MEDIAN4. Ordinal scale
when score are measured on an ordinal scale, the median is always appropriate and is usually the preferred measure of central tendency
© aSup-2011
Sampling Survey
70
WHEN TO USE THE MODE Nominal scales
Because nominal scales do not measure quantity, it is impossible to compute a mean or a median for data from a nominal scale
Discrete variables indivisible categories Describes shape
the mode identifies the location of the peak (s). If you are told a set of exam score has a mean of 72 and a mode of 80, you should have a better picture of the distribution than would be available from mean alone
© aSup-2011
Sampling Survey
71
CENTRAL TENDENCY AND THE SHAPE OF THE DISTRIBUTION
Because the mean, the median, and the mode are all trying to measure the same thing (central tendency), it is reasonable to expect that these three values should be related
There are situations in which all three measures will have exactly the same or different value
The relationship among the mean, median, and mode are determined by the shape of the distribution
© aSup-2011
Sampling Survey
72
SYMMETRICAL DISTRIBUTION SHAPE
For a symmetrical distribution, the right-hand side will be a mirror image of the left-hand side
By definition, the mean and the median will be exactly at the center because exactly half of the area in the graph will be on either side of the center
Thus, for any symmetrical distribution, the mean and the median will be the same
© aSup-2011
Sampling Survey
73
SYMMETRICAL DISTRIBUTION SHAPE
If a symmetrical distribution has only one mode, it will also be exactly in the center of the distribution. All three measures of central tendency will have same value
A bimodal distribution will have the mean and the median together in the center with the modes on each side
A rectangular distribution has no mode because all X values occur with the same frequency. Still the mean and the median will be in the center and equivalent in value
© aSup-2011
Sampling Survey
74
MEASURES OF VARIABILITY
© aSup-2011
Sampling Survey
75
Knowing the central value of a set of measurement tells us much, but it does not by any means give us the total pictures of the sample we have measured
Two groups of six-year-old children may have the same average IQ of 105. One group contain no individuals with IQs below 95 or above 115, and that the other includes individuals with IQs ranging from 75 to 135
We recognize immediately that there is a decided difference between the two groups in variability or dispersion
© aSup-2011
Sampling Survey
76
75 85 95 105 115 135125
The BLUE group is decidedly more homogenous than the RED group with respect to IQ
© aSup-2011
Sampling Survey
77
Purpose of Measures of Variability
To explain and to illustrate the methods of indicating degree of variability or dispersion by the use of single numbers
The three customary values to indicate variability are○ The total range○ The semi-interquartile range Q, and○ The standard deviation S
© aSup-2011
Sampling Survey
78
The TOTAL RANGE The total range is the easiest and most
quickly ascertained value, but it also the most unreliable
The BLUE group (from an IQ of 95 to one of 115) is 20 points. The range of RED group from 75 to 135, or 60 points
The range is given by the highest score minus the lowest score
The RED group has three times the range of the BLUE group
© aSup-2011
Sampling Survey
79
The SEMI-INTERQUARTILE RANGE Q
The Q is one-half the range of the middle 50 percent of the cases
First we find by interpolation the range of the middle 50 percent, or interquartile range, the divide this range into 2
© aSup-2011
Sampling Survey
80
Q1 Q2 Q3
Lowest Quarter
Low Middle Quarter
High Middle Quarter Highest
Quarter
Q2 – Q1 Q3 – Q2
Q3 – Q1 = 2Q
Q =Q3 – Q1
2
© aSup-2011
Sampling Survey
81
The STANDARD DEVIATION S Standard deviation is by far the most
commonly used indicator of degree of dispersion and is the most dependable estimate of the variability in the population from which the sample came
The S is a kind of average of all deviation from the mean
S = √∑ x2
n - 1
© aSup-2011
Sampling Survey
82
As a general concept, the standard deviation is often symbolized by SD, but much more often by simply S
In verbal terms, a S is the square root of the arithmetic mean of the squared deviations of measurements from their means
© aSup-2011
Sampling Survey
83
Interpretation of a Standard Deviation
The usual and most accepted interpretation of a S is in percentage of cases included within the range from one S below the mean to one S above the mean
In a normal distribution the range from -1σ to +1σ contains 68,27 percent of the cases
If the mean = 29,6 and S = 10,45; we say about two-third of the cases lies from 19,15 to 40,05
© aSup-2011
Sampling Survey
84
Interpretation of a Standard Deviation
One of the most common source of variance in statistical data is individual differences, where each measurement comes from a different person
© aSup-2011
Sampling Survey
85
Interpretation of a Standard Deviation Giving a test of n items to a group of person
Before the first item is given to the group, as far as any information from this test is concerned, the individuals are all alike. There is no variance
Now administer the first item to the group. Some pass it and some fail. Some now have score of 1, and some have scores of zero
There are two groups of individuals. There is much variation, this much variance
© aSup-2011
Sampling Survey
86
Interpretation of a Standard Deviation
Give a second item. Of those who passed the first, some will past the second and some will fail it. Etc.
There are now three possible scores : 0, 1, and 2.
More variance has been introduced Carry the illustration further, adding item by
item The differences between scores will keep
increasing, and also, by computation, the variance and variability
© aSup-2011
Sampling Survey
87
Another rough check is to compare the S obtained with the total range of measurement
In very large samples (N=500 or more) the S is about one-sixth of the total range
In other word, the total range is about six S
In smaller samples the ratio of range to S can be expected to be smaller (see Guilford & Fruchter p.71)
© aSup-2011
Sampling Survey
88
Ratios of the Total Range to the Standard Deviation in a Distribution for Different Values
of N Rough check for a computed SD
○ The actual percentage of a case between +1 SD and -1 SD deviates 68 percents
○ In very large sample (N = 500 or more) the SD as about one-sixth of the total range
N Range/S
N Range/S
N Range/S
5 2.3 40 4.3 400 5.910
3.1 50 4.5 500 6.1
15
3.5 100
5.0 700 6.3
20
3.7 200
5.5 1000
6.5
© aSup-2011
Sampling Survey
89
z-Score:
Location of Scores and Standardized Distribution
© aSup-2011
Sampling Survey
90
PREVIEW In particular, we will convert each
individual score into a new, standardize score, so that the standardized score provides a meaningful description of its exact location within the distribution
We will use the mean as a reference point to determine whether the individual is above or below average
The standard deviation will serve as yardstick for measuring how much an individual differ from the group average
© aSup-2011
Sampling Survey
91
EXAMPLE Suppose you received a score of X = 76
on a statistics exam. How did you do? It should be clear that you need more
information to predict your grade Your score could be one of the best
score in class, or it might be the lowest score in the distribution
© aSup-2011
Sampling Survey
92
X = 76, the best score or the lowest score?
To find the location of your score, you must have information about the other score in the distribution
If the mean were μ = 70 you would be in better position than the mean were μ = 86
Obviously, your position relative to the rest of the class depends on mean
© aSup-2011
Sampling Survey
93
X = 76 and μ = 70 However, the mean by itself is not sufficient
to tell you the exact location of your score At this point, you know that your score is six
points above the mean Six points may be a relatively big distance
and you may have one of the highest score in class, or
Six points may be a relatively small distance and you are only slightly above the average
© aSup-2011
Sampling Survey
94
THE z-SCORE FORMULA
z =X - μ
σ
© aSup-2011
Sampling Survey
95
z-Score and Location In a Distribution
One of the primary purpose of a z-Score is to describe the exact location of a score within a distribution
The z-Score accomplishes this goal by transforming each X value into a signed number (+ or -), so that:○ The sign tells whether the score is located
above (+) or below (-) the mean, and○ The number tells the distance between the
score and the mean in term of the number of standard deviation
© aSup-2011
Sampling Survey
96
If every X value is transformed into a z-score, then the distribution of z-score will have the following properties: Shape of the z-score distribution will be the
same as the original distribution of raw scores. Each individual has exactly the same relative position in the X distribution and the z-score distribution
The Mean will always have a mean of zero. The subject with score same as the mean is transformed into z = 0
The Standard Deviation will always have a standard deviation of 1. The subject with score same as the +1S from the mean is transformed into z = +1
© aSup-2011
Sampling Survey
97
PROBABILITY and NORMAL DISTRIBUTION
In simpler terms, the normal distribution is symmetrical with a single mode in the
middle. The frequency tapers off as you move farther from the middle in either direction
μ
σ
© aSup-2011
Sampling Survey
98
THE DISTRIBUTION OF SAMPLE MEANS
© aSup-2011
Sampling Survey
99
OVERVIEW Whenever a score is selected from a
population, you should be able to compute a z-score
And, if the population is normal, you should be able to determine the probability value for obtaining any individual score
In a normal distribution, a z-score of +2.00 correspond to an extreme score out in the tail of the distribution, and a score at least large has a probability of only p = .0228
© aSup-2011
Sampling Survey
100
THE DISTRIBUTION OF SAMPLE MEANS Two separate samples probably will be
different even though they are taken from the same population
The sample will have different individual, different scores, different means, and so on
The distribution of sample means is the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population
© aSup-2011
Sampling Survey
101
COMBINATION
Consider a population that consist of 5 scores: 3, 4, 5, 6, and 7
Mean population = ? Construct the distribution of sample
means for n = 1, n = 2, n = 3, n = 4, n = 5
nCr =n!
r! (n-r)!
© aSup-2011
Sampling Survey
102
SAMPLING DISTRIBUTION… is a distribution of statistics obtained by selecting all the possible samples of a specific size from a population
CENTRAL LIMIT THEOREMFor any population with mean μ and standard deviation σ, the distribution of sample means for sample size n will have a
mean of μ and a standard deviation of σ/√n and will approach a normal distribution as n approaches infinity
© aSup-2011
Sampling Survey
103
The STANDARD ERROR OF MEAN The value we will be working with is the
standard deviation for the distribution of sample means, and it called the σM
Remember the sampling error There typically will be some error
between the sample and the population The σM measures exactly how much
difference should be expected on average between sample mean M and the population mean μ
© aSup-2011
Sampling Survey
104
The MAGNITUDE of THE σM
Determined by two factors:○The size of the sample, and○The standard deviation of the
population from which the sample is selected
© aSup-2011
Sampling Survey
105
PROBABILITY AND THE DISTRIBUTION OF SAMPLE
MEANS The primary use of the standard distribution of sample means is to find the probability associated with any specific sample
Because the distribution of sample means present the entire set of all possible Ms, we can use proportions of this distribution to determine probabilities
© aSup-2011
Sampling Survey
106
EXAMPLE The population of scores on the SAT forms
a normal distribution with μ = 500 and σ = 100. If you take a random sample of n = 16 students, what is the probability that sample mean will be greater that M = 540?
σM =σ
√n= 25 z =
M - μ
σM= 1.6
z = 1.6 Area C p = .0548
© aSup-2011
Sampling Survey
107
© aSup-2011
Sampling Survey
Tipe-tipe Pengambilan SampelDesain pengambilan sampel
random/probabilita Untuk desain pengambilan sampel
random atau probabilita, setiap elemen dalam populasi harus memiliki kesempatan yang sama dan bebas untuk dipilih sebagai sampel.
108
© aSup-2011
Sampling Survey
Terdapat dua keuntungan dari sampel acak/probabilita:1.Sebagai representasi pengambilan sampel populasi total, penarikan kesimpulan dari sampel seperti ini dapat digeneralisasikan ke pengambilan sampel populasi total.2.Pengujian statistik yang didasarkan pada teori probabilita dapat diaplikasikan hanya pada data yang dikumpulkan dari sampel acak.
109
© aSup-2011
Sampling Survey Metode-metode mengambil sampel
acak The fishbowl draw: jika jumlah total
populasi kecil, prosedur yang mudah adalah menuliskan setiap elemen pada secarik kertas tiap elemennya, masukan pada sebuah kotak, dan ambil satu-persatu tanpa dilihat, sampai kertas yang dipilih sesuai dengan ukuran sampel yang telah ditetapkan
110
© aSup-2011
Sampling Survey Metode-metode mengambil sampel
acak Program komputer Tabel acak: kebanyakan buku
metodologi penelitian dan statistik memasukan tabel acak pada bagian lampirannya. Sampel dapat dipilih dengan menggunakan tabel sesuai prosedur
111