Upload
isaac-sullivan
View
237
Download
0
Tags:
Embed Size (px)
Citation preview
DefinitionsPopulation: A collection, or set, of individuals, objects, or events whose properties are to be analyzed.
Sample: A subset of the population.
We desire knowledge about an entire population but is most often the case that it is prohibitively expensive, so we select representative sample from the population and study the individual items in the sample.
Descriptive Statistics: The collection, presentation, and description of of the sample data.
Inferential Statistics: The technique of of interpreting the values resulting from the descriptive techniques and making decisions and drawing conclusions about the population.
1Section 1.1, Page 4
DefinitionsParameter: A numerical value summarizing all the data of a population. For example, the average high school grade point of all Shoreline Students is 3.20. We often use Greek letters to identify parameters, μ = 3.20.
Statistic: A numerical value summarizing the sample data. For example, the average grade point of a sample of Shoreline Students is 3.18. We would use the symbol,
x 3.18
The statistic corresponds to the parameter. We usually don’t know the value of the parameter, so we take a sample and estimate it with the corresponding statistic.
Sampling Variation: While the parameter of a population is considered a fixed number, the corresponding statistic will vary from sample to sample. Also, different populations give rise to more or less sampling variability. Considering the variable age, samples of 60 students from a Community college would have less variability than samples of a Seattle neighborhood.
2Section 1.1, Page 4
VariablesVariable: A characteristic of interest about each element of a population.
Data: The set of values collected for the variable from each of the elements that belong to the sample.
Variability: The extent to which data values for a particular variable differ from each other.
Numerical or Quantitative Variable: A variable that quantifies an element of the population. The HS grade point of a student is a numerical variable. Numerical variables are numbers for which math operations make sense. The average grade point of a sample makes sense.
Continuous Numerical Variable: The variable can take on take on an uncountable number of values between to points on the number line. An example is the weight of people.
Discrete Numerical Variable: The variable can take on a countable number of values between two points on a number line. An example is the price of statistics text books.
3Section 1.1, Page 8
Variables (2)
Section 1.1, Page 8 4
Categorical or Qualitative Variable: A variable that describes or categorizes an element of a population. The gender of a person would be a categorical variable. The categories are male and female.
Nominal Categorical Variable: A categorical variable that uses a number to describe or name an element of a population. An example is a telephone area code. It is a number, but not a numerical variable used on math operations. The average area code does not make sense.
Ordinal Categorical Variable: A categorical variable that incorporates an ordered position or ranking. An example would be a survey response that ranks “very satisfied” ahead of “satisfied” ahead of “somewhat satisfied.” Limited math operations may be done with ordinal variables.
Problems
Problems, Page 19 5
Problems
Section 1.3, Page 20 6
Observational Studies and Experiments
Section 1.3, Page 12 7
Observational Study: Researchers collect data without modifying the environment or controlling the process being observed. Surveys and polls are observational studies. Observational studies cannot establish causality.
Example: For a randomly selected high school researchers collect data on each student, grade point and whether the student has music training, to see if there is a relationship between the two variables.
Experiments: Researchers collect data in a controlled environment. The investigator controls or modifies the environment and observes the effect of a variable under study. Experiments can establish causality.
Example: Randomly divide a sample of people with migraine headaches into a control and treatment groups. Give the treatment group a experimental medication and the control group a placebo, and then measure and compare the reduction of frequency and severity of headaches for both groups.
Single-Stage Sampling Methods
Section 1.3, Page 13 8
Single-stage sampling: A sample design in which the elements of the sampling frame treated equally and there is no subdividing or partitioning of the frame.
Simple Random Sample: Sample selected in such a way that every element of the population has an equal probability of being selected and all samples of size n have an equal probability of being selected.
Example: Select a simple random sample of 6 students from from a class of 30.
1.Number the students from 1 to 30 on the roster.
2.Get 6 non-recurring random numbers between 1 and 30.
3.The six students who match the six random numbers are the sample.
Multistage Sampling Designs
Section 1.3, Page 15 9
Multistage Sampling: A sample design in which the elements of the sampling frame are subdivided and the sample is chosen in more than one stage.
Stratified Random Sampling: A sample is selected by stratifying the population, or sampling frame, and then selecting a number of items from each of the strata by means of a simple random sampling technique.The strata are usually subgroups of the sampling frame that are homogeneous but different from each other.
Example: Select a sample of six students from a class of 30 so that the sample contains an equal number of males and females.1.List the males and females separately2.Take a simple random sample of 3 students from each group. 3.The six students selected are the sample.
Multi-Stage Sampling Designs
Section 1.3, Page 16 10
Cluster Sample: A sample obtained stratifying the population, or sampling frame, and then selecting some or all of the items from some, but not all of the strata.
The strata are usually easily identified subgroups of the sampling frame that are similar to each other. This is often the most economical way to sample a large population.
Example: Take a sample of 300 Catholics in the Seattle Area.1. Get a list of the Catholic Parishes in the Seattle area.2. Take a random sample of 3 parishes.3. In each parish, select a simple random sample of 100 parishioners.
Problem
a. Find the mean, variance, and standard deviation.
b. Find the 5-number summary.
c. Make a box and whisker display and label the numbers.
d. Calculate the Interquartile range and the range
e. Describe the shape of the distribution
11Problems, Page 50
Summary of Probability Formulas
Equally Likely Outcomes: P(A) = n(A)/n
Complement: P(A) = 1- P(not A); P(not A) =1- P(A)
General Addition Rule: P(A or B) = P(A) + P(B) – P(A and B)
If A and B are disjoint, P(A and B) = 0Then the Special Addition Rule:Then P(A or B) = P(A) + P(B)
General Multiplication Rule: P(A and B) = P(A)×P(B|A)
If A and B are independent, P(B|A) = P(B)Then the Special Multiplication Rule:
P(A and B) = P(A)×P(B)
OddsIf the odds for A are a:b, then the odds against A are
b:a. The probability of A is a/(a+b). The probability of not A is b/(b+a)
12Chapter 4
Problems
13Problems, Page 95
Problem
14Problems, Page 95
Problems
15Problems, Page 97
Problems
16Problems, Page 99
Z Score Problems
17Problems, Page 52
Problems
18Problems, Page 132
Problems
19Problems, Page 133
6.51 IQ scores are normally distributed with a mean of 100 and a standard deviation of 16. Find the following:a.The 66th percentile.b.The 80th percentile.c.The minimum score required to be in the top 10%.d.The minimum score to be in the top 25%.
6.52 Find the two z-scores that bound the middle 30% of the standard normal distribution.
Problems
20Problems, Page 149
Problems
21Problems, Page 151
Problems
22Problems, Page 50
Problems
23Problems, Page 179
the standard deviation is 5 seconds.
Problems
Test the claim that the BMI of the cardiovascular technologists is different than the BMI of the general population. Use α = .05. Assume the population of the BMI of the cardiovascular technologists is normal.a.State the necessary hypotheses.b.Is the sampling distribution normal. Why?c.Find the p-value.d.State your conclusion.e.If you made an error, what type of error did you make?
24Problems Page 181
Problems
25Problems, Page 179
Problems
a. Find the 98% confidence interval.b. Find the critical valuec. Find the margin of error.d. Find the standard error.e. What assumption must we make about the the
population to have a t-sampling distribution.f. What are the proper words to describe the
confidence interval?g. If you wanted to have a margin of error of one
minute and the 98% confidence interval for this data, how large must the sample be?
26Problems, Page 205
Problems
a. Find the p-value.b. State your conclusion.c. What is the name of the probability model used for
the sampling distributiond. What is the mean of the sampling distribution?e. What is the value of the standard error?f. If your conclusion is in error, what type of error is it?
27Problems, Page 205
Problems
28Problems, Page 208
Problems
a. Check the conditions for a normal sampling distribution.b. State the hypotheses.c. Find the p-value.d. State your conclusione. If you make an error in your conclusion, what type is it?f. Find the mean of the sampling distribution.g. Find the standard error of the sampling distribution.
29Problems, Page 207
Dependent and Independent Samples
30Section 10.1, Page 208
Problems
a. Test the hypotheses that the people increased their knowledge. Use α=.05 and assume normality. State the appropriate hypotheses.
b. Find the p-value and state your conclusion.
c. Find the 90% confidence interval for the mean estimate of the increase in test scores.
31Problems, Page 231
Problems
a. State the hypothesis (Assume Normality)b. Find the p-value, and state you conclusion.c. Find the 95% confidence interval for the
difference of the means; Gouda-Brie.d. Find the mean and standard error of the sampling
distribution
32Problems, Page 232
Problems
a. State the appropriate hypotheses.b. Find the p-value and state your conclusion.c. What model is used for the sampling distribution and
what is the mean of the sampling distribution and its standard error?
d. Find the 98% confidence interval for the difference in proportions, men – women.
33Problems, Page 234
Summary of Chi-Square Applications
Goodness of Fit TestGiven one categorical variable with a fixed set of proportions for the categories. Ha: The observed data does not fit the proportions.Calculate expected values (Ho true proportion * total observations)Observed and Expected data in List EditorPRGM: GOODFIT
Test for IndependenceGiven two categorical variables measured on the same population.Ha: The variables are not independent (They are related)Observed data in Matrix EditorStat-Tests-χ2 Test
Test for HomogeneityGiven one categorical variable and two or more populations.Ha: The proportions for the categories are not the same for for all populations.Observed data in Matrix EditorStat-Tests-χ2 Test
34Chapter 12, Summary
Chi-Square DistributionFair Die Example
Now we need a sampling distribution for the Χ2 statistic = 2.2, so we can calculate the probability of getting a Χ2 ≥ 2.2 when the true proportions are all equal to 1/6.
Χ2 Distribution for 5 df
This is a distribution of all possible Χ2 statistics calculated from all possible samples of 60 observations when there are 6 proportions or cells. Note that the degree of freedom equals the number of proportions – 1.
Finding the p-value on the TI-83, Given Χ2 Stat, df
PRGM – CHI2DISTLOWER BOUND: 2.2UPPER BOUND: 2ND E99df: 5Output: P-VALUE = 0.8208The null hypothesis cannot be rejected.
35Section 11.2, Page 240
Problems
a. Perform a hypotheses test to see if the preferences are not all the same. State the hypotheses.
b. Find the p-value and state your conclusionc. What is the name of the model used for the
sampling distribution?
36Problems, Page 252
Problems
a. Perform a hypotheses test to see if the preferences are not all the same. State the hypotheses.
b. Find the p-value and state your conclusionc. What is the name of the model used for the
sampling distribution?
37Problems, Page 252
Problems
a. Test the hypotheses that the size of community reared in is independent of the size of community residing in. State the appropriate hypotheses.
b. Find the p-value and state your conclusionc. What is the name of the sampling
distribution?d. What are the necessary conditions, and are
they satisfied? What is the value of the smallest expected cell?
38Section 11.3, Page 254
The F-Distribution
39Sec 10.5, Page 226
Each sample must be from a normal distribution4.
Problem
40Problems, Page 234
Set up the problem so that the the F-Stat >1.
a. State the necessary hypotheses.b. Find the p-value and state your conclusion.c. What is the name of the model used for the
sampling distribution?
Problems
41Sec 12.1, Page 268
a. State the necessary hypotheses.b. Sketch the side-by-side box plots. Does it appear
that the means are all the same?c. Find the p-value and state your conclusion.d. What is the name of the model used for the
sampling distribution?
Problems
Problems, Page 268 42
SampleSize
SampleMean
SampleSt. Dev.
Atlanta 6 24.67 7.76
Boston 7 33.00 9.56
Dallas 7 30.86 7.58
Philadelphia 5 32.20 7.47
Seattle 5 27.40 9.40
St. Louis 6 25.83 10.03
a. Test the hypotheses that not all the mean commute times are all the same. State the appropriate hypothesis.
b. Find the p-value and state your conclusion.c. What is the name of the sampling distribution?d. What is the F-Statistic, the df numerator and df
denominator?
Problems
Problems, Page 268 43
SampleSize
SampleMean
SampleSt. Dev.
Atlanta 6 24.67 7.76
Boston 7 33.00 9.56
Dallas 7 30.86 7.58
Philadelphia 5 32.20 7.47
Seattle 5 27.40 9.40
St. Louis 6 25.83 10.03
a. Test the hypotheses that not all the mean commute times are all the same. State the appropriate hypothesis.
b. Find the p-value and state your conclusion.c. What is the name of the sampling distribution?d. What is the F-Statistic, the df numerator and df
denominator?