29
STA 291 Spring 2010 Lecture 3 Dustin Lueker

STA 291 Spring 2010

  • Upload
    edda

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

STA 291 Spring 2010. Lecture 3 Dustin Lueker. Sampling Plans. Simple Random Sampling (SRS) Each possible sample has the same probability of being selected Stratified Random Sampling The population can be divided into a set of non-overlapping subgroups (the strata) - PowerPoint PPT Presentation

Citation preview

STA 291-021 Summer 2007

STA 291Spring 2010Lecture 3Dustin Lueker1Simple Random Sampling (SRS)Each possible sample has the same probability of being selectedStratified Random SamplingThe population can be divided into a set of non-overlapping subgroups (the strata)SRSs are drawn from each strataCluster SamplingThe population can be divided into a set of non-overlapping subgroups (the clusters)The clusters are then selected at random, and all individuals in the selected clusters are included in the sampleSystematic Sampling Useful when the population consists as a listA value K is specified. Then one of the first K individuals is selected at random, after which every Kth observation is included in the sampleSampling PlansSTA 291 Spring 2010 Lecture 22STA 291 Spring 2010 Lecture 23Descriptive StatisticsSummarize dataCondense the information from the datasetGraphsTableNumbersInterval dataHistogramNominal/Ordinal dataBar chartPie chartDifficult to see the big picture from these numbersWe want to try to condense the dataData Table: Murder RatesSTA 291 Spring 2010 Lecture 24Alabama 11.6Alaska 9.0Arizona 8.6Arkansas 10.2California 13.1Colorado 5.8Connecticut 6.3Delaware 5.0D C 78.5Florida 8.9Georgia 11.4Hawaii 3.8STA 291 Spring 2010 Lecture 25Frequency DistributionA listing of intervals of possible values for a variableTogether with a tabulation of the number of observations in each interval.Frequency DistributionSTA 291 Spring 2010 Lecture 26Murder RateFrequency0-2.953-5.9166-8.9129-11.91212-14.9415-17.9018-20.91>211Total51Conditions for intervalsEqual lengthMutually exclusiveAny observation can only fall into one intervalCollectively exhaustiveAll observations fall into an intervalRule of thumb:If you have n observations then the number of intervals should approximately

Frequency DistributionSTA 291 Spring 2010 Lecture 27

STA 291 Spring 2010 Lecture 28Relative FrequenciesRelative frequency for an intervalProportion of sample observations that fall in that intervalSometimes percentages are preferred to relative frequencies

Frequency, Relative Frequency, and Percentage DistributionSTA 291 Spring 2010 Lecture 29Murder RateFrequencyRelative FrequencyPercentage0-2.95.10103-5.916.31316-8.912.24249-11.912.242412-14.94.08815-17.900018-20.91.022>211.022Total511100STA 291 Spring 2010 Lecture 210Frequency DistributionsNotice that we had to group the observations into intervals because the variable is measured on a continuous scaleFor discrete data, grouping may not be necessaryExcept when there are many categoriesIntervals are sometimes called classesClass Cumulative FrequencyNumber of observations that fall in the class and in smaller classesClass Relative Cumulative FrequencyProportion of observations that fall in the class and in smaller classes

Frequency and Cumulative FrequencySTA 291 Spring 2010 Lecture 211Murder RateFrequencyRelative FrequencyCumulativeFrequencyRelativeCumulative Frequency0-2.95.105.103-5.916.3121.416-8.912.2433.659-11.912.2445.8912-14.94.0849.9715-17.90049.9718-20.91.0250.99>211.02511Total511511STA 291 Spring 2010 Lecture 212Histogram (Interval Data)Use the numbers from the frequency distribution to create a graphDraw a bar over each interval, the height of the bar represents the relative frequency for that intervalBars should be touchingEqually extend the width of the bar at the upper and lower limits so that the bars are touching.

STA 291 Spring 2010 Lecture 213Histogram

STA 291 Spring 2010 Lecture 214Histogram w/o DC

STA 291 Spring 2010 Lecture 215Bar Graph (Nominal/Ordinal Data)Histogram: for interval (quantitative) data Bar graph is almost the same, but for qualitative dataDifference: The bars are usually separated to emphasize that the variable is categorical rather than quantitativeFor nominal variables (no natural ordering), order the bars by frequency, except possibly for a category other that is always lastFirst StepCreate a frequency distributionPie Chart(Nominal/Ordinal Data)STA 291 Spring 2010 Lecture 216Highest Degree ObtainedFrequency(Number of Employees)Grade School15High School200Bachelors185Masters55Doctorate70Other25Total550Bar graphIf the data is ordinal, classes are presented in the natural orderingWe could display this data in a bar chartSTA 291 Spring 2010 Lecture 217Pie is divided into slicesArea of each slice is proportional to the frequency of each classPie ChartSTA 291 Spring 2010 Lecture 218

Pie Chart for Highest Degree AchievedSTA 291 Spring 2010 Lecture 21920Write the observations ordered from smallest to largestLooks like a histogram sidewaysContains more information than a histogram, because every single observation can be recoveredEach observation represented by a stem and leafStem = leading digit(s)Leaf = final digitStem and Leaf PlotSTA 291 Spring 2010 Lecture 22021Stem and Leaf PlotSTA 291 Spring 2010 Lecture 221 Stem Leaf # 20 3 1 19 18 17 16 15 14 13 135 3 12 7 1 11 334469 6 10 2234 4 9 08 2 8 03469 5 7 5 1 6 034689 6 5 0238 4 4 46 2 3 0144468999 10 2 039 3 1 67 2 ----+----+----+----+ 22Useful for small data setsLess than 100 observationsCan also be used to compare groupsBack-to-Back Stem and Leaf Plots, using the same stems for both groups.Murder Rate Data from U.S. and CanadaNote: it doesnt really matter whether the smallest stem is at top or bottom of the table

Stem and Leaf PlotSTA 291 Spring 2010 Lecture 22223Stem and Leaf PlotSTA 291 Spring 2010 Lecture 223PRESIDENTAGEPRESIDENTAGEPRESIDENTAGEWashington67Fillmore74Roosevelt60Adams90Pierce64Taft72Jefferson83Buchanan77Wilson67Madison85Lincoln56Harding57Monroe73Johnson66Coolidge60Adams80Grant63Hoover90Jackson78Hayes70Roosevelt63Van Buren79Garfield49Truman88Harrison68Arthur56Eisenhower78Tyler71Cleveland71Kennedy46Polk53Harrison67Johnson64Taylor65McKinley58Nixon81Reagan 93Ford 93StemLeaf24Discrete dataFrequency distributionContinuous dataGrouped frequency distributionSmall data setsStem and leaf plotInterval dataHistogramCategorical dataBar chartPie chart

Grouping intervals should be of same length, but may be dictated more by subject-matter considerations

Summary of Graphical and Tabular TechniquesSTA 291 Spring 2010 Lecture 2242425Present large data sets concisely and coherentlyCan replace a thousand words and still be clearly understood and comprehended Encourage the viewer to compare two or more variablesDo not replace substance by formDo not distort what the data reveal

Good GraphicsSTA 291 Spring 2010 Lecture 22526Dont have a scale on the axisHave a misleading caption Distort by using absolute values where relative/proportional values are more appropriateDistort by stretching/shrinking the vertical or horizontal axisUse bar charts with bars of unequal width

Bad GraphicsSTA 291 Spring 2010 Lecture 22627Frequency distributions and histograms exist for the population as well as for the samplePopulation distribution vs. sample distributionAs the sample size increases, the sample distribution looks more and more like the population distribution

Sample/Population DistributionSTA 291 Spring 2010 Lecture 22728The population distribution for a continuous variable is usually represented by a smooth curveLike a histogram that gets finer and finerSimilar to the idea of using smaller and smaller rectangles to calculate the area under a curve when learning how to integrateSymmetric distributionsBell-shapedU-shapedUniformNot symmetric distributions:Left-skewedRight-skewedSkewed

Population DistributionSTA 291 Spring 2010 Lecture 228 Symmetric

Right-skewed

Left-skewed

SkewnessSTA 291 Spring 2010 Lecture 229