64
Outline Populations, Samples, and Census Some Sampling Concepts Random Variables and Statistical Populations Basic Graphics for Data Visualization Proportions, Averages, Variances and Percentiles Lesson 1 Chapter 1: Basic Statistical Concepts Michael Akritas Department of Statistics The Pennsylvania State University Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Lesson 1

  • Upload
    rt2222

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Lesson 1Chapter 1: Basic Statistical Concepts

Michael Akritas

Department of StatisticsThe Pennsylvania State University

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 2: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

1 Populations, Samples, and Census

2 Some Sampling Concepts

3 Random Variables and Statistical Populations

4 Basic Graphics for Data Visualization

5 Proportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-

Averages: Population- and Sample-

Variance: Population- and Sample-

Sample Percentiles and the Box Plot

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 3: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Introduction to R

R is a GNU project. The GNU (recursive acronym for”GNU’s Not Unix”) project, sponsored by the Free SoftwareFoundation, was launched in 1984 to develop a completeUnix-like operating system which is free software.To find out about R go to http://www.R-project.org/ .See also the NY Times article http://www.nytimes.com/

2009/01/07/technology/business-computing/

07program.html?pagewanted=all

To download R go to http://cran.r-project.org/.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 4: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

You can start using R as a calculator: 2*4; 2**3; sqrt(16);sin(pi); cos(2*pi); log(exp(1)); log(10,base=10)Try some simple commands: 1:10, seq(1,10), seq(1,10,1),seq(2,10, 2). Also, rep(1,5), rep(”a”,5), rep(seq(1,4),2) orrep(1:4,2), c(rep(0,5),rep(1,7)).Can store the numbers in ”objects”: x=c(rep(0,5),rep(1,7)).x=seq(2,10,2); sum(x); mean(x). Try also x/2; x**2; sqrt(x)Define functions: f=function(x){x**2}. Try f(2); f(c(2,3))Integrate a function: integrate(f, 0, 3). Try alsog=function(x){x**(-2)}; integrate(g, 1, Inf)

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 5: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Why Statistics?

Example (Examples of Engineering/Scientific Studies)

Comparing the compressive strength of two or morecement mixtures.Comparing the effectiveness of three cleaning products inremoving four different types of stains.Predicting failure time on the basis of stress applied.Assessing the effectiveness of a new traffic regulatorymeasure in reducing the weekly rate of accidents.Testing a manufacturer’s claim regarding a product’squality.Studying the relation between salary increases andemployee productivity in a large corporation.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 6: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

These studies require Statistics due to the intrinsic variability:

The compressive strength of different preparations of thesame cement mixture will differ. The figure inhttp://sites.stat.psu.edu/˜mga/401/fig/HistComprStrCement.pdf shows 32 compressivestrength measurements (MegaPascal units), of testcylinders (6 in. diameter, 12 in. high), using water/cementratio of 0.4, measured on the 28th day after they are made.Under the same stress, two beams fail at different times.The proportion of defective items of a certain product willdiffer from batch to batch.

Intrinsic variability renders the objectives of the case studies, asstated, ambiguous.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 7: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

The objectives of the case studies can be made precise ifstated in terms of averages or means.

Comparing the average hardness of two different cementmixtures.Predicting the average failure time on the basis of stressapplied.Estimation of the average coefficient of thermal expansion.Estimation of the average proportion of defective items.

Moreover, because of variability, the words ”average” and”mean” have a technical meaning which can be made clearthrough the concepts of population and sample.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 8: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

DefinitionPopulation is a well-defined collection of objects or subjects, ofrelevance to a particular study, which are exposed to the sametreatment or method.

Population members are called units.The objective of a study is to investigate certaincharacteristic(s) of the units of the population(s) ofinterest.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 9: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Example (Populations and Unit Characteristics)All water samples taken from a lake. Characteristics:Mercury concentration; Concentration of other pollutants.All items of a certain manufactured product (that have, orwill be produced). Characteristic: Proportion of defectiveitems.All students enrolled in Big Ten universities during the2013-14 academic year. Characteristics: Favorite type ofmusic; Political affiliation.Two types of cleaning products. Characteristic: cleaningeffectiveness.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 10: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Populations consisting of the same type of units but differin the treatment, or method, applied to them are calledtreatment populations.

Example (Treatment Populations)The concentration of pollutants in water samples isanalyzed by two different labs. Water samples sent to Lab1 constitute population 1, and those sent to Lab 2constitute population 2.The time to failure of beams is studied under differentstress conditions. The beams subjected to each stresscondition constitute different populations.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 11: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Full (i.e. population-level) understanding of a characteristicrequires the examination of all population units, i.e. acensus.

For example, full understanding of the relation betweensalary and productivity of a corporation’s employeesrequires obtaining these two characteristics from allemployees.

However,taking a census can be time consuming and expensive:The 2000 U.S. Census costed $6.5 billion, while the 2010Census costed $13 billion.Moreover, census is not feasible if the population ishypothetical or conceptual, i.e. not all members areavailable for examination.

Because of the above, we typically settle for examining allunits in a sample, which is a subset of the population.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 12: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Due to the intrinsic variability, the sample properties/attributesof the characteristic of interest will differ from those of thepopulation. For example

The average mercury concentration in 25 water sampleswill differ from the overall mercury concentration in the lake.The proportion in a sample of 100 PSU students who favorthe use of solar energy will differ from the correspondingproportion of all PSU students.The relation between bear’s chest girth and weight in asample of 10 bears, will differ from the correspondingrelation in the entire population of 50 bears in a forestedregion.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 13: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

The GOOD NEWS is that, if the sample is suitably drawn, thensample properties approximate the population properties.

20 25 30 35 40 45 50 55

100

200

300

400

Chest Girth

Weight

Figure: Population and sample relationships between chest girth andweight of black bears.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 14: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Sampling Variability

Samples properties of the characteristic of interest alsodiffer from sample to sample. For example:

1 The number of US citizens, in a sample of size 20, whofavor expanding solar energy, will (most likely) be differentfrom the corresponding number in a different sample of 20US citizens.

2 The average mercury concentration in two sets of 25 watersamples drawn from a lake will differ.

The term sampling variability is used to describe suchdifferences in the characteristic of interest from sample tosample.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 15: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

20 25 30 35 40 45 50 55

100

200

300

400

Chest Girth

Weight

Figure: Illustration of Sampling Variability.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 16: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Population level properties/attributes of characteristic(s) ofinterest are called (population) parameters.

Examples of parameters include averages, proportions,percentiles, and correlation coefficient.

The corresponding sample properties/attributes ofcharacteristics are called statistics.Sample statistics approximate the correspondingpopulation parameters but are not equal to them.Statistical inference deals with the uncertainty issueswhich arise in approximating parameters by statistics.The tools of statistical inference include point and intervalestimation, hypothesis testing and prediction.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 17: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Example (Examples of Estimation, Hypothesis Testing andPrediction)

Estimation (point and interval) would be used in the task ofestimating the coefficient of thermal expansion of a metal,or the air pollution level.Hypothesis testing would be used for deciding whether totake corrective action to bring the air pollution level down,or whether a manufacturer’s claim regarding the quality ofa product is false.Prediction arises in cases where we would like to predictthe failure time on the basis of the stress applied, or theage of a tree on the basis of its trunk diameter.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 18: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

For valid statistical inference the sample must berepresentative of the population. For example, a sampleof PSU basketball players is not representative of PSUstudents, if the characteristic of interest is height.Typically it is hard to tell whether a sample isrepresentative of the population. So, we define a sample tobe representative if . . . (cyclical definition!!)

it allows for valid statistical inference.

The only guarantee for that comes from the method usedto select the sample (sampling method).The good news is that there are several sampling methodsguarantee representativeness.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 19: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

DefinitionA sample of size n is a simple random sample if the selectionprocess ensures that every sample of size n has equal chanceof being selected.

In simple random sampling every member of thepopulation has the same chance of being included in thesample. The reverse, however, is not true.

ExampleTo select a sample of 2 students from a population of 20 maleand 20 female students, one selects at random one male andone female students. Is this a s.r.s.? (Does every student havethe same chance of being included in the sample?)

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 20: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Another sampling method for obtaining a representative sampleis called stratified sampling.

DefinitionA stratified sample consists of simple random samples fromeach of a number of groups (which are non-overlapping andmake up the entire population) called strata.

Examples of strata include: ethnic groups, age groups, andproduction facilities.If the units in the different strata differ in terms of thecharacteristic under study, stratified sampling is preferableto s.r.s. For example, if different production facilities differin terms of the proportion of defective products, a stratifiedsample is preferable.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 21: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

How do we select a s.r.s. of size n from a population of Nunits?

STEP 1: Assign to each unit a number from 1 to N.STEP 2: Write each number on a slips of paper, place theN slips of paper in an urn, and shuffle them.STEP 3: Select n slips of paper at random, one at a time.

Alternatively, the entire process can be performed in softwarelike R. We will see this in the next lab session.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 22: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Sampling without replacement simply means that apopulation unit can be included in a sample at most once.For example, a simple random sample is obtained bysampling without replacement: Once a unit’s slip of paperis drawn, it is not placed back into the urn.Sampling with replacement means that after a unit’s slip ofpaper is chosen, it is put back in the urn. Thus apopulation unit could be included in the sample anywherebetween 0 and n times. Rolling a die can be thought of assampling with replacement from the numbers 1,2, . . . ,6.Though conceptually undesirable, sampling withreplacement is easier to work with from a mathematicalpoint of view.When a population is very large, sampling with and withoutreplacement are practically equivalent.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 23: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Non-representative samples arise whenever the samplingplan is such that a part, or parts, of the population ofinterest are either excluded from, or systematicallyunder-represented in, the sample. This is called selectionbias.Two examples of non-representative samples areself-selected and convenience samples.A self-selected sample often occurs when people areasked to send in their opinions in surveys orquestionnaires. For example, in a political survey, oftenthose who feel that things are running smoothly or whosupport an incumbent will (apathetically) not respond,whereas those activists who strongly desire change willvoice their opinions.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 24: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Convenience samples are made up from the most easilyaccessed units. For example, randomly selecting studentsfrom your classes will not result in a sample that isrepresentative of all PSU students since your classes aremostly comprised of students with the same major as you.

Example (The Literary Digest poll of 1936)The magazine had been extremely successful in predicting theresults in US presidential elections, but in 1936 it predicted a3-to-2 victory for Republican Alf Landon over the Democraticincumbent Franklin Delano Roosevelt. Worth noting is that thisprediction was based on 2.3 million responses (out of 10 millionquestionnaires sent). On the other hand Gallup correctlypredicted the outcome of that election by surveying only 50,000people.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 25: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Variable = a Numerical Characteristic

If the characteristic of interest can be measured expressed as anumber, e.g. thermal expansion of a metal, hardness ofcement, mercury concentration, or number of accidents it is arecalled quantitative.

Examples of non-quantitative characteristics are gender, makeof car, eye color, strength category, political affiliation. Suchcharacteristics are called categorical or qualitative.

Because statistical procedures are applied to numerical datasets, the categories in categorical characteristic are labeledwith arbitrarily chosen numbers (i.e. ’male’= −1, ’female’= +1).

A characteristic expressed as a number is called a variable.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 26: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Types of Variables

Qualitative variables are a particular kind of discretevariables. Quantitative variables can also be discrete.

All variables expressing counts, such as the number ofearthquakes, the number of fish caught etc, are discrete.

Quantitative variables expressing measurements on acontinuous scale are examples of continuous variables.

Measurements of length, strength, weight, or time to failureare examples of continuous variables.

When two or more characteristics are measured on eachpopulation unit, we have bivariate or multivariatevariables.

Example of bivariate: Salary increase and productivity.Example of multivariate: Age, income, education level.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 27: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Random Variables

When a unit is randomly sampled from a population, thevalue of its variable will be denoted by X (or Y, or Z, etc).Because of the intrinsic variability, X is not known a-prioriand thus it is called a random variable (r.v.).The population from which a random variable is drawn iscalled the underlying population of the r.v.The collection of of the variable values of all populationunits is called the statistical population.The statistical population of a r.v. is NOT the same as theset of values a variable can take.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 28: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Example1 A list of the weight of every PSU student is the statistical

population of the r.v. weight.2 A list of 1s and 0s representing every student’s opinion on

whether solar energy should be expanded is the statisticalpopulation of the r.v. expressing opinion on solar energy.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 29: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Sampling from the Statistical Population

It should be intuitively clear that taking a sample of n units formsome population and recording the variable of each sampledunit, is equivalent to taking a sample of n units from thestatistical population of the random variable and its underlyingpopulation.

Henceforth, the word sample will also be used to denote asample from the statistical population. Such a sample

1 Consists of units of the statistical population i.e. numbers.2 The numbers are not known a-priori, so they are rv’s.3 A sample of size n will be denoted by X1,X2, . . . ,Xn.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 30: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Histograms and Stem and Leaf Plots

In histograms the range of the data is divided into bins, anda box is constructed above each bin.The height of each box is the bin’s frequency. Alternatively,the heights can be adjusted so the histogram’s area is one.R will automatically choose the number of bins but it alsoallows user specified intervals. Moreover, R offers theoption of constructing a smooth histogram.In stem and leaf plots each observation gets split into itsstem, which is the beginning digit(s), and its leaf, which isthe first of the remaining digits.They retain more information about the original data but donot offer as much flexibility in selecting the bins.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 31: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

The R data set faithful

The histogram, with superimposed smooth histogram, for asample of 272 eruption durations from the Old Faithfulgeyser is shown in http://stat.psu.edu/˜mga/401/fig/HistOldFaith1.pdf

The stem and leaf plot for the same data set is shown inhttp://stat.psu.edu/˜mga/401/fig/StemLeaf.pdf

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 32: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Scatterplots

The basic scatterplot is useful for exploring therelationship between two variables. An enhance versionidentifies subclasses of data. See http://stat.psu.edu/˜mga/401/fig/BearsChG_W_by_S.pdf

A scatterplot matrix is a matrix of scatterplots for all pairsof variables in a data set. See http://stat.psu.edu/

˜mga/401/fig/BearMeas_by_S.pdf. It helps identifythe best single predictor of weight.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 33: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Scatterplots Continued

Scatterplots with marginal histograms showshistograms of the two variables in the margins of thescatterplot. See http://stat.psu.edu/˜mga/401/fig/BearMeas_with_MarginalHist.pdf

3D Scatterplots are useful for exploring the relationshipbetween three variables. For example, http://stat.psu.edu/˜mga/401/fig/TempProdElect2.pdf givesa three dimensional view of the joint effect of temperatureand production volume on electricity consumed.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 34: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Pie Charts and Bar Graphs

Pie charts and bar graphs are used with count data todisplay the proportion of each category in a sample.The pie chart is popular in the mass media and one of themost widely used statistical charts in the business world.It is a circular chart, where the circle is divided intosections whose areas represent proportions.The pie chart in http://www.stat.psu.edu/˜mga/401/fig/LvMsPie.pdfdisplays information on the November, 2011 light vehiclemarket share of car companies (source:http://wardsauto.com/keydata/USSalesSummary0702.xls).

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 35: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

According to Steven’s power law bar lengths is better thansection areas for comparing the different proportions.Bar graphs resemble histograms with the heights of thebars equal to the proportion of each category. The bargraph display of the November 2011 light vehicle marketshare data is shown in http://stat.psu.edu/˜mga/401/fig/LvMsBar2.pdf.

Remark: When the heights of the bars are arranged in a decreasingorder, the bar graph is also called Pareto chart. The Pareto chart isone of the key tools used in quality control, where it is often used torepresent the most common sources of defects in a manufacturingprocess, or the most frequent reasons for customer complaints, etc.[Google Pareto principle]

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 36: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

The Most Common Parameters

For a univariate statistical population these are:

The proportion. For example, the proportion of HondaAccords that will require warranty repair work in 36,000miles.The average. For example, the average failure time at agiven stress level.The variance and standard deviation. These parametersquantify the intrinsic variability.The median and other percentiles. Can be used to quantifyboth location and variability.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 37: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Outline1 Populations, Samples, and Census

2 Some Sampling Concepts

3 Random Variables and Statistical Populations

4 Basic Graphics for Data Visualization

5 Proportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-

Averages: Population- and Sample-

Variance: Population- and Sample-

Sample Percentiles and the Box Plot

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 38: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Proportions are relevant whenever the variable of interest iscategorical, or has been categorized.

Definition1 If the population has N units, and Ni units are in category i ,

then the population proportion for category i , is

pi =#{population units of category i}

#{population units}=

Ni

N.

2 If a sample of size n is taken, and ni sample units are incategory i , then the sample proportion for category i , is

p̂i =#{sample units of category i}

#{sample units}=

ni

n.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 39: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Example1 In a sample of 1000 adults, 72% favor tougher penalties for

drunk driving. Is the correct notation for 0.72 p or p̂?2 In a population of 80 engineering majors taking a required

statistics class, 40 are enthusiastic about having computerlabs. If a s.r. sample of 20 from these students 8 areenthusiastic. What is the correct notation for 40/80 = 0.5and for 8/20 = 2/5?

Always remember that, under s.r. sampling, p̂approximates, but in general is different from p.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 40: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Outline1 Populations, Samples, and Census

2 Some Sampling Concepts

3 Random Variables and Statistical Populations

4 Basic Graphics for Data Visualization

5 Proportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-

Averages: Population- and Sample-

Variance: Population- and Sample-

Sample Percentiles and the Box Plot

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 41: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Consider a population of N units, and let v1, v2, . . . , vN denotethe statistical population corresponding to some variable. Thenthe population average or population mean, denoted by µ, isthe arithmetic average of all values in the statistical population.Thus,

µ =1N

N∑i=1

vi .

If the random variable X denotes the value of the variable of arandomly selected population unit, then a synonymousterminology for the population mean is expected value of X , ormean value of X , and is denoted by µX or E(X ).

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 42: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

ExampleIn a population of 500 tin plates, the number of plates with 0, 1and 2 scratches is N0 = 190, N1 = 160 and N2 = 150. Thus, inthe statistical population v1, . . . , v500, 190 vi equal 0, 160 equal1, and 150 equal 2. The population mean is

µ =1

500

500∑i=1

vi =0× N0

500+

1× N1

500+

2× N2

500= 0.92

If a tin plate is selected at random and X is the rv denoting thenumber of scratches, the mean value of X is 0.92 and we writeµX = 0.92, or E(X ) = 0.92.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 43: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

If a sample of size n is taken, and x1, x2, . . . , xn denote thevariable values of the sample units, then the sample averageor sample mean, denoted by x , is

x =1n

n∑i=1

xi

Under s.r. sampling, a sample mean approximates, but ingeneral is different from the population mean.

ExampleIf a s.r. sample of n = 100 is taken from the 500 tin plates, itcould be that there are n0 = 40, n1 = 34 and n2 = 26 plateswith 0, 1 and 2 scratches. In this case, x = 0.86.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 44: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Proportions are Averages!

A proportion is a special case of a mean. To see this:

Consider the example with the tin plates, where N1 = 160out of N = 500 have one scratch, and let the variable Xtake the value 1 if a tin plate has one scratch and the value0 otherwise.Note that for the statistical population, v1, . . . , v500, of thisvariable, 160 vi are equal to 1 and 340 are equal to 0.Thus,

µX =160500

= 0.32, which equals p =N1

N.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 45: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Outline1 Populations, Samples, and Census

2 Some Sampling Concepts

3 Random Variables and Statistical Populations

4 Basic Graphics for Data Visualization

5 Proportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-

Averages: Population- and Sample-

Variance: Population- and Sample-

Sample Percentiles and the Box Plot

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 46: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Let v1, v2, . . . , vN be a statistical population with mean µ.

Definition

The population variance, σ2, is defined as

σ2 =1N

N∑i=1

(vi − µ)2.

The standard deviation is the positive square root of thevariance: σ =

√σ2.

If the rv X denotes a randomly selected value from thestatistical population, then a synonymous terminology for thepopulation variance is variance of X , and is denoted by σ2

X , or

Var(X ). The standard deviation of X is σX =√σ2

X .

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 47: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

A simpler computational formula for the variance is

σ2 =1N

N∑i=1

v2i − µ

2

.ExampleConsider the tin plate example, so the statistical populationv1, . . . , v500, has 190 vi equal 0, 160 equal 1, 150 equal 2, andµ = 0.92. Then,

σ2 =190× 0

500+

1× 160500

+4× 150

500− 0.922 = 0.6736.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 48: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

If x1, x2, . . . , xn denotes a sample from the statistical population,the sample variance and its computational formula are:

S2 =1

n − 1

n∑i=1

(xi − x)2 =1

n − 1

[ n∑i=1

x2i −

1n

( n∑i=1

xi

)2].

The sample standard deviation is S =√

S2. Under s.r.sampling, S2 approximates, but in general is different from σ2.

ExampleConsider the s.r. sample of n = 100 tin plates, which has 40, 34and 26 plates with 0, 1 and 2 scratches. Then,

S2 =1

99[138− 73.96] = 0.647

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 49: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Why Divide by n − 1?

Because this assures that the average of the sample variancesresulting from all possible samples is equal to the populationaverage.

Example

The variance of the population {0,1}, which corresponds totossing a fair coin, is 0.25 (why?). The possible samples of sizetwo, taken with replacement, are {0,0}, {0,1}, {1,0}, {1,1}.Verify that the four sample variances average to 0.25.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 50: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Outline1 Populations, Samples, and Census

2 Some Sampling Concepts

3 Random Variables and Statistical Populations

4 Basic Graphics for Data Visualization

5 Proportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-

Averages: Population- and Sample-

Variance: Population- and Sample-

Sample Percentiles and the Box Plot

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 51: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Roughly speaking, the (1− α)100th sample percentileseparates the part having the (1− α)100% smaller values,from that which has the α100% larger values. Thus:

The 90th sample percentile separates the largest 10% fromthe lower 90% values in the data set.

The 50th sample percentile is also called the samplemedian. The 25th, the 50th and the 75th samplepercentiles are also called sample quartiles. The 25thand 75th percentiles are the lower quartile and upperquartile, respectively.The distance between the lower and upper quartiles iscalled the interquartile range or IQR.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 52: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Order Statistics as Sample Percentiles

Let X1, . . . ,Xn be a s.r. sample from a continuousdistribution. The ordered sample values are denoted

X(1),X(2), . . . ,X(n) .

Thus, X(1) < X(2) < · · · < X(n).X(i), the i th smallest sample value, is defined to be the[

100 i−0.5n

]-th sample percentile.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 53: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Example

A s.r.s. of 10 black bears’ weights is: 154 158 356 446 40 15490 94 150 142. Give the order statistics, and state thepopulation percentiles they estimate.Solution: The R command

sort( c(154, 158, 356, 446, 40, 154, 90, 94, 150, 142) )returns the order statistics: 40, 90, 94, 142, 150, 154, 154, 158,356, 446. These order statistics estimate the5th, 15th, 25th, 35th, 45th, 55th, 65th, 75th, 85th and 95thpopulation percentiles, respectively. For example, X(3) = 94 isthe 100(3− 0.5)/10 = 25th percentile and estimates thecorresponding population percentile. [In R the percentiles areobtained with: ”100*(1:10 - 0.5)/10”.]

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 54: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

In the above example none of the order statistics correspondsto the median or the 90th percentile. In general, if n is even,none of the order statistics corresponds to the median. Forexample,

If n = 5 then X(3), the 3rd smallest value, is the1002.5

5 = 50th sample percentile or median.If n = 4 then

X(2) is the 100 1.54 = 37.5th sample percentile,

while X(3) is the 100 2.54 = 62.5th sample percentile.

Thus, none of the ordered values is the median.

Depending on n, the above definition may not identify otherpercentiles of interest. In such cases, we use interpolations.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 55: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Percentiles in R

R uses a different interpolation algorithm for evaluating samplepercentiles from a given data set. With the data set in theobject x, the commands

median(x)quantile(x,0.25)quantile(x,c(0.3,0.7,0.9))summary(x)

R commandsfor percentiles

give, respectively, the median, the 25th percentile, the 30th,70th and 90th percentiles, and a five number summary of thedata consisting of x(1), q1, x̃ , q3, and x(n).

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 56: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Example

Using the previous sample of 10 black bear weights, estimate thepopulation median, 70th, 80th and 90th percentiles.Solution: With the sample values in the object w, i.e.w=c(154, 158, 356, 446, 40, 154, 90, 94, 150, 142)the R commandquantile(w,c(0.5, 0.7, 0.8, 0.9))returns 152.0, 155.2, 197.6, 365.0 for the sample median, 70th, 80thand 90th percentiles, respectively.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 57: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

The five number summary of the data given by the”summary(x)” command in R is the basis for the boxplot.A boxplot displays the central 50% of the data with a box,

the lower and upper edges are at q1 and q3, respectively,a line inside the box represents the median.

The lower 25% and upper 25% of the data are representedby lines (or whiskers) which extend from each edge of thebox.

The lower (upper) whisker extends from q1 (q3) until thesmallest (largest) observation within 1.5 interquartileranges from q1 (q3).Observations further from the box than the whisker ends(i.e. smaller than q1 − 1.5× IQR or larger thanq3 + 1.5× IQR) are called outliers, and are plottedindividually.

See http://sites.stat.psu.edu/˜mga/401/fig/BoxplotOzoneR.pdf

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 58: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Example

Scientists have been monitoring the ozone hole since 1980.See the images shown in http://ozonewatch.gsfc.nasa.gov/ The14 Ozone measurements (Dobson units) given inhttp://stat.psu.edu/∼mga/401/Data/OzoneData.txt. are taken in2002 from the lower stratosphere, between 9 and 12 milesaltitude. Give the five number summary of this data andconstruct the box plot.Solution: Read the data in the R object oz using

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 59: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

oz=read.table(”http://stat.psu.edu/∼mga/401/Data/OzoneData.txt”,header =T)

Then, use the command

summary(oz) (or summary(oz$OzoneData)) to get the fivenumber summary of this data. For the boxplot use

boxplot(oz, col=”grey”), or boxplot(oz$OzoneData, col=”grey”).

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 60: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Hand Calculation of Sample Median

Definition

Let X(1),X(2), . . . ,X(n) denote the ordered sample values in asample of size n. The sample median is defined as

X̃ =

X( n+1

2 ), if n is odd

X( n2)

+ X( n2+1)

2, if n is even

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 61: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Example (Relation Between X̃ and X )Find the sample median of X1 = 2.3, X2 = 3.2, X3 = 1.8,X4 = 2.5, X5 = 2.7.Solution. We first order the values from smallest to largest:

X(1) = 1.8, X(2) = 2.3, X(3) = 2.5, X(4) = 2.7, X(5) = 3.2.

Since sample size is odd, X̃ = X( n+12 ) = X(3) = 2.5.

For this data, X = X̃ = 2.5.If X(5) is changed to 4.2, then X = 2.7 but X̃ = 2.5. Thus Xis affected by outliers, where as X̃ is not.In general, if the histogram of the data is positively skewedX > X̃ , and if it is negatively skewed X < X̃ .

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 62: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Hand Calculation of Sample Quartiles and SampleIQR

DefinitionThe sample lower quartile or SLQ is defined as

the median of the smallest n/2 values, if n is eventhe median the smallest (n + 1)/2 values, if n is odd

The sample upper quartile or SUQ is defined asthe median of the largest n/2 values, if n is eventhe median the largest (n + 1)/2 values, if n is odd

The sample interquartile range, or SIQR, defined as

SIQR = SUQ − SLQ

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 63: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

ExampleFind the lower and upper quartiles of the n = 9 observations9.39, 7.04, 7.17, 13.28, 9.00, 7.46, 21.06, 15.19, 7.50.Solution. Since n is odd, the SLQ is the median of the

Smallest 5(= (n + 1)/2) values: 7.04, 7.17, 7.46, 7.50, 9.00

and the SUQ is the median of the

Largest 5(= (n + 1)/2) values: 9.00, 9.39, 13.28, 15.19, 21.06.

Thus SLQ = 7.46, and SUQ = 13.28.

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts

Page 64: Lesson 1

OutlinePopulations, Samples, and Census

Some Sampling ConceptsRandom Variables and Statistical Populations

Basic Graphics for Data VisualizationProportions, Averages, Variances and Percentiles

Proportions: Population- and Sample-Averages: Population- and Sample-Variance: Population- and Sample-Sample Percentiles and the Box Plot

Go to next lesson http://stat.psu.edu/˜mga/401/course.info/lesson2.pdf

Go to the Stat 401 home pagehttp://stat.psu.edu/˜mga/401/course.info/

Michael Akritas Lesson 1 Chapter 1: Basic Statistical Concepts