Class Schedule Class next week Holiday the next week (Monday schedule) Exam the next week So only...

Preview:

Citation preview

Class Schedule

• Class next week

• Holiday the next week (Monday schedule)

• Exam the next week

• So only one more class before the first exam

• Will cover Chapters 6, 3, 2. Only material included in class as shown in Course Outline.

Class 3

• Return homewk from first class• Get soc sec # from 2 students• Review Assignment 2• Collect Assignment 2• Quiz

• Continue Ungrouped Data• Start Grouped Data

Very Important Hint for Excel

• I will be putting models for calculation on the web site.

• You will see the values, not the formulas

• BUT, you can change to view the formulas

• Hold down control and hit the accent key.

• It’s the same key, upper left of the keyboard, that we use for the tilde ~.

Copying Formulas

• To copy to the next cells, highlight the cell and drag the little box in the lower right of the cell.

• Remember that the cells to be used in the formula will be adjusted

• To copy the value in the cell not the location, use a $ before the column letter or before the row number or before each.

Formatting Excel

• To make a column wide enough for the label or values, double-click on the right border

• To highlight cells adjacent to each other, click on one, hold down shift key and move to the other cells.

• To underline a cell, not just the letters in it, use the little box on the toolbar

• To merge several cells and center the label, use the little box with 2 arrows

Excel Doing All the Work

• You now know the summation sign on the toolbar. Click on the little arrow next to it. We will find lots of “functions” there.

• Find the Mean. Click on an empty cell, probably the one just under the mean you already calculated. Click on the functions arrow.

• Click Average & highlight the column of ages, Enter.

• You should have the same answer.

To Calculate the Variance

• Put ages in a column & get the sum

• Make a labeled cell for n

• Make a labeled cell for n-1

• Put label “Mean” & under it put

=, click on the cell with sum, then /, click on cell with value for n, Enter.

Deviations from the Mean• Go to the column next to the values, label it

X-X• In the top cell, put = ,click on the cell with the

first value, put minus sign, put in the row and column for the mean. This last value needs dollar signs, e.g. $b$4.

Hit enter.• Highlight that cell and pull down the corner

box. • When all cells are filled, go to below the

column and get the autosum.

Take the squares of the dev’ns

• Label the next column “squares”

• In the top cell, put =, click on the top deviation, put *, click on the top dev’n again.

• Pull down to all cells in the column

• Take an autosum.

Variance

• The sum of the deviations squared, divided by

n-1is the variance

• So in an appropriate cell, put =, highlight the cell with the sum, /, highlight the n-1 cell

• Put the label “variance” or s2 next to it.

Variance• Click cell under the variance you

calculated• Click Autofunction (the little arrow by the

summation sign)• Click More Functions & In select a

category, choose statistical• Go down to VAR• Next to the Number 1 box, click on the

little graph, highlight the ages data, Enter, Enter.

• Should have same answer.

Standard Deviation

• May be the most widely used measure of dispersion

• Related to the Variance

• It’s just the square root of the variance.

Root-Mean-Square

• Descriptive name for the St Dev

• (∑(Xn-X)2/n-1)-1/2

• We used the square to get rid of neg signs in the sum but then we go back to the same range by taking a square root later.

• Go back to excel and get the standard deviation by taking square root and by letting excel do it directly from data

Standard Deviation

• Find a suitable empty cell. • Go to autofunctions and choose math & trig

functions and click on SQRT.• Highlight the variance so that it will be used by

the function.• Put the label st. dev., or s, in an empty cell

above or next to your answer

Autofunction Standard Deviation

• Choose another appropriate empty cell

• Label it

• In the empty cell, go to autofunction, choose statistical, click on STDEV.

• Choose the range of the original data.

• Should get same answer as the calculated one.

To Compare s from 2 Samples

• Use Coefficient of Variation

• s/X times 100. Answer is a percentage.

• Why use CV?

CV

• Removes differences due to units of measurement

• We might want to know whether serum cholesterol levels, measured in mg/100ml,are more variable than body weight, measured in pounds

Another Advantage of CV• Variance or st.dev. if applied to 2 groups

with very different means may give misleading idea of variability

• Wts of 11year-old boys compared to weights of 25 yr-olds11’s, X = 80, s = 1025’s, X = 145, s = 10

Are they equally variable?• CV’s = 12.5% & 6.9%• Young boys wt more variable

Finding the Quartiles

• Have to find locations for Q3 and Q1

• For Q3 loc, find n*3/4

• If it’s not an integer, take the next higher whole number.

• Find it in the array. That is Q3

• For Q1 loc, find n/4. If it’s not an integer, take the next higher whole number.

• Find it in the array. That is Q1

Interquartile Range

• Q3 – Q1

• Use autofunction to check your answer.

Percentiles

• Location = n * p/100

• e.g., the 90th percentile is located at

n * 90/100. If its not an integer, go to the next higher integer.

• Pick that value from the array

• 10% of the values will be higher than P90

and 90% of the values will be lower.

Special Problem

• Biostatistics books give these directions for finding the percentiles

• However, excel autofunction for percentile uses an interpolation method

• Our answers will not agree

End of Ungrouped Descriptives

• Continue these topics using Grouped Data

Grouped Data

Grouped Data

We do this all the time

Change in my pocket

3 dimes, 2 nickels, 4 quarters, 6 pennies

(3X10) + 2(5) + (4X25) + (6X1)

Sum = $1.46

Group Mean or Weighted Mean

Average Age of Dr. Jones’ Patients

12, 14, 15, 14, 8, 7, 8, 9, 14, 12

n = 10

∑ = (2X12) + (3X14) + (2X8) + 7 + 9 + 15

X = (24 + 42 + 16 + 31)/10

= 113/10

= 11.3

Frequency

• No. of times each value occurs is called its frequency

• Instead of n, use ∑fi

Grouped Variance

• Use exactly the same concept as for Grouped Mean

Types of Data

• Before we continue with grouping, let’s go back and look at types of data

• Ch. 2

Types of Values

• Nominal

• Ordinal

• Ranked

• Discrete

• Continuous

Nominal

• Comes from “name”• Qualitative • Classes: diabetes, asthma …• Types or categories, may be coded in

numbers: diabetes =1, asthma = 2 …• May be completely qualitative or semi-

quantitative, e.g over 45, over 65 … • In statistics, use the frequency of

occurrence of each class

Ordinal

• When classes progress whether in ascending order or descending order

• Staging of Cancer

Stage 1 is least serious. Each more serious. Stage 2, 3 to Stage 4

• An order exists but there is no intrinsic magnitude, e.g. the difference between Stage 2 and Stage 3 may be much greater than the difference between Stage 3 and Stage 4

Ranked

• Very similar to ordinal data

• Difference is that the order depends on a quantitative difference

• e.g. Rank students according to highest average

• Rank diseases according to number of deaths caused by each

• But still, differences between the sequential ranks may not be equal

Quantitative Data

• Has intrinsic magnitude

• A count or a weight

• Types: Discrete & Continuous

• Types: Interval & Ratio

Discrete vs Continuous

Discrete• Eggs. We quantify them by counting.• Number of births. Cannot be a fraction.

Continuous• Butter. We quantify it by weight.• Can be fractional• We can make differences infinitesimally

small

Continuous

Can make intervals infinitesimally small

Height: 5 feet 6 inches, or 5 ft. 6 1/4 in. or 66.21 inches

Ages: Even if given in years, it is a continuous variable. 10 yrs or 10 ½ yrs

or 122 months or even months, days, etc.

Interval Data

• Differences are equal

• Temperature

Difference between 20 and 30 degrees

equals difference between 30 and 40

• These do not have a real zero

Interval vs Ratio Data

• This distinction not made in your text.Interval

• Temperature. The scale we use does not have a true zero. Intervals are equal but ratios are not. 400 is not twice as warm as 200.

Ratio• Weight or height. They do have a true

zero. 4 inches is twice as long as 2 inches.

Data Presentation

• Start with Frequency Tables.

• Next class, graphs.

Frequency Distribution

• Group values within intervals

• All intervals of equal size

• Intervals contiguous, not overlapping

• Not too many groups, not too few

Look at a Frequency Distribution

• Pagano page 12, Table 2.6

• How many intervals?

• Does it look like a good number?

• Look at the distribution of data. Is it clear?

Interval Sizes

• Table 2.6 of Pagano• Look at left hand column. LL of first

interval is • 80• LL of second interval is• 120• What is the interval size• 120-80

Class Limits

• We can call the groups “classes” as well as “intervals”.

• Why doesn’t the first class go from 80 to 120?

• Would be ambiguous. Where would we put levels of 120, in the first class or the second?

Using a Frequency Distribution

• Can we use it to calculate measures of central tendency?

• Measures of dispersion?

• Yes to both questions, of course.

• But how, we no longer have the actual values.

• Usually, use midpoints of each class

Midpoint of Intervals

• Use midpoint as though it were the actual value for the individuals in the interval

• Calculate midpoint: (LL + UL)/2

• (80 + 119)/2 = 99.5

• For each of the next classes, just add 40 to 99.5.

Write out Mid-pts

• Or even re-write the table using mid-pts

• 99.5, 139.5, 179.5, 219.5, 259.5, 299.5, 339.5, 379.5

• Use this to find the mean

Mean from F.D.

• Mid-pt., mi, times frequency, fi, within that class

• Take the sum

• Also take the sum of the frequencies.

• Mean = ∑ fi, mi, / ∑ fi,

• Before we do an example, let’s take a look at calculating variance or st. dev.

Variance from Grouped Data

• From ungrouped data ∑(Xn-X)2/n-1.

Review this. Does it make sense?

• For grouped data (∑fi(mi-X)2/(∑fi-1)

• What did we do?

• Substitute m for X and the sum of fi for n.

How get Standard Deviation?

• Just take square root of the variance.

Other Measures

• Mode very easy, just the class with highest frequency.

• Range, just highest limit minus lowest limit

• Median and quartiles, I think they’re not in your book for grouped data. If we don’t come across it, will omit.

Calculating from Frequency Distributions

• Go to Course Outline on Internet.

• Click on Calculations from Grouped.

End of Class 3

• Hope we get this far.

• Check assignments if we do not get this far in Class 3.

• You only have to do the topics covered in class.

Get a Data Set

• Later we will use the cd from back of book

• For now, go to our course outline, lect 3, data set 1, death rates. Click & open.

• Save onto your computer hard drive. Put it into a place you can find again.

• At home, do this also.

• In fact, save under two names: one call death_crude”, other call “death_freqdist”

Making the Frequency Distribution

• Make another copy of the data. Highlight both columns & pull the copy square.

• Insert a column between the raw data and the new data.

• Make an array. Highlight the column of data. Click on the Sort Ascending & when it asks about the adjoining data, click to include it.

Making a Frequency Table

• Have to get data. Take cd from back of book.

• Put in cd slot. If it doesn’t open automatically, go to My Computer & choose that drive.

Recommended