11
Part One: Descriptive Statistics The dataset has 747 observations on 19 variables. The part one of the assignment focuses on three variables namely: 1) Age of respondents 2) Total family income in quartile 3) RS highest degree Statistics Age of Responden t Total Family Income in quartiles RS Highest Degree N Valid 744 747 747 Missi ng 3 0 0 Let's look at the data and start with descriptive statistics. Valid N (list wise) - This is the number of non-missing values. Descriptive Statistics N Mean Std. Deviation Varianc e Age of Respondent 744 40.41 11.158 124.511 RS Highest Degree 747 1.79 1.200 1.440 Total Family Income in quartiles 747 2.57 1.144 1.309 Valid N (list wise) 744

Descriptive Statistics

Embed Size (px)

DESCRIPTION

Descriptive Statistics & Counting Responses and Graphs

Citation preview

Part One: Descriptive StatisticsThe dataset has 747 observations on 19 variables. The part one of the assignment focuses on three variables namely: 1) Age of respondents2) Total family income in quartile3) RS highest degreeStatistics

Age of RespondentTotal Family Income in quartilesRS Highest Degree

NValid744747747

Missing300

Descriptive Statistics

NMeanStd. DeviationVariance

Age of Respondent74440.4111.158124.511

RS Highest Degree7471.791.2001.440

Total Family Income in quartiles7472.571.1441.309

Valid N (list wise)744

Let's look at the data and start with descriptive statistics.

Valid N (list wise) - This is the number of non-missing values.N - This is the number of valid observations for the variable. The total number of observations is the sum of N and the number of missing values.Mean - This is the arithmetic mean across the observations. It is the most widely used measure of central tendency. It is commonly called the average. The mean is sensitive to extremely large or small values. Std. - Standard deviation is the square root of the variance. It measures the spread of a set of observations. The larger the standard deviation is, the more spread out the observations are.Variance - The variance is a measure of variability. It is the sum of the squared distances of data value from the mean divided by the variance divisor. The Corrected SS is the sum of squared distances of data value from the mean. In the above sample the mean of Age respondents, RS Highest Degree and Total family income in quartiles are measured as 40.1, 1.79 and 2.57 respectively.Std. Deviation for the above three variables are 11.158, 1.200, 1.144 And also the variance for these variables are calculated as 124.511, 1.440 and 1.309 respectively.The frequency distribution for each variable is displayed below:Age of Respondent

FrequencyPercentValid PercentCumulative Percent

Valid191.1.1.1

205.7.7.8

214.5.51.3

223.4.41.7

23141.91.93.6

24152.02.05.6

25202.72.78.3

26162.12.210.5

27141.91.912.4

28253.33.415.7

29202.72.718.4

30202.72.721.1

31233.13.124.2

32162.12.226.3

33222.93.029.3

34182.42.431.7

35324.34.336.0

36283.73.839.8

37243.23.243.0

38324.34.347.3

39283.73.851.1

40233.13.154.2

41233.13.157.3

42202.72.759.9

43273.63.663.6

44172.32.365.9

45233.13.169.0

46233.13.172.0

47253.33.475.4

48101.31.376.7

49192.52.679.3

50152.02.081.3

51212.82.884.1

52101.31.385.5

5391.21.286.7

547.9.987.6

55152.02.089.7

567.9.990.6

57121.61.692.2

58131.71.794.0

595.7.794.6

607.9.995.6

615.7.796.2

626.8.897.0

636.8.897.8

641.1.198.0

653.4.498.4

661.1.198.5

671.1.198.7

681.1.198.8

722.3.399.1

731.1.199.2

741.1.199.3

751.1.199.5

772.3.399.7

781.1.199.9

821.1.1100.0

Total74499.6100.0

MissingNA3.4

Total747100.0

Total Family Income in quartiles

FrequencyPercentValid PercentCumulative Percent

Valid24,999 or less17423.323.323.3

25,000 to 39,99919426.026.049.3

40,000 to 59,99915620.920.970.1

60,000 or more22329.929.9100.0

Total747100.0100.0

RS Highest Degree

FrequencyPercentValid PercentCumulative Percent

ValidLess than HS527.07.07.0

High school39052.252.259.2

Junior college547.27.266.4

Bachelor16522.122.188.5

Graduate8611.511.5100.0

Total747100.0100.0

BAR CHART:A bar chart is helpful in graphically describing (visualizing) your data. It will often be used in addition to inferential statistics.

From this data interpreted by bar chart, we can observe that people of age group 35 and 38 are high in number contributing 4.3% each of them and also we can observe that age group of people from 59 to 82 are less in number compared to other age group of people.

From this bar chart we can observe that more percent of the respondents are from high school with 52.2% and the least from less than high school level and junior college contributing 7% and 7.2% each.

From this chart it can be observed that there are 29.9% i.e. 233 families whose income is more than 60,000. And there are 20.9% i.e. 156 families whose income is between 40,000 to 59,999.

Part Two: Counting Responses and Graphs

1. Create Frequency table of importance of high income (jobinc). What are the percents of "Very important" and "Fourth"?

Statistics

Importance of HIGH INCOME

NValid497

Missing250

Importance of HIGH INCOME

FrequencyPercentValid PercentCumulative Percent

ValidMOST IMPT10113.520.320.3

SECOND14519.429.249.5

THIRD15120.230.479.9

FOURTH8110.816.396.2

FIFTH192.53.8100.0

Total49766.5100.0

MissingNAP23731.7

NA131.7

Total25033.5

Total747100.0

From this data it can be inferred that, 13.5% i.e. 101 people think high income is most important and 10.8% i.e. 19 people think it is fourth most.

2. Create Frequency table of level of education (RS Highest Degree), then create Pie chart and Bar chart. What level has the highest percent?

Statistics

RS Highest Degree

NValid747

Missing0

RS Highest Degree

FrequencyPercentValid PercentCumulative Percent

ValidLess than HS527.07.07.0

High school39052.252.259.2

Junior college547.27.266.4

Bachelor16522.122.188.5

Graduate8611.511.5100.0

Total747100.0100.0

On comparing the data from pie chart and bar chart, it is observed that high school has the highest with 52.2%.

3. Create a Histogram of ages of respondents. What is the mean? And Standard Deviation? (See answers from the histogram)

From the histogram it is observed that mean and Std. Deviation are 40.41 and 11.158 respectively.