Unit 1: Introduction to data Lecture 2: Exploratory data ...nmd16/courses/Summer15/sta101.001-1/slides/(1) Unit 1/Lec 2...Unit 1: Introduction to data Lecture 2: Exploratory data analysis

Unit 1: Introduction to dataLecture 2: Exploratory data analysis

Statistics 101

Nicole Dalzell

May 14, 2015

Warm-Up and Data Basics

Announcements

Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 2 / 1


Review

Example Study:A researcher is interested in whether or not cats will choose to sleepless if they have toys to entertain themselves. She divides 250 cats(adults and kittens) into two rooms, with adult cats in one room andbaby kittens in the other room. Within each room she erects a fence,randomly placing half the cats (or kittens) on each side of the fence.On one side of the fence she scatters a variety of cat toys. For 1 day,the researcher records the number of hours each cat spendssleeping.

What is the research question?

What are the explanatory and response variables?

Is this an Experimental or Observational study?

What are the controls and treatments?

Is blocking employed in this study?



Types of Variables Example

Still our cat example:

Cat Age Toys # of Naps Weight (lbs)1 adult 1 3 82 juvenile 1 5 93 adult 0 2 10.54 adult 1 8 12.25...

......

......

250 adult 0 5 11.67

What types of variables are these:

Age?

Toys?

# of Naps?

Weight?


Warm-Up and Data Basics Sampling Methods

Population to sample

It is usually not feasible to collect information on the entirepopulation due to high costs of data collection so statisticiansinstead work with samples that are (hopefully) representative ofthe populations they come from.

population

sample

We try to understand certain features of the population as awhole using summary statistics and graphs based on thesesamples.



Obtaining good samples

Almost all statistical methods are based on the notion of impliedrandomness.

If observational data are not collected in a random frameworkfrom a population, these statistical methods – the estimates anderrors associated with the estimates – are not reliable.

Most commonly used random sampling techniques are simple,stratified, and cluster sampling.



Simple random sample

Randomly select cases from the population, each case is equallylikely to be selected.

Index

●

●●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●●

●

●

Index

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

Stratum 1

Stratum 2

Stratum 3

Stratum 4

Stratum 5

Stratum 6

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

Cluster 7

Cluster 8

Cluster 9



Stratified sample

Strata are homogenous, simple random sample from each stratum.

Index

●

●●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●●

●

●

Index

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

Stratum 1

Stratum 2

Stratum 3

Stratum 4

Stratum 5

Stratum 6

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

Cluster 7

Cluster 8

Cluster 9



Cluster sample

Clusters are not necessarily homogenous, simple random samplefrom a random sample of clusters. Usually preferred for economicalreasons.

Index

●

●●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●●

●

●

Index

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

Stratum 1

Stratum 2

Stratum 3

Stratum 4

Stratum 5

Stratum 6

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

Cluster 7

Cluster 8

Cluster 9



Participation question

A city council has requested a household survey be conducted in asuburban area of their city. The area is broken into many distinct andunique neighborhoods, some including large homes, some with onlyapartments. Which approach would likely be the least effective?

(a) Simple random sampling

(b) Cluster sampling

(c) Stratified sampling


Warm-Up and Data Basics Exploratory Data Analysis

Explore the Data

When you taste a spoonful of chili and decide it doesn’t tastespicy enough, that’s exploratory analysis.

For data analysis, we perform exploratory data analysis, or EDA,to determine trends in features that may be present in the data.

The distribution of a variable is a list of possible values thevariable can take and how often it takes each of those values.

Distributions are critical to assessing the probability of events.

Plots are almost always useful for visualizing relationships anddistributions in the data.



Visualizing numerical variables

Intensity map: Useful for displaying the spatial distribution.

Dot plot: Useful when individual values are of interest.

Histogram: Provides a view of the data density, and areespecially convenient for describing the shape of the datadistribution.

Box plot: Especially useful for displaying the median, quartiles,unusual observations, as well as the IQR.



Why visualize?

Describe the spatial distribution of race/ethnicity in the US.

http:// demographics.coopercenter.org/ DotMap/ index.html



Why visualize?

And let’s take a closer look at Durham.



Scatterplot

Scatterplots are useful for visualizing the relationship between twonumerical variables.

Do life expectancy and total fertil-ity appear to be associated or in-dependent?

Was the relationship the samethroughout the years, or did itchange?

http:// www.gapminder.org/ world



Cars: ... vs. weight

From the cars data:

mile

s p

er

ga

llon

(city r

atin

g)

2000 3000 4000

20

30

40

weight (pounds)

2000 2500 3000 3500 4000

10

20

30

40

50

60

weight (pounds)

pric

e ($

1000

s)

What do these scatterplots reveal about the data? How might they beuseful?


http://demographics.coopercenter.org/DotMap/index.html

http://www.gapminder.org/world

Numerical Variables Basic Plots

World Bank Data

This is public-use data available for download fromhttp:// data.worldbank.org/ topic/ energy-and-mining .

What does the distribution of energy use per capita look likeacross different countries?Is energy use fairly uniform across different countries?If not, can we distinguish groups of countries that use more thanothers?

Country.Name X201137 Afghanistan50 Angola 672.7463 Albania 689.0376 Arab World 1806.9089 United Arab Emirates 7407.01

102 Argentina 1966.97115 Armenia 916.26128 American Samoa141 Antigua and Barbuda154 Australia 5500.79167 Austria 3927.92180 Azerbaijan 1369.32193 Burundi206 Belgium 5348.97219 Benin 384.56232 Burkina Faso245 Bangladesh 204.72258 Bulgaria 2615.04



Visualizing numerical variables

Intensity map: Useful for displaying the spatial distribution.

Dot plot: Useful when individual values are of interest.

Histogram: Provides a view of the data density, and areespecially convenient for describing the shape of the datadistribution.

Box plot: Especially useful for displaying the median, quartiles,unusual observations, as well as the IQR.



Stacked Dot Plot

Higher bars represent areas where there are more observations,makes it a little easier to judge the center and the shape of thedistribution.

gpa

3.0 3.2 3.4 3.6 3.8 4.0

●

●

● ●● ● ●● ●● ●● ●

● ●●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●



Dot Plot: Why visualize?

Dot plot of weight, in ounces

0 1000 2000 3000 4000

●● ●● ●● ●●●

●

●●● ●

●

●● ● ● ●

●

● ●

●

●

●● ●

●

●●

●●

●●

● ●

●

● ●

●●

●

● ●

● ●

●

●●

● ●

●

●● ●

●● ●

●

● ●

●

●

●

●●

●

●

●

● ●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Do you see anything out of the ordinary?


http://data.worldbank.org/topic/energy-and-mining


Why visualize?

Dot plot of weight, in ounces

0 1000 2000 3000 4000

●● ●● ●● ●●●

●

●●● ●

●

●● ● ● ●

●

● ●

●

●

●● ●

●

●●

●●

●●

● ●

●

● ●

●●

●

● ●

● ●

●

●●

● ●

●

●● ●

●● ●

●

● ●

●

●

●

●●

●

●

●

● ●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Do you see anything out of the ordinary?



Why visualize?

What type of variable is average number of hours of sleep per night?Is this reflected in the dot plot below? If not, what might be the reason?

Dot plot of average number of hours of sleep per night

4 5 6 7 8 9

●●● ●●● ●●●●●

●●● ●●●●●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●

●

●●●

●

●

●●

●

●

●

●

●

●●●

●●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●



Dot Plot: World Bank Data

0 5000 10000 15000

Eenrgy Data Dot Plot

Energy per capita



Histogram

Energy Use in 2011 (World Bank Data)

Energy Use (kg oil equivalent per capita)

Num

ber

of C

ount

ries

0 5000 10000 15000

020

4060

80

Country.Name X2011AfghanistanAngola 672.74Albania 689.03Arab World 1806.90United Arab Emirates 7407.01Argentina 1966.97Armenia 916.26

Bins 0-2000 2001 - 4000 4001 - 6000 6001 - 8000 . . .

Count 92 38 18 10 . . .



Histogram: Bin Width



Num

ber

of C

ount

ries

0 5000 10000 150000

2040

6080



Num

ber

of C

ount

ries

0 5000 10000 15000 20000

020

4060

8012

0



Num

ber

of C

ount

ries

0 5000 10000 15000

05

1015

2025

3035



Bin Width

Which one(s) of these histograms are useful? Which reveal too muchabout the data? Which hide too much?

extracurricular hrs / week

freq

uenc

y

0 5 10 15 20 25 300

10

20

30

40

50


freq

uenc

y

0 5 10 15 20 250

5

10

15

20

25

30


freq

uenc

y

0 5 10 15 20 250

5

10

15


freq

uenc

y

5 10 15 20 2502468

101214



Histogram



Num

ber

of C

ount

ries

0 5000 10000 15000

020

4060

80

Provides a view of the data density.

Very usual for examining the shape of a distribution.


Numerical Variables Distribution Shapes

Histogram



Num

ber

of C

ount

ries

This distribution is right skewed and unimodal.



Shape: Skewness

We describe histograms as right skewed, left skewed, or symmetric.

0 2 4 6 8 10

05

1015

0 5 10 15 20 25

020

4060

0 20 40 60 80

05

1015

2025

30Histograms are said to be skewed to the side of the long tail.



Shape: Modality

The mode is defined as the most frequent observation in the data set.Does the histogram have a single prominent peak (unimodal), severalprominent peaks (bimodal/multimodal), or no apparent peaks(uniform)?

0 5 10 15

05

1015

0 5 10 15 20

05

1015

0 5 10 15 20

05

1015

20

0 5 10 15 20

02

46

810

1214

In order to determine modality, it’s easiest to step back and imagine adensity curve over the histogram. Use the limp spaghetti method.Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 29 / 1


Shape and Skew

How would you describe this distribution?

Histogram ofaverage number of hours spent on school work per day

2 4 6 8 10

05

1015

2025

30



Shape: Why does this matter?

Symmetric Distribution

Value

Hei

ght

Bimodal Distribution

Value

Hei

ght

0 1 2 3 4 5 6 7 8 9

020

040

060

080

010

00




Which of these variables do you expect to be uniformly distributed?

(a) weights of adult females

(b) salaries of a random sample of people from North Carolina

(c) house prices

(d) birthdays of classmates (day of the month)



Commonly observed shapes of distributions

modality

unimodal bimodal multimodaluniform

skewness

right skew left skewsymmetric



Density Curves

A Density Curve is a smoothed density histogram where the areaunder the curve is 1.To draw a density curve from a histogram simply connect thepeaks of a histogram with a smooth line, and normalize thevalues of the y-axis such that the area under the curve is 1.



Unusual Observations

Are there any unusual observations or potential outliers?

0 5 10 15 20

05

1015

2025

30

20 40 60 80 100

010

2030

40Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 35 / 1


Application exercise: Shapes of distributions



Describing Your Pictures

Bell Shaped: Data is bell shaped if the majority of the data isclustered around the center value (mean) with very few datapoints lying either way above or way below this value.

Right Skewed: Data is positively skewed if you have severallarge positive data points creating a long tail to the right.

Left Skewed: Data is negatively skewed if you have several largenegative numbers creating a long tail to the left.

Bimodal: Data is bimodal if it has two large clusters of datapoints.

Symmetric: Data is symmetric if it looks like a mirror imagearound a point of inflection.

Uniformly Distributed: Data is evenly spread across all possiblevalues.


Descriptive Statistics Center

Mean

The sample mean, denoted as x̄, can be calculated as

x̄ =x1 + x2 + · · ·+ xn

n=

Sum of Data PointsNumber of Data Points

,

where x1, x2, · · · , xn represent the n observed values.

The population mean is a parameter computed the same way butis denoted as µ. It is often not possible to calculate µ sincepopulation data is rarely available.

x̄ is an estimate of µ based on the observed data.

The sample mean is a sample statistic, or a point estimate of thepopulation mean. This estimate may not be perfect, but if thesample is good (representative of the population) it is usually agood guess.



Median

The median is the value that splits the data in half when orderedin ascending order.

0, 1, 2, 3, 4

If there are an even number of observations, then the median isthe average of the two values in the middle.

0, 1, 2, 3, 4, 5→2 + 3

2= 2.5

Since the median is the midpoint of the data, 50% of the valuesare below it. Hence, it is also the 50th percentile.



Mean vs. MedianLink

If the distribution is symmetric, center is the meanSymmetric: mean ≈ median

If the distribution is skewed or has outliers center is the medianRight-skewed: mean > medianLeft-skewed: mean < median

Right−skewed

meanmedian

Left−skewed

meanmedian

Symmetric

meanmedian



Back to our Energy Data



Num

ber

of C

ount

ries

0 5000 10000 15000

020

4060

80

Mean: 2532.631Median: 1593.7



Measures of Center

The Mean of a dataset is what we commonly refer to as theaverage.

The Median of a dataset is the middle value of your data. Youfind the median of your data by ordering from smallest to largest,then finding the value where 50% of your data is above andbelow that value.

The Trimmed Mean is the calculation of the mean after removinga few of the very large and very small observations.



Are you typical?

http:// www.youtube.com/ watch?v=4B2xOvKFFz4

How useful are centers alone for conveying the true characteristics ofa distribution?


http://mih5.github.io/statapps/histogram/histogram.html

http://www.youtube.com/watch?v=4B2xOvKFFz4


Describing distributions of numerical variables

When describing distributions of numerical variables always mention

Shape: skewness, modalityCenter: an estimate of a typical observation in the distribution(mean, median, mode, etc.)Unusual observations: observations that stand out from the restof the data that may be suspected outliersSpread: measure of variability in the distribution (SD, IQR, range,etc.)

−3 −2 −1 0 1 2 3

−3 −2 −1 0 1 2 3

−3 −2 −1 0 1 2 3


Descriptive Statistics Spread

Measures of Spread

The population Variance, σ2, measures each observation’sdeviation from the mean.

The population Standard Deviation, σ, is the square root of thevariance.

The Inner Quartile Range (IQR) measures the spread of themiddle 50% of your data, and is visually depicted in Boxplots.

Link



Box Plot

The box in a box plot represents the middle 50% of the data, and thethick line in the box is the median.

# of study hours / week10 20 30 40



Anatomy of a Box Plot

# of

stu

dy h

ours

/ w

eek

0

10

20

30

40

lower whisker

Q1 (first quartile)

median

Q3 (third quartile)

upper whisker

max whisker reach

suspected outliers


http://mih5.github.io/statapps/


Measures of Location

The 25th percentile is also called the first quartile, Q1.

The 50th percentile is also called the median.

The 75th percentile is also called the third quartile, Q3.

summary ( d$study hours )Min . 1 s t Qu. Median Mean 3rd Qu. Max . NAs3.00 10.00 15.00 17.42 20.00 40.00 13.00

Between Q1 and Q3 is the middle 50% of the data. The range thesedata span is called the interquartile range, or the IQR.

IQR = 20 − 10 = 10



Whiskers and Outliers

Whiskers of a box plot can extend up to 1.5 * IQR away from thequartiles.

max upper whisker reach : Q3 + 1.5 ∗ IQR = 20 + 1.5 ∗ 10 = 35

max lower whisker reach : Q1 − 1.5 ∗ IQR = 10 − 1.5 ∗ 10 = −5

An outlier is defined as an observation beyond the maximumreach of the whiskers. It is an observation that appears extremerelative to the rest of the data.



Outliers (cont.)

Why is it important to look for outliers?

Identify extreme skew in the distribution.

Identify data collection and entry errors.

Provide insight into interesting features of the data.



Why visualize?

What does a response of 0 mean in this distribution?

●●●

0 2 4 6 8 10 12

Number of drinks it takes students to get drunk



Example: Visualizing

What does our Energy Data look like?

050

0010

000

1500

0

Energy Use Data Boxplot

Ene

rgy

Usa

ge



Who uses the most energy?

Country.Name X20111 Iceland 17964.442 Qatar 17418.693 Trinidad and Tobago 15691.294 Kuwait 10408.285 Brunei Darussalam 9427.096 Oman 8356.297 Luxembourg 8045.908 United Arab Emirates 7407.019 Bahrain 7353.16

10 Canada 7333.2811 North America 7062.2212 United States 7032.3513 Saudi Arabia 6738.4214 Singapore 6452.3315 Finland 6449.04




Which of the following is false about the distribution of average numberof hours students study daily?

●

2 4 6 8 10

Average number of hours students study daily

Min. 1st Qu. Median Mean 3rd Qu. Max.

1.000 3.000 4.000 3.821 5.000 10.000

(a) There are no students who don’t study at all.(b) 75% of the students study more than 5 hours daily, on average.(c) 25% of the students study less than 3 hours, on average.(d) IQR is 2 hours.Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 54 / 1


Measures of Spread

The population Variance, σ2, measures each observation’sdeviation from the mean.

The population Standard Deviation, σ, is the square root of thevariance.

The Inner Quartile Range (IQR) measures the spread of themiddle 50% of your data, and is visually depicted in Boxplots.

Link


http://mih5.github.io/statapps/

Documents

Unit 1: Introduction to data Lecture 2: Exploratory data ...nmd16/courses/Summer15/sta101.001-1/slides/(1) Unit 1/Lec 2...Unit 1: Introduction to data Lecture 2: Exploratory data analysis