54
Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Embed Size (px)

Citation preview

Page 1: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Lecture 1: IntroductionMath Boot Camp

Will Terry

Department of Political Science University of Oregon

September 16, 2013

Page 2: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Objectives of Math Camp

Have a good time learning about the wonders of math(s)!

Get ready for PS545-546….

Page 3: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Objectives of PS545-546

• The objectives of our sequence are twofold:

(1.) to improve your ability to read mainstream quantitative research, and

(2.) provide a broad overview of the main tools of quantitative analysis.

• We will focus on the linear regression model.

• You will become familiar with Stata.

Page 4: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Statistical software

• This course will focus on practical computing skills that you might find useful in your future research.

– There are reasons to spend some time with R to appreciate capability of statistical computing.

– Given the limited time we will focus on developing STATA skills as much as possible.

• We will master the basic components of statistical computing.

– Data management

– Estimating regression models

– Graphing

Page 5: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

The standard political science stats education

I. Basic probability theory

- random variables

- PDFs

-CDFs

II. Statistical inference theory

- confidence intervals, hypothesis testing, p-values, etc.

III. Linear regression analysis

- the workhorse model of the social sciences

IV. Binary Outcome Models & Other Extensions of the Basic Linear Model

V. Time Series Cross Sectional Models

 

Page 6: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

First, some key terms…

Causality

Phenomenon Y (e.g. income) is affected by factor X (e.g., gender)

Statistical inference

   Drawing conclusions about the world based on characteristics of sample data. Typically we are in interested in understanding “population parameters.”

Independent variable (syn. “regressor”, RHS var)

The variable that is exogenously manipulated or changed.

Dependent variable (syn. “regressand”, LHS var)

Its value “depends” on the value taken by the independent variables.

Page 7: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Random variables and hypothesis testing

Random Variable (RV)

A variable whose values are determined by chance.

 

Population Density Function (PDF)

Describes how an RV is “distributed”—i.e., how likely it is that the RV takes any particular value.

Parameter

Characteristic or measure that describes a population.

 

Statistic (not to be confused with Statistics)

Characteristic or measure obtained from a sample.

 

.

Page 8: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Common ways to distinguish variables

Qualitative Variables

Variables that take non-numerical values. (e.g., eye color; gun ownership)

 

Quantitative Variables

Variables that take numerical values. (e.g., number of credit cards in one’s wallet; time elapsed since the Compromise of 1877)

 

Discrete Variables

Variables which assume a finite or countable number of possible values. Usually obtained by counting. (e.g., the number of credit cards in one’s wallet)

 

Continuous Variables

Variables which assume an infinite number of possible values. Usually obtained by measurement. (e.g., time elapsed since the Compromise of 1877)

Page 9: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Hypothesis testing terminology

Population

All subjects possessing a common characteristic that is being studied.

 

Sample

A subgroup or subset of the population.

Statistics

Collection of methods for planning experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions.

Page 10: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Hypothesis testing

Page 11: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Research design

• Research design is the means by which we attempt to uncover causal relationships between variables using data that we collect.

• In the jargon of the trade, the objective is to to “identify” the effect of a “treatment.”

• Conceptually, one wants to make a comparison between two identical subjects—one who received the treatment, and one who did not.

• A pure experiment is the gold standard. Unfortunately, this ideal is generally infeasible in the social sciences.

Page 12: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Language of research design

Treatment group

The group that receives the treatment.

Control group

The group that does not receive the treatment.

Experimental data

Data derived from a process whereby the researcher determines the receipt of the treatment.

Non-experimental data (syn. “observational data”)

Data in which the administration of the treatment is determined by factors beyond the researchers control.

Page 13: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

The standard political science stats education

I Basic probability theory

- random variables

- PDFs

-CDFs

II. Statistical inference theory

- confidence intervals, hypothesis testing, p-values, etc.

III. Linear regression analysis

- the workhorse model of the social sciences

IV. Binary Outcome Models & Other Extensions of the Basic Linear Model

V. Time Series Cross Sectional Models

 

Page 14: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Linear regression analysis

A. Univariate regression model

yi = β0 + β1xi + εi (There is one IV)

B. Multivariate regression model

yi = β0 + β1xi +β2zi + εi (There are two IVs)

yi = β0 + β1x1i +….+ βNxNi + εi (There are N IVs)

Page 15: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

V. Binary dependent variable models

Used when the dependent variable takes one of two possible values:

= 1 if citizen i is a Democrat

Democrati

= 0 if citizen i is not a Democrat

Democrati = f (genderi, incomei, racei, agei )

Page 16: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

VI. Time series cross sectional models

State Year GDP per capita Ave. Education

Alabama 1970 $5,000 10.3 years

Alabama 1980 $9,500 11.2 years

Alabama 1990 $11,200 12.4 years

Illinois 1970 $7,000 9.3 years

Illinois 1980 $12,500 10.2 years

Illinois 1990 $17,200 13.7 years

New York 1970 $6,000 8.4 years

New York 1980 $11,500 10.1 years

New York 1990 $18,00 14.5 years

When the researcher observes the objects of analysis at multiple points in time.

(These data have both time series and cross section features.)

Page 17: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

What we won’t cover in PS545-6 but might be useful in your dissertation, future research, etc.

A. MLE estimation and other procedures

B. Model selection

C. Simultaneous equations/IV estimation

D. Matching

E. Non-parametric models

F. Case study selection for qualitative research

And much, much more!

Page 18: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Causality and research design

• Causality is often difficult to determine—wait for the next slide—that’s that’s why research design is important.

• An experiment is the gold standard.

• If a treated subject and a control subject are the same in every respect (as they are in a perfect experiment), we can logically attribute any difference in the observed outcome to receipt of the treatment.

• In the social sciences, we generally can’t run experiments so we use statistical techniques to make the treatment and control group as alike as we can.

Page 19: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Common difficulties in determining causality

One variable causes another, but how do you know which is causal?

Douglass firs ? Rainfall

Two variables cause each other.

Expected closeness of race Candidate expenditures

Page 20: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Common difficulties in determining causality

An omitted third variable causes both. (One reason correlation ≠ causation.)

Bad Driving

Old age Gray Hair

If one were to look at the relationship between Bad Driving and Gray Hair only one might be led to the erroneous conclusion that Gray Hair causes people to drive badly (or Bad Driving causes one to have Gray Hair).

How could one test these competing hypotheses?

Recall the relationship between ice cream consumption and the NY homicide rate…

Page 21: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

A research design schematic

R denotes randomized assignment.N denotes non-randomized assignment.X denotes receipt of the treatment.O Denotes that the subject is tested.

Page 22: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Some basic mathematical tools

We will review some basic mathematical tools:

- Functions

- Summation operators

- Differential Calculus

Page 23: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Functions

A function is a rule that assigns exactly one value to each input of a specified type

A function expresses the intuitive idea that one quantity (the argument of the function, also known as the input) completely determines another quantity (the value, or the output).

Page 24: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Summation operators

Summation operators are a useful way to represent the sum of a large set of numbers:

The index i indicates which numbers in the set are to be included in the sum.

The product operator works in a similar fashion.

x ii=1

N

∑ = x1 + x2 + ...+ xN −1 + xN

x ii=1

N

∏ = x1 × x2 × ...× xN

Page 25: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Summation operators

Suppose your data were, {x1, x2 , x3 , x4 , x5 , x6 , x7} = {-100,-10, -1, 0, 1, 10, 100}.

Compute the following:

x ii=1

7

x ii=1

3

∑€

x ii=3

5

8(x i)i=1

7

∑€

x i

4i≠3

x ii is an odd number

∑€

x ii=1

7

x ii≥6

Page 26: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Sample mean and sample varianceEvery population has a mean (μ) and a variance (σ2), note this implies it has a

standard deviation (σ) as well.

The population mean tells you were the population is “centered.” There’s a sense in which the mean is the middle of the data.

The population variance (or standard deviation) measures how far “spread out” individuals in the population are. (Obviously, these are always non-negative).

The sample mean and sample variance are two fundamental statistics. They estimate the parameters of the population the data were drawn from.

ˆ μ =1

Nx i

i=1

N

ˆ σ 2 =1

N(x i

i=1

N

∑ − ˆ μ )2

Page 27: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Derivatives

Loosely speaking, a derivative can be thought of as how much one quantity is changing in response to changes in some other quantity.

Page 28: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Integrals

A definite integral of a function can be represented as the signed area of the region bounded by its graph.

Page 29: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Math Camp game plan: Time to get down to business…

In the remainder of this lecture we will discuss some elementary results in a branch of mathematics called Real Analysis—i.e., the branch of math that studies real numbers.

Q: Why do we care about Real Analysis?A: Because it provides the logical structure that undergirds the math we use as social scientists.

The next few slides follow a text that is slightly more advanced than we need, but let’s follow along to develop a few ideas about the real number line…

Page 30: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

The set of real numbers:Special symbols

Page 31: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

The real number line

Page 32: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

The set of real numbers:Properties

Page 33: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Inequalities

Page 34: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Inequalities

Page 35: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Inequalities

Page 36: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Roots

Page 37: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

A cheat sheet of handy rules re real numbers

(see the Math Camp website for the complete sheet)

Page 38: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Quadratic equations

Page 39: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Quadratic equations (cont.)

Page 40: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Quadratic equations (cont.)

Page 41: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Absolute value

Page 42: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Achilles and the tortoise

Page 43: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Achilles and tortoise

Page 44: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Achilles and the tortoise

Page 45: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Achilles and the tortoise

Page 46: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Bounds

Page 47: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Bounds

Page 48: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Bounds

Page 49: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Bounds

Page 50: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Intervals

Page 51: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Intervals

Page 52: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Intervals

Page 53: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Intervals

Page 54: Lecture 1: Introduction Math Boot Camp Will Terry Department of Political Science University of Oregon September 16, 2013

Next lecture…

Functions and graphs - Functions

- Graphs

- Functional forms