17
Everything I wish I Everything I wish I had known about had known about research design and research design and data analysis… data analysis… Statlab Workshop Statlab Workshop Fall 2006 Fall 2006 Kyle Hood and Frank Kyle Hood and Frank Farach Farach

Everything I wish I had known about research design and data analysis… Statlab Workshop Fall 2006 Kyle Hood and Frank Farach

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Everything I wish I had Everything I wish I had known about research known about research

design and data design and data analysis…analysis…

Statlab WorkshopStatlab Workshop

Fall 2006Fall 2006

Kyle Hood and Frank FarachKyle Hood and Frank Farach

Outline of a paperOutline of a paper

IntroductionIntroductionTheoryTheoryData DescriptionData DescriptionAnalysisAnalysisConclusionConclusion

Identifying a QuestionIdentifying a Question

Tradeoff between work in and resultsTradeoff between work in and results Easy to do, trivial resultsEasy to do, trivial results Result is interesting, but difficulty is highResult is interesting, but difficulty is high

New tools open up new questionsNew tools open up new questions New statistical or computational tools New statistical or computational tools

make formerly difficult questions make formerly difficult questions approachableapproachable

New theory opens up new questionsNew theory opens up new questions

IntroductionIntroductionTopicTopic

Most general levelMost general level

QuestionQuestion What is the question you want to What is the question you want to

answer?answer? Be specificBe specific Ask only what you can answerAsk only what you can answer

Review the LiteratureReview the Literature ““Stay the course”Stay the course”

TheoryTheoryCategorize your theoryCategorize your theory

Descriptive vs. causalDescriptive vs. causal

Write down your theoryWrite down your theory In paragraph formIn paragraph form Using a statistical modelUsing a statistical model

Identify testable hypotheses from Identify testable hypotheses from theorytheory

Do you need statistics after all?Do you need statistics after all? Quantitative v. Qualitative researchQuantitative v. Qualitative research

VariablesVariablesDependent Variable (Dependent Variable (response, response,

outcome, criterion)outcome, criterion)

Independent Variables (Independent Variables (explanatory or explanatory or predictor variables)predictor variables)

Treatment VariableTreatment Variable Covariates / Confounding Variables Covariates / Confounding Variables

Categorical and Continuous Categorical and Continuous VariablesVariables

Remember: Types of variables we choose, Remember: Types of variables we choose, determine the statistics we usedetermine the statistics we use

You need DataYou need DataThink about analyses early!Think about analyses early!Collecting your own dataCollecting your own data

Retrospective, prospective, experimental & Retrospective, prospective, experimental & observational methodsobservational methods

Can find most data you’ll need on-line!Can find most data you’ll need on-line!Statlab Webpage Statlab Webpage (http://statlab.stat.yale.edu)(http://statlab.stat.yale.edu)

AdvisorsAdvisors Yale StatCat (http://ssrs.yale.edu/statcat/)Yale StatCat (http://ssrs.yale.edu/statcat/) ICPSR (http://www.icpsr.umich.edu)ICPSR (http://www.icpsr.umich.edu) Reference Librarian (Julie Linden)Reference Librarian (Julie Linden)

EndogeneityEndogeneity

Problem: “Independent” variables are Problem: “Independent” variables are not really independentnot really independent

The “dependent” and “independent” The “dependent” and “independent” variables are determined in equilibrium variables are determined in equilibrium (example: effect of education on wages)(example: effect of education on wages)

Treatment effects will be biasedTreatment effects will be biased

Modeling approaches to deal with thisModeling approaches to deal with this Assumption-based methodsAssumption-based methods InstrumentsInstruments

So, you want to make a So, you want to make a surveysurvey

Extensive on-line resources and softwareExtensive on-line resources and software Question types determine analysesQuestion types determine analyses

Open vs. close ended questions, Likert scales, rank order Open vs. close ended questions, Likert scales, rank order datadata

Assumptions of normalityAssumptions of normality ValidityValidity

Internal & External validityInternal & External validity Pilot testingPilot testing

You need variance to analyze!You need variance to analyze! Sample sizeSample size

It depends; power, effect size, cost (UCLA power calculator)It depends; power, effect size, cost (UCLA power calculator)

Once You’ve Found or Once You’ve Found or Collected your dataCollected your data

Download the data and documentationDownload the data and documentation StatTransfer (Statlab)StatTransfer (Statlab)

Determine data file typeDetermine data file type Probably a text file (.txt, .dat, .raw)Probably a text file (.txt, .dat, .raw)

Converting text & delimited filesConverting text & delimited files

Choose a statistical software programChoose a statistical software program SPSS, Stata, SAS, Matlab, Excel, R, C++SPSS, Stata, SAS, Matlab, Excel, R, C++

Managing your dataManaging your dataBack up all Master Data FilesBack up all Master Data Files

CDR/CDRW, USB Key CDR/CDRW, USB Key

CodebookCodebook All codes All codes Adding variables, cases, computing new Adding variables, cases, computing new

variablesvariables

Keep a roadmap Keep a roadmap Keep a log of all analyses with what you Keep a log of all analyses with what you

have donehave done Save syntax filesSave syntax files

Syntax FilesSyntax FilesWhat are they?What are they?

Text-files used to enter commands in bulkText-files used to enter commands in bulk

Is it worth learning?Is it worth learning?You will make mistakes, need to make You will make mistakes, need to make

changeschanges

SPSS and many other programs let you use SPSS and many other programs let you use pull down menus pull down menus

How do I know what to write?How do I know what to write?Program’s manual provides the underlying Program’s manual provides the underlying

commandcommand

So, how do I analyze my So, how do I analyze my data?data?

CorrelationCorrelation Correlation allows you to quantify relationships Correlation allows you to quantify relationships

between variables (r, r-squared)between variables (r, r-squared) Regression allows prediction of dependent variable Regression allows prediction of dependent variable

based on one or more independent variablesbased on one or more independent variables

Group differencesGroup differences t-test & ANOVAt-test & ANOVA Chi-square for categorical and frequency dataChi-square for categorical and frequency data

Significance v. effect sizeSignificance v. effect sizeMore Complex ModelsMore Complex Models

Descriptive StatisticsDescriptive Statistics

VariablesVariablesDependent Variable(s)Dependent Variable(s)

Independent Variable(s)Independent Variable(s)

Important CovariatesImportant Covariates

GraphsGraphs

Summary Statistics on Key VariablesSummary Statistics on Key VariablesNumber, Mean, Minimum, Maximum, Standard Number, Mean, Minimum, Maximum, Standard

DeviationDeviation

Cross-TabsCross-Tabs

Putting Output into a Putting Output into a PaperPaper

Cut and PasteCut and PasteGraphsGraphs

Cut and Paste into Word Processing documentCut and Paste into Word Processing document

Save as .jpeg or .tif fileSave as .jpeg or .tif file

TablesTablesCut and PasteCut and Paste

Format in Word Processing documentFormat in Word Processing document

Import into Excel, format, and then place in WordImport into Excel, format, and then place in Word

More Advanced AnalysisMore Advanced AnalysisMultivariate techniques help to account for Multivariate techniques help to account for

confounding factors, allow for testing confounding factors, allow for testing change over time and more complex change over time and more complex hypotheseshypotheses……

(See: Tabachnick & Fidell, Using Multivariate (See: Tabachnick & Fidell, Using Multivariate Statistics)Statistics)

1)1) Be honest about your abilities.Be honest about your abilities.

2)2) Ask for helpAsk for help

3)3) Best off including techniques that you fully Best off including techniques that you fully understand.understand.

Take Away MessagesTake Away Messages

1)1) Determine your question, methods Determine your question, methods and statistics before you startand statistics before you start

2)2) Keep a codebook of everythingKeep a codebook of everything

3)3) Keep a log of all commands issuedKeep a log of all commands issued

4)4) Save data at every stepSave data at every step

5)5) Ask for helpAsk for help

6)6) Don’t get in over your headDon’t get in over your head