26
Getting Started with STATA By: Katie Droll

Getting Started with STATA

  • Upload
    lavonn

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

Getting Started with STATA. By: Katie Droll. Embrace Stata!. Stata is your statistical buddy! If you put in a bit of effort to learn the basics, you should find the program quite easy and very helpful. Statistical software can be very intimidating your 1 st time around. Stay patient!. - PowerPoint PPT Presentation

Citation preview

Page 1: Getting Started with STATA

Getting Started with STATA

By: Katie Droll

Page 2: Getting Started with STATA

Embrace Stata!

• Stata is your statistical buddy!

• If you put in a bit of effort to learn the basics, you should find the program quite easy and very helpful.

• Statistical software can be very intimidating your 1st time around. Stay patient!

Page 3: Getting Started with STATA

Enter Commands here!

STATA Command Window

Results window: This is where non-graphic output is printed

Variable Window

Review Window: lists all commands

Click on command to rerun

Graph Window:

Click on graph & copy into word doc

Page 4: Getting Started with STATA

How do I enter data?

• Retrieve data from stored data files: – EASY: Open .dta files from textbook CD-ROM– HARDER: Import ASCII data from .txt or .raw

• But also useful outside the context of class

• Manually enter variables & data values:– EASY: Use the data editor– HARDER: Use input command

• Time consuming if there is a lot of data• Prone to errors: typos!

Page 5: Getting Started with STATA

Where is the stored data?

• Textbook CD-ROM– Datasets for examples found in chapter examples will

be under the appropriate ‘chapter’ folder under Stata– Datasets for homework problems in Appendix B of the

book should also be here under ‘exercise’

• On the course website– Under ‘Statistical Computing’ ’Datasets’– Save the .dta file on your computer

Page 6: Getting Started with STATA

Retrieving .DTA files• Command line:

use "E:\Stata\exercise\nurshome.dta", clear

-OR-

• Point and Click: Go to ‘File’ ‘Open…’ Select your CD drive

Go to ‘Stata’ ‘exercise’ OR ‘chapn’

Page 7: Getting Started with STATA

Importing .txt OR .raw data files

• Remove the variable names and any other symbols (such as ‘*’) from the top of the .txt file, then save!

Command:

infile str20 strvar1 numvar2 using “C:\Unicef.txt"

import data command

Variable namesCommand

File pathname

Command for ‘string’ variable indicating the length

Page 8: Getting Started with STATA

Entering data using the editor• Go to Data Data Editor

•Enter your data similar to a spreadsheet program like Excel

•Double-click on the variable names (var1) to edit them and add variable labels

•Click Preserve, and then close out of the data editor window

•You cannot run analyses on this data until you preserve the data and close the data editor!

Variable Name

Page 9: Getting Started with STATA

Entering data using input

input str18 name age “Joe Smith” 15“Ricky Bobby” 24“Wilma Flintstone” 27end

input str5 first str10 last age Joe Smith 15Ricky Bobby 24Wilma Flintstone 27end

input year cigs1900 541910 1511920 6651930 14851940 19761950 35221960 41711970 39851980 38511990 2828end

This tells STATA the variable is string

Length of string variable

Exit data entry

Start data entry

Must use “” if there are any spaces in variable

Page 10: Getting Started with STATA

Summarizing data

list print your dataset to the results window

summarize variable prints summary stats in the results window

summarize variable, detail provides additional summary statistics

Page 11: Getting Started with STATA

Lab #1 Main Topics

Page 12: Getting Started with STATA

Bar Chartsgraph bar cigs, over(year) title("Cigarette Consumption Per Person, US") b2(Year) ytitle("number of Cigarettes") ylabel(0(2000)4000)

02,

000

4,00

0N

umbe

r of

Cig

aret

tes

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990Year

Cigarette Consumption Per Person, US

Page 13: Getting Started with STATA

Box plotgraph box cigs, title("Cigarette Consumption per Person, US")

ytitle("Number of Cigarette")

graph box resident, medtype(line) box(1, fcolor(magenta) lcolor(purple)) title(Box plot of Nursing Home Residents)

02

04

06

08

0n

urs

ing

ho

me

re

sid

en

ts

Boxplot of Nursing Home Residents

Page 14: Getting Started with STATA

Histogramhistogram resident, ytitle(Distribution of Residents)

xtitle(Number of residents) title(Histogram of the Distribution of Residents)

0.0

05.0

1.0

15.0

2.0

25D

istr

ibu

tion

of

Re

sid

en

ts

0.00 20.00 40.00 60.00 80.00Number of residents

Histogram of the Distribution of Residents

Page 15: Getting Started with STATA

Save commands!

• Open a do editor:Window Do-file Editor New Do-File

• Copy and paste commands in this file to save for later use

• You can also copy and paste commands into a simple txt file or a word file

• Please include important output (results & graphs) in your homework, along with the commands that produced the included output.

Page 16: Getting Started with STATA

Saving commands to a log file• Before your Stata session begins, you want to give Stata the

following Command:

log using "C:\Temp\myfile.log", noproc

• After you are done writing your Stata commands, you can close the log file by using the Log button located just below the Prefs menu (it looks like scroll with a traffic light next to it).

• From within Stata, you can examine the contents of that Log file with the command:

type "C:\Temp\myfile.log"

• To run that file as a program (referred to as a "do-file" in Stata), you can simply issue the Stata command:

do "C:\Temp\myfile.log"

Page 17: Getting Started with STATA

Putting Stata output into homework

• Simply highlight what you want from the results window (including the command), then copy [Ctrl-C] and paste [Ctrl-V] into your homework document

• To copy and paste graphs, just click on the graph before copying it. You can use [Ctrl-C] or Right-click Copy

• After you copy & paste the output into your homework, change the font to a monospace (fixed pitch) font, i.e. fonts in which each character has the same width. This will line up your output!

• Examples: Courier New, SAS Monospace

Page 18: Getting Started with STATA

Lab #2 Main Topics

Page 19: Getting Started with STATA

Labels

• Save organ.dta from the website to your computer, and it open in Stata

• The names of the afflicted organs are just labels. To see what the raw data look like, you can list them without the labels as follows:

list, nolabel

• You can see what the association of label and value is by listing the labels:

label list

Page 20: Getting Started with STATA

Summarizing data by categorical groups

• If we want to do some exploratory analysis of our data set, we can at first produce some descriptive statistics for the survival of each organ. To do that we must sort the observation by organ.

sort organ

• Then we can summarize the data by organ as follows:

by organ: summarize survival

Page 21: Getting Started with STATA

Side-by-side box plots• We can even generate

side-by-side box plots for the survival from diagnosis for each affected organ as follows:

01,

000

2,00

03,

000

4,00

00

1,00

02,

000

3,00

04,

000

Breast Bronchus Colon

Ovary Stomach

Leng

th o

f Sur

viva

l (da

ys fr

om d

iagn

osis

)Graphs by Af f ected organ

graph box survival, by(organ) ytitle("Length of Survival (days from diagnosis)")

Page 22: Getting Started with STATA

Creating a new variable as a function of an existing variable

• The first conclusion from the box plot is that women with breast cancer have the longest survival. This is consistent with the descriptive statistics produced by the summarize command.

• Another conclusion is that the variability in the length of survival is not the same in all cases, with breast and ovarian cancer having a large variability (indicated by the length of the box) while the rest of the cancers have very small variability. This will actually be a problem later on, so taking a transformation of the original survival times. A logarithmic transformation is usually a good bet. We do this as follows:

generate lsurv=ln(survival)label var lsurv "Log-transformed survival"

Page 23: Getting Started with STATA

Box plot of log survival• To include the overall box plot of survival in

the side-by-side box plots, you just add the option total:

graph box lsurv, by(organ,total) ytitle("Log-transformed Survival (days from diagnosis)")

34

56

78

34

56

78

Breast Bronchus Colon

Ovary Stomach Total

Log-

tran

sfor

me

d S

urv

ival

(d

ays

from

dia

gnos

is)

Graphs by Affected organ

Page 24: Getting Started with STATA

Histograms by group

• We can also generate the histograms of survival time (log-transformed) for each type of cancer as well as total as follows:

hist lsurv, freq by(organ, total)

05

1015

05

1015

2 4 6 8 2 4 6 8 2 4 6 8

Breast Bronchus Colon

Ovary Stomach Total

Fre

quen

cy

Log-transformed survivalGraphs by Af f ected organ

Page 25: Getting Started with STATA

Selecting groups to summarize

• To get descriptive statistics within only breast and ovarian cancer groups you must use the if statement within the summarize command:

by organ: summarize survival if organ==1 | organ==4, detail

Page 26: Getting Started with STATA

Especially for Point-and-click People!• If you don’t like entering commands, you can also use the menus in

Stata to point and click your way through the analyses.

• To summarize data:Data Describe Data ‘choose an option here’

• Graphs:Graphics Bar Chart

HistogramBox plot‘and many other options’

• This is a great way to explore the program, and learn about the various capabilities of Stata

• Still please remember to include the command from the results window in your homework