50
1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

Embed Size (px)

Citation preview

Page 1: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

1

How to enter data in SPSS

1.1 Introduction of SPSS

1.2 Data Entry

1.3 Data Cleaning using SPSS

Page 2: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

2

302

87

80

49

43

33

18

9

9

8

SAS

SPSS

STATA

Epi Info

SUDAAN

S-PLUS

StatXact

BMDP

StatView

Statistica

0 100 200 300 400

Statistical Software Packages Most Commonly Cited in the NEJM and JAMA between 1998 and 2002

Number of articles software was sited

Page 3: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

3

Before you perform analysis in SPSS, let’s set up the following option.

Go to Edit, Options,..

Page 4: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

4

SPSS Windows has 3 windows:

Data Editor

Syntax Editor, which displays syntax files

Viewer or Draft Viewer which displays the output files

The Data Editor has two parts:

Data View window, which displays data from the active file in spreadsheet format

Variable View window, which displays metadata or information about the data in the active file, such as variable names and labels, value labels, formats, and missing value indicators.

Page 5: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

5

SPSS Data View

Page 6: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

6

SPSS Variable View

Page 7: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

7

1.2 Data Entry into SPSS

There are 2 ways to enter data into SPSS:

1. Directly enter in to SPSS by typing in Data View

2. Enter into other database software such as Excel then import into SPSS

Let’s start with the second option, using data in Excel.

Page 8: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

8

Figure 1. Data from Hell

Page 9: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

9

Data from Heaven

Page 10: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

10

How to move from Hell to Heaven (1):

1. Add a patient’ ID number

2. Delete the first row with the title of the project

3. Delete the 2 rows under the variable name.

4. Delete the 2 row between the groups.

5. Delete the row of average at the bottom.

6. Add a variable called group and code the first 10 with Drug A as 1 and the next 10 as 2.7. Change the variable names to less than 8 or 8 characters with no spaces, (you can use numeric, but not starting with numeric, avoid symbols).

8. Insert 2 columns before BP as SYSBP and DIASBP. Delete the BP text column.9. Change missing values, NA, unknown, ?, to blanks.

10. Change age of 6 months to 0.5 (years). Fix errors.

11. Code males=1 and females=2.

12. Code complications as 0 for no and 1 for yes

13. Go back to the source and complete the missing information

14. If a column was entered as a string (words), you may have to select the column and format the cells for change it to numeric.

Page 11: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

11

1. Give each variable a valid name (8 characters or less with no spaces or punctuation, beginning with a letter not a numeric number). Short, easy to remember word names. Avoid the following variable names: TEST, ALL, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, WITH. These are used in the SPSS syntax and if they were permitted, the software would not be able to distinguish between a command and a variable. Each variable name must be unique; duplication is not allowed. Variable names are not case sensitive. The names NEWVAR, NewVar, and newvar are all considered identical.

General guidelines for data entry

2. Encode categorical variables. Convert letters and words to numbers.

3. Avoid mixing symbols with data. Convert them to numbers.

4. Give each patient a unique, sequential case number (ID). Place this ID number in the first column on the left

Page 12: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

12

5. Each variable should be in its own column.

Avoid this:

AnimalControl1Control2Experiment1Experiment2

Change to:

Animal Group1 02 03 14 1

* It is recommended to use 0/1 for 2 groups with 0 as a reference group.

* Do not combine variables in one column

6. All data for a project should be in one spreadsheet. Do not include graphs or summary statistics in the spreadsheet.

Page 13: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

13

8. However when data are repeatedly collected over a patient, it’s recommended to have patient-day observation on a simple line to ease data management. SPSS has a nice feature to convert from the longitudinal format to horizontal format. When the number of repeats are few 2 or 3, horizontal format may be preferred for simplicity.

Date ID SYSBP1/2/2005 1 1301/3/2005 1 1201/4/2005 1 1203/1/2005 2 110 3/2/2005 2 140

Longitudinal data entry

ID SYSBP1 SYSBP2 SYSBP3 1 130 120 120 2 110 140

Horizontal data entry

7. Each patient should be entered on a single line or row. Do not copy a patient’s information to another row to perform subgroup analysis.

Page 14: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

14

9. For yes/no questions, enter “0” for no and “1” for yes. Do not leave blanks for no. Do not enter “?”, “*”, or “NA” for missing data because this indicates to the statistical program than the variable is a string variable. String variables cannot be used for any arithmetic computation.

10. Put ordinal variables into one column if they are mutually exclusive.

Avoid:

Pain Mild Moderate Severe1 0 0 0 1 00 0 1

Preferred:

Pain

123

11. Do not make columns wider then 8 characters, unless absolutely essential.

Page 15: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

15

Entering Date in Excel.

In Excel,go to: Format, Cells, select Date under Category,

Choose Type for a format you like

Page 16: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

16

Entering Time in Excel.

In Excel, go to: Format, Cells, select Time under Category,

Choose Type for a format you like

Page 17: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

17

Entering Date / Time in Excel.

In Excel, go to: Format, Cells, select Time under Category,

Choose Data/Time format

Page 18: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

18

Entering Date, Time in SPSS

In SPSS, open Variable View, Click Type for the variable you want to Assign date format, click on Date, and select a format of your choice.

Page 19: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

19

Importing data from Excel spreadsheet into SPSS.

In SPSS, go to:File, Open, DataSelect Type of file (for example, Excel) you want to openSelect File name you want to open

Page 20: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

20

Importing data from SPSS to Excel.

In SPSS, go to:Data, Save as,Select Type of file (for example, Excel) you want to save intoGive File name you want to save into

Page 21: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

21

Data merging in SPSS (1)

1. Make sure that both files are sorted by Key variable in ascending order2. In SPSS, open Data from Hell to Heaven.sav3. Select Add Variables under Data, Merge Files

Page 22: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

22

Data merging in SPSS (2)

4. Select the dataset you want to merge into the working file.

Page 23: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

23

Data merging in SPSS (3)

5. Click on Match cases on key variables in sorted files,

6. Click on Both files provide cases

7. Highlight ID in the excluded variables box, then click ► near key Variables

Page 24: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

24

Cases must be sorted in the same order in both data files. If one or more key variables are used to match cases, the two data files must be sorted by ascending order of the key variable. Variable names in the second data file that duplicate variable names in the working data file are excluded by default because Add Variables assumes that these variables contain duplicate information. Thus before you merge data files, you need carefully to check two variables with the same name. If two variables contain different information, SPSS automatically delete variable from the file, which is being merged into (Birthday.sav).

Note in Data merging in SPSS (3)

Page 25: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

25

1.3 Data Cleaning in SPSS

1. Re-coding existing variables

2. Creating new variables

3. Creating new variable from existing variables

4. Data labeling and formatting

Page 26: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

26

Data cleaning in SPSS (1): Recoding existing variables (1)

Old New

ID Group Group

1 A 02 A 03 B 1 4 B 1

We want to use numeric coding for group instead of A and B.

Page 27: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

27

Data cleaning in SPSS (2): Recoding existing variables (2)

From SPSS dialog box, go to: Transform Recode Into Same variables

Page 28: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

28

1. Select Group from the variable box into String Variables box2. Click on Old and new Values to proceed

Data cleaning in SPSS (1): Recoding existing variables (3)

Page 29: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

29

1. Type the old value and the new value you want to convert into2. Click on Add (To remove, or change, click on Change or Remove)3. Type all values in the Old New box, then click Continue4. Click OK to execute the commands.

Data cleaning in SPSS (1): Recoding existing variables (4)

Page 30: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

30

Data Cleaning in SPSS (2) Creating a new variable for Diastolic blood pressure (DiasBP):

In SPSS, go to Variable View,Then type DiasBP at the last row under Name

Go back to Data View and directly type diastolic blood pressure to separate from SysBP. For ease of data entry, you can move DiasBP right after SysBP. Now also edit sysBP.

Page 31: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

31

Computing patient’s age from birthday and date enrolled into the study.

Data Cleaning in SPSS (3)

Page 32: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

32

HT 61.00 68.00 47.00 66.00 72.00 67.00 72.00 72.00 66.00 60.00 61.00 59.00 73.00 65.00 71.00 68.00 69.00 66.00 66.00 68.00

Data Cleaning in SPSS (4): Data labeling and formatting (1) Specifying Type of Variable

Page 33: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

33

Data Labeling

Data Cleaning in SPSS (4): Data labeling and formatting (2)

Page 34: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

34

Variable Formatting

Data Cleaning in SPSS (4): Data labeling and formatting (3)

Page 35: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

35

Data Cleaning in SPSS (4): Data labeling and formatting (4)

Specifying missing values

Page 36: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

36

Data Cleaning in SPSS (4): Data labeling and formatting (5)

Measurement category

Page 37: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

37

Retrieve data property from existing files in SPSS (1)

This property is extremely handy when you need to construct a similar database for expanded, or new group of patients. You can save time on creating variable label, format, etc, rather you can retrieve these information from existing files.

Now let’s create a copy from “Data from heaven.sav” after you delete formats and labels you just created. Save it as “Data from hell to heaven without format.sav”.

Note: Before you perform this commands, make sure that Type of variables matched between the two datasets.

Modified

Page 38: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

38

Retrieve data property from existing files in SPSS (2)

Page 39: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

39

Retrieve data property from existing files in SPSS (3)

Page 40: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

40

Using syntax in SPSS:

SPSS has its great advantage in producing high level graphs and statistical analysis by easy point-and-click operations. However, some people may criticize SPSS for irreproducibility of analysis which were conducted before. In fact, SPSS has a high level capacity of programming syntax which can be saved and repeatedly operated.

Throughout the course, I will provide “how to” box to conduct all analysis used in the class, here I will show how to save your commands in syntax. I highly recommend the use of syntax for better organization on haw has been done.

Page 41: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

41

Using syntax in SPSS (1): Creating a new syntax file

Page 42: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

42

Using syntax in SPSS (2): Editing a syntax file

Page 43: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

43

Using syntax in SPSS (3): Saving a syntax file

Page 44: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

44

Using syntax in SPSS (4): Opening an existing syntax

Page 45: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

45

I find syntax very handy especially when you get tired of clicking so many times!

Using a syntax in SPSS (5): Example Syntax

Page 46: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

46

You can in fact use command dialog box (point and click method) as your main tool and still save what you did with point and click into syntax. Then later you can simply execute the syntax to repeat the analysis.

Using syntax in SPSS (6):Recoding syntax from command dialog box

Step 1

Page 47: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

47

Saved syntax from the previous PASTE commandStep 2:

Page 48: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

48

Using syntax in SPSS (7): Executing the syntax

Page 49: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

49

Data need to be stored in a secure locked place, need to be back-up daily or once a week. When you send your data to a biostatistician for further statistical analysis, delete patient name, social security numbers, medical record numbers, actual dates (birth day, admission date, etc)

Data confidentiality

Page 50: 1 How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

50

Most statisticians prefer to have data submitted as SPSS format or in the statistical software they use. An advantage of entering data directly into a statistical package, such as SPSS is that one can enter variable label and value labels in the file.

Also answer the following questions:

What is the name of your study?What is the purpose of your study?What is the type of your study?Will all subjects be included in the analysis?Was there any matched (repeated) measures?How will outliers be defined and handled?Has the data been cleaned?What is our goal and deadline for this goal?

Communication with a biostatistician:

When communicating with a biostatistician, also describe the research problem, study hypothesis, and the primary comparison that you are interested in. Explain any variables that need to be controlled for. Explain the code used for missing values.