Upload
shahzad-asghar-arain
View
69
Download
0
Embed Size (px)
Citation preview
Using EpiData and SPSSShahzad Asghar Arain
[email protected] 92 312 514 9114
http://shahzadasghar.info
ReferencesPublic domain (pdf) book on data management:
Bennett, et al. (2001). Data Management for Surveys and Trials. A Practical Primer Using EpiData. The EpiData Documentation Project. : http://www.epidata.dk/downloads/dmepidata.pdf
EpiData Association Website: http://www.epidata.dk/
Importing raw data into SPSS: http://www.ats.ucla.edu/stat/spss/modules/input.htm
Data ManagementPlanning data needsData collectionData entry and controlValidation and checkingData cleaning and variable transformationData backup and storageSystem documentationOther
Types of Data Base Management Systems (DBMSs)
Spreadsheets (e.g., Excel, SPSS Data Editor) Prone to error, data corruption, & mismanagement Lack data controls, limited programmability Suitable only for small and didactic projects Also good for last step data cleaning
Commercial DBMS programs (e.g., MySQL,Oracle, Access) Limited data control, good programmability Slow & expensive Powerful and widely available
Public domain programs (e.g., EpiData, Epi Info) Controlled data entry, good programmability Suitable for research and field use
We will use two platforms:EpiData
controlled data entry data documentationexport (“write”) data
SPSS import (“read”) dataanalysis reporting
What is EpiData ? EpiData is computer program (small in size
1.2Mb) for simple or programmed data entry and data documentation
It is highly reliable It runs on Windows computers
Runs on Macs and Linus with emulator software (only)Interface
pull down menus work bar
History of EpiInfo & EpiData 1976–1995: EpiInfo (DOS program) created by
CDC (in wake of swine flu epidemic)Small, fast, reliable, 100,000+ users worldwide
1995–2000: DOS dies slow painful death2000: CDC releases EpiInfo2000
Based on Microsoft Jet (Access) data engineLarge, slow, unreliable (resembled EpiInfo in name only)
2001: Loyal EpiInfo user group decides it needs real “EpiInfo for Windows”Creates open source public domain program Calls program “EpiData”
Goal: Create & Maintain Error-Free DatasetsTwo types of data errors
Measurement error (i.e., information bias) – discussed last couple of weeks
Processing errors = errors that occur during data handling – discussed this week
Examples of data processing errorsTranspositions (91 instead of 19)Copying errors (O instead of 0)Additional processing errors described on p.
18.2
Avoiding Data Processing ErrorsManual checks (e.g., handwriting legibility)
Range and consistency checks* (e.g., do not allow hysterectomy dates for men)
Double entry and validation* Operator 1 enters dataOperator 2 enters data in separate fileCheck files for inconsistencies
Screening during analysis (e.g., look for outliers)
* covered in lab
Controlled Data EntryCriteria for accepting & rejecting dataTypes of data controls
Range checks (e.g., restrict AGE to reasonable range)
Value labels (e.g., SEX: 1 = male, 2 = female)Jumps (e.g., if “male,” jump to Q8)Consistency checks (e.g., if “sex = male,” do
not allow “hysterectomy = yes”)Must entersetc.
Data Processing Steps1. File naming conventions2. Variables types and names3. QES (questionnaire) development4. Convert .QES file to .REC (record) file 5. Add .CHK file 6. Enter data in REC file7. Validate data (double entry procedure)8. Documentation data (code book) 9. Export data to SPSS 10. Import data into SPSS
Filenaming and File Managementc:\path\filename.extA web address is a good example of a filename,
e.g., http://www2.sjsu.edu/faculty/gerstman/StatPrimer/data.pptSome systems are case sensitive (Unix)
Others are not (Windows) Always be aware of
Physical location (local, removable, network) Path (folders and subfolders) Filename (proper) Extension
Demo Windows Network Explorer: right-click Start Bar > Explore
ExtensionExtension Software programSoftware program.qes.qes EpiInfo/EpiData questionnaireEpiInfo/EpiData questionnaire.rec.rec EpiInfo/EpiData records (data)EpiInfo/EpiData records (data).chk.chk EpiInfo/EpiData check (controls & labels)EpiInfo/EpiData check (controls & labels).not.not EpiData notes (data documentation)EpiData notes (data documentation).sav.sav SPSS permanent data fileSPSS permanent data file.sps.sps SPSS syntax file (program)SPSS syntax file (program).txt.txt Generic (flat) text dataGeneric (flat) text data.htm.htm Web BrowserWeb Browser.doc.doc Microsoft WordMicrosoft Word.xls.xls Microsoft ExcelMicrosoft Excel
Selected EpiData Variable Types
Variable TypeVariable Type ExamplesExamplesTextText _ _
<A ><A >NumericNumeric ##
##.###.#DateDate <mm/dd/yyyy><mm/dd/yyyy>
<dd/mm/yyyy><dd/mm/yyyy>Auto IDAuto ID <IDNUM><IDNUM>Sondex (sanitized)Sondex (sanitized) <S ><S >
EpiData Variable NamesVariable name based on text that occurs
before variable type indicator codeEpiData variable naming default vary
depending on installation Create variable names exactly as specified
To be safe, denote variable names in {curly brackets}
For example, to create a two byte numeric variable called age, use the question:
What is your {age}? ##
Demo / Work AlongCreate QES file [demo.qes]Convert QES to REC [demo.rec]Create CHK file [demo.chk]Create double entry file [demo2.rec]Enter data Validate data
FnameFname LnameLname DOBDOB SEXSEX DEATHAGEDEATHAGE
JohnJohn SnowSnow 3/15/18133/15/1813 11 4545
GeorgeGeorge OrwellOrwell 6/25/19036/25/1903 11 4646
CodebooksContain info that helps users decipher
data file content and structureIncludes:
Filename(s)File location(s)Variable namesCoding schemesUnits Anything else you think might be useful
EpiData codebook generators
File Structure Codebook
Full codebook contains descriptive statistics (demo)
Notice descriptive statistics
Conversion of Data FileRequires common intermediate file formatExamples of common intermediate files
.TXT = plain text .DBF = dBase program.XLS = Excel
StepsExport .REC file .TXT fileImport .TXT file into SPSS Save permanent SAV file
Plain (“raw”) TXT dataplain ASCII data formatno column demarcationsno variable namesno labels
tox-samp.txttox-samp.txt tox-samp.nottox-samp.not
SPSS Data Export / Import
TXT(raw data)
REC
SPS(syntax)
SAV
Lines beginning with * are comments (ignored by command interpreter)
Next set of commands showfile location and structure via SPSS command syntax
Labels being importedinto SPSS
Delete * if you want this command to run
Ethics of Data KeepingConfidentiality (sanitized files – free of
identifiers)Beneficence EquipoiseInformed consent (To what extent?)Oversight (IRB)