12
Data preparation for use in SEM Ned Kock

Data preparation for use in SEM

  • Upload
    marged

  • View
    40

  • Download
    4

Embed Size (px)

DESCRIPTION

Data preparation for use in SEM. Ned Kock. Each column corresponds to a manifest variable. Data in table format. Some groups of columns correspond to a latent variable. Each row often contains the answers from one subject under a particular condition, and is also known as a “case”. - PowerPoint PPT Presentation

Citation preview

Page 1: Data preparation for use in SEM

Data preparation for use in SEM

Ned Kock

Page 2: Data preparation for use in SEM

Data in table format Each column corresponds to a manifest variable.

Some groups of columns correspond to a latent variable.

Each row often contains the answers from one subject under a particular condition, and is also known as a “case”.

Page 3: Data preparation for use in SEM

Missing values• A missing value is an empty cell in a data table.• Missing values are a fact of life in many areas of

research, including behavioral research.• In terms of behavioral research, missing values

may be present when:– Respondents do not answer one or more questions in a

questionnaire.– A researcher empties a data cell when a respondent

answers a question with non-usable data; e.g., by responding with a “0” (zero) when asked for his or her age.

Page 4: Data preparation for use in SEM

Examples of missing values

Datasets with missing values are a common occurrence in behavioral research, as well as other types of research.

Page 5: Data preparation for use in SEM

Percentage of missing dataA simple Excel formula can be used to calculate the percentage of missing data for a manifest variable.

How much is too much?

As a rule of thumb, as much as 10% is acceptable. More than that can lead to problems.

Supporting source: Kline, R.B. (1998), Principles and Practice of Structural Equation Modeling, The Guilford Press, New York, NY.

Page 6: Data preparation for use in SEM

Dealing with missing values• A first step is to ensure that no more than 10% of

the data is missing in each column of a data table.• The above can be accomplished by randomly

removing rows that have missing values, until the column meets the 10% rule of thumb.

• Then the remaining missing cells can be filled using one of the several available techniques, such as replacing missing values with:– The column mean.– The mean of nearby points.– A number obtained though linear interpolation.

Page 7: Data preparation for use in SEM

Replacing missing values with SPSS

Page 8: Data preparation for use in SEM

Creating source data file for WarpPLS• Source data files contain the data used in a

WarpPLS analysis.• They are often referred to as “raw data files”.• Source data files should be prepared as follows:

– They should be .xls or .xlsx files (Excel), or plain text files with the names of the variables first followed by each data case in the same order as the variables listed (missing data points do not have to be imputed a-priori).

– If text files, variable names and numeric data should be separated from each other by tabs.

– If text files, the suffix of the data file should be designated as .txt.

Page 9: Data preparation for use in SEM

Using Excel to create a .txt file

Page 10: Data preparation for use in SEM

Important tips• One file format that usually works well for a .txt file, and

that is widely available is the ASCII tab-delimited format.

• If you are using Excel to create a .txt file, save the Excel-formatted file first, and create the .txt file with a different name.

• With Excel, have only one worksheet with the raw data.• You can also create .txt tab-delimited files using SPSS, in

which case it is important to instruct SPSS to write the variable names into the .txt file.– The above is done by default when you use Excel.

Page 11: Data preparation for use in SEM

Reading raw data file in WarpPLSFile import wizard

Viewing and accepting data

Page 12: Data preparation for use in SEM

AcknowledgementsAdapted text, illustrations, and ideas from the following sources were used in the preparation of the preceding set of slides:

1. Kock, N. (2010). WarpPLS 1.0 User Manual, ScriptWarp Systems, Laredo, Texas.

2. Kline, R.B. (1998), Principles and Practice of Structural Equation Modeling, The Guilford Press, New York, NY.

3. MS Excel, SPSS, and WarpPLS software applications.4. Rencher, A.C. (1998), Multivariate Statistical Inference and

Applications, John Wiley & Sons, New York, NY.5. SPSS’ web site: www.spss.com.6. WarpPLS software.

Final slide