CDIS 5400 Dr Brenda Louw 2010 Data Organization in Communication Disorders

CDIS 5400Dr Brenda Louw 2010

Data Organization in Communication Disorders

Objectives

Demonstrate knowledge and understanding of the data organization phase of research

Overview

Measurements in Communication Disorders : nominal, ordinal, interval, and ratio scales

Data preparation Visual representation of data Descriptive statistics

Frequencies and percentages Measures of central tendency Measures of variability Shapes of distributions

Readings

Schiavetti et al.,2011 Chapter 5, pp 176-191, chapter 6 pp231-250

Introduction

After the data collection procedures have been carried out , raw data in quantitative research needs to be arranged and organized

Organization is required before data can be interpreted with regard to the structure of the research design

Make decisions re most appropriate way of representing data e.g. visual representations and descriptive statistics

MEASUREMENTS

Instrumental measures of physical variables, e.g., formant frequency, sound pressure level.

Observer measures of behavioral variables, e.g., perceived vocal pitch, loudness.

Correlations between physical and behavioral variables may be high (e.g., vocal fundamental frequency and perceived vocal pitch) or weak (e.g., intensity of an acoustical signal and its loudness) .

DEFINITION OF MEASUREMENT

•Stanley Smith Stevens (1906-1973) ,psychologist; founder of Harvard's Psycho-Acoustical Laboratory; authored a milestone textbook, the 1400+ page "Handbook of Experimental Psychology" (1951).

"Measurement is defined as the assignment of numerals to objects or events according to rules."

Introduced different kinds of scales (or levels of measurements): nominal, ordinal, interval, and ratio

Data distribution

Obtained measures on one or more variables from a distribution

Schiavetti et al.,2011 p 179 Table 5.1 Four types of distributions

nominal ordinal Interval ratio

NOMINAL SCALEThe process of measurement represents the naming of attributes of objects or events; those are classified into mutually exclusive categories.

Each member of a named class is identical.

e.g., a number assigned to gender (female = 1; male = 2), a test result (pass = 1; fail = 0) or a clinical condition (stutterer =1; nonstutterer = 0)

The numbers above are just labels assigned for identification purposes; they do not represent magnitude.

ORDINAL SCALEIn addition to identifying the members of a category, the magnitude of the attribute of objects or events is ranked.

e.g., severity of an impairment (mild, moderate, severe), pitch of a tone (low, middle, high)strongly

agree, moderately disagree, undecided, moderately

agree, strongly agree

Objects, events or observations are ordered but we do not know the size of the differences between each object measured.

e.g., the pitch of signal A is higher than that of signal B, which in turn is higher than the pitch of signal C. The scale gives the rank-order: A > B > C but does not specify the distances between the categories.

INTERVAL SCALE

Provides information about which participants have higher or lower scores and how much they differ

Specifies both the order among categories and the fixed distances among them

Equal intervals means that the difference between the units is the same anywhere on the scale

e.g., the dates on a calendar, temperature measurement (with C and F scales), language

tests’ scores e.g. PPVT-R, TOLD, CELF.

Ratio scale

Has all the features of an interval level measurement plus a true zero

3 characteristics: Ability to arrange numbers on a continuum Ability to specify the amount and differences

between amounts Ability to identify absolute zero relative to a

characteristic E.g. difference between 20 and 30 is the same as

between 55 and 65 Score of 60 represents 2x as much of the

characteristic as a score of 30

RATIO SCALE

E.g., time intervals

Speech intelligibility measured with a word-recognition test by counting the number of words spoken by a speaker that are heard correctly by a listener; it can be zero; 80% intelligibility means twice as many words are recognized correctly than for 40% intelligibility.

Factors affecting quality of measurement

Test environment: distraction, background noise, interruption, poor lighting;

Instrument calibration; Instructions to subjects; Observer bias: decision criteria may change from

one session to another, knowledge of the purpose of the experiment.

ETSU

Questions Researchers classified preschool

children’s history of ME infections as frequent, moderate,infrequent based on the number of reported episodes. ORDINAL SCALE What scale of measurement was used?

What level of measurement would allow you to say that a score of 30 was 3x as much s a score of 10?RATIONAL SCALE

Reliability of measurement

Describes the degree to which we can depend on a measurement.

In general, there is always an error component in a measurement.

Systematic error (e.g., an improperly calibrated audiometer).

Behavior may change over time (within minutes, hours, days, etc.) due to fatigue, memory, etc.

Task’s difficulty may change within one measurement.

Reliability: the consistency of a measure

Does a test always measure the same thing?

xi = ti + ei

xi = subject score; ti = true score; ei = error

Validity of measurement

Describes the degree to which it measures what it purports to measure.

Reliability is the consistency of measurement; validity is appropriateness of measurement.

A measurement may be reliable but may not be correct; however, it needs to be reliable before we can access its validity.

It is relatively easy to determine validity of a measurement of a physical attribute (e.g., the sound pressure level).

It may be difficult to assess the validity of a behavioral measure (e.g., linguistic performance is used to evaluate language-processing strategies; a duration discrimination test).

DATA PREPARATION

Involves: Checking or logging in data Checking data for accuracy Developing data base structure Entering data Transforming data Developing and documenting data base

structure that integrates various measurements

Checking or logging in data

Data may come from different sources at different times e.g. surveys, coded-interview pretest or posttest data, observation data

Set up procedure for logging information and keeping track

Create data base that enables assessment of which data is entered and still needs to be

Data base critical component of good research record keeping

Programs available e.g. SAS Critical to retain original data records-IRB

Checking data for accuracy

Screen data for accuracy when received Clarify problems or errors Quality of measurement is crucial Questions to ask :

Is all relevant contextual information included e.g. date. SLP etc

Questionnaires: all q’s answered

Developing data base structure

Manner in which data is stored Can be the same as used for logging in

the data Can use data base programs or

statistical programs Generate code book: describes data and

where and how it can be accessed

continued

For each variable: Variable name, description, format(e.g.

number),instrument/method of collection, date, group/respondent, variable location in data base, notes

Indispensible tool for analysis team, comprehensive documentation to enable other researchers to subsequently analyze data

Entering data

Double entry ensures high level of data accuracy Program identifies discrepancies,

corrections Enter data once

Spot check records on random basis

Data transformation

Transform raw data into form that is more useful or usable in analyses

Missing values-analysis programs treat blank as missing values, assign value e.g. -99, program specific

Scale totals Categories: collapse many variables e.g.

mild,moderate,severe

VISUAL REPRESENTATION OF DATA Present distribution of data for visual

inspection before performing further data analysis

Tables and graphs advantage of showing overall contour of the distribution

Frequency tables, histograms, polygons,cumulative frequency distributions used

continued

American Psychological Association (APA,2001) provides guidelines for organization and formatting of tables

Charts and graphs Good for verbal presentations and posters

Primary means for presenting and interpreting single subject design research Pie chart Scatter plot Column and bar graphs Line graph

Continued…

Pie chart : percentages of observations that fit particular categories

32%

53%

5%

10%

Most of the time

Sometimes

Rarely

Question unanswered

Figure 4.6: Participants’ perception of competence in neonatal nursery (n = 41)

e.g. Pie chart –nominal data by %

Bar and column graphs

Illustrate magnitude or frequency of one or more variables

E.g. group differences on measures such as frequency counts, percentages of occurrence

Orientation of the display relative to axis differs

Column graph: originate from horizontal axis

Bar graph :originates from vertical axis

Bar graphs

FIGURE 5.3 Consultations of the children with CL/P with health care professionals (n=80)

FIGURE 5.6 Feeding, speech and hearing problems among the

children: parental reports (n=80)

100% 100%

44% 41%

3.70%

0%

20%

40%

60%

80%

100%

120%

Plasticsurgeon

SLT &Audiologist

Paediatrician ENTspecialist

Dentist

1

30

26

1

17

22

14

15

19

0

5

10

15

20

25

30

35

Lip only Palate only Lip & Palate Submucouscleft palate

Feeding Speech Hearing

Presence of Disorders affecting Communication Development in HIV/AIDS infected infants

0 5 10 15 20

Developmental Delay

Acute Otitis Media withEffusion

Chronic Otitis Media

Acute Otitis Mediawithout Effusion

Figure 4.8 Presence of Disorders affecting Communication Development in HIV/AIDS infected children (n=39)

74

28

64

0 10 20 30 40 50 60 70 80

Consultation with other professions in multi- ortransdisciplinary team (VV54)

Attendance of ward rounds with other professions (VV55)

In-service training & guidance of staff/team members (VV57)

Percentage participants

Figure 4.4: Participants’ indication of roles regarding intervention specifically directed at staff and team members (n = 39)

Scatter plot

Illustrates relationship between two and even three continuous measures

Used in correlation research

e.g. Scatter plot : Correlational Research

A positive correlation `indicates that an increase of one variable can be associated with an increase of the other.

A negative correlation indicates that an increase of one variable can be associated with a decrease of the other.

A correlation close to zero indicates that the two variables are unrelated.

DESCRIPTIVE STATISTICS

Reporting data in numerical form more common than by visual representation

Describes basic features of the data Provides summaries about the sample and

measures Forms basis of quantitative analysis of data Also referred to as summary statistics Descriptive statistics describe patterns and general trends

in a data set. In most cases, descriptive statistics are used to examine or explore one variable at a time. The relationship between two variables can also be described with correlation and regression.

Descriptive statistics

Frequencies and percentages Measures of central tendency Measures of variability Shapes of distributions

FREQUENCIES AND PERCENTAGES

Frequencies and percentages primary descriptive statistics for nominal level measures

E.g. frequency count for number of participants who fit a nominal category or behaviors that fit a particular classification

Percentage of occurrence :diving number of participants in a category by total number of across all categories and multiplying by 100

Continued

E.g. Primary work setting recently graduated audiologists ( Nelson,2009)

Work setting Frequency Percentage School 6 10 University 5 8 Hospital 16 27 Physicians office 16 27 Health Clinic 15 25 Other 2 3 Total 60 100

Tool

Number of infants who passed

(showed typical development)

Percentage of infants who

passed (showed typical

development)

Number of infants who failed

(showed delayed development)

Percentage of infants who

failed (showed delayed

development)

Rossetti Infant-Toddler

Language Scale 36 65.5% 19 34.5%

Surveillance Tool for

Communication Development 38 69.1% 17 30.1%

Difference 2 3.6% 2 3.6%

TABLE 4.2: Comparison between the Rossetti Infant-Toddler Language Scale (Rossetti, 2006) and the Surveillance Tool for Communication Development results at first screen (n=55)

Question 1: How many subjects had only one error?

Note the difference between reporting absolute values and reporting percentages;

we can’t draw any inferences about the general population if we don’t know the total number of subjects in our sample.

MEASURES OF CENTRAL TENDENCY

Estimate of the centre of a distribution of values Attempt to quantify what we mean when we

think of as the "typical" or "average" score in a data set.

Often asked questions how groups differ from each other "on average".

3 types of estimates: Mean Median Mode

Mean (M)

The mean, or "average", is the most widely used measure of central tendency. The mean is defined as the sum of all the data scores divided by n (the number of scores in the distribution). APA (2001) MMost appropriate for data at interval or ratio levels of measurement

Mean continued

The mean is often used to characterize the "typical" value in a distribution. E.g. 55,60,65,70,75,75,75,80,85,90,95 M=75.0 Mean can be influenced profoundly by one extreme data point (an "outlier"). Rather consider median as measure of central tendencyE.g. 30, 55,60,65,70,75,75,80,85,90,95

M= 70.9

Mean continued

If we use the letter x to represent the variable being measured, the mean is defined as:

x = ∑x/n

Can yield score that did not occur in data set

Median (Mdn)

Score found at the exact middle of the set of values

Middle most score Can be determined as long as numbers can be

ranked (order from low to high)(smallest to largest)

Median is point that separates upper half of data from the lower half

Median = 50th percentile Distributions of qualitative data do not have a

median.

Median continued

E.g. scores in a data set: 5, 7, 6, 1, 8. Sorting the data, we have: 1, 5, 6, 7, 8. Six is the "middle score". Compute (n+1)/2, where n is the number of data points. In the example above, n = 5, and so the median is the 3rd score in the sorted distribution, i.e., 6.

Median continued

If n is an even number, the median is defined as one half of the sum of the two data points that hold the two nearest locations to (n+1)/2.For example, suppose the data are 1, 4, 6, 5, 8, 0. The sorted distribution is 0, 1, 4, 5, 6, 8; n = 6, and (n+1)/2 = 3.5. This is not an integer. So the median is one half of the sum of the 3rd and 4th scores in the sorted distribution, i.e., 4.5.Notice that the median may not be an actual value in the data set.

Mode

Based on frequency information Most frequently occurring score in a

distribution Order scores and count each one E.g. 15,20,21,20,36,15,25,15

15 occurs 3x and is the mode In some distributions more than one modal

value

MEASURES OF VARIABILITY

Range Variance Standard deviation Interquartile range

Range

Spread from lowest to highest value in distribution of data

E.g. scores ranged from ..to.., the range was …

The smaller the range, the less variability in the distribution / the larger the range, the more variability in a distribution

Variance

Variance represents variability Determined by finding the mean of the

values in the distribution and determining how far each value in the distribution deviates from the mean

Variance and SD are based upon all the scores in the group.

Standard deviation ( SD)

Most important statistic for organizing data

Small SD scores of distribution do not spread out from the mean very much, group relatively homogenous

Large SD wide dispersion of scores, group heterogeneous

Standard deviation

Standard deviation (for a sample) is the square root of the variance:

SD = √variance = √SS/(N–1)

"Outliers" can wildly change the distribution’s standard deviation.

Interquartile range (Q)

When ser of scores has extremely high or low value, use alternative to range

Interquartile range is similar to the range but represents the difference between the score that falls at the 75th percentile and the 25th percentile

Characterizes the middle 50% of the data

Shapes of distributions

Distributions are patterns of scores Distribution of a variable provides

information about individual cases and about group scores

Distribution categorical variables displayed in bar graphs

Distributions of continuous variables displayed in line graphs or histograms

Normal distribution

Scores expressed in standardized scores (z) Dispersion around mean is symmetrical Half scores fall above and half below the

mean 34% fall within one standard deviation

below the mean and 34% fall within one standard deviation above the mean

So 68% of scores fall within +_1 standard deviation from the mean

Mean of zero and standard deviation of one

Normal distribution

The probability density function represents how likely each value of the random variable is.The red line is the standard normal distribution.

Normal distribution continued In a normal distribution

50% of all scores are above the mean and 50% of the scores are below the mean the mean, median, mode are the same most scores are near the mean; the farther from the mean a score is, the few number of subjects who attained that score.

In a normal distribution, the mean plus 3 SDs and the mean minus 3 SDs encompasses almost all the scores (99%)

Measures skewness

Lack of symmetry in distribution Skewness statistic Sk Sk 0 symmetrical distribution Positively skewed

scores tend to cluster toward the lower end of the scale (that is, the smaller numbers) with increasingly fewer scores at the upper end of the scale (that is, the larger numbers).

In our example we have a positively skewed distribution;the majority of subjects had 0, 1, or 2 errors.

Skewness continued

Negatively skewed most of the scores tend to occur toward the

upper end of the scale while increasingly fewer scores occur toward the lower end.

E.g. negatively skewed distribution would be age at retirement. Most people retire in their mid 60s or older, with increasingly fewer retiring at increasingly earlier ages.

Positively skewed Symmetrical Negatively skewed

A distribution is called unimodal if there is only one major "peak" in the distribution of scores when represented as a histogram. A distribution is "bimodal" if there are two major peaks.

Measures of Kurtosis

Concentration of scores around the centre of the distribution

Evaluated relative to the bell-shaped normal distribution

flat or peaked in shape

Conclusion

Once data has been organized by Measurements nominal, ordinal, interval, and ratio

scales Data preparation Visual representation of data Descriptive statistics

Statistical procedures can be employed to analyze

the results Two types of statistics: descriptive and

inferential

Documents

CDIS 5400 Dr Brenda Louw 2010 Data Organization in Communication Disorders