Upload
drusilla-boone
View
217
Download
0
Embed Size (px)
Citation preview
CDIS 5400Dr Brenda Louw 2010
Data Organization in Communication Disorders
Objectives
Demonstrate knowledge and understanding of the data organization phase of research
Overview
Measurements in Communication Disorders : nominal, ordinal, interval, and ratio scales
Data preparation Visual representation of data Descriptive statistics
Frequencies and percentages Measures of central tendency Measures of variability Shapes of distributions
Readings
Schiavetti et al.,2011 Chapter 5, pp 176-191, chapter 6 pp231-250
Introduction
After the data collection procedures have been carried out , raw data in quantitative research needs to be arranged and organized
Organization is required before data can be interpreted with regard to the structure of the research design
Make decisions re most appropriate way of representing data e.g. visual representations and descriptive statistics
MEASUREMENTS
Instrumental measures of physical variables, e.g., formant frequency, sound pressure level.
Observer measures of behavioral variables, e.g., perceived vocal pitch, loudness.
Correlations between physical and behavioral variables may be high (e.g., vocal fundamental frequency and perceived vocal pitch) or weak (e.g., intensity of an acoustical signal and its loudness) .
DEFINITION OF MEASUREMENT
•Stanley Smith Stevens (1906-1973) ,psychologist; founder of Harvard's Psycho-Acoustical Laboratory; authored a milestone textbook, the 1400+ page "Handbook of Experimental Psychology" (1951).
"Measurement is defined as the assignment of numerals to objects or events according to rules."
Introduced different kinds of scales (or levels of measurements): nominal, ordinal, interval, and ratio
Data distribution
Obtained measures on one or more variables from a distribution
Schiavetti et al.,2011 p 179 Table 5.1 Four types of distributions
nominal ordinal Interval ratio
NOMINAL SCALEThe process of measurement represents the naming of attributes of objects or events; those are classified into mutually exclusive categories.
Each member of a named class is identical.
e.g., a number assigned to gender (female = 1; male = 2), a test result (pass = 1; fail = 0) or a clinical condition (stutterer =1; nonstutterer = 0)
The numbers above are just labels assigned for identification purposes; they do not represent magnitude.
ORDINAL SCALEIn addition to identifying the members of a category, the magnitude of the attribute of objects or events is ranked.
e.g., severity of an impairment (mild, moderate, severe), pitch of a tone (low, middle, high)strongly
agree, moderately disagree, undecided, moderately
agree, strongly agree
Objects, events or observations are ordered but we do not know the size of the differences between each object measured.
e.g., the pitch of signal A is higher than that of signal B, which in turn is higher than the pitch of signal C. The scale gives the rank-order: A > B > C but does not specify the distances between the categories.
INTERVAL SCALE
Provides information about which participants have higher or lower scores and how much they differ
Specifies both the order among categories and the fixed distances among them
Equal intervals means that the difference between the units is the same anywhere on the scale
e.g., the dates on a calendar, temperature measurement (with C and F scales), language
tests’ scores e.g. PPVT-R, TOLD, CELF.
Ratio scale
Has all the features of an interval level measurement plus a true zero
3 characteristics: Ability to arrange numbers on a continuum Ability to specify the amount and differences
between amounts Ability to identify absolute zero relative to a
characteristic E.g. difference between 20 and 30 is the same as
between 55 and 65 Score of 60 represents 2x as much of the
characteristic as a score of 30
RATIO SCALE
E.g., time intervals
Speech intelligibility measured with a word-recognition test by counting the number of words spoken by a speaker that are heard correctly by a listener; it can be zero; 80% intelligibility means twice as many words are recognized correctly than for 40% intelligibility.
Factors affecting quality of measurement
Test environment: distraction, background noise, interruption, poor lighting;
Instrument calibration; Instructions to subjects; Observer bias: decision criteria may change from
one session to another, knowledge of the purpose of the experiment.
Questions Researchers classified preschool
children’s history of ME infections as frequent, moderate,infrequent based on the number of reported episodes. ORDINAL SCALE What scale of measurement was used?
What level of measurement would allow you to say that a score of 30 was 3x as much s a score of 10?RATIONAL SCALE
Reliability of measurement
Describes the degree to which we can depend on a measurement.
In general, there is always an error component in a measurement.
Systematic error (e.g., an improperly calibrated audiometer).
Behavior may change over time (within minutes, hours, days, etc.) due to fatigue, memory, etc.
Task’s difficulty may change within one measurement.
Reliability: the consistency of a measure
Does a test always measure the same thing?
xi = ti + ei
xi = subject score; ti = true score; ei = error
Validity of measurement
Describes the degree to which it measures what it purports to measure.
Reliability is the consistency of measurement; validity is appropriateness of measurement.
A measurement may be reliable but may not be correct; however, it needs to be reliable before we can access its validity.
It is relatively easy to determine validity of a measurement of a physical attribute (e.g., the sound pressure level).
It may be difficult to assess the validity of a behavioral measure (e.g., linguistic performance is used to evaluate language-processing strategies; a duration discrimination test).
DATA PREPARATION
Involves: Checking or logging in data Checking data for accuracy Developing data base structure Entering data Transforming data Developing and documenting data base
structure that integrates various measurements
Checking or logging in data
Data may come from different sources at different times e.g. surveys, coded-interview pretest or posttest data, observation data
Set up procedure for logging information and keeping track
Create data base that enables assessment of which data is entered and still needs to be
Data base critical component of good research record keeping
Programs available e.g. SAS Critical to retain original data records-IRB
Checking data for accuracy
Screen data for accuracy when received Clarify problems or errors Quality of measurement is crucial Questions to ask :
Is all relevant contextual information included e.g. date. SLP etc
Questionnaires: all q’s answered
Developing data base structure
Manner in which data is stored Can be the same as used for logging in
the data Can use data base programs or
statistical programs Generate code book: describes data and
where and how it can be accessed
continued
For each variable: Variable name, description, format(e.g.
number),instrument/method of collection, date, group/respondent, variable location in data base, notes
Indispensible tool for analysis team, comprehensive documentation to enable other researchers to subsequently analyze data
Entering data
Double entry ensures high level of data accuracy Program identifies discrepancies,
corrections Enter data once
Spot check records on random basis
Data transformation
Transform raw data into form that is more useful or usable in analyses
Missing values-analysis programs treat blank as missing values, assign value e.g. -99, program specific
Scale totals Categories: collapse many variables e.g.
mild,moderate,severe
VISUAL REPRESENTATION OF DATA Present distribution of data for visual
inspection before performing further data analysis
Tables and graphs advantage of showing overall contour of the distribution
Frequency tables, histograms, polygons,cumulative frequency distributions used
continued
American Psychological Association (APA,2001) provides guidelines for organization and formatting of tables
Charts and graphs Good for verbal presentations and posters
Primary means for presenting and interpreting single subject design research Pie chart Scatter plot Column and bar graphs Line graph
Continued…
Pie chart : percentages of observations that fit particular categories
32%
53%
5%
10%
Most of the time
Sometimes
Rarely
Question unanswered
Figure 4.6: Participants’ perception of competence in neonatal nursery (n = 41)
e.g. Pie chart –nominal data by %
Bar and column graphs
Illustrate magnitude or frequency of one or more variables
E.g. group differences on measures such as frequency counts, percentages of occurrence
Orientation of the display relative to axis differs
Column graph: originate from horizontal axis
Bar graph :originates from vertical axis
Bar graphs
FIGURE 5.3 Consultations of the children with CL/P with health care professionals (n=80)
FIGURE 5.6 Feeding, speech and hearing problems among the
children: parental reports (n=80)
100% 100%
44% 41%
3.70%
0%
20%
40%
60%
80%
100%
120%
Plasticsurgeon
SLT &Audiologist
Paediatrician ENTspecialist
Dentist
1
30
26
1
17
22
14
15
19
0
5
10
15
20
25
30
35
Lip only Palate only Lip & Palate Submucouscleft palate
Feeding Speech Hearing
Presence of Disorders affecting Communication Development in HIV/AIDS infected infants
0 5 10 15 20
Developmental Delay
Acute Otitis Media withEffusion
Chronic Otitis Media
Acute Otitis Mediawithout Effusion
Figure 4.8 Presence of Disorders affecting Communication Development in HIV/AIDS infected children (n=39)
74
28
64
0 10 20 30 40 50 60 70 80
Consultation with other professions in multi- ortransdisciplinary team (VV54)
Attendance of ward rounds with other professions (VV55)
In-service training & guidance of staff/team members (VV57)
Percentage participants
Figure 4.4: Participants’ indication of roles regarding intervention specifically directed at staff and team members (n = 39)
Scatter plot
Illustrates relationship between two and even three continuous measures
Used in correlation research
e.g. Scatter plot : Correlational Research
A positive correlation `indicates that an increase of one variable can be associated with an increase of the other.
A negative correlation indicates that an increase of one variable can be associated with a decrease of the other.
A correlation close to zero indicates that the two variables are unrelated.
DESCRIPTIVE STATISTICS
Reporting data in numerical form more common than by visual representation
Describes basic features of the data Provides summaries about the sample and
measures Forms basis of quantitative analysis of data Also referred to as summary statistics Descriptive statistics describe patterns and general trends
in a data set. In most cases, descriptive statistics are used to examine or explore one variable at a time. The relationship between two variables can also be described with correlation and regression.
Descriptive statistics
Frequencies and percentages Measures of central tendency Measures of variability Shapes of distributions
FREQUENCIES AND PERCENTAGES
Frequencies and percentages primary descriptive statistics for nominal level measures
E.g. frequency count for number of participants who fit a nominal category or behaviors that fit a particular classification
Percentage of occurrence :diving number of participants in a category by total number of across all categories and multiplying by 100
Continued
E.g. Primary work setting recently graduated audiologists ( Nelson,2009)
Work setting Frequency Percentage School 6 10 University 5 8 Hospital 16 27 Physicians office 16 27 Health Clinic 15 25 Other 2 3 Total 60 100
Tool
Number of infants who passed
(showed typical development)
Percentage of infants who
passed (showed typical
development)
Number of infants who failed
(showed delayed development)
Percentage of infants who
failed (showed delayed
development)
Rossetti Infant-Toddler
Language Scale 36 65.5% 19 34.5%
Surveillance Tool for
Communication Development 38 69.1% 17 30.1%
Difference 2 3.6% 2 3.6%
TABLE 4.2: Comparison between the Rossetti Infant-Toddler Language Scale (Rossetti, 2006) and the Surveillance Tool for Communication Development results at first screen (n=55)
Question 1: How many subjects had only one error?
Note the difference between reporting absolute values and reporting percentages;
we can’t draw any inferences about the general population if we don’t know the total number of subjects in our sample.
MEASURES OF CENTRAL TENDENCY
Estimate of the centre of a distribution of values Attempt to quantify what we mean when we
think of as the "typical" or "average" score in a data set.
Often asked questions how groups differ from each other "on average".
3 types of estimates: Mean Median Mode
Mean (M)
The mean, or "average", is the most widely used measure of central tendency. The mean is defined as the sum of all the data scores divided by n (the number of scores in the distribution). APA (2001) MMost appropriate for data at interval or ratio levels of measurement
Mean continued
The mean is often used to characterize the "typical" value in a distribution. E.g. 55,60,65,70,75,75,75,80,85,90,95 M=75.0 Mean can be influenced profoundly by one extreme data point (an "outlier"). Rather consider median as measure of central tendencyE.g. 30, 55,60,65,70,75,75,80,85,90,95
M= 70.9
Mean continued
If we use the letter x to represent the variable being measured, the mean is defined as:
x = ∑x/n
Can yield score that did not occur in data set
Median (Mdn)
Score found at the exact middle of the set of values
Middle most score Can be determined as long as numbers can be
ranked (order from low to high)(smallest to largest)
Median is point that separates upper half of data from the lower half
Median = 50th percentile Distributions of qualitative data do not have a
median.
Median continued
E.g. scores in a data set: 5, 7, 6, 1, 8. Sorting the data, we have: 1, 5, 6, 7, 8. Six is the "middle score". Compute (n+1)/2, where n is the number of data points. In the example above, n = 5, and so the median is the 3rd score in the sorted distribution, i.e., 6.
Median continued
If n is an even number, the median is defined as one half of the sum of the two data points that hold the two nearest locations to (n+1)/2.For example, suppose the data are 1, 4, 6, 5, 8, 0. The sorted distribution is 0, 1, 4, 5, 6, 8; n = 6, and (n+1)/2 = 3.5. This is not an integer. So the median is one half of the sum of the 3rd and 4th scores in the sorted distribution, i.e., 4.5.Notice that the median may not be an actual value in the data set.
Mode
Based on frequency information Most frequently occurring score in a
distribution Order scores and count each one E.g. 15,20,21,20,36,15,25,15
15 occurs 3x and is the mode In some distributions more than one modal
value
MEASURES OF VARIABILITY
Range Variance Standard deviation Interquartile range
Range
Spread from lowest to highest value in distribution of data
E.g. scores ranged from ..to.., the range was …
The smaller the range, the less variability in the distribution / the larger the range, the more variability in a distribution
Variance
Variance represents variability Determined by finding the mean of the
values in the distribution and determining how far each value in the distribution deviates from the mean
Variance and SD are based upon all the scores in the group.
Standard deviation ( SD)
Most important statistic for organizing data
Small SD scores of distribution do not spread out from the mean very much, group relatively homogenous
Large SD wide dispersion of scores, group heterogeneous
Standard deviation
Standard deviation (for a sample) is the square root of the variance:
SD = √variance = √SS/(N–1)
"Outliers" can wildly change the distribution’s standard deviation.
Interquartile range (Q)
When ser of scores has extremely high or low value, use alternative to range
Interquartile range is similar to the range but represents the difference between the score that falls at the 75th percentile and the 25th percentile
Characterizes the middle 50% of the data
Shapes of distributions
Distributions are patterns of scores Distribution of a variable provides
information about individual cases and about group scores
Distribution categorical variables displayed in bar graphs
Distributions of continuous variables displayed in line graphs or histograms
Normal distribution
Scores expressed in standardized scores (z) Dispersion around mean is symmetrical Half scores fall above and half below the
mean 34% fall within one standard deviation
below the mean and 34% fall within one standard deviation above the mean
So 68% of scores fall within +_1 standard deviation from the mean
Mean of zero and standard deviation of one
Normal distribution
The probability density function represents how likely each value of the random variable is.The red line is the standard normal distribution.
Normal distribution continued In a normal distribution
50% of all scores are above the mean and 50% of the scores are below the mean the mean, median, mode are the same most scores are near the mean; the farther from the mean a score is, the few number of subjects who attained that score.
In a normal distribution, the mean plus 3 SDs and the mean minus 3 SDs encompasses almost all the scores (99%)
Measures skewness
Lack of symmetry in distribution Skewness statistic Sk Sk 0 symmetrical distribution Positively skewed
scores tend to cluster toward the lower end of the scale (that is, the smaller numbers) with increasingly fewer scores at the upper end of the scale (that is, the larger numbers).
In our example we have a positively skewed distribution;the majority of subjects had 0, 1, or 2 errors.
Skewness continued
Negatively skewed most of the scores tend to occur toward the
upper end of the scale while increasingly fewer scores occur toward the lower end.
E.g. negatively skewed distribution would be age at retirement. Most people retire in their mid 60s or older, with increasingly fewer retiring at increasingly earlier ages.
Positively skewed Symmetrical Negatively skewed
A distribution is called unimodal if there is only one major "peak" in the distribution of scores when represented as a histogram. A distribution is "bimodal" if there are two major peaks.
Measures of Kurtosis
Concentration of scores around the centre of the distribution
Evaluated relative to the bell-shaped normal distribution
flat or peaked in shape
Conclusion
Once data has been organized by Measurements nominal, ordinal, interval, and ratio
scales Data preparation Visual representation of data Descriptive statistics
Statistical procedures can be employed to analyze
the results Two types of statistics: descriptive and
inferential