15
Experimental Design Student Saturday Session Submitted by Gloria Barrett and Floyd Bullard, Virginia Advanced Study Strategies January, 2011 Student Notes - Prep Session Topic: Experimental Design A free response question dealing with sampling or experimental design has appeared on every AP Statistics exam. The question is designed to assess your understanding of fundamental concepts and generally consists of multiple parts. Important vocabulary and key concepts related to experimental design include: The difference between an experiment and an observational study Characteristics of a welldesigned experiment; specifically: o Control—keeping extraneous variables as constant as possible; o Replication—having at least two experimental units in each treatment group; o Randomization—making sure that treatments are allocated to experimental units at random; o and (sometimes) Blocking—see a few more details below. Control group Explanatory variables: factors, levels, and treatments Experimental units and subjects Response variable Sources of possible bias Placebo effect Confounding Blinding (single blind and double blind) Blocking, blocks Completely randomized design Randomized block design, including matched pairs design Types of conclusions that can be drawn from experiments and observational studies Scope of inference Be sure you understand that ‐‐‐ Experiments are studies in which the researcher imposes a treatment on experimental units. If no treatment is assigned or imposed, the study is called an observational study. Experimental units are the smallest independent “objects” to which treatments are assigned and on which a response is measured. Consider an experiment that is designed to determine which of several types of fish food will result in the greatest weight gain for fish. If tanks contain several fish, and food is added to the water in the tank, then the tank is the experimental unit (not the individual fish), since the fish in a tank are not independent of one another, but tanks are independent of one another. Some experiments have a control group (a group of experimental units that receive no treatment or receive only a placebo), but this is not necessary for a well designed experiment. Sometimes different treatments are simply compared with one another. Replication refers to having multiple experimental units in each treatment group (repeating the treatment), not to repeating the entire experiment. In an experiment, randomization refers to randomly assigning experimental units to the treatments. Often the experimental units are not a random sample of the population of interest. While this is not a problem with the experimental design, it may limit the scope of inference for the experimental results. (Note that random samples are necessary in surveys.) The purpose of random assignment (of experimental units to treatment groups or of treatments to experimental units) is to even out extraneous variables and make treatment groups that are approximately similar in all respects except for the treatment. In a double blind experiment, someone must know which treatment the experimental unit received! The subjects (assuming they are people) are blind to which treatments they are receiving, and anyone who interacts with the subjects should also be blinded to treatments. Also, if the response variable is in any way a subjective evaluation, then the person performing that evaluation must be blind to what treatments were applied. But obviously, some person or people on the research team must have a record of what treatments have been applied to what subjects!

Experimental Design Student Saturday Session€¦ · Experimental Design Student Saturday Session Submitted)byGloria) ... Important vocabulary and key concepts related to experimental

Embed Size (px)

Citation preview

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

Student Notes - Prep Session Topic: Experimental Design A free response question dealing with sampling or experimental design has appeared on every AP Statistics exam. The question is designed to assess your understanding of fundamental concepts and generally consists of multiple parts. Important vocabulary and key concepts related to experimental design include:

• The difference between an experiment and an observational study • Characteristics of a well‐designed experiment; specifically:

o Control—keeping extraneous variables as constant as possible; o Replication—having at least two experimental units in each treatment group; o Randomization—making sure that treatments are allocated to experimental units at random; o and (sometimes) Blocking—see a few more details below.

• Control group • Explanatory variables: factors, levels, and treatments • Experimental units and subjects • Response variable • Sources of possible bias • Placebo effect • Confounding • Blinding (single blind and double blind) • Blocking, blocks • Completely randomized design • Randomized block design, including matched pairs design • Types of conclusions that can be drawn from experiments and observational studies • Scope of inference

Be sure you understand that ‐‐‐ • Experiments are studies in which the researcher imposes a treatment on experimental units. If no treatment

is assigned or imposed, the study is called an observational study. • Experimental units are the smallest independent “objects” to which treatments are assigned and on which a

response is measured. Consider an experiment that is designed to determine which of several types of fish food will result in the greatest weight gain for fish. If tanks contain several fish, and food is added to the water in the tank, then the tank is the experimental unit (not the individual fish), since the fish in a tank are not independent of one another, but tanks are independent of one another.

• Some experiments have a control group (a group of experimental units that receive no treatment or receive only a placebo), but this is not necessary for a well designed experiment.

• Sometimes different treatments are simply compared with one another. • Replication refers to having multiple experimental units in each treatment group (repeating the treatment),

not to repeating the entire experiment. • In an experiment, randomization refers to randomly assigning experimental units to the treatments. Often

the experimental units are not a random sample of the population of interest. While this is not a problem with the experimental design, it may limit the scope of inference for the experimental results. (Note that random samples are necessary in surveys.)

• The purpose of random assignment (of experimental units to treatment groups or of treatments to experimental units) is to even out extraneous variables and make treatment groups that are approximately similar in all respects except for the treatment.

• In a double blind experiment, someone must know which treatment the experimental unit received! The subjects (assuming they are people) are blind to which treatments they are receiving, and anyone who interacts with the subjects should also be blinded to treatments.

• Also, if the response variable is in any way a subjective evaluation, then the person performing that evaluation must be blind to what treatments were applied. But obviously, some person or people on the research team must have a record of what treatments have been applied to what subjects!

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

• A confounding variable is one that affects the response variable and also is related to group membership. A variable that affects the response variable and is not related to group membership (that is, the variable would be expected to even out across the groups) is not a confounding variable. You may refer to this type of variable as an extraneous variable.

• For example: It has been observed that people who take long vacations have, on average, significantly longer lifespans than people who don’t. Can we conclude that vacationing is a way to extend your lifespan? Not necessarily: a person’s income could be a confounding variable—people with higher incomes are more likely to be able to take long vacations, and they’re also more likely to afford health care that could lead to longer lifespans. Note that something like exercise would probably be an extraneous variable and not a confounding variable. Exercise may indeed be associated with longer lifespans, but is there an association between getting exercise and taking long vacations?

• It is best to avoid using the term lurking variable. It will almost always get you in trouble! • Blocks are groups of experimental units that are homogeneous with respect to some inherent characteristic

that is expected to affect the response to treatments. • Blocks are considered a form of control – blocks help control known sources of variability among the

experimental units so that the experimenter is better able to detect differences in the response variable that are due to the treatments.

• “Blocking is used to control the factors you can see; randomization helps balance the ones you cannot see.” Richard L. Scheaffer, AP Statistics Chief Faculty Consultant, 1997‐1999

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

Multiple Choice Questions from 2002 Exam

1. Which of the following is a key distinction between well designed experiments and observational studies? A. More subjects are available for experiments than for observational studies. B. Ethical constraints prevent large‐scale observational studies. C. Experiments are less costly to conduct than observational studies. D. An experiment can show a direct cause‐and‐effect relationship, whereas an observational study cannot. E. Tests of significance cannot be used on data collected from an observational study. 22. A study of existing records of 27,000 automobile accidents involving children in Michigan found that about

10 percent of children who were wearing a seatbelt (group SB) were injured and that about 15 percent of children who were not wearing a seatbelt (group NSB) were injured. Which of the following statements should NOT be included in a summary report about this study?

A. Driver behavior may be a potential confounding factor. B. The child's location in the car may be a potential confounding factor. C. This study was not an experiment, and cause‐and‐effect inferences are not warranted. D. This study demonstrates clearly that seat belts save children from injury. E. Concluding that seatbelts save children from injury is risky, at least until the study is independently

replicated. 25. A new medication has been developed to treat sleep‐onset insomnia (difficulty in falling asleep). Researchers

want to compare this drug to a drug that has been used in the past by comparing the length of time it takes subjects to fall asleep. Of the following, which is the best method for obtaining this information?

A. Have subjects choose which drug they are willing to use, then compare the results. B. Assign the two drugs to the subjects on the basis of their past sleep history without randomization, then

compare the results. C. Give the new drug to all subjects on the first night. Give the old drug to all subjects on the second night.

Compare the results. D. Randomly assign the subjects to two groups, giving the new drug to one group and no drug to the other

group, then compare the results. E. Randomly assign the subjects to two groups, giving the new drug to one group and the old drug to the

other group, then compare the results.

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

MC Answers: 1-D, 2-D, 3-E 2006B #5 5. When a tractor pulls a plow through an agricultural field, the energy needed to pull that plow is called the

draft. The draft is affected by environmental conditions such as soil type, terrain, and moisture. A study was conducted to determine whether a newly developed hitch would be able to reduce draft compared to the standard hitch. (A hitch is used to connect the plow to the tractor.) Two large plots of land were used in this study. It was randomly determined which plot was to be plowed using the standard hitch. As the tractor plowed that plot, a measurement device on the tractor automatically recorded the draft at 25 randomly selected points in the plot. After the plot was plowed, the hitch was changed from the standard one to the new one, a process that takes a substantial amount of time. Then the second plot was plowed using the new hitch. Twenty-five measurements of draft were also recorded at randomly selected points in this plot.

A. What was the response variable in this study? Identify the treatments. What were the experimental units? B. Given that the goal of the study is to determine whether a newly developed hitch reduces draft compared

to the standard hitch, was randomization used properly in this study? Justify your answer.

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

C. Given that the goal of the study is to determine whether a newly developed hitch reduces draft compared to the standard hitch, was replication used properly in this study? Justify your answer.

D. Plot of land is a confounding variable in this experiment. Explain why.

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

AP® STATISTICS 2006 SCORING GUIDELINES (Form B)

Question 5 Intent of Question The primary goals of this question are to assess a student’s ability to: (1) identify the response variable, treatments, and experimental units in a study; (2) critique the use of randomization and replication; (3) recognize and explain why a particular variable is a confounding variable. Solution Part (a): The response variable was the amount of draft. The two treatments were the standard hitch and the new hitch. The experimental units were the two large plots of land. Part (b): Yes, the two hitches (treatments) were randomly assigned to the two plots (experimental units). Part (c): No, each treatment (type of hitch) was applied to only one experimental unit (plot of land). Replication is used to repeat the treatments on different experimental units so general patterns can be observed. There is no replication in this study. Part (d): Although 25 measurements were taken at different locations in the two plots, each hitch was used in one plot (experimental unit) only. Thus, if a difference in the draft is observed we will not know whether the difference is due to the hitch or the plot. In statistical language, the treatments (hitches) are confounded with the plots. Scoring Parts (a), (b), (c), and (d) are scored as essentially correct (E), partially correct (P), or incorrect (I). Each essentially correct response is worth 1 point; each partially correct answer is worth 1/2 point. Part (a) is essentially correct (E) if the response variable, treatments, and experimental units are correctly identified. Part (a) is partially correct (P) if two of the three components of the experiment are correctly identified. Part (a) is incorrect (I) if one or less of the three components of the experiment is correctly identified. Note: Responses to parts (b), (c) and (d) must be considered with respect to the experimental units identified in part (a). Part (b) is essentially correct (E) if the student correctly discusses the use of randomization in this experiment with respect to assignment of the two hitches for use in the two plots OR with respect to the experimental units identified in part (a). Part (b) is partially correct (P) if the student recognizes the use of randomization in the experiment but provides an incomplete or unclear discussion. Part (b) is incorrect (I) if the student Recognizes that randomization was used properly but does not provide a justification. That is, a naked answer of “YES” is scored as incorrect. OR Provides a discussion that does not address the issue of randomization, e.g., the student indicates that the plots should be more alike. Part (c) is essentially correct (E) if the student recognizes that replication was not used properly and provides a correct justification. Part (c) is partially correct (P) if the student recognizes that replication was not used properly but provides an incomplete justification that reveals some understanding of replication. Part (c) is incorrect (I) if the student Recognizes that replication was not used properly but does not provide a justification. OR Fails to recognize that replication was not properly used, e.g., incorrectly argues that the 25 measurements taken on each experimental unit (plot) provide proper replication. Part (d) is essentially correct (E) if the student provides a valid explanation of confounding in this experiment.

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

Part (d) is partially correct (P) if the student provides an incomplete explanation that indicates an understanding of confounding in this experiment. For example, the student indicates that differences in plot conditions can affect draft but fails to link this to the inability to distinguish between plots differences and hitch effects. Part (d) is incorrect (I) if the student Provides a textbook definition of confounding with no attempt to describe the confounding variable in this experiment. OR Fails to address the issue of confounding. Alternative solutions for part (d): Each treatment was used in only one plot. Therefore, any differences caused by the differences in plots (e.g., soil hardness, moisture level, etc.) cannot be separated from differences in the two treatments. Because only one plot of land is assigned to each hitch, if no difference is found it could be due to a superior hitch in a poor field (highly compacted) being compared with an inferior hitch in a good field. The effect of the hitch is masked by the differences in the plot. 4 Complete Response 3 Substantial Response 2 Developing Response 1 Minimal Response If a response is between two scores (for example, 21/2 points), use a holistic approach to determine whether to score up or down depending on the strength of the response and communication.

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

Sample: 5A Score: 4 This essay correctly identifies the basic parts of the experiment. The response variable is the draft, or energy required to pull the plow. The treatments are the two types of hitches, and the experimental units are the two large plots of land. Part (b) clearly recognizes that the two large plots of land (experimental units) were randomly assigned to be plowed with one of the two hitches (treatments). The essay goes on to discuss the random selection of 25 points in each large plot at which measurements were made. This is random sampling, which should not be confused with randomization, the act of randomly assigning experimental units to treatments. Randomly selecting the points is a good thing to do, however, because it helps to avoid bias, and this discussion does not contradict the first sentence in which the appropriate use of randomization is recognized. A weakness in this response is that it indicates that randomization reduces variability. This is not correct. Since random assignment of experimental units to treatments converts potential sources of bias into random variation, randomization reduces bias but it does not reduce variability. Part (c) clearly recognizes that there is no replication in this study because each hitch was used to plow only one plot. Each hitch must be used in more than one plot to have replication. (The draft measurements taken at the 25 randomly selected points in each plot would be averaged to obtain a single average draft measurement for the plot, but taking several measurements on the same experimental unit is not true replication). Since each hitch is used on only one plot, and varying conditions across plots can affect the energy needed to pull the plow, any difference in draft due to the different hitches cannot be distinguished from the difference in the conditions in the two plots. The overall strength of responses and level of communication provided by this essay more than compensate for the weakness in the response to part (b), so this essay was scored as essentially correct.

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

2003 #4 4. Because of concerns about employee stress, a large company is conducting a study to compare two programs

(tai chi or yoga) that may help employees reduce their stress levels. Tai chi is a 1,200-year-old practice, originating in China, that consists of slow, fluid movements. Yoga is a practice, originating in India, that consists of breathing exercises and movements designed to stretch and relax muscles. The company has assembled a group of volunteer employees to participate in the study during the first half of their lunch hour each day for a 10-week period. Each volunteer will be assigned at random to one of the two programs. Volunteers will have their stress levels measured just before beginning the program and 10 weeks later at the completion of it.

A. A group of volunteers who work together ask to be assigned to the same program so that they can

participate in that program together. Give an example of a problem that might arise if this is permitted. Explain to this volunteer group why random assignment to the two programs will address this problem.

B. Someone proposes that a control group be included in the design as well. The stress level would be

measured for each volunteer assigned to the control group at the start of the study and again 10 weeks later. What additional information, if any, would this provide about the effectiveness of the two programs?

C. Is it reasonable to generalize the findings of this study to all employees of this company? Explain.

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

AP® STATISTICS 2003 SCORING GUIDELINES

Question 4

Solution Part (a): For example, a deadline in the department where the group of volunteers works has been moved back, lowering the stress levels of those working in the department. If the volunteers from this department were all in the same treatment group, this change in stress level could mistakenly be attributed to the treatment. Without random assignment of volunteers to the two programs, it is possible that the two treatment groups could differ in some way that affects the outcome of the experiment. Randomization "evens out" the possible effects of potentially confounding variables. Part (b): Without the control group, the company could compare the two treatments, but would not be able to say whether the observed reduction in stress was attributable to participation in the programs. For example, a change in the work environment during this period might have reduced the stress level of all employees. The addition of a control group would enable the company to assess the magnitude of the mean reduction attributable to each treatment, as opposed to just determining if the two programs differ. Part (c): It is not reasonable to generalize the findings of this study to all employees, because the participants in this experiment were volunteers and volunteers may not be representative of the population OR the participants were not randomly selected from the company employees. Scoring Each component is scored as either essentially correct (E), partially correct (P), or incorrect (I). Part (a) has two components: the example, and the randomization. The example is scored as essentially correct (E) if it contains each of the elements in the table below:

Elements Sample statements 1. Identify a plausible example of a problem

“Because a deadline has been moved back…”

2. Relate the identified problem to the change in stress level (the response)

“…lowering the stress levels of those working in the department. This change in stress level…”

3. …and state that the identified problem effects can not be distinguished from the difference in treatment effects

“...could mistakenly be attributed to the treatment.” (Note: A construction such as “can’t tell the difference” is OK here.)

The example is scored as partially correct (P) if the response contains 2 of the 3 components. The randomization is scored as essentially correct (E) if the student gives a reason for the necessity of random assignment. Possibilities include: clearly stating in context that randomization is relied upon to create comparable groups clearly stating in context that randomization controls for the effects of potentially confounding variables or reduces bias. (Both “Avoiding” bias and “Eliminating” bias are incorrect (I). ) The randomization is scored as partially correct (P) if the statement about randomization is not in context or is poorly communicated. Note: Constructions such as “split up” and “divided into” can be interpreted to indicate randomization.

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

Part (b) is scored as essentially correct (E) if the student 1. indicates that a control group does provide additional information AND 2. explains that the control group allows the company to determine if either or both treatments are effective in reducing stress OR explains that the control group provides a baseline for comparison Part (b) is scored as partially correct (P) if the student indicates there is additional information, even if the student’s explanation is incorrect. Note: Stating that the “passage of time” reduces stress is not sufficient; the student must specify that there is a confounding variable that operates through time. Part (c) is scored as essentially correct (E) if it 1. indicates that it is not reasonable to generalize to all employees AND 2. gives an explanation that the participants were not randomly selected from the company employees OR gives an explanation tied to the use of volunteers Note: Simply using the word “volunteer” in the explanation is not sufficient. Part (c) is scored as partially correct (P) if the student explicitly says that it is not reasonable to generalize to all employees, even if the student’s explanation is incorrect. Part (c) is scored as incorrect (I) if the student indicates that it is reasonable to generalize to all employees.

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

Experimental Design Student Saturday Session

Submitted  by  Gloria  Barrett  and  Floyd  Bullard,  Virginia  Advanced  Study  Strategies  January,  2011

Sample A – Score 4 In part (a) this student has identified a plausible problem, i.e. that the group members may belong to an outside organization. Membership in a massage therapy club could lower their stress during the experiment, and this could be confounded with the effect of the tai chi or yoga. Randomization “more or less” distributes the group evenly. In part (b) this student correctly states that the control group would allow a clear comparison with one or both of the treatments, and the absence of a treatment. Finally, in part (c) this student not only correctly identified the problem as the subjects being volunteers, but went on to explain why the use of volunteers might be a problem (i.e., the volunteers “are very likely the ones who needed the stress reduction the most”).