Properties of Well-Designed Studies

Properties of Well-Designed Studies

Learning Objectives

By the end of this lecture, you should be able to:

– Define ‘control group’– Contrast with ‘experimental group’– Give examples of different methods for creating a control group

– Define blinding, group stratification, group randomization– Speculate on possible stratifications for optimal study design

As with the previous lecture, this too is not a “numbers” oriented lecture. It’s not difficult, but there is a little bit of terminology involved. You have only ‘gotten it’ when you can describe these terms in your own words with examples. It may take a couple of views/reviews.

How can we reduce bias?A major objective in research.

•Control group: Any good experiment should include a control group.

•Blinding: When the subjects and ideally, the researchers as well, do NOT know

which individuals received the ‘treatment’ and which individuals were in the control

group until the experiment is completed.

•Randomize the groups: We will see that it is frequently necessary to stratify your

subjects into groups (beyond just the experimental and control groups). This

stratification should be done randomly.

The ideal experiment:

A “randomized, double-blind, controlled” trial.

Avoiding bias when conducting your experiment

Assume that all data is biased – it’s just a matter of degree…

A reputable journal will only publish studies that demonstrate a significant effort to minimize bias.

Comparative experimentsExperiments are comparative in nature: We compare the response to a treatment to:

– Another treatment,– No treatment (a control)– Older / original treatment (another form of control)– A placebo (another form of control)– Any combination of the above

A control is a group to which an experimental treatment is NOT administered. It serves as a reference mark for comparison (e.g., a group of subjects that do not receive the “new” drug, or a group of subjects that is given a placebo).

A placebo is a fake treatment, such as a sugar pill. This is to test the hypothesis that the response is due to the actual treatment and not to the subject’s belief that they were treated. In many studies, the control group is given a placebo.

Without a control group, you should be very, very skeptical about any conclusions drawn as a result of the experiment!

Control Group

• Any proper study will always discuss the “controls”. The “control group” refers to the group that was used as a comparison group with the “treatment” group.

• As was said previously: Without a control group, you should be very, very skeptical about any conclusions that come out of the experiment!

Example of experimental and control groups:Suppose you are a pharmaceutical company that has come up with what you

believe is a breakthrough drug for diabetes. • Experimental Group: In your study, you will give one group your new wonder-

drug. This group is called the experimental group.• Control Group: For comparison, any decent study will include a control group.

Examples of control groups:– Give this group a placebo (perhaps the most common ‘control’)– Give this group the “older” version of the drug– Give this group NO drug (however you then sacrifice ‘blinding’)

Control Group - Examples

- The Control group - 3 Experimental groups

Example of how control and treatment groups are often graphed together to highlight differences or lack of differences.

Placebos

Can exist in many forms:

– In a drug trial, the placebo might be a completely inert drug that looks exactly like the experimental drug and is administered in the same way.

– In studies evaluating accupuncture, a great choice for placebo was a needle that felt exactly like an accupuncture needle, but did not actually penetrate the skin.

– In a study involving prayer, the experimental group was prayed for, while the placebo group was only told that they were being prayed for.

BlindingBlinded: If the patient doesn’t know if they are in the experimental group or in the control group, the study is said to be ‘blinded’.

Double-Blinded: When both the subject AND the people involved in carrying out the experiment (e.g. researcher, nurses, etc) don’t know who is in the control group and who is in the experimental group. Double-blinded studies are much more ideal than single-blinded studies.

Example: In clinical drug trials, a patient is sometimes given a bar-code which they wear on a wristband. The medications also are not labeled, and also have a bar-code. The researcher/nurse giving the medication will scan the wristband and match it with an appropriate medication bar-code. So neither the patient nor the researcher knows if they are getting the treatment or the placebo/control. Only at the end of the study will they patients and researchers find out who was in the “experimental group” and who was in the “control group”.

Designing “controlled” experiments

Fisher found that the data from experiments that had been going on for decades was basically worthless because of poor experimental design.

– Fertilizer had been applied to a field one year and not the following year, in order to compare the yield of grain produced with v.s. without the fertilizer.

– What are the flaws in this research methodology? • It may have rained more or been sunnier during different years.• The seeds used may have differed between years as well.

– In one case, fertilizer was applied to one field and not applied to a nearby field in the same year. – BUT:

• The two fields might have had different soil, sun exposure, water, drainage, and farming history (that is, the two fields may have been farmed differently in previous years).

• In other words, many factors affecting the results were “uncontrolled.”

Any suggestions for a valid control group?

Sir Ronald Fisher—The “father of statistics”—was sent to Rothamsted Agricultural Station in the United Kingdom to

evaluate the success of various fertilizer treatments.

Setting up ‘controls’• In this particular experiment, you’d like to “control for” the various

confounding variables that exist in this experiment:– Different soil – Different sun exposure– Different water drainage– Different farming patterns– etc (it would be possible to come up with several others)

• Fisher came up with a very clever experiment design that did a terrific job of “controlling for” the confounding variables.

Fisher’s (elegant!) solution:

• In the same field and same year, apply fertilizer

to randomly spaced plots within the field.

Analyze plants from similarly treated plots

together.

• This was a great solution! Both the

experimental group (the fertilized areas) and

the control group (the non-fertilized areas)

were exposed to the same sunlight, weather,

drainage, farming patterns, etc.

F F F F F F

F F F F F F F F

F F F F F

F F F F F F F F

F F F F F

F F F F

Note how in this experiment there is:•A control group: The areas that were not fertilized•Randomization: The plots were randomized to either the fertilizer group or the control group.

RandomizationRecall how with samples, we randomize so that no one group is over-represented. Similarly, when we place subjects into an

experimental or control group, we are careful to do so randomly. (We don’t put our buddy in the control group to make sure “he gets the

good stuff.”)!

Key Point: All decent studies will randomize which subjects are in the control group vs which are in the experimental group.

For example, if you are comparing a new cancer treatment vs the ‘older’ treatment, which patients get the new treatment and which get

the older treatment must be decided at random.

Completely randomized experimental designs:

Individuals are randomly assigned to groups, then

the groups are randomly assigned to treatments.

Completely randomized designs

Group 1 is the “experimental group”

Group 2 is the “control group”

Which of the two groups is the control group?

Some key principles of experimental design

• Control the effects of lurking variables on the response, by comparing the treatment you are interested in with a second group who either receives a placebo, or a different treatment.

• Randomize – use some kind of randomization technique to assign subjects to treatments – in other words, the researcher does not pick who goes in the treatment group and who goes in the control group.

• Blind: This is another major factor – particularly in medical trials. Neither the experimenter nor the subjects should be aware which subjects are receiving the experimental treatment and which subjects are receiving the control treatment.

StratificationIndividuals (or observations) in a study must be properly stratified (grouped) to try and ensure that no one batch of people/observations is over-represented in the control group or in any of the experimental group(s).

Example: Testing a new cancer treatment v.s. the old treatment: – Both treatments must be given to patients with similar severity of disease. So you might stratify based on the stage of

the disease. – You might suspect that people of different ethnic groups (specifically Northern European ancestry) will respond

differently to your medication. So you might stratify based on those from Northern European ancestry and those that are not.

– etc

Example: Suppose you suspect that men and women would respond differently to the treatment. What is one change you should make to your study?

– Answer: try to ensure that you place about equals numbers of men in each group (control group and each experimental group). Do the same with the women.

This process of organizing your subjects into various blocks according to certain categories (age, race, severity of illnes, etc, etc) is called stratification.

In a block, or stratified, design, subjects are divided into groups, or blocks, prior to

experiments, to test hypotheses (i.e. theories) about differences between the

groups.

You can stratify based on the treatment, but you can also stratify based on the

subjects (e.g. different ages, different races, different stages of disease, etc).

For example, suppose you are evaluating three different acne treatments on a

group of teenagers between 14 and 16 years old. You would want to randomize

into a minimum of four groups (one group for each treatment, and the control

group)

Can you spot a potentially major flaw in this study?Gender! At this age, there are all kinds of hormonal changes affecting teenagers, and

they affect acne production differently in males vs females differently. So you would want

to stratify based on gender as well.

As a result, in order to do this study properly, we would need eight groups!

Boys: 3 treatments + control. Girls: 3 treatments + control.

Block aka “stratified” designs

We divide the subjects are into groups, or blocks, prior to the experiments.

This allows us to test hypotheses about differences between the groups.

(Note: There also must be a fourth group for each block, the control.

However, it is not shown in this diagram).

Stratifying into two blocks of three groups

A researcher wishes examine the relationship of resting pulse rates and age. A sample of 52 people had their pulse rate measured at rest in the lab. Would you stratify?

Answer: Yes. Fitness Level: Pepole who do lots of endurance sports typically have lower resting rates. Similarly gender: Men and women typically have different resting pulse rates, so this experiment should also be stratified by gender.

To stratify, or not to stratify…

A researcher wants to determine if BST, a hormone intended to spur

greater milk production works as advertised. A farming research facility

makes available 60 cattle. Can you think of possible stratifications you

might need? Answer: Different breeds of cattle may respond differently to this

hormone. As a result, you should consider stratifying by breed.

Weaknesses in experimental design• There is no such thing as the perfect experiment. Your goal is to decide

whether any of the limitations in the design are significant enough to limit the validity of the conclusions.

• Unfortunately, outside of reputable journals, badly designed experiments are extremely common . – Which is not to say that “reputable” journals do not also allow shoddy

research to slip through at times – it most certainly does happen!

Example of a randomized, double-blind controlled trial

A major cancer center is excited to hear about a promising new treatment for pancreatic cancer. So: • They contact all of the patients in their files with this condition.• They find 408 patients who agree to be in their trial.• They exclude from their trial 11 patients who say they moving out of state since that group cannot be

monitored by the center.• They exclude 43 others from the trial because they have other significant medical ilnesses which would

be confounding

• Stratification: Now they have 354 patients remaining. They suspect that men and women will respond differently to the drug. They also suspect that people will respond differently based on their age. So they stratify based on both of these variables.

– Gender: 190 are female and 164 are male. – Age: They use the age groups: 20-40 / 40-60 / 60-80

• Randomization and Control: Among each of these 6 groups (the 3 age groups, each of which is also stratified by gender), the patients are randomly assigned to receive either the usual treatment (the control group) vs the new treatment (the experimental group). We now have 12 different groups! But that’s okay, provided that each group is of a reasonable size.

• Blinding: The researchers set up the study to be double-blinded. That is, neither the patients nor the physicans know which patient is receiving which treatment. They will not find out until the study has been completed.

• Very good! Yet, there are still some flaws in the design of this study…

Limitations/Flaws in the pancreatic cancer study?

• Stage of cancer – Drugs will affect the cancer differently depending on how advanced the disease is when the treatment begins.

• Choice of age groups – The choice seems a bit arbitrary. • Lack of placebo control – It’s always great to have a placebo group as one

of your controls, but often, you can not. In this case, there are ethical constraints.

Ethics: Why couldn’t we use a placebo as the control?It would not be ethical to take patients with cancer and randomly give one block of them no treatment at all just for the purpose of improving the validity of your experiment.

• Thoughts?– Survey: Obtained 36,000 physician office fax numbers, delivered ~16,000 faxes and received

~700 replies. Their respondents were mostly private practice physicians, and mostly mid-career. .” (Source: http://www.dpmafoundation.org/physician-attitudes-on-medicine.html).

– The Doctor Patient Medical Association (DPMA) and the Patient Power Alliance (PPA) work to repeal health care reform and call themselves a "a nonpartisan association of doctors and patients dedicated to preserving free choice in medicine." The organization is a member of the National Tea Party Federation and the "American Grassroots Coalition

– Note which magazine published this article - hardly a fly-by-night magazine! • I.e. Even legitimate magazines and news sources are frequently guilty of pubishing “studies” and

other polls that are so riddled with flaws as to be completely meaningless.

http://www.dpmafoundation.org/physician-attitudes-on-medicine.html

Example – Claudication Study (on web page)• Methods: first thing they mention is IRB approval; Randomized; Design: 3 groups; Location (Northwestern)• Inclusion & Exclusion Criteria: defining the population• Measurement: How they measured the results – sometimes straight-forward, sometimes can be a huge and

contentious issue. How do you measure pain symptoms? How do you measure improvement? • Blinding: Obviously could not be double-blinded since patients knew their ‘treatment’. However, researchers

were blinded. They just saw the data results. They did not know which patients were in which group as the experiment was going on.

• Details: Many other issues and techniques employed by the study are explained in careful detail.• Stratifications (Blocks): Claudication vs No Claudication. • Control group: Nutritional consulting, regular meetings with data-gathering team, etc, but NO exercise.• Outcomes: In particular note the very frequent mention of p-values, and confidence intervals. Very important

and we will be learning about them.• Charts and graphs:

– p159: Breakdown of stratifications. Also note the ‘exclusion’ disclaimer at the bottom of the graph. If you’re gonna leave people out of your analysis, you’d better explain why. In this case, 4 were left out in the end because they did not respond to following up.

– Table 1, p.170: A careful breakdown and description of the people in each strata (block)• Conclusion: A study should at some point summarize the researchers’ recommendations on what the study

can tell us. In this study it is in the very last paragraph: “Physicians should recommend supervised treadmill exercise programs for PAD patients regardless of whether they have classic symptoms of intermittent claudication”.

Documents

Properties of Well-Designed Studies