How Blind is Double-Blind?

A weekly newsletter featuring the biggest ideas from the smartest people

The double-blind randomized control trial (RCT) has been the gold standard of clinical research for the last fifty years. But the double-blind RCT might just as well be called an aluminum standard or a lead standard if blinds are regularly broken. Which, in fact, they are.

Relatively few studies ask participants, at the end (or in the middle) of the study, whether they were able to guess which treatment arm they were in (placebo versus active treatment). But in studies that do, it’s surprising how many patients (and clinicians, the other half of the double blind) are easily able to guess which group they were in.

In one double-blind study by Rabkin et al. (Psychiatry Research, Vol. 19, Issue 1, September 1986, pp. 75–86), 137 depression sufferers were divided into groups and given placebo, imipramine (a tricyclic antidepressant), or phenelzine (a monoamine oxidase inhibitor) for six weeks. At the end of the study, patients and doctors were asked to guess which groups they were in. Some 78% of patients and 87% of doctors correctly guessed whether they were dealing with a placebo or a medicine. The authors had this to say:

Clinical outcome, treatment condition, and their interaction each contributed to guessing accuracy, while medication experience and side effects assessed only in week 6 did not. Accuracy was high, however, even when cases were stratified for clinical outcome, indicating that other cues were available to the patients and doctors. These may include patterns and timing of side effects and clinical response not detectable in this end-point analysis.

Another double-blind study, reported by Margraf et al. in the Journal of Consulting and Clinical Psychology (1991, Vol 59, No. 1, pp. 184-187), involved 59 panic-disorder patients who got either placebo, imipramine, or alprazolam (Xanax). Four weeks into the 8-week study, patients and physicians were asked to guess what they were taking. Among patients, 95% of those taking an active drug guessed they were taking an active drug and 44% of placebo patients guessed they were taking placebo. That’s not so unusual, considering that the drugs involved (Xanax, especially) are strong, and strongly therapeutic for the specific disorder being studied (panic disorder). What was interesting was that physicians (who were not subject to the effects of the drugs) also guessed correctly at a high rate (89% for imipramine, 100% for alprazolam, 72% for placebo). What’s more, imipramine patients were able to discriminate between imipramine and alprazolam at a higher rate than the Xanax users. Among subjects who got imipramine, 71% guessed imipramine. Only 50% of alprazolam patients guessed correctly.

It could be argued that guessing whether you’re taking an anxiolytic or not isn’t really that hard. Drugs like caffeine and nicotine, on the other hand, are more subtle in their effects and have been shown to have very high rates of placebo responsiveness in studies of their effects. (People’s hearts race, they get jittery, etc. when you give them decaf and tell them it’s regular coffee.) So it would be more important to look at those kinds of studies than other kinds. Arguably.

Mooney et al., in a 2004 paper in Addictive Behaviors, reported the results of a meta-analysis of double-blind nicotine-replacement-therapy studies. In 12 of 17 studies analyzed, participants were able to guess treatment assignments at rates significantly above chance. This was even true of patch therapies, where nicotine is released very slowly into the bloodstream over a long period of time.

Blind breakage is potentially damaging to the credibility of any study, but it’s especially damaging for studies in which placebo response is intrinsically high and drug efficacy is slight, because even a slight boost in perceived efficacy (from blind penetration) can hugely affect results. It’s well known that patients who expect positive results from a treatment tend to experience positive outcomes more often (whether from placebo or drug) than patients who expect negative results. This effect can make a low-efficacy drug appear more efficacious than it actually is once the blind is broken. Irving Kirsch and others have called this the “enhanced placebo effect.” Kirsch invokes “enhanced placebo effect” to explain the effectiveness of antidepressants, which owe 70% to 80% of their efficacy to ordinary placebo effect. The other 20% to 30% of their effectiveness, Kirsch says, comes not from drug action but enhanced placebo effect. Patients in antidepressant studies, once they begin to notice side effects like sexual dysfunction or gastrointestinal discomfort, know that they’re getting an active drug, and they tend to respond better because of that knowledge.

Greenberg et al. did a meta-analysis of 22 antidepressant studies that were subject to stricter-than-usual blinding requirements. (These were rigorously controlled 3-arm studies in which there was a placebo control arm and two active-treatment arms: one including the new drug under study and one including an older “reference” medication.) The authors came away with three conclusions:

1. Effect sizes were quite modest and approximately one half to one quarter the size of those previously reported under more transparent conditions.

2. Effect sizes that were based on clinician outcome ratings were significantly larger than those that were based on patient ratings.

3. Patient ratings revealed no advantage for antidepressants beyond the placebo effect.

What can be done about the blind-breakage problem? Some have advocated using a study design that has three or even four arms (placebo, active placebo, reference drug, new drug), or even five arms (if you include a wait-list group that gets absolutely nothing, but whose progress is nonetheless monitored). Some have actually proposed backing out correct-guessers’ data entirely from studies after they’re conducted. Others have suggested giving the active drug surreptitiously so the patient doesn’t even know he or she is getting anything at all. (But this brings up ethical issues with informed consent.) There’s also a concept of triple-blindness.

There are also those (e.g., Irving Kirsch) who have suggested a two-by-two matrix design, consisting of placebo and control groups which are then split into two arms, one totally blind, the other deliberately misinformed (placebo-receivers told that they’ve gotten active drug and active-drug-receivers told that they’ve gotten placebo).

In the excellent paper by Even et al. (The British Journal of Psychiatry, 2000, 177: 47-51), the authors present a seven-point Blindness Assessment and Protection Checklist (BAPC) and recommend that researchers provide BAPC analysis as part of their published papers.

My opinion? If degree-of-blindness is measurable (which it is), then researchers should, in fact, measure it and disclose it as part of any study that’s purported to be “blinded.” Otherwise, all of us (researchers, clinicians, academics, students, and lay public) are truly operating in the blind.

A weekly newsletter featuring the biggest ideas from the smartest people