Interpreting Medical Findings with Caution
Thu, 27 Oct. 2011 - 4:26 p.m. MT
Credit: ARA Staff - American Running Association
A drug, carefully tested by its manufacturer and approved by the FDA, went on the market in the not so distant past. People began to report side effects, and so an independent researcher decided to conduct his own study. He put half of the subjects on the drug and half on placebo. This randomized, double-blind, placebo-controlled study then reported a dismaying number of serious side effects, including 15% of the subjects contracting eye problems and 8% going blind. The Institutional Review Board halted the study, statistical analysis confirmed the rates of these side effects, and the study—which concluded that the drug was harmful and should be avoided—was published in a peer-reviewed medical journal.
A Case Study The study in question was published in Biol. Psychiatry in 1993, and the drug it denounced was aspartame, the widely used sugar substitute. Meanwhile, the FDA, the European Food Safety Authority, and a recent review of all the other published literature found this drug to be safe. So was this one study enough to change the verdict on aspartame, was it the beginning of a new series of dissenting data, or was it deeply flawed? The answer is that it was deeply flawed.
The author’s admitted bias against aspartame, which was based on past observations that many of his depressed patients reported headaches from the drug, set off the study on the wrong foot immediately. It compared a group of depressed patients to a group of non-depressed controls, each randomly assigned to a placebo capsule or a capsule containing the aspartame equivalent to 10-12 cans of diet soda daily. Designed as a crossover study, the subjects took one pill daily for seven days, three days of nothing (the “washout period”), then the other pill daily for another seven days. The non-depressed controls were volunteers who worked in the same institution as the study author, which raises the possibility that they may have known of, or even shared, his bias.
What’s more, the “15% of subjects” with eye problems turned out to be two people, and the “8% of subjects” who went blind was a single person. There were only 13 subjects in the entire study. The blind subject turns out to have suffered a retinal detachment, causing blindness in one eye, while he was taking the placebo. He alone was half of the 15%, with the other being one subject with a subconjunctival hemorrhage—a common, painless and harmless condition in which the eye develops a tiny spot from a bit of blood leaking out under the surface of the white of the eye. It can be thought of as an eye bruise, and often occurs spontaneously and for no apparent reason. It had never been associated with taking aspartame before, and in fact was not listed on the symptom checklist given the patients during the study.
This is called mining the data. When your hypothesis doesn’t pan out, you go back in and look at what other rash conclusions you can draw from the specific occurrences during the study. Concluding that taking aspartame causes eye problems with this data is no different from concluding it causes hiccups because one of your subjects contracted them during the study. In the case of the placebo group’s subject with a retinal detachment, the author blamed a delayed reaction to aspartame, as the subject was in the aspartame-first group. But he had taken his last dose of aspartame six days earlier, and the author-chosen washout period was only three days. If the author is suggesting that delayed reactions to aspartame might occur after twice the length of his designated washout period, he is in fact changing midway through what it is he purports to be testing. The study was not designed to determine whether aspartame could cause delayed reactions. The author obviously didn’t think that, or he would have made the washout period much longer. Blaming everything on aspartame whether it occurs in the aspartame group or the placebo group collapses the idea of a meaningful control group: there would be no way to falsify the hypothesis. There are myriad other problems with this study, from variances in medications among the depressed to the fact that they threw partial data from two subjects into the statistical analysis before the study was terminated (only 11 subjects completed the protocol).
How to Spot Bad Science The first line of defense when you hear of a new study’s findings is to ask who disagrees with it and why. And any legitimate conclusion will have many studies buttressing it. Patients who believe aspartame gives them headaches have been tested in double-blind studies before, and they don’t get headaches when they don’t know they’re getting aspartame. The five non-depressed volunteers in the aspartame study didn’t get headaches either. Three had believed prior to the testing that the drug would, but four of the five reported headaches while taking the placebo. Only one headache occurred on aspartame. Are we to conclude that not taking aspartame is four times as likely to cause headaches as taking aspartame? Of course not.
Listed below are seven indicators that a scientific claim lies well outside the bounds of rational scientific discourse. They come from Robert L. Park, PhD, who published them in the Chronicle of Higher Education in 2003. Park wishes us to note that these are only warning signs; even a claim with several of the signs could be legitimate.
1. The discoverer pitches the claim directly to the media. The integrity of science rests on the willingness of scientists to expose new ideas and findings to the scrutiny of other scientists. Thus, scientists expect their colleagues to reveal new findings to them initially. An attempt to bypass peer review by taking a new result directly to the media, and thence to the public, suggests that the work is unlikely to stand up to close examination by other scientists.
2. The discoverer says that a powerful establishment is trying to suppress his or her work. The idea is that the establishment will presumably stop at nothing to suppress discoveries that might shift the balance of wealth and power in society. Often, the discoverer describes mainstream science as part of a larger conspiracy that includes industry and government.
3. The scientific effect involved is always at the very limit of detection. Alas, there is never a clear photograph of a flying saucer, or the Loch Ness monster. All scientific measurements must contend with some level of background noise or statistical fluctuation. But if the signal-to-noise ratio cannot be improved, even in principle, the effect is probably not real and the work is not science.
4. Evidence for a discovery is anecdotal. If modern science has learned anything in the past century, it is to distrust anecdotal evidence. Because anecdotes have a very strong emotional impact, they serve to keep superstitious beliefs alive in an age of science. The most important discovery of modern medicine is not vaccines or antibiotics, it is the randomized double-blind test, by means of which we know what works and what doesn't. Contrary to the saying, "data" is not the plural of "anecdote."
5. The discoverer says a belief is credible because it has endured for centuries. There is a persistent myth that hundreds or even thousands of years ago, long before anyone knew that blood circulates throughout the body, or that germs cause disease, our ancestors possessed miraculous remedies that modern science cannot understand. Much of what is termed "alternative medicine" is part of that myth.
6. The discoverer has worked in isolation. The image of a lone genius who struggles in secrecy in an attic laboratory and ends up making a revolutionary breakthrough is a staple of Hollywood's science-fiction films, but it is hard to find examples in real life. Scientific breakthroughs nowadays are almost always syntheses of the work of many scientists.
7. The discoverer must propose new laws of nature to explain an observation. A new law of nature, invoked to explain some extraordinary result, must not conflict with what is already known. If we must change existing laws of nature or propose new laws to account for an observation, it is almost certainly wrong.
Recent Controversy But even among the best designed studies and clinical trials there exists conflicting evidence and ways to interpret data. In 1985, the Nurses’ Health Study run out of the Harvard Medical School and the Harvard School of Public Health reported that women taking estrogen had only a third as many heart attacks as women who had never taken the drug. This appeared to confirm the belief that women were protected from heart attacks until they passed through menopause and that it was estrogen that bestowed that protection, and this became the basis of the therapeutic wisdom for the next 17 years. Faith in the protective powers of estrogen evaporated in July 2002, when a large clinical trial, the Women’s Health Initiative, concluded that HRT constituted a potential health risk for all postmenopausal women. While it might protect them against osteoporosis and perhaps colorectal cancer, these benefits would be outweighed by increased risks of heart disease, stroke, blood clots, and breast cancer. But in June, The New England Journal of Medicine reported that HRT may indeed protect women against heart disease if they begin taking it during menopause, but it is still decidedly deleterious for those women who begin later in life.
The catch with observational studies like the Nurses’ Health Study, no matter how well designed and how many tens of thousands of subjects they might include, is that they have a fundamental limitation. They can distinguish associations between two events—that women who take HRT have less heart disease, for instance, than women who don’t. But they cannot inherently determine causation—the conclusion that one event causes the other; that HRT protects against heart disease. As a result, observational studies can only provide what researchers call hypothesis-generating evidence—what a defense attorney would call circumstantial evidence.
Testing these hypotheses in any definitive way requires a randomized-controlled trial—an experiment, not an observational study—and these clinical trials typically provide the flop to the flip-flop rhythm of medical wisdom.
Even the Nurses’ Health Study, one of the biggest and best of these studies, cannot be used to reliably test small-to-moderate risks or benefits. Proponents of the value of these studies for telling us how to prevent common diseases—including the epidemiologists who do them, and physicians, nutritionists and public-health authorities who use their findings to argue for or against the health benefits of a particular regimen—will argue that they are never relying on any single study. Instead, they base their ultimate judgments on the “totality of the data,” which includes all the observational evidence, any existing clinical trials and any laboratory work that might provide a biological mechanism to explain the observations.
What to Believe? So how should we respond the next time we’re asked to believe that an association implies a cause and effect, that some medication or some facet of our diet or lifestyle is either killing us or making us healthier? We can fall back on several guiding principles.
One is to assume that the first report of an association is incorrect or meaningless, no matter how big that association might be. After all, it’s the first claim in any scientific endeavor that is most likely to be wrong. Only after that report is made public will the authors have the opportunity to be informed by their peers of all the many ways that they might have simply misinterpreted what they saw. The regrettable reality, of course, is that it’s this first report that is most newsworthy. So be skeptical.
If the association appears consistently in study after study, population after population, but is small—in the range of tens of a percent—then doubt it. For the individual, such small associations, even if real, will have only minor effects or no effect on overall health or risk of disease. They can have enormous public-health implications, but they’re also small enough to be treated with suspicion until a clinical trial demonstrates their validity.
If the association involves some aspect of human behavior, which is, of course, the case with the great majority of the epidemiology that attracts our attention, then question its validity. If taking a pill, eating a diet or living in proximity to some potentially noxious aspect of the environment is associated with a particular risk of disease, then other factors of socioeconomic status, education, medical care and the whole gamut of healthy-user effects are as well. These will make the association, for all practical purposes, impossible to interpret reliably.
For more information on verifying media claims, see Cutting Through the Haze of Health Marketing Claims in this issue.
REFERENCES: Skeptic, 2007, Vol. 13, No. 3, pp. 59-63; “Do We Really Know What Makes Us Healthy?” by Gary Taubes, New York Times, September 16, 2007, http://www.nytimes.com/2007/09/16/magazine/16epidemiology-t.html?pagewanted=1&_r=1; “The Seven Warning Signs of Bogus Science,” by Robert L. Park, PhD, Chronicle of Higher Education, January 31, 2003, http://chronicle.com/article/The-Seven-Warning-Signs-of/13674
(RUNNING & FITNEWS® September / October 2007 • Volume 25, Number 5)