Why many published papers are not rubbish
2005.09.05
Mathematics, Rants, Natural Sciences

As usual, science reporters exaggerate the findings of a particular researchers in order to make a more shocking headline.

New Scientist, the self-proclaimed "World's No. 1 Science and Technology News Service", carried this article by Kurt Kleiner on August 30th, with the headline Most Scientific Papers are Probably Wrong. The article cites an essay published in Public Library of Science: Medicine that was written by John P. A. Ioannidis, an epidemiologist in the University of Ioannina School of Medicine in Greece. While the essay certainly carries some weight and is of important concerns to the scientific community, New Scientist's report stubbornly makes a mountain out of a mole hill by a clear lack of understanding on the part of the reporter.

One of the most important things overlooked by Mr. Kleiner when authoring the NS article is the following words written on the left hand margin of the PLoS essay

The Essay section contains opinion pieces on topics of broad interest to a general medical audience.
The essay was published in PLoS:Medicine, which carries an implicit audience of medicinal professionals. The words of caution written by Dr. Ioannidis are thus addressed to others in his own field. Yet Mr. Kleiner wrote in the opening paragraph of his report
Most published scientific research papers are wrong, according to a new analysis. Assuming that the new paper is itself correct, problems with experimental and statistical methods mean that there is less than a 50% chance that the results of any randomly chosen scientific paper are true.
This point of view is completely wrong and demonstrate a clear ignorance of the particular reporter, and perhaps of the general public that is thus reflected, towards the nature of scientific research.

When Dr. Ioannidis wrote his essay, he clearly had in mind papers published in the field of medicine, and especially those published in epidemiology. A cursory glance at the reference citations he included with the essay show that he based his evidence on experience with papers in epidemiology, and a careful reading of the essay shows that most of the hypothetical situations that he constructed apply most appropriately in human medicinal research. In the summary of the essay he stated

...a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance.
And I agree completely with Dr. Ioannidis that in the present atmosphere, there is indeed a high likelihood that research findings conducted with experiments that suffer from those influences listed above are false. However, I take issue with Mr. Kleiner's generalization of Dr. Ioannidis's analysis to all of science.

The traditional Scientific Method comes about from a series of Characterization, Hypothesis, Prediction, and Experiment. Yet within this notion of a falsifiable hypothesis verified with repeatable experiments, there are different ways at looking/finding the answer. In certain scientific fields, the ideal is to isolate the characteristic in question and establish a correlation of that characteristic with some observed quantity. In particular, classical physics and chemistry are strong candidates of such science. In classical physics, a hypothesis is typically expressed as a mathematical function that correlates one observable quantity (which we call the control, or independent variable) with another (which we call the effect, or dependent variable) under the strict assumption that all other things shall be equal. For example, we look at the classical experiment of Galileo, where he dropped two objects from the leaning tower of Pisa to demonstrate that things accelerate equally under gravity, regardless of weight. An experiment that verifies such an hypothesis must then drop two objects of the same shape and size from the same spot at the same time, such that the only differing characteristic between the two objects is the weight of the objects. (Of course, it would not be possible to drop two objects from precisely the same spot at precisely the same time. One possibility is to drop the objects twice, each time from the same height at the same time, but the two objects switch places between the first and second drop. But let us not dwell on the small details.) The key is the repeatability of the experiment. Suppose Galileo wrote down his result from the drops of balls from the Pisa tower, and his younger brother Bob decides he want to see for himself. He sees that Galileo decided to drop the balls from the same height at the same time that are the same shape and size. Bob can then go find his own two balls, his own tower, and his own weekend, and repeat the experiment. And if Bob finds the results to be precisely the same, it is reasonable to expect that there isn't another outside force unaccounted for that could have influenced the experiment. (As a counter example, suppose Galileo didn't restrict to two balls the same size, and Bob decides to take a large cannon ball and a even larger foam ball to drop from the Eiffel Tower. As we can expect, the foam ball will drift down rather slowly. Then the hypothesis "two balls will drop at the same speed from the same height" would be shown false, because it didn't take into account the effect of a third outside force--drag--which depends on the size of the balls.) This same experimental spirit is still very much alive in modern physics and physical chemistry, even though experiments are more complicated nowadays with multi-dimensional controls for the scientists to worry about. The essence, however, is non-changing: that an experiment must control all other influences that can be regulated and only vary those variables that appears in the hypothesis.

A second type of scientific ideal is what I like to call the science of statistical correlation. This ideal manifests more often in the life-sciences, such as biology, medicine, anthropology, sociology, psychology, etc. In the study of life, a necessary problem is that no two (macro-)organisms are identical. While simple micro-organisms can be grown such that we produce a colony of identical life-forms (since bacteria et al. undergo binary splitting which effectually replicates themselves), the same cannot be had for higher life-forms. There will be inherent differences between test subjects when it comes to life and medicine. Therefore, we cannot specify as strict a hypothesis as in the case of physics and chemistry. We can only observe and experiment with those test subjects available to us. (In a bizarre way, the study of astronomy is quite similar: there's nothing we can do about the objects we observe, and we can only take data and draw conclusions about them.) In these scientific fields, what end up happening is what we call statistical sampling. Rather than experimenting on human beings as individuals, medical researchers experiment on human kind as a whole, or at least subgroups of human kinds (say, all humans of a certain age). But it would be rather impractical to test every human living one by one: so scientists make a leap of faith that depends on statistics

they assume that human beings are homogeneous, so that a large enough subset of human beings will accurately represent the human kind as a whole.
The experiments are then designed and executed on a sample group, while the results are interpreted and generalized for a larger group. For example, suppose we want to test the effectiveness of a certain drug toward Caucasian French men between ages of 25 and 29. We would go to France, pick out, say, 1000 white guys between those ages and do our experiment on them, by administering to half of them the experimental drug and to the other half some placebo. We then observe if there's any difference between those two groups. If there's a quantifiable difference, we then claim that the drug does introduce some reaction among the target group of Caucasian French Male aged 25 to 29. However, because that we cannot control the people to be exactly like one another, we cannot expect that the effects of the drug to be identical on all people. So, in fact, it is even possible that what we had was a statistical fluke: it just happens that for some reason, all 500 of our test subjects behaved one way and all 500 of the placebo subjects behaved the other. It is unlikely, but possible. Now, the problem is the following: if I ask you to flip a coin 8 times, it is damn unlikely that you will flip 8 heads in a roll. But if I ask a million people to flip a coin 8 times, statistics say that over 3000 of them should have 8 heads in a roll. With enough eyes investigating a scientific problem, and using a statistical tools in the backdrop, it is quite likely that some of the "positive results" are indeed statistical flukes. This is the gist of Dr. Ioannidis's essay. It is something every experimental scientist was taught in school: that statistics can work both for and against experiments--that using statistical methods can reduce the complexity of an experiment, but it also introduces errors into the results. Dr. Ioannidis just reminds his colleagues not to take every published result on faith along, since many of them might be such statistical flukes (interestingly, part of his large "50% false" claim comes from the habit in the scientific community to publish only positive results. If negative results are also published routinely, it will easily invalidate many of the statistical flukes that contributes to those false conclusions).

A last model of science that I would like to mention is that used in theoretical physics and mathematics. In a sense, these fields are closer to metaphysics than to science: the investigation depends not on experimental validation of a hypothesis, but by demonstrating the truth of a hypothesis from prior axioms. In this frame work, there are no variables to control and no statistics to apply, and in fact proofs conclusively demonstrate the truth of a theory. This is rather unlike the prior two methods of the scientific method.

With the scientific method, hypothesis can only be falsified, and not proven. Theories remain theories. There are no way to conclusively prove a hypothesis. The scientific process is a process of elimination. The experiments weed out hypothesis that are can be shown to be false. What we are left with will be, to a certain extent, a good approximation of reality. In certain types of experiments, however, it is best to heed to Dr. Ioannidis's cautionary essay and realize that sometimes, a lack of a negative result might not necessarily imply the presence of a positive one. But in many other fields, such as physics and chemistry, such worries can be safely discarded due to the nature of the experiments that are designed and performed.

Posted at 02:34:15 EDT by W comment

blogCentralFront Page
2009.11.20 00:41:20 GMT Feynman's Messenger Lectures online Just found out something rather cool: Microsoft Research, through Project Tuva, is publishing videos of Richard Feynman's Messenger Lectures. Go watch.
2009.11.18 11:05:07 GMT Alcohol consumption Different cultures certainly have different views on alcohol. For example, at Hertford College Oxford, wine is allowed if reasonably drunk and 4) A small amount of beer or lager will be allowed wher
2009.11.16 19:17:31 GMT Luc visits; Willie doesn't check e-mail Holy cow! I just realized that I spent a day at work without checking e-mail! Okay, to be honest, today I was hosting Luc Nguyen, who we invited to speak on his work about the regularity near the sing
2009.11.15 18:19:32 GMT Chicken soup Chicken soup is not just good for the soul. It has been scientifically proven to mitigate inflammation. Maybe mommy's chicken soup was the reason that the same bug that took Pin out of commission for
2009.11.10 17:58:53 GMT Sayonara, e-nibbles; hullo, Gee-Mi-Ni It's final: e-nibbles is no more. e-nibbles was my trusty Dell D600 which I purchased summer after my Junior year in college through the Student Computer Initiative. Immediately after receiving the ob
2009.09.30 10:12:57 BST Ahhh! Cruft discovered in pre-print. Ack, I should've known better. I stayed up a bit later on Monday night than I intended to. I was asked, by Claude, last week, about whether certain cases (in particular the Born-Infeld model) not cove
2009.09.28 18:30:27 BST Spiders spiders everywhere Wow! Third post today, and here I thought I have been neglecting my blog. Anyway, it turns out that I am not the only person to have noticed the large number of spiders in Britain this autumn. Going o
2009.09.28 15:12:09 BST Causality of generalized wave-maps--paper on arXiv Oh, almost forgot. New paper on arXiv. Gary Gibbons showed via explicit computations using eigenvalues that the Skyrmion equation obeys the dominant energy condition. In my paper, I proved the dominan
2009.09.28 14:42:39 BST The evolution debate as an illustration of speciation I was reading some article or another in Wired, which happens to be about dinosaurs. And of course, the religious kooks came out of the woodwork to attack evolution on the comment board. And it occurr
2009.09.02 12:42:44 BST New beginnings: first days at Cambridge Heh. Did you, dear reader, notice the change on the date-stamp for the previous entry? It was posted in British Standard Time. Yes, I am now taking a position in the Department of Pure Mathematics and