Early stage research often gets dinged for not including enough trial subjects to be statistically valid. But adhering to the large sample-size dogma is counterproductive, says Peter Bacchetti, a biostatistician at the University of California, San Francisco. Large sample sizes waste time on unsuccessful ideas as most early stage trials fail, and can even prevent innovative treatments from moving forward if trials that don't recruit enough patients are never performed, he argues in a perspectives piece published online today (June 15) in Science Translational Medicine.
This week, Bacchetti took time to speak with The Scientist about why sample size isn’t everything, and what scientists can use instead to measure a study’s worth.
The Scientist: Why does starting small make sense?
Peter Bacchetti: Because you don’t know what’s going to happen, and sometimes that means you can learn enough from something small, or have a better idea where to go next. Having a large study means spending a lot of money, it means taking a lot of time, and it can also mean taking a lot of risk. That’s not always justified right off the bat with a new idea.
TS: Why are sample size calculations often counterproductive?
PB: There is this idea that you must prove that you have 80 percent power in order for a study to be worth doing. That means given some particular assumption about what’s really true, you’ll have an 80 percent chance of getting a P-value of less than .05.
The goal of 80 percent power isn’t well justified. It’s not like it’s worthless until you reach 80 percent power, and then it’s good. Studies that have less than 80 percent power may still be very worthwhile and worth doing.
What I talk about in the article is the reality of diminishing marginal returns. If you add another subject, they produce less information than the one that came before them. So you get less and less for each additional person you add.
TS: How can a study be useful if doesn’t have statistical power?
PB: Sample size can often be a distraction from more important issues. Deciding whether a study is worth doing could depend on how interesting the idea is, whether the design is sound or susceptible to bias, what the potential upsides are if it turns out to be right.
TS: How does sample size play into later-stage trials?
PB: It makes sense to weigh the costs when you’re planning later studies, as well. In reality, investigators do. They don’t do a sample size calculation and then use that sample size regardless of how infeasible it may be. You can’t say “well, a thousand subjects, pretty good but it’s not quite significant under these assumptions, so double the cost of the study, go up to 1100 subjects.” That typically would not make sense.
TS: You mention in your paper that requirements of a certain sample size can often lead reviewers to reject proposals that may actually be worth funding. What should reviewers do instead?
PB: Well, we’ve recommended they assume sample size is a reasonable choice unless there’s a really extreme, obvious case. For example, if you’re studying something that’s so rare that you’re not even likely to see an instance of it in your study, then of course you’re not likely to learn anything.
Sometimes people will fall back to the argument that more is always better. Sample size calculation can’t be as simple as that, otherwise every study would include everyone on the planet.
P. Bacchetti et al., “Breaking free of sample size dogma to perform innovative translational research,” Science Translational Medicine, 3: 1-4, 2011.