Composite Endpoints in Clinical Trials

There’s a right way and a wrong way to boost the statistical sensitivity of this type of clinical studies.

Jul 1, 2016
Sarah C.P. Williams

© BRYAN SATALINOIt’s a moment every clinical researcher dreads: you crunch the numbers for an upcoming trial and realize you’ll need to recruit tens of thousands of participants to show a statistically significant effect for the therapy you’re testing. You don’t see any way to change most of your variables linked to trial size. But what if you change your endpoints?

In recent years, a growing number of clinical trials have used composite endpoints—multiple events all treated as one endpoint—as a way to boost the power of a study so that fewer participants are needed. “Say you’re designing a study to look at heart attacks, and it looks like you’ll need 40,000 patients,” says Joshua Stolker, a cardiologist at Mercy Clinic in Saint Louis. “But if you use a combined endpoint that considers both heart attacks and hospitalizations, suddenly you only need 20,000. Then you add in revascularization surgeries, and you only need 5,000 patients.”

In that hypothetical example, researchers who chose to combine all three outcomes would be testing whether their intervention changed the number of heart attacks patients experienced, the number of hospitalizations, or the number of revascularization surgeries they required. The study would need fewer patients to reach a firm conclusion because increasing the number of possible outcomes in their endpoint makes it more likely that one of them would occur in any given patient. But the design limits the study’s conclusion, because a composite endpoint lumps together all the outcomes, making it hard to conclude which outcome is affected by the intervention. The drug may have decreased the number of surgeries but not heart attacks or hospitalizations, for example, or affected any other combination of the three measures.

“It can be really hard to understand these studies,” says Lisa Schwartz, a professor of medicine and a medical communication researcher at Dartmouth College. “A drug reduces your chances of this or this or that; what does that really mean?”

Stolker, Schwartz, and other experts in statistics and study design say that composite endpoints are overused—or, at the very least, often improperly used. The Scientist asked these experts for their advice on proper design and interpretation of the statistical approach. Here’s what they said.

Weigh the importance of endpoints

© BRYAN SATALINOBecause it’s so hard to determine which outcomes are truly affected by a trial that uses composite endpoints, Stolker says it’s key to select appropriate endpoints before a study begins. His number one piece of advice: choose endpoints that are relatively similar in how clinically important they are to patients.

“We have so many trials that are being driven by endpoints that don’t matter very much,” says Stolker. “It’s kind of a chronic pet peeve of lots of us doing outcomes research.”

In cardiology, for instance, multiple endpoints can be used to evaluate drugs or procedures that treat heart disease. Researchers can measure the frequency of procedures such as bypasses or angioplasties that patients receive to treat vessel blockages; the number of strokes or heart attacks patients experience; how many patients are hospitalized; rates of death; or in the case of many studies with composite endpoints, all of the above. But would patients be equally apt to take a drug with side effects if its only benefit was a slight downtick in the rate of hospitalizations, but not in the rate of strokes, heart attacks, or death?

That question plays out constantly in the field, says Stolker. The 2006 DREAM trial testing rosiglitazone (Avandia) concluded that the drug was effective for a composite endpoint combining death and new diabetes diagnoses (Lancet, 368:1096-105). The results left clinicians and patients wondering whether the drug improved only one of the two drastically different outcomes and, if so, which one.

Curious how patients rank the relative value of commonly used cardiac endpoints, Stolker and his colleagues surveyed 785 cardiovascular patients and 164 authors of recent clinical trials about which negative outcomes they considered most important for an intervention to reduce (Circulation, 130:1254-61, 2014). While clinicians rated death as the top outcome to prevent, many patients saw it differently. “If you’re doing a treatment for people in their 80s, maybe the only thing that matters is reducing strokes,” says Stolker. “They don’t care if they’re hospitalized five times, but they’re terrified of becoming a vegetable.”

So what do Stolker’s ranking results have to do with composite endpoints? Studies with composite endpoints are typically designed to give equal weight to all endpoints. But that means you must select endpoints that are—from a patient’s perspective—equal. Stolker suggests surveying a small group of patients before a trial to get a sense of whether they view your endpoints as relatively similar in importance. Or, he says, you could use weighted endpoints, adding a statistical twist to the standard idea of composite endpoints. In that case, different values are assigned to different outcomes within a composite endpoint, rather than treating all outcomes as statistically identical. But using this approach could require more patients and complicate the analysis.

“Weighted endpoints are an idea that’s been very slow to catch on, but they really make sense,” Stolker says. If a trial combines endpoints of strokes and hospitalizations, you could assign more value to occurrences of strokes, and less to hospitalizations. “This can really change the way you’d interpret the study and whether it’s positive or negative to do one therapy over another,” he says.    

Decompose effects

© BRYAN SATALINOEven choosing appropriate, equally important endpoints doesn’t fully solve the problem. By their nature, composite endpoints often leave unanswered questions about the effects of an intervention. Are the outcomes influenced similarly by the intervention, or does one outcome increase and another decrease? It’s a particular issue when using the common composite endpoint “event-free survival,” says radiation oncologist Loren Mell of the University of California, San Diego.

“Event-free survival, or overall survival, is a composite endpoint because it includes death from multiple different causes,” says Mell. In cancer, he explains, that means deaths included in a study could be deaths from the cancer itself or from the toxicity of a strong systemic drug. “For a researcher, it’s a very different story if a drug is completely inert or if it’s effective but the survival is being offset by toxicity,” he says.

Mell and his colleagues looked at 158 studies linking patterns of gene expression to cancer outcome and using event-free survival as an endpoint (BMC Proceedings, 9 (Suppl 1):A17, 2015). Only 15 of the studies, or about 10 percent, specifically reported the effects on both cancer and noncancer events. That means that in 90 percent of the studies, readers can’t conclude whether the genes studied have an effect on cancer or on death from another cause, says Mell. “It’s this problem of saying, ‘I showed that X affects either A or B and therefore X affects A.’ It’s a logical fallacy that is repeated in the scientific literature again and again.”

Mell’s advice: if you’re using event-free survival or overall survival as a composite endpoint, do the extra statistics to show the effect of your gene of interest or new therapy on different events. Depending on the details of your study, you may not have enough participants to show a statistically significant effect of each outcome, but you could still crunch the numbers to offer a glimpse of what the results suggest.

Keep it simple

© BRYAN SATALINOIn some fields, there are reasons beyond reducing costs and trial sizes to use composite endpoints. In multifaceted diseases like Alzheimer’s, for example, composite endpoints are required to capture the complexity of disease.

“Depending on where people are in the process of disease, certain cognitive areas are impacted more than others,” says Alette Wessels, a neuropsychologist who leads the development of outcomes measures for Alzheimer’s trials at Eli Lilly. “And then, as you can imagine, there is variability between subjects.”

So a single endpoint—measuring memory, for instance—might not capture the severity of Alzheimer’s in all patients. Combining endpoints that reflect memory, executive functioning, and language could capture cognitive deficits more completely.

Yet problems in interpreting composite endpoints still arise, says Wessels. “There are a lot of different composites; a lot of researchers come up with their own version by combining different items. And data comparison or results comparison is very difficult if everyone is doing something different.” There are three different tools used to assess Alzheimer’s patients, she explains, and each contains many data points collected by testing patients and interviewing them or their caregiver. Many researchers create their own composite endpoints by picking and choosing different items out of the three scales. Ideally, Wessels says, researchers in a field like Alzheimer’s should settle on a single, accepted composite endpoint to use in a broad range of studies.

Whether you’re using a composite endpoint that has been developed by others or combining outcomes into a new composite, Wessels says simplicity is important. “The more statistical manipulation you’re doing when you’re combining things, the more difficult it is, at the end of the day, to figure out what’s driving any treatment effect.”

Communicate clearly

© BRYAN SATALINOMost problems with composite endpoints, says Schwartz, could be solved by better discussions of endpoints in research papers. In 2010, Schwartz analyzed a collection of studies from various fields that used composite endpoints (BMJ, 341:c3920). Only one of 40 papers included a discussion of how the authors chose components of the endpoint; 13 of the papers had inconsistent definitions of their composite, making it unclear what outcomes were included. Moreover, among the 16 trials that had a statistically significant composite at the end, 11 misleadingly used language implying that the intervention affected an individual component of the composite.

Authors commonly use “and” when discussing their results, Schwartz says. “But if you have a composite, it reduces [the risk of] this, this, or that. It doesn’t reduce this, this, and that. It’s very subtle language but it’s very important.” Incorrect language and wording in research papers can be propagated through media coverage of a paper and lead to faulty news articles about the research.

“The clearest thing to say is that a drug affects the chance of any one of these things happening,” Schwartz suggests. “Don’t give the message that it affects all of these things.” She also echoes what others have suggested about teasing apart the effects within a composite. “Give people a sense about where you’re most confident the effect was,” she says.