The pink ribbons of the month of October are a visual reminder of how much primacy US society places on cancer research. Indeed, the National Cancer Institute’s 2021 budget is $6.56 billion budget out of the $42.9 billion allocated to the entirety of the National Institutes of Health—and cancer’s hefty research funding carries over into scientific publications. A literature analysis of the journal database PubMed published Wednesday (October 27) in Trends in Genetics reveals that the vast majority of human genes have been linked to cancer in some way. The database contains papers on 17,371 different human genes, according to the study, 87.7 percent of which mention cancer. Meanwhile, cancer studies are exceedingly common throughout the database, and there are several times more papers that focus on cancer than on other severe medical conditions such as strokes.
The Scientist spoke with João Pedro de Magalhães, an aging and longevity researcher at the University of Liverpool who conducted the analysis as part of his effort to understand, as he phrases it, “the science of science.” De Magalhães explains that he first became interested in longevity as a child when he realized his own mortality, and now he pursues research on how the aging process might be slowed down. As he began to analyze genetic factors, he realized that cancer studies dominate the literature when it comes to genetic analyses, revealing how human biases not only shape but potentially muddle scientific research.
The Scientist: Given that the bulk of your research is focused on aging and longevity, where did the idea for this cancer genetics paper come from?
João Pedro de Magalhães: The idea of what I call publication bias or researcher bias, that’s been big in my mind for some time already. It’s very simple. The idea is that we know some genes and processes and diseases are way more studied than others.
[My colleagues and I] do quite a lot of systems analyses. We know, for example, aging is not controlled by a single gene. There may be predisposed factors, there may be mutations in a gene that make you age faster. Another example is cancer or Alzheimer’s. There may be genes or mutations that cause a lot of damage or that predispose you to Alzheimer’s or cancer. Most complex phenotypes, like aging, longevity, cancer, Alzheimer’s disease, cardiovascular disease, and so on—they’re caused by interactions between multiple genes and the environment. So you have to study those interactions between different components of different systems.
One thing that worries me in this kind of analysis is this publication bias. You’re going to have a lot more information for some genes than others, so how do you control for that? Basically, the idea is that yes, there are some interesting human biases that influence how we do studies . . . so you have topics and genes that are going to be more studied than others.
When you’re researching genes, you always find some genes associated with cancer, and that’s where the idea for this analysis came from. There must be a lot more studies of cancer than anything else. . . . I think it’s quite an unexplored topic. Not cancer—I’m talking about the science of science and these human biases in research. It’s also an issue, I find, that we have genes that are way more studied than others. Sometimes for good reasons, but sometimes just for historical reasons. I would like to see science be more efficient, in a way.
TS: It’s probably not terribly shocking that cancer, which takes myriad forms, has been linked in some way to lots of human genes. But were you surprised when you saw that scientists have studied such an overwhelming majority of our genes within the context of cancer?
JPdM: I was surprised that the numbers were so high. Of all the genes that have at least one paper, I think it’s nearly 90 percent of them. When you look at genes with over 100 papers, nearly all of them have at least one mentioning cancer. I think the overall results were not surprising. . . . But the overall magnitude of the effect, that was surprising to me.
TS: What was the overall goal for highlighting how many human genetics papers focus on or at least mention cancer? Where do you see researchers going from here?
JPdM: There are two conclusions or two end goals you can take from the study.
The first one is what I was mentioning earlier: If you’re doing a systematic network analysis of any process, it’s important to take into consideration the amount of studies that each of your genes has. If you’re doing a gene network for anything, really, you need to at least be aware of that as a potential confounding factor. With large-scale analysis, like network analysis, by and large these kinds of biases are not taken into account. Trying to correct for them or at least be aware of them when you interpret the results, I think, will improve the insight you gain and the quality of the analysis.
If you’re writing a paper or a grant application, you can also say “Hey, this gene has also been associated with cancer” about the vast majority of human genes.
The second conclusion is we feel a little more cautious about what you call a cancer gene or a cancer-associated gene. The point is that nearly every gene can be called a cancer-associated gene. If you’re writing a paper or a grant application, you can also say “Hey, this gene has also been associated with cancer” about the vast majority of human genes.
We need to be careful . . . when you’re interpreting your results or writing a grant application. When you’re doing a large-scale study, when you have lots of hits, a lot of them will be cancer-related genes. So that’s something to be aware of when you’re interpreting your results.
TS: In your article, you wrote “In a scientific world where everything and every gene can be associated with cancer, the challenge is determining which are the key drivers of cancer and more promising therapeutic targets.” Can you elaborate on how those challenges might manifest?
JPdM: Having a lot of genes associated with a phenotype doesn’t mean they’re important. Maybe there’s a correlation with cancer, but it doesn’t mean they’re causal.
If a gene is targeted or inhibited, it doesn’t mean there’s going to be a clinical benefit. So the main problem is going from association to causation and then identifying which are the biggest targets.
I would say that, arguably, compared to other processes or other diseases, cancer is more straightforward to study. You can get cell lines to study cancer, for example. So the experimental methods needed to study cancer are not as elaborate as for other diseases. Finding associations with cancer is actually not that complicated. It’s easier than for other diseases. But an association, a correlation, does not mean causation. And it does not mean it’s a good therapeutic target. So a next step would be to find the key drivers of cancer. We know some of them, but not all, and what are some of the promising therapeutic targets.
TS: You mentioned in your paper, perhaps a little bit humorously, that scientists could now justify studying and attract grant money to explore just about any gene thanks to possible cancer ties. Would you say that’s a real phenomenon among researchers? Say I’m a geneticist interested in studying an underexplored gene. Would I have an advantage if I pointed out that there may potentially be a cancer tie-in to my work?
JPdM: I think so, yes. I’m not that involved in grant applications related to cancer. If you have a very weak correlation to aging, you can apply for a grant and say “Hey, this [gene] is potentially related to aging, and we want to study it in that context.” But your question raises different questions on how you assess scientific research.
Having a lot of genes associated with a phenotype doesn’t mean they’re important.
Should we just be focusing on the genes that have been more studied? Or should we study the genes that are understudied as well? That’s an open question. I would say the general approach when you apply for a grant [is that] you need to have preliminary data to justify why you want to study whatever you want to study. Some would argue maybe that’s the wrong approach. Maybe we should actually study genes that have not been well studied yet and that should be more supported. I wouldn’t take a side on that discussion, but I think that’s just something to think about in the context of these results.
So answering your question, yes. You can use a cancer association as justification for applying for funding, even if it’s not very strong. But then how you judge and how you evaluate grants, that’s a whole can of worms.
TS: Do you see any downsides to this major bias toward cancer research? Is it overshadowing other, important work?
JPdM: I think that’s a broader question again. What should we study? What should we invest our research money in? Because it’s easier to study cancer than, say, Alzheimer’s disease, you’re going to have more publications and more information about genes associated with cancer than genes associated with Alzheimer’s. We have levels of funding for cancer that are much higher than for aging, much higher than for other diseases. Whether that’s what we should be doing or not—I think that’s not so much a scientific question. It’s a societal question.
I think we should invest in studying cancer and in finding therapies for cancer. I think we should invest in discovering therapies for Alzheimer’s disease, for heart disease. How exactly you allocate that in terms of percentage, I think that’s a difficult question. I don’t have an answer to it.
Editor’s note: This interview has been edited for brevity.