Previous approaches to cell-type identification were based on identifying the presence of a small set of known markers. Current high-throughput, single-cell sequencing methods, on the other hand, enable quantifiable cell-type classification with little or no prior knowledge, revealing previously unidentified variations in cellular phenotypes across numerous tissue types. But, given that cells do not cluster perfectly into distinct units, what portion of this heterogeneity truly defines a novel cell type and what portion can instead be attributed to variations in cell state or to methodological artifacts?
In practice, these single-cell experiments analyze thousands of cells, often with sets of 40,000+ genetic predictors; therefore, it is tempting to perform stratification after stratification to continually identify new cell types, resulting in groupings that go far beyond “type” and into functionally irrelevant and arbitrary categories. In order for a researcher to move forward with rigor, a framework should be agreed upon that informs what criteria will be used for cell-type identification.
Form versus function
One could argue that a biological definition of a cell type should be based on a combination of the cell’s developmental origin and current function. In contrast to cell type, a cell state may be thought of as the phenotype that occurs when a cell changes any of its extraneous variables, such as location, morphology, or phase of the cell cycle, while maintaining the core characteristics that define its specific cell type. For example, a fibroblast is still a fibroblast as it is dividing, and a neuron is still a neuron as it is firing, but a stem cell is no longer a stem cell as it differentiates into a neuron.
Unfortunately, function is not easy to assess in a high-throughput manner. However, because the transcriptome and epigenome are highly correlated with a cell’s developmental origin, type, and state, single-cell sequencing methods can be used as a proxy to indirectly assay cell function. To link sequencing results with biological definitions, we can think of the molecular markers of cell type as the set of genes that are similarly expressed across all cells with identical function and that are consistent across all states. Conversely, markers of cell state would be transient within a given cell type and can be expressed within different cell types that transition through similar cell states.
Once we begin to consider all of the subtle cell-to-cell variations, it becomes clear that the number of cell types is much greater than ever imagined.
In this way, single-cell sequencing provides a conceptual framework by which to assess cell type. Indeed, many models have been proposed for how to use single-cell sequencing to reproducibly define a cell type. They all share the basic concept of moving away from relying on a small set of marker genes and towards an accumulated representation of a unified gene signature across a set of cells. The models differ, however, when discerning where to draw the line of what that set of cells is.
For example, some models propose to purely discriminate cell types based on distances after a clustering procedure has been completed. Clustering cells based on the molecular signatures that are present in a given study is useful within any single experiment, but fails to unify results across studies. There are also models that completely abolish the use of cell types and instead promote deriving a continuum along which all cells exist. Conceptually, a continuum is likely closest to reality. However, it is difficult to implement in practice as the axes for the continuum would necessarily be redefined after every study, and generating these axes requires dimensionality-reduction methods that, when applied to increasingly complex systems, could lead to a loss of information that would otherwise distinguish cells that are functionally distinct.
A third option—one that has proven useful in other disciplines, such as species discrimination—is a hierarchical taxonomic definition of cell type. This approach has been useful for many of the early cell-type identification papers as it supports classification across multiple layers of what can define a cell type. For example, within the brain, the first layer of cell type could discriminate neuronal and non-neuronal cells; the second layer, GABAergic and glutamatergic neurons; and the third layer, specific GABAergic cell types such as parvalbumin versus somatostatin neurons. Importantly, in such a hierarchical model, the state of the cell could easily be worked in by adding another branch to the tree. These subtypes could be discriminated based on the researcher’s discretion and then the resulting definitions could be accumulated to create a reproducible and overarching framework for cell-type identification. Moving forward, this hierarchical model could withstand restructuring based on follow-up functional studies and the identification of new information as we move toward defining cell types within the entire human body.
Traditionally, cells have been defined by the tissue to which they belonged and their particular functional role or morphology. Researchers ascribed to this classification scheme for many decades because tools to refine the definition of a cell did not yet exist. Only very recently have techniques to probe single-cell genomes, along with statistical methods for analyzing large, multidimensional datasets, reached the point where we can begin to collect large amounts of information on individual cells. Use of these metrics has revealed remarkable heterogeneity among cells of the same traditional type. Cells exist in different degrees of maturation, activation, plasticity, and morphology. Once we begin to consider all of the subtle cell-to-cell variations, it becomes clear that the number of cell types is much greater than ever imagined.
The future of understanding these cell types largely depends on the choices that we make in how to define them. But the concept of a cell type will continue to be refined as technology allows for greater genetic and functional assessments of individual cells and better analyses for high-dimensional data. Thus, as the tools to evaluate cellular heterogeneity continue to evolve, so too will the discussion over how to define the subtypes we are able to identify.
Sara B. Linker and Tracy A. Bedrosian are postdoctoral research fellows in the Laboratory of Genetics at the Salk Institute for Biological Studies, where Fred H. Gage is a professor and Vi and John Adler Chair for Research on Age-Related Neurodegenerative Disease. Read their feature on heterogeneity within the brain, “Advancing Techniques Reveal the Brain’s Impressive Diversity.”