A recent toast to James Watson highlights a tolerance for bigotry many want excised from the scientific community.
A sampling of free software for flow cytometry data analysis
December 1, 2015|
ADAPTED BENDALL ET AL., SCIENCE, 332:687-96, 2011Flow cytometers guide fluorescently labeled cells one by one past a series of lasers and detectors in order to record their physical and molecular characteristics. Researchers using these techniques can survey tens or even hundreds of thousands of cells, garnering information that allows them not only to enumerate known cell types (such as CD4+ and CD8+ T cells) but also to identify novel subpopulations they may never have known were there.
But data collection is only the first part of the story; in flow cytometry it’s the analysis that counts. At a minimum, flow cytometry software packages must be able to load raw data files, transform the recorded intensity values onto a logarithmic scale, “gate” the information (i.e., identify threshold values to define whether a cell expresses a given marker), and plot the results.
Flow cytometers include their own software packages, of course, and third-party commercial analysis tools also exist. But these packages are typically relatively slow to adopt advances coming out of academic labs, says Ryan Brinkman, a Distinguished Scientist in the Terry Fox Laboratory at the BC Cancer Agency in Vancouver. As a result, a healthy collection of free and open-source alternatives has sprung up over the past decade or so. Brinkman even co-organizes periodic open competitions called FlowCAP to test them (flowcap.flowsite.org).
A healthy collection of free and open-source flow cytometry analysis software has sprung up over the past decade or so, several of which have been tested in open competitions.
These freeware applications are not typically as polished as commercial suites, concedes J. Paul Robinson, director of the Purdue University Cytometry Laboratories, who maintains a catalog of free software tools on his web site (cyto.purdue.edu/flowcyt/software/Catalog.htm); many, in fact, are command-line tools that require programming prowess and a solid understanding of flow cytometry. And they may be both buggy and slow to adapt to user suggestions or technical support queries. When you’re getting software for free, Robinson notes, “you get it warts and all.” Thus, advises Brinkman, it makes sense in many cases for wet-lab biologists to collaborate with bioinformaticians rather than go it alone. “It has to be a back-and-forth,” he explains. “There’s very, very few people that have the expertise in both these domains.”
Still, for many researchers, these tools solve problems they cannot otherwise handle. Here The Scientist rounds up some of the available options.
Overview: Bioconductor is an open-source, community-driven project based on the statistical programming language R. It includes tools for next-generation DNA sequencing and mass spectrometry, high-throughput screening and gene expression. Among its 1,104 packages are 38 for flow cytometry.
According to Brinkman, who has coauthored several reviews of cytometry software tools (Immunity, 42:591-92, 2015; PLOS Comput Biol, 9:e1003365, 2013), these modules provide a support infrastructure for everything from importing flow cytometry data files to automated gating, clustering, and visualization of the resulting data. “Somebody new to the field . . . doesn’t have to worry about how to process the data, how to write all those tools to read in files and visualize them. They have all that infrastructure, [and] they can just plug in their own little bit of that whole pipeline,” he says.
The traditional Bioconductor flow cytometry pipeline revolves around such modules as flowCore, flowViz, and flowStats. In 2014, Greg Finak and Raphael Gottardo of the Fred Hutchinson Cancer Research Center in Seattle extended that infrastructure with a new module called OpenCyto, which enables higher-level interaction with the data (PLOS Comput Biol, 10.e1003806, 2014). Using OpenCyto, researchers can develop complex gating hierarchies—basically, cellular inheritance trees—from a simple Excel template, says Finak, specifying, for instance, that CD4+ T cells are a subset of CD3+ cells, which are themselves a subset of live, single cells. “Without OpenCyto, you have to do that all by hand.” And those templates are reusable, Finak adds. “That is usually not the case for scripts written as one-offs to analyze a specific data set.”
System requirements: R programming environment
Plugging in: OpenCyto sports a plug-in architecture that allows researchers to integrate hot-off-the-press third-party algorithms with just a few lines of code. A similar capability recently was added to the popular commercial tool FlowJo; all that’s required is for users to write a small “wrapper” of code around the algorithms they want to plug into FlowJo, about 20 lines’ worth of Java, and they’re done. According to FlowJo CEO Mike Stadnisky, this architecture allows researchers to leverage the user experience and infrastructure of FlowJo with tools that have not yet become mainstream. The company has launched a website called FlowJo Exchange for users to share their tools and documentation, with several scripts already available. Four in-house plug-ins were slated to post by the end of November, and “two to three” third-party plug-ins should join them by year’s end as well.
EUGENE YURTSEVgithub.com/bpteague/cytoflow; eyurtsev.github.io/FlowCytometryTools/
Overview: If the programming language Python is more your speed, give CytoFlow and FlowCytometryTools a look.
Both provide libraries of functions for Python-conversant biologists to transform flow cytometry data files from raw numbers into biological insight.
According to developer Brian Teague, a postdoctoral associate at MIT, CytoFlow (which includes an optional point-and-click user interface) “handles all the basic traditional cytometry tasks,” including data import, transformation, gating, statistical analyses, and plotting. Unlike many tools, which focus on the physical minutiae of how an experiment was performed, CytoFlow takes a broader view, says Teague. Users can indicate, say, that their cells were treated with different concentrations of a drug, each with three replicates, and allow the software to plot those data as a series. “It’s much less focused on physically what you did, and it’s much more focused on how the experiment was designed to answer the question that you wanted to ask.”
FlowCytometryTools offers similar functionality, but also includes features to simplify the handling of microtiter plate–based data. For instance, users can map wells to their associated data, outputting the data in a grid that matches the plate layout. A small graphical user interface for defining gating parameters is included.
System requirements: Python
In development: One of the advantages of programming-based tools is reproducibility. “If I used a script to do an analysis (instead of pointing and clicking), then I can run exactly the same analysis on other samples,” Teague says. Yet, like many tools in this roundup, CytoFlow and FlowCytometryTools are still very much in development. CytoFlow, says Teague, “is alpha software that is useful for some things, but it is not yet feature-complete.” Such packages can be relatively difficult to install and crash intermittently. Forewarned is forearmed.
Single Cell Debarcoder
Overview: One difficulty with high-throughput experiments, says Stanford University biologist Garry Nolan, is that different wells of a microtiter plate may receive slightly different treatments, leading to experimental noise. Nolan’s lab devised a sample barcoding strategy to circumvent that issue (Nature Protocols, 10:316-33, 2015). Essentially, each sample in a plate receives a unique mixture of fluorescent or metal labels, which act as a barcode for each well. The samples are then combined into a single tube, mixed with antibodies, and analyzed as one. That not only improves reproducibility, Nolan says, it also reduces experimental time and reagent use.
The trick is to work out which cells in the raw data came from which well. Single Cell Debarcoder (SCD) does precisely that, using a user-defined table of barcodes to assign each cell to the well from whence it came. “From the standpoint of any subsequent analysis, it’s as if you had done 96 individual wells,” Nolan explains.
System requirements: Windows (64-bit) or Mac OSX; the algorithm is also included in Cytobank (cytobank.org), a cloud-based analysis tool with both free and premium tiers.
Watch out for doublets: When scanning for new cell populations, it’s easy to be fooled by droplets containing two cells instead of one. “Doublets are a huge problem in any flow cytometry platform,” Nolan explains. “These kinds of artifacts need to be removed from the system as soon as possible.” To do so, SCD employs “sparse” barcoding, an error-correction strategy in which only a subset of possible barcodes is used. These are defined such that it is possible to recognize when a droplet has a barcode that should not normally exist—for instance, if a droplet contains two cells. “If two cells from two different wells come together, the merged barcodes would be illegal,” he says.
Cytospec and PlateAnalyzer
J.P. ROBINSON ET AL., EXPERT OPIN DRUG DISCOV, 7:679 -93, 2012 cyto.purdue.edu/Purdue_software
Overview: Most flow cytometers employ a complicated optical setup in which each color channel has its own bandpass filters and photomultiplier tube (PMT) detector. That makes the systems relatively expensive and complicated to build and maintain. An alternative approach swaps those components for a simple configuration in which each cell’s color signature is projected onto a single multichannel PMT, a strategy called spectral analysis. Purdue University’s Robinson built the first such instrument at Purdue in 2004 (it has since been licensed to Sony), and his lab developed Cytospec as a general-purpose analytical tool to make sense of the data.
PlateAnalyzer is a separate application for handling high-throughput experiments, such as 384-well drug screening assays and high-content cytometry studies. Among other things, Robinson says, PlateAnalyzer can generate dose-response curves directly from instrument data, and provides “a new approach to high-content data analysis by the incorporation of its unique ‘logic maps’ that allow the investigator to visually map out the analysis pathway, making it faster and easier to visualize and analyze big data sets.”
System requirements: Windows (32- or 64-bit)
Automated PCA: According to Robinson, one key feature of Cytospec is automated, one-click principal component analysis (PCA), a statistical approach capable of identifying new cell populations in highly complex data sets, where they might not be readily apparent.
“It uniquely allows you to compare traditional parameters with PCA,” Robinson explains—that is, it enables users to determine which specific flow cytometry variables the PCA algorithm is filtering. “That allows me to look at, say, different cell sizes and see how that fits from a principal component space,” he says. “That’s not necessarily logical, but it turns out it’s extremely useful.”