It’s time for agricultural researchers to take better advantage of the massive amount of data they produce and move into an era of “big science.” Open data—data that are publicly available and have been intentionally prepared for reuse and innovation by others—is the path to the research breakthroughs of the future. But to make that move, agricultural science must change in significant ways. That’s what our seven-person task force concluded in a report published March 11 for the Council for Agricultural Science and Technology.
Small science—researchers working alone or in little groups, analyzing, interpreting, and sharing only their most important result(s)—is no longer adequate to improve world food production, nutrition enhancement, food safety, and disease prevention, all while protecting the environment. Teams from different disciplines, working with robust data sets and widely available and shared information are what’s required today to make substantive progress on complex problems that transect traditional disciplines.
We call for the creation of centralized “knowledgebases” linking emerging institutional and discipline repositories for agricultural research data.
But there are impediments to implementing big science in agricultural research. One is the lack of access to the data that are already out there. Plenty of data are produced but not shared. Lack of data accessibility diminishes the value of public investment in science as it is a barrier to making better decisions in agriculture. Without evidence-based decision making, policy leans too much and too often on expert opinion and partial information.
In agricultural research, approaches to research design and data collection are rarely standardized across studies, and data access and use by others remains dependent on individual agreements and one-time, trial-and-error solutions for data transfer.
We need an infrastructure to support data sharing and its routine synthesis into agriculture practice and policy. This would empower a statistical approach for combining multiple studies and lead to a better understanding of experimental results. The concept is not new. It is done routinely by medicine and other disciplines to translate science into effective practice. It should happen in agriculture too.
To that end, we call for the creation of centralized “knowledgebases” linking emerging institutional and discipline repositories for agricultural research data. At the National Center for Biotechnology Information, for example, genetic sequence data from thousands of species are housed as are the tools developed to search them. As a result, researchers are rapidly identifying the functions of genes, causes of diseases, and human evolutionary signatures. This simply could not be done if genetic/genomic data were dispersed across hundreds of individual researchers’ lab computers. The challenge is extending this model to the diverse methods and scales of studies designed to understand the interactions of plant and animal genetics with the varied and ever-changing management technologies and environments in which they are used.
Although challenging to achieve, establishing similar knowledgebase infrastructure for agricultural sciences would facilitate the organization of research data according to principles articulated in the acronym F.A.I.R.: data must be Findable, Accessible, Interoperable, and Reusable. Such knowledgebases would serve not only as portals to data reuse, but would create value-added data products based on anticipated interests and objectives of scientists and innovators seeking data for new projects. Examples of these include fusions of data from similar field experiments with regional mapping data of soil properties and in-stream monitoring for water quality to support sustainability metrics for food supply chains. But how can we do this?
First, agriculture researchers will need to partner with data scientists who know how to assemble information into forms that can be mined for trends, filtered for promising ideas, and translated for end-users. Colleges and universities will need to reorient themselves to support team science and data sharing. Undergraduate and graduate curricula must include some understanding of data sciences and their use in food-systems research. Professional assessment and reward systems for professors have to factor in the broader array of faculty activities associated with team science.
Individual USDA agencies separately maintain and pay for their own geospatial data platforms. It is inefficient.
A strong data-sharing program should include information not currently fully represented by peer-reviewed publications. For example, data sets inclusive of all information on crops, soils, environment, and other meta-data acquired by individual researchers and universities need to be embedded with each scholarly publication; at present, publications include only a fraction of the data and meta-data researchers collect.
Regarding peer-reviewed literature itself, more needs to be done to publish studies with negative results or verification studies that confirm previously published research results. Many journals specifically discourage authors from submitting “non-novel” results; researchers have responded by skipping the time-consuming process of pushing their less scientifically exciting results into this formal record of science. Yet, these are critical to an unbiased foundation for evidence-based agricultural practice. One idea: the use of “registered reports” where peer-approved proposals are registered prior to data collection assuring that authors will publish their results no matter the findings.
Other data inefficiencies also exist in agricultural science. At present, individual US Department of Agriculture agencies such as the Agricultural Research Service, the Farm Service Agency, the Forest Service, and the National Agricultural Statistics Service, separately maintain and pay for their own geospatial data platforms. It is inefficient for multiple organizations to create their own stop-gap solutions. Pooling data sets and creating one-stop shopping is more cost-effective and efficient, and simplifies data discovery.
One of the keys to making data more available is changing the business model of agencies that fund agricultural research. Granting agencies should require data sharing and pooling, whenever possible. And perhaps funding agencies should pay directly—in proportion to the amount they award—for the infrastructure needed to create a “national center for agricultural research information.” It’s an investment that would pay for itself many times over.
Sylvie Brouder is the Wickersham Chair of Excellence in Agriculture Research and a professor in the Department of Agronomy at Purdue University. She led the Ag Data Taskforce publication for the Council for Agricultural Science and Technology.