By Michael Lucibella
Scientists receiving federal funds will soon have to include plans for the public access of much of their raw data. This requirement was spelled out in February, in the same memo from the executive branch's Office of Science and Technology Policy (OSTP) that dealt with the open availability of journal papers derived from federally funded research.
Though there has been no official statement on the issue, scientists applying for federal grants will likely start seeing requirements for a data management plan in the coming months. Michael Lubell, APS Director of Public Affairs, said that after meetings with representatives from various funding agencies, including the NSF and the DOE, it's clear that the beginning of a framework is just starting to take shape.
Many of the details of how and where this data will be stored are still unclear, and the timeframe is still uncertain. The enormous amount and huge diversity of different kinds of data across all disciplines of science poses a huge challenge to anyone trying to put together a single centralized database of research data. It's also possible that individual agencies, or even publishers, might be the stewards of the raw data files.
Who will store the data will have to be worked out on a field by field or even discipline by discipline basis.
"What they're looking for is not something that would be burdensome but something that will enable readers to assess how important that result is in terms of their own work," Lubell said.
Generally speaking the data that would have to be included are the individual data points used in preparation of a published paper. Data points that have been expunged from the final analysis will likely have to be included, the idea being that scientists can evaluate why those points were eliminated.
However, the policy would not extend to objects or lab notebooks. It would likely not extend also to computer codes, though talks are continuing over this point.
Victoria Stodden, a statistician at Columbia University and an expert on open data, expressed concern at the prospect that computer code might not be included in the requirements. She said that computers have dramatically changed the way that scientists process their data, and knowing how is as important as knowing the raw data points.
"Now things have changed, computation adds this extra level of complexity," Stodden said, adding that without codes, "results are typically not replicable."
The February memorandum from OSTP called for a road map to make all scientific journal articles based on federally funded research freely available for everyone to see after a year. The memo also stipulated that "scientific data resulting from unclassified research supported wholly or in part by Federal funding should be stored and publicly accessible to search, retrieve, and analyze."
The memo set an August 22 deadline for when the funding agencies needed to submit a plan for their data management. Several agencies have received extensions and no plans have been publicly released.
When reached for comment, OSTP spokesperson Rick Weiss said that "[The] OSTP will be working with the agencies to get the various submitted plans finalized."
In June, APS announced that along with 75 other publishers and organizations, they will participate in the Clearinghouse for the Open Research of the United States, or CHORUS, which would be an online platform that links to open access journal articles stored on publishers' servers. However, the service focuses solely on research articles."
The concept of CHORUS is to maximize existing infrastructure, standards and processes," said Andi Sporkin, spokesperson for CHORUS. "The maintenance of raw data requires a new infrastructure."
Though policies and guidelines are still forthcoming, Lubell encouraged researchers to start thinking ahead about raw data management.
"Everyone needs to start thinking about 'How am I going to manage my data?'…‛What is the plan, and how do I ensure a long life for this so it doesn't get destroyed?'" Lubell said. "That is what the community will have to start thinking about right now."