The National Academies Press has finally published, “The Future of Scientific Knowledge Discovery in Open Networked Environments: Summary of a Workshop.” Despite the title, this is actually a 200-page compendium of academic thinking about sharing scientific data. The knowledge discovery alluded to is that which might come about if scientific data were extensively documented, curated, preserved, and shared via some sort of open practice system. While the word “data” is not in the title, it occurs over 1,200 times in the report.
The report is largely an exercise in envisioning the possible opportunities for scientific discovery presented by extensive data sharing, based on large-scale, present-day examples. Given the vastness of this issue and the slow pace of its unfolding, the fact that this workshop happened over a year ago does not detract from the report’s importance. Happily, the PDF version is free and searchable. One of the best features is that in addition to the presentations, the post-presentation discussions are included, where interestingly the discussants remain anonymous.
The context for this report may be as important as its content, if not more so. The workshop was a flagship event for the newly formed Board on Research Data and Information (BRDI) of the US National Academy of Sciences, locally known as “Birdie.” NAS is the chief science adviser to the US government, and it is standard practice that they pass judgment on new federal science initiatives. Creating BRDI is a de facto endorsement of data policy as a legitimate federal science issue, and tracking their activities is recommended.
The focus of the report is promise, not policy, but there are many major policy issues between the lines. Given the disciplined case approach, these issues tend to be scattered, especially among the discussions, so search and scan may be the best way to find them. For example, the word “journal” occurs about 40 times and “literature” about 70 times. Some of these occurrences are policy and publishing related, while others are not. (I also recommend Todd Carpenter’s recent Kitchen article on data metadata issues.)
While the presenters are almost all academics, one may be sure that the data policy people from the funding agencies were in the audience. The workshop does include a “Government perspective” presentation by Walter Warnick, director of the Energy Department’s Office of Scientific and Technical Information. Significantly, his presentation focuses on cost and the federal budget, which is by far the biggest data policy issue. (Disclosure: I co-authored some of Walt’s material related to the Knowledge Investment Curve.)
It seems clear that even in good budget times, universal data preparation, curation, preservation, and sharing are not feasible, and these are far from good times. The present approach is to fund what can be justified on a case-by-case basis, and that is likely to be about all for the foreseeable future, so we need good data sharing selection mechanisms and a better awareness of the issues.