I recently read a paper from Los Alamos National Labs (LANL), “Using Architectures for Semantic Interoperability to Create Journal Clubs for Emergency Response.” Without diving too deeply into the technical weeds, what the paper describes is:
[A] process for leveraging emerging semantic web and digital library architectures and standards to (1) create a focused collection of bibliographic metadata, (2) extract semantic information, (3) convert it to the Resource Description Framework IExtensible Markup Language (RDFIXML), and (4) integrate it so that scientific and technical responders can share and explore critical information in the collections.
Why recommend creating a semantic research repository in RDF XML? Let’s step back and take a look at this interesting use-case:
Problem (specific to the LANL paper): Prevent bioterrorist incident outbreaks and prevent spreading of viruses having the potential develop into a catastrophic pandemic.
Desired outcome: Assemble an appropriate group of experts who quickly receive access to customized research and tools, which enable them to collaboratively head off large-scale crises.
In order to proactively respond in the event of an emerging emergency response situation, in this case the potential threat of the SARS pandemic, the authors have outlined a process — leveraging both Web 2.0 (social networking) and Web 3.0 (semantic) capabilities, and using RDF XML to normalize data without confining its meaning or future expansion possibilities — which mobilizes expert research groups in alignment with situational specifics and provides them with customized research information and visualization and analytical tools that enable them to quickly and collaboratively generate solutions and curtail the impact of the biological threat.
Extrapolating from the article—and expanding a bit on what the authors have proposed—a generalized process outline would look something like this:
Condition: Urgent need for expert research response
Content: Using a semantic repository of technical articles (developed via the harvesting, augmenting, and mapping processes, which the article describes)
Activity 1: Dynamically assemble specialist expert “journals clubs” or researcher networks based on biographical metadata — such as expertise, affiliation, publication history, relationships, geography — to quickly form a collaborative emergency response team
Activity 2: Facilitate the equally dynamic creation of custom knowledge collections, driven by semantic search that is supported by enriched metadata contained in normalized RDF XML
Activity 3: Provide visualization tools and provide other analytical capabilities to support collaborative problem-solving the expert group
Close the loop: Capture process and outcome information, scenarios considered, implementation recommendations and provide routes for republication and sharing — with or without further peer review
The LANL author team is not alone in exploring this terrain.
Collexis is another highly visible proponent of semantic technologies in the scholarly research industry. Their BioMedExperts, for example, accomplishes a number of the functions proposed by the LANL group:
BioMedExperts contains the research profiles of more than 1.8 million life science researchers, representing over 24 million connections from over 3,500 institutions in more than 190 countries. . . . profiles were generated from author and co-author information from 18 million publications published in over 20,000 journals.
BioMedExperts includes visualization and linear/hierarchical tools for browsing and refining result sets, authority metrics, and networking tools to facilitate conversations among geographically dispersed researchers. For the curious, free access is available on the site. Collexis has also recently announced a project with Elsevier SCOPUS, University of North Carolina, and North Carolina State to create a statewide expert network. From the press release:
Once implemented it will be the largest statewide research community of its kind. The web community created will have fully populated information on publications grant data and citations from over 15,000 researchers across all research disciplines.
There is active debate on the Web about the potential for Web 3.0 technologies and the standards that will be adopted to support them. Writing for O’Reilly Community, Kurt Cagle has remarked:
My central problem with RDF is that it is a brilliant technology that tried to solve too big a problem too early on by establishing itself as a way of building “dynamic” ontologies. Most ontologies are ultimately dynamic, changing and shifting as the requirements for their use change, but at the same time such ontologies change relatively slowly over time.
This means that the benefit of specifying a complex RDF Schema on an ontology — which can be a major exercise in hair pulling — is typically only advantageous in the very long term for most ontologies, and that in general the flexibility offered by RDF in that regard is much like trying to build a skyscraper out of silly putty.
As of January 2009, when Cagle wrote this, RDF had failed to garner widespread support from the Web community — but it has gained significant traction during the past year, including incorporation in the Drupal 7 Core.
Berners-Lee wants raw data to come online so that it can be related to each other and applied together for multidisciplinary purposes, like combining genomics data and protein data to try to cure Alzheimer’s. He urged “raw data now,” and an end to “hugging your data” — i.e. keeping it private — until you can make a beautiful web site for it.
Berners-Lee said his dream is already on its way to becoming a reality, but that it will require a format for tagging data and understanding relationships between different pieces of it in order for a search to turn up something meaningful. Some current efforts are dbpedia, a project aimed at extracting structured information from Wikipedia, and OpenStreetMap, an editable map of the world.
The promise within this alphabet soup of technologies is that semantic Web standards will support the development of utilities that:
- Provide access to large repositories of information that would otherwise be unwieldy to search quickly
- Surface relationships within complex data sets that would otherwise be obscured
- Are highly transferable
- Deliver democratized access to research information
But there are risks. Building sites that depend on semantic technologies and RDF XML can take longer and be more costly initially. In a stalled economy, long-term financial vision is harder to come by, but those with it may truly leapfrog. In addition, there are concerns about accuracy, authority, and security within these systems, ones the architects must address in order for them to reach the mainstream.
In our industry, which depends on research authority, one may wonder whether this is an all-or-nothing proposition. Without speed and consistent delivery of reliable results, projects such as these may fail to meet user expectations and be dead in the water. On the flip side, if RDF XML and its successors can accomplish what they purport to, they will drive significant advances in research by providing the capacity to dynamically derive rich meaning from relationships as well as content.
Thanks to David Wojick for sharing the LANL paper that contributed to this post.