Study on Journal Article Data Mining from
Publishing Resea
There's a very interesting new report out on Journal Data Mining;
it was prepared by Eefke Smit and Maurits van der Graaf on
behalf of the Publishing Research Consortium, so it has a strong
publisher perspective, but as far as I know it's the first extensive
look at the issues involved in practical and operational large-scale
data mining of the journal literature. One of the really interesting
things that emerges from the report, at least the way I read it, is
that many of the commercial publishers seem to be thinking about
literature mining as a separate activity, not included in traditional
electronic subscription arrangements (site licenses) that they have
with research libraries. (Indeed, many such licenses forbid bulk
downloading of journal articles, which in the absence of text mining
facilities built into the vendor platforms is a prerequisite for such
mining; even if such facilities exist, they essentially mean that the
publishers control the evolution of mining technology). Rather, the
publshers seem to envision a future where they'll do business directly
with potential literature miners.
This is one of several issues framed by the report which I think
merit very careful thought by research library leaders, and broad
conversations engaging faculty.
The report is at:
http://www.publishingresearch.net/documents/PRCSmitJAMreport20June2011VersionofRecord.pdf
and there is an accompanying press release at
DIsclosure: I was one of the many people interviewed for this
study, presumably at least in part because of my 2006 paper on open
computation.
Clifford Lynch
Director, CNI
|