Notes from BiblioHack 2012

Notes on BiblioHack
===================

Executive Summary

--

---------------

There are some interesting tools in development at the hackathon. However, none meet our immediate needs in ViBRANT. The tools would require too much work from us both to be generally usable and to be tailored for ViBRANT. As such, the tools represent too great a diversion of resource from our immediate milestones and deliverables. While the tools are of interest for their long term potential, and their development should be monitored for such, they cannot be recommended as immediate additions to our toolset.

Background information
----------------------

Wednesday 13 and Thursday 14 June 2012
Queen Mary, University of London.

Announcement: http://devcsi.ukoln.ac.uk/2012/05/10/bibliohack/
Journal: http://okfnpad.org/bibliohack

DevCSI organised the event in conjunction with several groups of whom the most active participant was the OKFN's (Open Knowledge Foundation, http://blog.okfn.org/) Open Biblio group, who wrote up the event on their blog (http://openbiblio.net/).

Developments relevant to ViBRANT
--------------------------------

Of particular interest to ViBRANT were four existing tools that attendees, which included several of the tools' developers, hoped to work on during the two days.

BibServer
~~~~~~~~~

(https://github.com/okfn/bibserver)

Is a generic bibliographic reference manager with a more sophisticated front-end and search facilities than we currently have in RefBank.

There were, however, many installation problems (full details available on request). While we did get BibServer installed on OS X, Windows, and various flavours of UNIX it took most of the two days. Once we began to use the software we had scalability and encoding problems with BibServer and so failed to install any of our target reference collections (German national bibliography: http://thedatahub.org/dataset/deutsche-nationalbibliografie-dnb, the metadata for 8547 phylogeny-related BioMedCentral articles: http://www.citeulike.org/user/testtest87, and the Catalogue of Life bibliography). Note, there are other issues with the software as it stands for use within ViBRANT, such as BibJSON (http://bibjson.org/) being the only export format.

PubCrawler [the software not the activity :-o ]
~~~~~~~~~~

(https://bitbucket.org/wwmm/pub-crawler)

Is a tool to crawl scholarly publishers' web-sites, publisher->journal->issue-article.

There is a set of common Java code on top of which tailored Java routines perform article level reference harvesting from a publisher's web-site. Unfortunately, quite a lot of the code is tailored and the tool is not so generic as to make a significant reduction in the effort required to harvest references from a new publisher.

During the hackathon we could not install PubCrawler and so did not develop the hoped for BMC crawler.

TEXTUS
~~~~~~

(project home http://textusproject.org/, source code https://github.com/okfn/textus)

Is a tool for adding metadata to text, though its heritage is from the humanities and is built on the OpenShakespeare annotator and so would need more work to meet ViBRANT's needs.

Currently, TEXTUS has no support for bibliographic references. TEXTUS's author hoped to address that deficiency during the hackathon, and while he wrote no code he went away satisfied because he had spent the two days learning about reference management and what needs adding to TEXTUS.

The Pundit
~~~~~~~~~~

(http://thepund.it/)

Is a 'novel semantic web annotation tool', but was too unstable to use so the people looking at it transferred to working on the very similar TEXTUS.

Conclusion
----------

The Open Biblio project gained because there are now better instructions for the installation of, and a comprehensive write up of various bugs and issues with, their BibServer, PubCrawler and TEXTUS software.

ViBRANT gained from the two days because in a very concentrated fashion, and with access to several of the tools' developers and other experienced developers and users, I learned that these tools are not yet suitable for our needs. All are too unstable and in need of development, especially for scalability and internationalisation. Far from offering a short cut to meeting ViBRANT's deliverables we would have to divert resource to support their development.

I was also able to raise ViBRANT's profile with several interested parties and am still following up the new contacts. Updates to follow.

----
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).