WP7: Biodiversity literature access and data mining

Workpackage 7 will be led by the UK’s Open University, who are experts in data mining biodiversity literature, and will engage partners at the Karlsruhe Institute for Technology, Pensoft Publishing (covering data formats in contemporary literature) and the Natural History Museum London. Key aspects of this work include developments on i) the infrastructure to support the creation and ongoing maintenance of community constructed digital bibliographies within the Scratchpad virtual research environment; ii) robust, federated search mechanism and context-sensitive ranking of search results for biodiversity literature; iii) web services to recover certain content elements, such as taxonomic names, author names and locality, from within text blocks; iv) the means to identify structural elements (text blocks) of different types within published documents; and v) the infrastructure to support annotation and correction of documents by citizen scientists and others. These research activities build on the EU funded INOTAXA project at the Natural History Museum, London and the Plazi project run from the Karlsruhe Institute for Technology.

WP7 is the key link with the agINFRA project.  Slides from teh agINFRA kick-off meeting are attached below.

WP lead: 
David Morse
Lead partner: 
OU

M7.27 – Publish ViBRANT NLP corpus

Date: 
31/10/2013
Deliverable or Milestone: 
Milestone

To develop, refine and assess our data mining work, and the similar work of others aiming to mine biodiversity texts, a substantial gold standard corpus is required. None currently exists. This milestone addresses the community need.

Reporter: 
Dauvit King
Completed: 
Completed

M7.26 – Workpackage software packaged

Date: 
31/10/2013
Deliverable or Milestone: 
Milestone

For sustainability and to formalise our contractual agreement with the EC, we need to package the software in industry standard formats, following accepted coding conventions and using version control. Software to be deposited in the Scratchpads git repository.

Reporter: 
Dauvit King
Completed: 
Completed

[ViBRANT] Future Citations notes

Dear all

Last week I attended a JISC organised hackathon on citations. It was a useful couple of days, covering a variety of topics. Relevant to ViBRANT we explored:

1. the nature of citations

Notes from BiblioHack 2012

Notes on BiblioHack
===================

Executive Summary

--

---------------

M7.25 Enhance reference parser to parse references in bulk uploads

Date: 
31/10/2012
Deliverable or Milestone: 
Milestone

Opening RefBank to allow people to upload references requires that the reference parser is enhanced so that it can cope with the variety of formats that will be presented. This is not simply to cope with well-structured references that adhere to reference format standards, but also common mistakes and errors in the formats. M7.25 builds on the functionality that will be developed in M7.21, M7.22, M7.23 and M7.24.

Actually completed earlier, but only got around to writing the report today, Tuesday 30 October 2012. Dauvit

Reporter: 
Dauvit King
Completed: 
Completed

M7.24 Upload service for complete bibliographies

Date: 
01/06/2012
Deliverable or Milestone: 
Milestone

In order to allow individuals to upload bibliographies that they have compiled, a bulk upload service is required. This milestone will address that requirement. M7.24 requires the functionality that will be developed in M7.21, M7.22 and M7.23.

This milestone was defined in M7.15.

----

Milestone original delivery date was Friday 29 June 2012.
Brought forward to Friday 1 June in line with changing emphasis within WP7 to favour progress on bibliographic reference handling at the expense of mark up processing. See milestones M7.16 and M7.23.

Reporter: 
Dauvit King
Completed: 
Completed

M7.23 Extend RefBank import routines to support other widely used bibliographic formats, eg BibTex, RIS, etc

Date: 
01/06/2012
Deliverable or Milestone: 
Milestone

As the milestone states, RefBank needs extending so that import from other widely used bibliographic formats such as BibTex and RIS are supported. This will facilitate populating the database by bulk upload of personal bibliographies. Note that this milestone relies on there being a working attribution (origin) mechanism - see Milestone M7.21 - if people who upload their bibliographies are to be credited. Bulk upload will be implemented by Milestone M7.24.

This milestone was defined in M7.15.

----

Milestone original delivery date was Friday 29 June 2012.

Reporter: 
Dauvit King
Completed: 
Completed

M7.22 Import bibliographies from Pensoft to RefBank

Date: 
04/05/2012
Deliverable or Milestone: 
Milestone

Develop the infrastructure to import bibliographic information automatically from Pensoft to RefBank. Typically, this will be the metadata about each publication and the bibliography from the end of the paper. Note that this milestone relies on there being a working attribution (origin) mechanism - see Milestone M7.21.

This milestone was defined in M7.15.

Reporter: 
Dauvit King
Completed: 
Completed

M7.21 Add metadata to cover origin of bibliographies

Date: 
13/04/2012
Deliverable or Milestone: 
Milestone

Metadata and processing code added to RefBank to cover origin of bibliographies.

Use cases:
1) Bibliographies taken from a publication, in which case the origin is the bibliographic details of the original publication or (possibly) RefBank ID of original publication, which will be accurate but not human-user friendly.
2) Bibliographies contributed by a particular author, in which case, attribution of the bibliography is appropriate.

Reporter: 
Dauvit King
Completed: 
Completed

M7.19 - Review of pilot of reference de-duplication software

Date: 
31/07/2013
Deliverable or Milestone: 
Milestone

Working with community contributed references to RefBank means that our repository will have a large number of duplicate references arising from:

Reporter: 
Dauvit King
Completed: 
Completed

D7.3 - Literature search

Date: 
31/10/2013
Deliverable or Milestone: 
Deliverable

This, the third and final deliverable of workpackage seven, was originally conceived of as an "Enhanced search facility to locate concepts based on linguistics and proximity rules." It was superseded during the project by the need to provide a breadth of coverage to enable a bibliography of life.

Reporter: 
Dauvit King
Completed: 
Completed

M7.20 - Workpackage software documentation produced

Date: 
31/10/2013
Deliverable or Milestone: 
Milestone

To encourage uptake and use, and for future enhancement and maintenance after completion of ViBRANT, all workpackage produced software must be fully documented to consistent quality, informed by the appropriate standards.

Reporter: 
Dauvit King
Completed: 
Completed

M7.17 - Review of pilot mark up processes within the Scratchpad infrastructure

Date: 
23/11/2012
Deliverable or Milestone: 
Milestone

Originally planned for 31 July 2012. However, following confirmation at ManComm7 to pull forward bibliographic work in preference to data mining work this milestone was deferred. Delivery date further affected by re-plan to include more enhancements to RefBank during year two than originally envisaged.

Eventually re-scheduled in line with revised date for M7.16 (http://vbrant.eu/content/m716-mark-modules-delivering-outline-mark). M7.16 was completed 28 September 2012.

Reporter: 
Dauvit King
Completed: 
Completed

D7.2 - Mark-up modules

Date: 
29/11/2012
Deliverable or Milestone: 
Deliverable

This deliverable involved extending and integrating the GoldenGATE interactive mark-up tool (http://plazi.org/?q=GoldenGATE) within the Scratchpad infrastructure. GoldenGATE is our tool of choice because it has the mechanisms for handling the stylised structures common in taxonomic literature. Should integration of the complete tool prove difficult, GoldenGATE’s modular structure will permit it to be decomposed so that individual modules can be integrated into the Scratchpad infrastructure or deployed as web services.

Reporter: 
Dauvit King
Completed: 
Completed

M7.18 - First integration phase complete

Date: 
27/11/2013
Deliverable or Milestone: 
Milestone

Notes

This milestone represents implementation of sustainable links between the bibliography service and mark-up services developed by this work package, and Scratchpads.

The bibliography service integrations is achieved through a Scratchpads-to-RefBank harvester program.

The mark-up modules integration is achieved through the standard OBOE interface.

Reporter: 
Dauvit King
Completed: 
Completed

M7.16 - Mark-up modules delivering outline mark-up

Date: 
28/09/2012
Deliverable or Milestone: 
Milestone

E.g. for article boundaries, treatment boundaries, headings and authors

----

Milestone original delivery date was Thursday 31 May 2012.

Deferred to Friday 29 June 2012 in line with changing emphasis within WP7 to favour progress on bibliographic reference handling at the expense of mark up processing. See milestones M7.23 and M7.24.

Deferred again - see Rescheduling below.

Reporter: 
Dauvit King
Completed: 
Completed

D7.1 - Community contributed bibliography

Date: 
30/11/2011
Deliverable or Milestone: 
Deliverable

A functional community-contributed bibliography with unique identifiers at publication unit level(s) and links to publicly available digital copies where possible.

The report contains a description of the community constructed bibliography, covering:

  • architectural and implementation issues,
  • a brief description of the functionality of the bibliography,
  • the way forward, including architectural and functional developments

The prototype is hosted at http://plazi2.cs.umb.edu:8080/RefBank/search

Reporter: 
David Morse
Completed: 
Completed

M7.15 - Define further milestones in the light of usage and feedback

Date: 
29/02/2012
Deliverable or Milestone: 
Milestone

Additional milestones defined to monitor and break down work programme for year 2. Milestones added to the list of milestones and deliverables on the ViBRANT website.

Reporter: 
Dauvit King
Completed: 
Completed

D6.3 - Data publication workflow

Date: 
30/11/2013
Deliverable or Milestone: 
Deliverable

The present deliverable describes several workflows and tools developed or upgraded by Pensoft in the course of the ViBRANT project.

Work package: 
Reporter: 
Lyubomir Penev
Completed: 
Completed

M6.10 - Use cases of existing standards of XML mark up tagging and semantic enhancement collected and review

Date: 
28/02/2011
Deliverable or Milestone: 
Milestone
Work package: 
Reporter: 
Lyubomir Penev
Completed: 
Completed
AttachmentSize
agINFRA_ViBRANT_Roberts.pdf1.79 MB
WP4_OCR_Morse.pdf1.4 MB
WP4_agStor_Morse.pdf612.6 KB
WP4_Scratchpads_Roberts.pdf4.32 MB
WP5_BHL_Morse_opt.pdf1.07 MB
WP5_Policies_Morse.pdf1.32 MB
Report_BibliographyDataFormatsAndServices.odt125.35 KB
Syndicate content