The great
TDWG conference has just finished this year in Fremantle, Western Australia. Me and 180 other biodiversity informatics people have attended and I thought I would point out the most interesting things from my side.
Darwin Core
The most exciting thing to me is the chance to come up with a new Darwin Core soon for ratification that resembles more Dublin Core and consists of possibly 3 namespaces.
While working on the
IPT it seemed that most information we are dealing with in biodiversity informatics is centered around 3 entities only: Taxon, Occurrence and SamplingSite. It would make sense in my eyes to separate elements of a new DarwinCore standard according to those core entities. The normative standard will be decoupled from the implementation technology and only consist of natural language definitions with URIs for the elements of the standard. Specific application schemas (called profiles in DC) will then define/recommend how to use DarwinCore within an XML, RDF, XHTML, OGC application schema or tab file environment. Even datatyping can be left to the respective schemas and e.g. dates can be expressed in their native formats. I will give examples of the different representations soon in another blog entry.
Combining these 3 core entities with the notion of one-to-many "extensions", the star schema of the IPT, one can handle quite a rich definition of data. Extensions for multiple identifications can be added to the occurrences, SPM like species descriptions, geographic distribution or species status for invasiveness to the taxon. And still the whole standard and exchanged data can be extremely simple! Btw, simplicity was probably the most mentioned idea in TDWG this year (a really nice talk was about creating a
Species Index with SiteMaps by Roger Hyam in our Wild Ideas! session). Controlled vocabularies like BasisOfRecord or Ranks should be expressed in simple ASCII files like the ISO country code one, with a code, label, definition and examples. This also allows for easy translation in different languages because the codes stay the same.
Integrated Publishing Toolkit
Surprisingly many people asked for IPT demos after screenshots were shown in some GBIF talks. Me and Tim did demos nearly every day and the publishing tool was well received in general. Alpha testers (mainly usability) for the
public instance at GBIF were gathered and new ideas arose or vague ideas materialised, e.g. validation/annotations:
* validation can be done through external services that adhere to a simple API. The validaion would be asynchroneous, sending a token, link to the full dataset (either dwc xml or tab file dumps) and a callback handler. Once the validation is done the handler would receive the token and a link to the validation report, an xml file that contains annotations (unstructured text) about records together with some probability and potential suggestions (list) of property changes.
* provide API to push datasets into the IPT via REST service
*
BCI collection and institution code validation and lookup of GUIDs during mapping.
TAG & GUIDs
Greg Whitbread will likely be leading the Technical Architecture Group. Refining and shrinking the
core ontology (
owl) was seen as an important issue over the next years.
Originally I had planned to question the uptake of LSIDs again, being in favour of PURLs since the beginning. After long debates that some of us seemed to have experienced before, we came to the following conclusions though keeping up the LSID recommendation:
* LSID look much more stable in printed publications. That means they should really be resolvable through proper LSID resolution via DSN SRV records
* LSID are on the agenda of many projects already
* Proxying LSID with http removes many troubles, especially when used with RDF. The strong recommendation is to always use the proxied version in RDF abouts.
* Changing the domain used within the LSID might cause problems (e.g. name change of an institutions). It might be better to have a single central LSID authority
* pure UUIDs with a central resolver would be great. The resolver would have to know about all existing UUIDs in our domain, but it could be mirrored and sharded easily. Maybe something to think about
* central PURLs and LSID (only) require a registration of services/authorities
Names
Great introductory talks were given by
Rich Pyle about zoological and botanical nomenclatural differences and by
Nico Cellinese about the phylocode. If you always felt to be on shaky grounds with nomenclature or taxon concepts you should really watch these talks!