Saturday, February 12, 2011

Watch this: Zoological Nomenclature as it unfolds

There is a new flurry of debate in ICZN list (the list of the International Commission on Zoological Nomenclature or the institution that supposedly should take care of the scientific naming of zoological taxa (species, genera, etc.) and all that is related to it in order to provide stability in names) about gender of the species epithet. This is a discussion which brakes loose once and then and wouldn't be as strange, if there would actually be a system that provides the stability.

But there isn't anything like that and we have been waiting now for a long time to get there. We have Zoobank that supposedly gets yet another overhaul to be part of an even bigger system (Global Name Architecture as part of the Global Names Usage Bank. In this corner of the biodiversity informatics world exist three creeds: We need to provide the all encompassing system, the shell that can harbor all the names; there are people out there (the community of crowd) that do the work and want to chip in; The relevant stuff is in the past.

It seems to me, that this all is a misconception of Facebook, Flickr, Google which provide exactly the environment that these people envision. They all are chasing names that are legacy - but are they really that important?

Taxonomy is publishing approximately 17,000 new species and I guess ca 85,000 redescriptions per year, the biomedical field uses few 100 species names in their domain which produces millions of papers a year. But there is, with very few exceptions, no system in place that is set up so that new names are automatically collected and added to those databases.

Zoobank has at best a very tedious interface that allows adding data manually, something that does not happen as we know from a GBIF sponsored project to add Zootaxa names. Instead of spending all the effort to get this system up to date, an overhaul of Zoobank is now on its way for several years without end in sight.

There is no debate about this, but all the effort is focused on the gender agreement and similar matter. And it could not be more abstruse:

If a proper genus and species group name combination was UNIQUE and STABLE, why would we need LSID?
To a computer,
267B9A8B-372C-45EC-BFE5-661AF13CABC8
and
Stomosis arachnophila
are both UNIQUE and Stable codes.

So, why does a computer have to have a long unrecognizeable (to humans) LSID? [Chris Thompson]


(the answer is simple: so the computer knows when we talk about something for which we can not agree the proper name because we can not agree on the ending of the species epithet)

The only systematic collector of names, Zoological Record, struggles all along with, if I am right, 25 employees (zoologists etc.) to decipher the cryptic and well hidden information in our taxonomic work to produce there reference work of zoological literature.
The other large initiative, the Biodiversity Heritage Library, with an important goal to convert as much of legacy print publication into the digital world, struggles as an aside to collect names, though because of copyright issues most older than 70 years and thus irrelevant for the user on the street.

It needs something different, like PubMed and PubMed Central where the publishers submit their paper to be archived and discoverable via a form that includes all the relevant information.

We have to create something BHL-Modern for our field that does exactly this: Whenever something is being published the document is published in a form that can easily read into a dedicated database, the content be checked for validity (in case nomencaltorial acts are included) during the submission process, and then all the information will be available, including treatment, names, bibliographic records and links to external resources, such as DNA, images, etc.

We know have a system that is close to this, just the Zoobank part is missing, and probably should be dealt with differently, by just doing this part without Zoobank but either internally at BHL-Modern or Zoological Record that has the manpower to operate.
A prototype of a system is Pensoft and their suit of journals like Zookeys, Journal of Hymenoptera Research that produces taxbup NLM DTD based output with all the taxonomic elements semantically marked up that is then read in Plazi where the treatments are available for use by EOL, GBIF and whoever requests it. Zoobank is included but only through an akward interface that is being fed manually. Right now, this is not complete, but it is open for criticism: The schema can be modified, the elements to be included in the publication be defined for fulfill the purpose of ICZN, Zoological Record and others.
The good thing about this development is, that there is something alive that is growing; it is a real system that is being used, fed by authors, and paid for, and since it is open access, it is open to all sorts of experiments, but most importantly, to all the users unlimited.