Sunday, February 13, 2011

Kurt Pickett dead

After a long illness, my colleague and friend Kurt died a couple of days ago.

The last time I was listening to Kurt was when he presented his lecture at the meeting of the International Society of Hymenopterists in Köszeg. He wasn't able to fly and would deliver via a remote link, a lecture in which he both experimented with content and style. Even in that moment he was innovative and driven with his endless curiosity - it is sad that he is not anymore among us, an inspiring and outspoken person with a formidable mix of intellect, field- and labwork and style. I will miss him.

Kurt's lab

Saturday, February 12, 2011

Watch this: Zoological Nomenclature as it unfolds

There is a new flurry of debate in ICZN list (the list of the International Commission on Zoological Nomenclature or the institution that supposedly should take care of the scientific naming of zoological taxa (species, genera, etc.) and all that is related to it in order to provide stability in names) about gender of the species epithet. This is a discussion which brakes loose once and then and wouldn't be as strange, if there would actually be a system that provides the stability.

But there isn't anything like that and we have been waiting now for a long time to get there. We have Zoobank that supposedly gets yet another overhaul to be part of an even bigger system (Global Name Architecture as part of the Global Names Usage Bank. In this corner of the biodiversity informatics world exist three creeds: We need to provide the all encompassing system, the shell that can harbor all the names; there are people out there (the community of crowd) that do the work and want to chip in; The relevant stuff is in the past.

It seems to me, that this all is a misconception of Facebook, Flickr, Google which provide exactly the environment that these people envision. They all are chasing names that are legacy - but are they really that important?

Taxonomy is publishing approximately 17,000 new species and I guess ca 85,000 redescriptions per year, the biomedical field uses few 100 species names in their domain which produces millions of papers a year. But there is, with very few exceptions, no system in place that is set up so that new names are automatically collected and added to those databases.

Zoobank has at best a very tedious interface that allows adding data manually, something that does not happen as we know from a GBIF sponsored project to add Zootaxa names. Instead of spending all the effort to get this system up to date, an overhaul of Zoobank is now on its way for several years without end in sight.

There is no debate about this, but all the effort is focused on the gender agreement and similar matter. And it could not be more abstruse:

If a proper genus and species group name combination was UNIQUE and STABLE, why would we need LSID?
To a computer,
267B9A8B-372C-45EC-BFE5-661AF13CABC8
and
Stomosis arachnophila
are both UNIQUE and Stable codes.

So, why does a computer have to have a long unrecognizeable (to humans) LSID? [Chris Thompson]


(the answer is simple: so the computer knows when we talk about something for which we can not agree the proper name because we can not agree on the ending of the species epithet)

The only systematic collector of names, Zoological Record, struggles all along with, if I am right, 25 employees (zoologists etc.) to decipher the cryptic and well hidden information in our taxonomic work to produce there reference work of zoological literature.
The other large initiative, the Biodiversity Heritage Library, with an important goal to convert as much of legacy print publication into the digital world, struggles as an aside to collect names, though because of copyright issues most older than 70 years and thus irrelevant for the user on the street.

It needs something different, like PubMed and PubMed Central where the publishers submit their paper to be archived and discoverable via a form that includes all the relevant information.

We have to create something BHL-Modern for our field that does exactly this: Whenever something is being published the document is published in a form that can easily read into a dedicated database, the content be checked for validity (in case nomencaltorial acts are included) during the submission process, and then all the information will be available, including treatment, names, bibliographic records and links to external resources, such as DNA, images, etc.

We know have a system that is close to this, just the Zoobank part is missing, and probably should be dealt with differently, by just doing this part without Zoobank but either internally at BHL-Modern or Zoological Record that has the manpower to operate.
A prototype of a system is Pensoft and their suit of journals like Zookeys, Journal of Hymenoptera Research that produces taxbup NLM DTD based output with all the taxonomic elements semantically marked up that is then read in Plazi where the treatments are available for use by EOL, GBIF and whoever requests it. Zoobank is included but only through an akward interface that is being fed manually. Right now, this is not complete, but it is open for criticism: The schema can be modified, the elements to be included in the publication be defined for fulfill the purpose of ICZN, Zoological Record and others.
The good thing about this development is, that there is something alive that is growing; it is a real system that is being used, fed by authors, and paid for, and since it is open access, it is open to all sorts of experiments, but most importantly, to all the users unlimited.

Thursday, February 10, 2011

Collaborations in the our Field: Synergies and US Virtual Herbarium

United States Virtual Herbarium Workshop

United States Virtual Herbarium Workshop at the Missouri Botanical Garden in St Louis from February 23-25, 2010 at the Monsanto Center. (see report)

What is a National Virtual Herbarium? Well, in part that will be determined by this workshop but at the least the United States Virtual Herbarium (USVH) will support nearly all of the operations performed by a traditional herbarium. The USVH will serve as a storage and distribution center for knowledge about plants. It also must serve as a center of knowledge creation as is the case with physically located herbaria. From the information resource perspective, which of course I would take, the USVM must support the acquisition, evaluation, analysis, storage, dissemination and "weeding" of information in support of decision making for society.

Acquisition of information in herbaria is in many forms beginning with plant specimens, but including collections metadata, publications, field notes, photographs and many other materials. If there were a U.S. Virtual Herbarium it would be possible to streamline acquisition because many botanists send pieces of the same specimen to several herbaria. There is no need to redo all that data entry… like copy cataloging. There is a great opportunity for authority control and other quality control operations. There are a large number of museum digitization projects occurring throughout the country. There could be economies of scale through greater coordination of these efforts. Likewise, evaluation of information quality could be better managed. For example, if one specimen undergoes a redetermination all of its twins should be reevaluated as well wherever they are held. Likewise, georeferencing could be shared. Bryan Heidorn


This seems to be an effort worthwhile to study and copy to our domain, but also to find out what it really does and what is missing.

Tuesday, February 08, 2011

Journal of Hymenoptera Research Open Access and taxpub based

On February 8, Pensoft published the first gold Open Access and NLM Taxpub based issue of the Journal of Hymenoptera Research. It is one of the first journals Pensoft publishes for a scientific society besides its own inhouse journals such as Zookeys. The implication are beyond changing from a traditional pdf based to a semantically enhanced taxpub NLM DTD based journal allowing immediate distribution of its content to a set of external aggregators such as Plazi, Encylopedia of Life, Wikispecied or the Global Biodiversity Information Facility, and not least PubMedCentral for archival, one of the big issues in taxonomic literature. The big question will be, how the journal will break even, will it attract more readers, and whether the members of the society will continue their annual subscription now that one of the returns is open and freely available.

I am very glad that this development keeps its momentum. The Hymenoptera taxonomist's community has and is developing very advanced tools that, if brought together, will make the field of Hymenoptera taxonomy very attractive. Norm Johnson's Hymenoptera Name Server and John Noyes Chalcidoid Database are just two very large databases with well over 100,000 names and related bibliographic records included. Antbase was spearheading the development of online domain specific digital libraries, antweb the development of standardized digital imagery, the Hymenoptera Anatomy Ontology and not least advanced systems like Norm Johnson's that allow XML publishing straight out of the databases (see an example here).
What is a treatment?

On the way to define or better provide a description of treatment for the forthcoming release of taxpub, we once again stumble, as we many times did in our meetings, about the term "treatment". For us, it seemed, it was always clear that a treatment is the scientific description of a taxon including a Latinized name of the nominate taxon,followed by one or several elements such as references to older literature citing this taxon and putting it in relation (nov.comb, syn., etc.), a description (a verbatim morphological decription; that is why the element is not called description but treatment), distribution (a summary of the materials citated), materials citation (including references to the original specimen or observations used for the analysis), biology, ecology, host-relationships, etymology, etc.

On earlier publications we referred to treatment as follows:

"The presentation of names or treatments of species in taxonomic literature is not individual in the sense described above. The content of these treatments may be of high scientific value, it may be singular and new, but it derives fundamental meaning only in the context of scientific conventions that have long been established and practiced. Taxonomic treatments are formulated in a highly standardized language following highly standardized criteria. They adhere to rules and pre-defined logic.
They are not "individual", nor "original" in the sense of copyright law. They are thus data, but not "works", and therefore belong to the public domain." (Agosti & Egloff, 2009)

A key feature of this literature is the taxonomic “treatment”: publications or (more frequently) sections of publications documenting the features or distribution of a related group of organisms (called a “taxon”, plural “taxa”) in ways adhering to highly formalized conventions. Some of these are over a century old and are maintained by scientific commissions accepted by the profession. Two of the most significant are the international standard for naming animals, the International Code for Zoological Nomenclature (ICZN), and the corresponding code for plants, the International Code for Botanical Nomenclature (ICBN).(Catapano, 2010)


Winston (Winston, J. E. 1999. Describing species. Practical taxonomic procedures for biologists. Columbia University Press, New York. 518 pp) uses this term only once (p370)"...to find a monographic treatment with a key that covered all the known species". For her, treatments are "species descriptions". She doesn't present an explicit definition (at least, I haven't found it), but discusses what belongs into it on page 83:

"However, there is a common basic structure: a heading that consists of scientific name, name, author, and date, followed by a synonymy (a list of previous references to that species), and then the main body of the description, which may include etymology, diagnosis, taxonomic discussion, ecology, and distribution sections. Aftet the somewhat looser style of textbooks and research articles, taxonomic descriptions may at first seem mystifying, rather like a racetrack program in which each entering horse's form is described in tiny print and strange abbreviations. In fact, the standard taxonomic description bears a strong resemblance to the information given on each horse entered in a particular race, giving similar vital statistics (animal's name, parentage, date of "birth," description, and past performance) and packing a considerable amount of information about the organism into a very small space.
"