Friday, July 3, 2009

Biodiversity databases and OGC standards don't play well together


Some days ago Tim and I had some discussions about how to provide OGC services for biodiversity databases, like for example the Global Registry of Migratory Species. This time the reason for the discussion to start again was the discovery of the new INSPIRE Geoportal Viewer. For those who don't know it, the INSPIRE directive is pushing the creation of a common infrastructure fo sharing geospatial data within Europe. They plan to do that by using Open Standards like the ones from the Open Geospatial Consortium (OGC).

The typical use case they always describe with these Spatial Data Infrastructures (SDIs) is having a registry of services (Catalog Service) where a user can find geospatial data. The data is available through Web services, like Web Map Server (WMS) or Web Feature Service (WFS). With a
web or desktop GIS client you discover, connect to the services, display different layers, do some analysis, print results, etc. All by using open standards and mixing data from lot of different places.

The INSPIRE registry and Geoportal Viewer is the typical example. On the viewer you can select from a list of available services (registered on the catalog service) or use your WMS web service URL.

When you select a service, you get a list of available layers on it. For example in the Spanish SDI you get this:
The way this work internally is by doing a getCapabilities request to the WMS server. The returned XML document list the layers available with metadata on how to query it, owner, etc.

But what happens when you have a database like the GBIF cache with 1.8 Million species? You can not create a layer per species, or the getCapabitilites document will be MB and MB impossible to parse by any client. In any case, who wants to provide a list of 1.8M species to select a layer?

Well, the way to make it work is to specify a filter on the WMS request specifying for example the species_id you are interested in. But those generic clients do not support specifying these kind of filters.

To me that means that the current status of OGC clients, like the ones used by INSPIRE, GEOSS, National SDIs, etc. are not able to handle biodiversity OGC services. Or say it in a different way, OGC services are not prepared to handle biodiversity databases with lot of species.

What are the possibilities?

1) OGC supports on their capabilities documents things like "Hey! I am not a service with a set of layers, I am a datastore with potentially millions of layers. So if you want to grab anything from me, you are going to provide a filter in your request". This will imply that OGC do some work and more important, software clients support this work. I think this will not happen in a few years.

2) Create a set of interesting layers in biodiversity. We could match our biodiversity databases against IUCN list of endangered species, create richness maps, etc, but access to primary data per species will not be possible.

If you think on the potential customers we probably should be thinking on a predefined list of layers that we could all create on our OGC services that might be interesting for lot of people. Richness, endangered species, kingdoms, whatever...

Other possibility is that we create portals where the user filter for a species and then gets a "customized,dynamic GetCapabilities document" that will include the filter on the URL. That will be possible. But with Catalog Services, like GEOSS, where there will thousands of services, is biodiversity going to be so special as for the user to go to one of our website before continuing in their wonderful world of web services workflow? I doubt it.

Next week I am going to Geoweb 09 as invited speaker to talk about Biodiversity and the challenges to share it on the Geoweb. I would love to hear what do you think about using OGC services within our community or any other issue related to geospatial data and analysis.
I would love to hear what do you think about using OGC services within our community.

7 comments:

Donald Hobern said...

Anything short of a full model for filter parameters will not meet all cases - a user might only be interested in records of frogs from the 1970s.

However - just speaking off the top of my head - wouldn't a hierarchical catalogue model be an option?

The top-level catalogue could indicate that GBIF offers a catalogue of biodiversity data layers. That catalogue includes a layer for all data and a catalogue service for each kingdom. Each kingdom catalogue offers a layer for all data for that kingdom and a catalogue service for each phylum. Etc.

A client could then open nested catalogue requests to allow a user to tunnel to the desired layer.

Of course that might be just as big a change to the OGC standards, but it seems a plausible approach.

sean.gillies@mac.com said...

In GIS parlance, your species (and their ranges, etc) would be features, not feature *types*, yes? I see all kinds of problems with the OGC's architecture, but I don't think you have one here.

Javier de la Torre said...

Yes. The whole species database would be a FeatureType and each species would be a Feature. In DBs terms there is a very big table with million of species records with a POINT geometry.

Of course you can share this data like this using OGC (well with some scaling issue i will comment on other post). Check out http://geoserver.gbif.org/ for an example of that. There you can find one feature type called gbifDensityLayer. This layer has behind 1.8Million species features. For the user to make sense of it they have to specify in their request the species they are interested in. Without this filter it does not make sense to query this feature type.
For example here is a request:
http://geoserver.gbif.org/wms?LAYERS=gbif%3AgbifDensityLayer&VERSION=1.0.0&TRANSPARENT=true&FORMAT=image%2Fpng&FILTER=(%3CFilter%3E%3CAnd%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3Etype%3C%2FPropertyName%3E%3CLiteral%3E1%3C%2FLiteral%3E%3C%2FPropertyIsEqualTo%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3Econcept%3C%2FPropertyName%3E%3CLiteral%3E13839800%3C%2FLiteral%3E%3C%2FPropertyIsEqualTo%3E%3C%2FAnd%3E%3C%2FFilter%3E)&SERVICE=WMS&REQUEST=GetMap&STYLES=&EXCEPTIONS=application%2Fvnd.ogc.se_inimage&SRS=EPSG%3A4326&BBOX=-180,-90,0,90&WIDTH=256&HEIGHT=256

Inside the URL there is a filter using the attribute "concept" of the feature type. How would you know that this is the kind of required filter that you have to use when accessing this datastore in a "generic" client? This information is not exposed in the capabilities document.

Javier de la Torre said...

Hi Donald,

I am not sure if I understand your proposal. It would be like the WMS/WFS/WCS services became a kind of catalog service too. That sounds more complicate to me to be added to the stack, but definitely would be a solution. Im would propose something simpler, as a start.

In the capabilities you express the kind of queries you accept, and that includes, free searches, ranges and selection between choices (or multichoice). It would not be possible to provide browsing capabilities as you propose, but it would not be that complicate. Additionally also texts describing the fields to search to be able to create nice interfaces that the user can understand.

Maybe we can add some of this stuff as extensions to the satndards. At the begining it might be that only us support it, but we can work on the open source clients to support it. Then maybe push OGC to accept those extensions... well i am not an OGC process expert... but I will try to find one here at Vancouver this week.

Donald Hobern said...

Javier,

My suggestion was actually that the catalog service should allow you to register other catalog services as well as WFS/WMS/WCS services and that this would allow you to have nested catalogs, which could correspond to nested taxa.

Javier de la Torre said...

Ahh Donald, I see. But that would mean registering taxonomies in the catalog services. If there is a change on taxonomy, catalog services will have to update... I am not sure if catalog services are the best place to handle this. I think they will quickly become unupdated no?

I have a lunch today with people from Galdos to talk about registries, I dont know much about them so I am very curious what they offer. At the end taxonomy should be a big registry no? :)

sean.gillies@mac.com said...

Javier, I see where you're coming from now. Are you open to an approach that comes from the web instead of the OGC? You could use an OpenSearch description document:

http://www.opensearch.org/Specifications/OpenSearch/1.1#OpenSearch_description_document

There remains a problem, of course: how do OGC clients find the OpenSearch doc from your service's capabilities? From HTML, you'd use a link with rel="search", but there's no such thing for OGC services.