Friday, October 24, 2008

Identifying good images on Google cache for scientific names

I have been working on ways to represent taxonomic trees in a more intuitive way for non-biologist. The best I have found until now is to provide a contextual image together with the name. This helps a lot specially on higher ranks where most of the names are very unfamiliar for most of people, like me for example.
The problem is from where to get images about all scientific names. Well the best I managed to find is Google AJAX API. This is the content you get when you search for images on Google. The service is fast and reliable. But, it has one problem, it is not content aware. Well, they are trying, check this and much better this, but still it is not. So sometimes the results you get back just dont make any sense. There are some very pornographic examples of this, but to keep it kids friendly check the Phylum Labyrinthulomycota. The first image you will get is from this nice researcher:
Well, that is not of great help to get an idea of what this phylum is about. Therefore I had been for a while thinking in doing a little application where people can help me to select pictures that actually help to get an idea of what is behind a cold scientific name. And today I had a little bit of time and wanted to deploy something on AppEngine.

So here I am presenting a very simple application to ask for collaboration on this task. There are only 13 million names that I need to find a picture for, but I think I have enough friends :D

The application is very simple and I have not added any visual effects, but I just wanted to give it a try.
The rankings I get from this will be released soon in an API for anybody else.

Things I would like to incorporate:
1) Make it a game, more precisely a GWAP "game with a purpose". Like Google Image Labeler or any other toy from Luis von Ahn. (I wanted to link to his site for sooo much time).

2) Allow Multiple rankings. Now names only get evaluated once so a malicious contributor can ruin all this.

3) More sources, specially Flickr.

4) Nice UI so that Sergio, our new blogger here, is happy :)

When I get a decent amount of reviews from people I will post some stats and an example application.

Come on send the link to everybody and help everybody to understand better scientific names!

onur güngör said...

Hmm. That's a promising implementation of human computation. I too work on implementing a game which incorporates human computation.

I would like to communicate with you about the subject.

I think you can message me using my gmail handle.