But this post talks about a different idea, and it is awesome. Now that they have all this geotagged pictures and they know they are in a particular neighborhood or country (WOEID) they are creating shapefiles from these areas based on the coordinates of the different pictures! Maybe is easier to explain it with a picture:
The one in the left is a London polygon and the right for United States. They were created by aggregating all the coordinates of the different pictures found in London and in the US.
Again. They take all the coordinates, POINTs, in the DB for a certain WOEID and with them they generate a polygon. If they have enough data it looks more or less ok. They have a good discussion about the quality of the polygons and the threshold set to generate them. I like very much the introduction they make to Alpha shapes and the links they provide.
Additionally they also provide the source code of the tool they use to process the data. It is called Clustr.
Now, lets apply this concept to Biodiversity, if you haven't already figure it out. Think of their geotagged pictures as PRIMARY DATA, the WOE as SCIENTIFIC NAMES and the polygons you get out of them as DERIVED SPECIES DISTRIBUTION polygons.
I have been investigating this for a long time already, specially trough the Biodiversity Atlas project that is now stopped, but hopefully will start soon. We can take their source code and apply it to GBIF data and generate "derived, unchecked and uncompleted" species distributions based on GBIF data on a massive way! And the same they try in Flickr, the more primary data we get into the system the better the distributions will start looking like.
But there is another idea... why dont ask Flickr to not only process their polygons based on WOE Ids but also on tags? So if we tag pictures in Flickr with scientific names, or better GUIDs, they can then try to generate by themselves the distributions.
I particularly dont see Flickr as the best place to handle species distributions discussions, and will, hopefully, try to convince at least one big biodiversity project to let me try this way. Most of you can probably imagine the incredible API we can create once we have a lot of species distributions accessible in such a system. I will write another post about it soon, but think of:
- Which species could live in my garden?
- Which species habitats this new road will cross?
- Where should I create a new Protected Area to preserve as much biodiversity as possible?
- What species could I find in the track I will hike this weekend?
So maybe instead of so much niche modeling projects we should start thinking on how to manage the vast amount of primary data (and other sources) we have already and how to curate it and complete it. I dream of a scientific community joined together to create a complete information system to know where species are. Imagine like a Wikipedia but for Species Distributions.