Wednesday, June 25, 2008

Convex Hull over GBIF "points"

I found yesterday a post where Convex Hull was explained and the source code made available for Javascript and PHP. "The convex hull may be easily visualized by imagining an elastic band stretched open to encompass the given points; when released, it will assume the shape of the required convex hull." (from wikipedia). I talked about it months ago with Tim and I thought that would be fun to adapt it to AS3 and use it in BiodiversityAtlas. Months ago we talked that one way to create "polygons" from gbif straight points was to do a Convex Hull over them.
But when finished I realized it might dont make much sense... the convex hull over GBIF points always display enormous polygons over land and water that dont look very nice. Look at the next picture.
This is Puma concolor convex hull over GBIF data. Well this is exactly what a convex Hull is so I am not sure what I was expecting.

There is other types of convex hull like the Orthogonal convex hull that might make more sense in some scenarios, but in general this does not look good to me.

By the way, the convex hull algorithm i am using, quickhull, is amazingly fast.

If you want to try for yourself I have uploaded the test app for you to try. If anybody is interested in the source code let me know and I will post it.


Tim Robertson said...

[12:54:36 PM] Tim says: I was wondering about this the other day... the only thing I could come up with is a simple grouping of cells from the cell density, and then doing multiple convex hulls on groups of points from cells.
So basically starting with
1) group all cells into groups of adjacent cells with a pretty simple algorithm (cell_id2-cell_id1 = 1 and the same for the verticals)
2) for each group, run the convex hull on the points within the cells

[12:55:57 PM] Javier de la Torre says: yeah, but you also add then the problem of inner rings
[12:56:07 PM] Tim says: right - i forgot 3
[12:56:19 PM] Tim says: 3) remove inner rings :D
[12:56:22 PM] Javier de la Torre says: so I am not sure of this is a better solution than what we already have
[12:57:31 PM] Javier de la Torre says: and in some cases it would work very bad... with distributions, local, that have like two "centers" and a little corridor connecting them... convex hull will enmascarate this...
[12:57:31 PM] Tim says: is there an opposite of convex hull? Convex hull is like a big rubber band that shrinks around the points right? Is there one that expands until it reaches points? If so, we could easily find holes that are bigger than 1 degree cell from the cell groups
[12:57:50 PM] Javier de la Torre says: yep.ortogonal convex hull
[12:57:56 PM] Tim says: Yeah, like an invasives corridor...
[12:58:14 PM] Tim says: so we can do doughnuts if there is an inverse converse hull thing
[12:58:44 PM] Tim says: but for corridors... I guess you would need to get clever on the cells and detect corridors...
[12:58:45 PM] Javier de la Torre says: is that what you are looking for? -->
[12:59:12 PM] Tim says: hmm no
[12:59:16 PM] Javier de la Torre says: look at the dots outside of the main area..
[1:00:03 PM] Tim says: that would handle corridors, not doughnuts
[1:00:40 PM] Javier de la Torre says: Im not sure if this would get any better to the actual grid areas we construct...
[1:01:23 PM] Javier de la Torre says: we can smooth the rectangles we create to make them look better... but any other algorithm like you are proposing looks too much for me...
[1:01:33 PM] Javier de la Torre says: then I would rather go to localized niche modelling around the dots
[1:03:12 PM] Tim says: hmmm... localised niche modelling I have to be convinced of
[1:04:51 PM] Javier de la Torre says: I dont like none of them... so I would not try to create automatic distributions from primary data... i rather leave it in a grid... construct polygons from this and use it together with other sources... so I consider this test to be a fail.
[1:05:30 PM] Tim says: Ok... cells are 111Km sqaures at most - this is probably ok globally. Perhaps centi cells?
[1:05:37 PM] Tim says: for zoomed in?
[1:06:06 PM] Javier de la Torre says: centi and even smaller
[1:06:30 PM] Javier de la Torre says: i think people should be able to define their grid at the level they want.
[1:07:02 PM] Tim says: yeah... map reduce to the grid of your choice ;)
[1:07:03 PM] Javier de la Torre says: i will write today another post today about a grid editor at different scales.

Stewart Macdonald said...

So did you guys ever come up with a way to generate species distribution maps that is better than a simple convex polygon? I'm working in this area at the moment with Australian taxa, but I'm struggling to get some decent working. Generating these maps by eye is very easy, but automating it and taking into account outliers and geographical barriers is next to impossible!


Tim Robertson said...

Hi Stewart,
We have not really been looking at this too much, as all this is a side project for us and we have both been busy with work - the next thing I want to try is the AlphaShape algorithm. It is basically a ConvexHull that has a maximum vertex length - thus any line in the polygon that is longer than the maximum length is split and you end up with several polygons from the points instead of a single one. It won't handle 'doughnuts' very well, but would be better for the Puma concolor one shown.
Javi and I are both always happy to chat about ideas if you want to grab us on skype. Are you working with the Atlas of Living Australia project by any chance?

Javier de la Torre said...

I have fun trying these things, but as I already declared I am not sure if this is the best way to go. I mean, the idea itself, trying to create distribution polygons from spread points I am not sure if it can be generalized, even if it look obvious sometimes in our eyes.

I think right now that there are two possible ways to handle this problem, specially for world wide distributions as in GBIF:

1) Grid System: Try to be totally agnostic. This way noone can complain that the distribution is wrong as it is just a visualization representation and that only depend on the precision you have for the data. Of course that can introduce errors, but somehow are obvious from the system... the user is aware of inconsistencies due to the method.

2) Localized Niche Modeling: If you want to be smarter, better to be based on something that is related to biology and not to pure geometric algorithms. By localized I mean doing very limited niche modeling only around groups of points, therefore not introducing errors like distributions not considering physical barriers.

In any case Tim, do you have any link to this AlphaShape thing? Would be fun to try it.

Tim Robertson said...

I guess for somewhere like Australia one might have enough environmental data to do niche modeling... but I would imagine you need to do some kind of grouping of taxa types, and then select the environmental params specific to the group. is a nice approach - nice modeling on 0.5 degree grids, that are then verified by people, who can add additional rules (e.g. does not lie above 55 degrees north, or does not cross this boundary).

AlphaShapes... not link, and difficult to find info. The world database on protected areas guys gave me some code (VB I think) to look at. I can send it on if you mail me Stewart - timrobertson100 on gmail.

Stewart Macdonald said...

Thanks for the replies. I like the grid idea. Tim, I'll email you 'off-list'.