[Thuban-list] Patch: classifiers

Daniel Calvelo Aros dcalvelo at minag.gob.pe
Mon Feb 9 07:58:40 CET 2004


Hi all.

The attached patch provides four classifiers to thuban's available three.

1. "Custom": that's a debugged version of my early "visual classifier". I
found it quite useful if used alongside the current classification window and
*after* another classifier has been used. It will start with the current
classification breaks and allow you to edit them. I used a very ugly hack to
do that. Could knowledgeable people give it a look and advise? Besides, that
will work only if you classify then *close* the classgen dialog and reopen it.

2. "Distribution discontinuities": this is based on a very nice idea and code
from J.P. Grimmeau of the ULB (thanks Moritz!). It will automatically find
discontinuities in the data distribution and select breaks accordingly. I'm
not sure on the proper way of giving credit, though.

3. "Distribution classes": this is based on a k-means classification of the
data, followed by a break-finding simple rule (nearest centroid). It also
fixes the number of classes although an initial value is required.

4. "Optimized distribution classes": an extra optimization pass after the
previous clustering. The optimization is a simulated annealing applied with a
target function based on Jenks & Caspall (1971). It's not really useful for
now, but all the hooks are there to use it as a "natural breaks" finder.

Since the last three algorithms are pretty sophisticated for the average SIG
user (AFAICT), I included a short description in the interface itself.

I tried to keep the algorithms generic, so they could be used elsewhere.

The zipped file includes a patch (for both classgen.py's mainly) and some new
files.

Please give it a try and I will do my best to improve the thing.

Some notes for the more guts-savvy people: these classifiers need to load the
entire dataset into memory to massage it properly; this can become pretty slow
on large datasets: are there plans to allow access to large datasets (I mean,
larger than available python memory)? I'm sure many things in the code can be
adapted to progressive calculations and other tricks.

BTW has anybody thought about interfacing thuban with Terralib? Lots of
interesting algorithms in there!

Daniel.

PS. I couldn't manage to reverse my previous patch for HSV ramps, so it's also
included here. Sorry for the inconvenience!

PPS. Still working on getting "natural breaks" to work...

-- Daniel Calvelo Aros
-- Dirección General de Información Agraria
-- Ministerio de Agricultura del Perú
-- (51-1)424-9001

-------------- next part --------------
A non-text attachment was scrubbed...
Name: hsv+class.zip
Type: application/zip
Size: 13952 bytes
Desc: not available
Url : http://www.intevation.de/pipermail/thuban-list/attachments/20040209/cb33d901/hsvclass.zip


More information about the Thuban-list mailing list

This site is hosted by Intevation GmbH (Datenschutzerklärung und Impressum | Privacy Policy and Imprint)