[Thuban-list] visual classifier
putler at sauder.ubc.ca
Thu Dec 4 23:38:51 CET 2003
The "natural breaks" method as done in ArcView is also known as Jenks Method. One citation that might help in your literature search (given the references it contains) is Jenks, George F. and Fred C. Caspall (1971), "Error in Choroplethic Maps: Definition, Measurement, Reduction," Annals of the Association of American Geographers 61 (2, June), 217-244.
Jenks methods is actually based on a method proposed by Walter Fisher. The cite to Fisher's work is Fisher, Walter D. (1958), "On Grouping for Maximum Homogeneity," American Statistical Assoication Journal, December. Fisher's method turns out to be pretty computationally intensive, so my guess is that ESRI actually doesn't use it in ArcView. Ultimately, "natural breaks" amounts to single variable cluster analysis. Given this, a K-Means cluster analysis algorithm could be used. Another approach (that is used as an alternative approach to Fisher's method in xlstat, a set of add-in multivariate statistics macros of Excel) is an algorithm proposed by Anderberg (Anderberg M.R. (1973). Cluster analysis for applications. Academic Press, New York). A benefit to using K-Means to create the "natural breaks" is that the SciPy package already contains a K-Means method function written in Python. Although, it may require bringing in a lot of bagage to Thuban.
Some thoughts. Given the level of detail, I've been thinking about this. I wrote two classifiers for OpenEV, and wanted to include a natural breaks classifier. However, I haven't had time to work on it, and don't think I will for awhile.
*********** REPLY SEPARATOR ***********
On 12/4/2003 at 12:47 PM Daniel Calvelo wrote:
>> When you said:
>> > and a set of triangular markers indicating
>> > breaks. These markers can be deleted (double-click) and added
>> > (single-click on new location). They represent cut points from which a
>> > of ranges will be generated.
>> does that mean that these markers are set automatically at the
>> beginning ? How do you define these breaks ?
>Yes, they are set initially as evenly spaced. I looked hard at how you
>initialize them using the other classifiers, but the infrastructure ties
>closely the classifiers with the class definitions, and I couldn't find an
>elegant, unobtrusive solution.
>My idea was to have a permanent View of the classifier (in the cute world,
>would also have visual cues for the corresponding drawing style), in which
>markers could be gridded, reset to automatic settings (quantiles, even
>"natural" breaks,...), hand-tuned either visually or by input, and so on. A
>kind of classification studio if you wish. Generating the theme classes
>be either a side-effect or an extra action for this subsystem.
>I got stuck by wxWindows too. I couldn't produce something correct that
>me sliders that I could drag and drop. Could wx-savvy people give any hint
>I also explored natural breaks. I found very scarce literature on the
>>From what I gathered, natural breaks are based on a kernel approximation
>the data distribution. From previous experience, programming these things
>needs tuning and lots of special casing. If you have ideas on the subject,
>they are most welcome.
>Thuban-list mailing list
>Thuban-list at intevation.de
Sauder School of Business
The University of British Columbia
Email: putler at sauder dot ubc dot ca
More information about the Thuban-list