TEP 001: Fully switch to unicode (was: [Thuban-list] Problem with Thuban in Ubuntu)

Bernhard Reiter bernhard at intevation.de
Wed Apr 25 12:03:44 CEST 2007


On Wednesday 25 April 2007 11:57, Bram de Greve wrote:
> pyshapelib in my WIP branch is currently supporting Unicode at its
> interface: Unicode strings are used for filenames, fieldnames and string
> content.  So if you're going for all unicode internally, now might be a
> good time to merge the branch with the trunk?

We are still discussing it, but if you think new pyshapelib is ready,
if must be considered.
I take it that is still could be decoupled and pyshapelib will still
use non-unicode strings like Thuban uses them.

> However, I must emphasize that the transformation is not yet complete and
> that its interface is not entirely stable yet.  But it doesn't have to be a
> problem for this.  Here are the open issues:
> - pyshapelib does not support UTF-8 shapefiles (or rather DBF files) yet as
> specified by ESRI (see bottom of:
> http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow
>&d=21106 ).
> - I don't have an UTF-8 shapefile made by ArcGIS to test pyshapelib.

This should be easy to remedy, you could ask on thuban-list (to get more 
users) for an example file or on 
http://intevation.de/mailman/listinfo/freegis-list

> - To my understanding, to support UTF-8, ESRI does not use the Language
> Driver ID (LDID) specified in the header of DBF file, but rather an
> external CodePage file with the extension .cpg.  This is where the biggest
> changes in dbflib will happen.
> - Supporting that CodePage will probably alter the interface of DBFFile a
> bit, but this will mostly concern some read-only attributes that tell the
> used encoding.  These probably won't be needed by Thuban anyway since
> Thuban will blindly use Unicode strings.
> - There's one point in the interface that might change and have an
> influence on Thuban and that's on the creation of a DBFFile where you will
> have to specify the encoding.

If this is the only change, this does not look like a problem, because
you probably will give this encoding attribute a default value (of 
latin-1). ;)

Bernhard

-- 
Managing Director - Owner: www.intevation.net       (Free Software Company)
Germany Coordinator: fsfeurope.org. Coordinator: www.Kolab-Konsortium.com.
Intevation GmbH, Osnabrück, DE; Amtsgericht Osnabrück, HRB 18998
Geschäftsführer Frank Koormann, Bernhard Reiter, Dr. Jan-Oliver Wagner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.intevation.de/pipermail/thuban-devel/attachments/20070425/0ebc0691/attachment.bin


More information about the Thuban-devel mailing list

This site is hosted by Intevation GmbH (Datenschutzerklärung und Impressum | Privacy Policy and Imprint)