TEP 001: Fully switch to unicode (was: [Thuban-list] Problem with Thuban in Ubuntu)

Bram de Greve bram.degreve at gmail.com
Wed Apr 25 11:57:16 CEST 2007


On 4/25/07, Bernhard Reiter <bernhard at intevation.de> wrote:
>
> On Wednesday 25 April 2007 11:01, Didrik Pinte wrote:
> >
> > This sounds a very good idea. Especially that now we have correct
> > support of
> > unicode in pyshapelib thanks to Bram.
>
> Does this mean we need to couple this with using the new pyshapelib?
> Just curious. Decoupled steps are always good. ;)
>
> If you want to start doing this, check the
> Doc/technotes/string_representation.txt
> add some code to use "unicode" and try running it with an non-English
> locale.
>
> Bernhard



pyshapelib in my WIP branch is currently supporting Unicode at its
interface: Unicode strings are used for filenames, fieldnames and string
content.  So if you're going for all unicode internally, now might be a good
time to merge the branch with the trunk?

However, I must emphasize that the transformation is not yet complete and
that its interface is not entirely stable yet.  But it doesn't have to be a
problem for this.  Here are the open issues:
- pyshapelib does not support UTF-8 shapefiles (or rather DBF files) yet as
specified by ESRI (see bottom of:
http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&d=21106
).
- I don't have an UTF-8 shapefile made by ArcGIS to test pyshapelib.
- To my understanding, to support UTF-8, ESRI does not use the Language
Driver ID (LDID) specified in the header of DBF file, but rather an external
CodePage file with the extension .cpg.  This is where the biggest changes in
dbflib will happen.
- Supporting that CodePage will probably alter the interface of DBFFile a
bit, but this will mostly concern some read-only attributes that tell the
used encoding.  These probably won't be needed by Thuban anyway since Thuban
will blindly use Unicode strings.
- There's one point in the interface that might change and have an influence
on Thuban and that's on the creation of a DBFFile where you will have to
specify the encoding.

Bram

-- 
hi, i'm a signature viruz, plz set me as your signature and help me spread
:)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.intevation.de/pipermail/thuban-devel/attachments/20070425/90c45d51/attachment.html


More information about the Thuban-devel mailing list

This site is hosted by Intevation GmbH (Datenschutzerklärung und Impressum | Privacy Policy and Imprint)