On 4/25/07, <b class="gmail_sendername">Bernhard Reiter</b> <<a href="mailto:bernhard@intevation.de">bernhard@intevation.de</a>> wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Wednesday 25 April 2007 11:01, Didrik Pinte wrote:<br>><br>> This sounds a very good idea. Especially that now we have correct<br>> support of<br>> unicode in pyshapelib thanks to Bram.<br><br>Does this mean we need to couple this with using the new pyshapelib?
<br>Just curious. Decoupled steps are always good. ;)<br><br>If you want to start doing this, check the<br>Doc/technotes/string_representation.txt<br>add some code to use "unicode" and try running it with an non-English
<br>locale.<br><br>Bernhard</blockquote><div><br><br>pyshapelib in my WIP branch is currently supporting Unicode at its interface: Unicode strings are used for filenames, fieldnames and string content. So if you're going for all unicode internally, now might be a good time to merge the branch with the trunk?
<br><br>However, I must emphasize that the transformation is not yet complete and that its interface is not entirely stable yet. But it doesn't have to be a problem for this. Here are the open issues:<br>- pyshapelib does not support UTF-8 shapefiles (or rather DBF files) yet as specified by ESRI (see bottom of:
<a href="http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&d=21106">http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&d=21106</a>).<br>- I don't have an UTF-8 shapefile made by ArcGIS to test pyshapelib.
<br>- To my understanding, to support UTF-8, ESRI does not use the Language Driver ID (LDID) specified in the header of DBF file, but rather an external CodePage file with the extension .cpg. This is where the biggest changes in dbflib will happen.
<br>- Supporting that CodePage will probably alter the interface of DBFFile a bit, but this will mostly concern some read-only attributes that tell the used encoding. These probably won't be needed by Thuban anyway since Thuban will blindly use Unicode strings.
<br>- There's one point in the interface that might change and have an influence on Thuban and that's on the creation of a DBFFile where you will have to specify the encoding.<br></div></div><br>Bram<br clear="all">
<br>-- <br>hi, i'm a signature viruz, plz set me as your signature and help me spread :)