On 4/25/07, <b class="gmail_sendername">Bernhard Reiter</b> &lt;<a href="mailto:bernhard@intevation.de">bernhard@intevation.de</a>&gt; wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

On Wednesday 25 April 2007 11:01, Didrik Pinte wrote: &gt; &gt; This sounds a very good idea. Especially that now we have correct &gt; support of &gt; unicode in pyshapelib thanks to Bram. Does this mean we need to couple this with using the new pyshapelib?

<br>Just curious. Decoupled steps are always good. ;)<br><br>If you want to start doing this, check the<br>Doc/technotes/string_representation.txt<br>add some code to use &quot;unicode&quot; and try running it with an non-English

locale. Bernhard</blockquote><div> pyshapelib in my WIP branch is currently supporting Unicode at its interface: Unicode strings are used for filenames, fieldnames and string content.&nbsp; So if you&#39;re going for all unicode internally, now might be a good time to merge the branch with the trunk?

<br><br>However, I must emphasize that the transformation is not yet complete and that its interface is not entirely stable yet.&nbsp; But it doesn&#39;t have to be a problem for this.&nbsp; Here are the open issues:<br>- pyshapelib does not support UTF-8 shapefiles (or rather DBF files) yet as specified by ESRI (see bottom of: 

<a href="http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&amp;d=21106">http://support.esri.com/index.cfm?fa=knowledgebase.techArticles.articleShow&amp;d=21106</a>). - I don&#39;t have an UTF-8 shapefile made by ArcGIS to test pyshapelib.

- To my understanding, to support UTF-8, ESRI does not use the Language Driver ID (LDID) specified in the header of DBF file, but rather an external CodePage file with the extension .cpg.&nbsp; This is where the biggest changes in dbflib will happen.

- Supporting that CodePage will probably alter the interface of DBFFile a bit, but this will mostly concern some read-only attributes that tell the used encoding.&nbsp; These probably won&#39;t be needed by Thuban anyway since Thuban will blindly use Unicode strings.

<br>- There&#39;s one point in the interface that might change and have an influence on Thuban and that&#39;s on the creation of a DBFFile where you will have to specify the encoding.<br></div></div><br>Bram<br clear="all">

<br>-- <br>hi, i&#39;m a signature viruz, plz set me as your signature and help me spread :)