csv-import (was:Re: [Thuban-devel] Re: Lots of interesting code in precision farming application)

Mon Nov 29 11:44:32 CET 2004

Dear Jan and Bernhard,

On Sat, November 27, 2004 13:29, Jan-Oliver Wagner said:
> On Sat, Nov 27, 2004 at 10:40:03AM +0100, Moritz Lennert wrote:
>>
>> If noone else (with more experience) is planning on doing so, I am willing to
>> try to extract the csv-import functions as a seperate extension.
>
> that'll be great. Note that this involves also some work on
> MemoryShapestores in case you want to load the data directly into Thuban.
> There is an alternaive of indirectly loading it by first producing s
shapefile formatted file. See gns2shp for an example.
>

There is a form of MemoryShapestores in Ole's code if I am not mistaken. I
think this is the way to go. Having to create new shapefiles is not very
elegant in my opinion. Thuban should be able to load the new data directly.

But I guess we should agree on one implementation of a MemoryShapestore. This
is up to you developers to decide, as I don't know/understand the code enough
to decide this.

On Sat, November 27, 2004 17:58, Bernhard Herzog said:
> "Moritz Lennert" <mlennert at club.worldonline.be> writes:
>
>> If noone else (with more experience) is planning on doing so, I am willing to
>> try to extract the csv-import functions as a seperate extension.
>
> If we add csv import, we should look at the various csv parsers already
written for Python.  Apart from the one in in Ole's code, there are a few
others:
>
>  - cvs.py in the standard library Python 2.3
>    combination of C and Python
>
>  - http://python-dsv.sourceforge.net/
>
>    pure python, even has wxPython based import wizard
>
>  - http://www.object-craft.com.au/projects/csv/
>    written in C
>
> There are probably others.  I haven't really looked at any of them.  I'd
probably prefer the one from the python standard library because that's the
one most likely to be maintained in the future.  TO avoid requiring python
2.3 we could probably ship a copy with Thuban.

I agree with you that cvs.py looks like the best solution. However, this would
probably mean writing most of the extension from scratch and not taking much
of Ole's code, or ?

>
>> If I have seen correctly this entails mostly removing the hard-coded Pirol
csv format in order to allow more generic files.
>
> The ones mentioned are probably more flexible than the PIROL one, though I
haven't really look at that one either yet.

Yes, I agree. I'll look into them a bit deeper in order to see how to approach
the entire question.

For now, I have just extracted the code and modules necessary in order to have
a very roughly functioning csv-import. You can download it at

http://moritz.homelinux.org/thuban/csvimport.tgz

I changed the code so that the csv file does not have to contain units anymore
(in my eyes this should be handled via the projection properties). At this
stage I have just included all the modules that are necessary because of
series of dependencies. I am sure that much of the code is not necessary for
the csv import.

I also think that ideally the user should not be bothered with having to tell
the module which types the different data elements are in. I will see if some
sort of automatic recognition cannot just be done with python's type()
function.

In the current form of the extension, the csv file must have the following
format:

line 1: column headers
line2: column variable types (one of 'int', 'double' or 'string')
lines3-EOF: data

The separator can be tabulation, comma, semicolon and space.

Moritz