[storm] This is a noob question: storing lists
chandramouli s
naruvimama at gmail.com
Fri May 8 11:35:53 BST 2009
Hi Gerdus,
Oh thanks, I din't really think of the problem of quering using the
'column names'. Flatfiles would do a good job if the java programmers
don't mind writing some code every time they get a new thought. I got
introduced to ORM from Active records (and the automagic in it) and It
was just my fixed idea of using Sqlite instead of a flatfile.
In my quest for automagic I quite forgot the difficulties on the java
end ;-) yes but once a dummy schema is in place using loops would be a
great way to work with it. What each column stands for is not
significant during the processing as long as we are finally able to
link the columns to the origin. Also my data takes up significant
space few 100 MBs so to let sqlite worry about caching would be a
significant gain in coding speed.
I would definitely look at 'makotemplates' but if it needs significant
learning time then may be someother time
Thankyou
Chandramouli
On Fri, May 8, 2009 at 9:30 AM, Gerdus van Zyl <gerdusvanzyl at gmail.com> wrote:
> Well for generating a schema automagically I normally just use Mako
> templates (http://www.makotemplates.org/) to generate the python code.
> Then you can use for loops, etc.
>
> And again querying a 400 column table is going to be difficult. What
> kind of values are the 400 data points? If they are 400 different
> measurements of different things then the 400 columns are unavoidable,
> otherwise not. Maybe go ask one of the Java users how they would
> prefer the data.
>
> ~G
>
> On Fri, May 8, 2009 at 11:13 AM, chandramouli s <naruvimama at gmail.com> wrote:
>> Yes you are right a relational db is an overkill, I am working on
>> scientific data and am more or less playing around with the data ( so
>> no fixed schema). The right way to do this would be to use pytables
>> (much better than flatfiles, and super fast and supports large amount
>> of data) but there always remains the ease at which java people could
>> browse and query my data and Sqlite seems to be the best option. But
>> does anyone know of a way I could generate schema on the fly (like the
>> A,B,C... column names in excel). I do not require relational features
>> (no update, no deletes) just quick querys to find abnormalities in the
>> data and also the redundancy in the main part(id, name) is
>> insignificant compared to the 400 or so fields.
>>
>> 1 John 400 ...features
>> 2 mary 400 ... features
>> 3 greg 400 ... features
>>
>> Infact even the names are insignificant I could replace it by a key,
>> just to group all the features belonging to one person.
>> BTW I am making it for a machine learning project ...
>>
>> Thank you
>> Chandramouli
>>
>> On Fri, May 8, 2009 at 6:52 AM, Gerdus van Zyl <gerdusvanzyl at gmail.com> wrote:
>>> Well if you want to share the db with java people python pickling is
>>> out of the question and a list would be problematic since it's not a
>>> native sqlite data type. I would suggest the two tables approach as
>>> suggested:
>>> main->pri_id,name
>>> marks->pri_id,mark (400 of these per main)
>>>
>>> Also maybe a relational database in this case might be overkill but
>>> that depends on if you want to query the data(no query=no database)
>>> and how large the data will be(large=db), if it will be updated(db
>>> easier), etc.
>>>
>>> ~G
>
More information about the storm
mailing list