Jump to content

PostgreSQL or tab delimited file?


ultrus

Recommended Posts

Hello,

I'm trying to figure out the best way to deal with lots of data in a portable way. I use a temporary test server on Amazon's EC2 service, and transfer all the bulk data needed to the server when it starts up. This may not be the best way of doing things but it's a start. I'm also open to other ideas. Maybe I should be using their data storage service, but I don't like being nickled and dimed to death on this personal project with data transfer fees.

 

I calculated and created a set of lookup tables that show the distance from 5500 locations to 5500 locations in MySQL. The dump file though was about 1GB! I'm sure its smaller zipped, but I don't want to mess with that right now.

 

My second thought was to create a tab delimited file for each location, listing each location id and distance. At about 37kb or so per file, it totaled to 400MB. Not bad! The down side though is I would have to load and parse a file every time I needed to look up a distance. Is that all that bad?

 

A third option would be to use PostgreSQL. I know that all the data is stored in cool portable files. Cool! I'd create a file for each location for lookup speed. I'm guessing that each file will be bigger than the tab delimited one. However would the lookup speed be faster that parsing at csv file into an array and looking it up that way?

 

My secondary reason for doing all this is that I don't want mysql to be a requirement.

 

What are your thoughts and ideas? Thanks much for the feedback! :)

Link to comment
Share on other sites

I'm a bit confused by your post.  What do you mean by "portable" ?

 

Both postgres and mysql allow you to index your data.  That means you can ask for particular locations and the database can give you the answer almost instantly.  Tab delimited csv files do not allow this, although you already have a form of indexing by splitting them into 37kb files.  That may well be good enough for what you need.

 

I think postgres and mysql are much the same when it comes to what you need to do here.  There's no significant difference I can think of.  It's better though that you have mysql as a requirement than postgres, as mysql is more common.  Better still if you use the csv files and have no database required, if you're thinking along those lines

Link to comment
Share on other sites

LOL :)

 

Anyway, if the amount of data is getting out of hand you might want to try a spatial database like postgres with postGIS.  I haven't got any experience with those, but the idea is that you just store the spatial location of each place, and the database will calculate the distance for you.  Then you only have 5500 data items instead of 5500 x 5500.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.