Koobi Posted July 25, 2006 Share Posted July 25, 2006 This isn't really PHP specific so i thought i'd post here.I'm reading a CSV file into my script (and this CSV may well contain upto 3 million records with upto 53 fields per record) and I do a number of things to the data like mathematical calculations and insertions into a database.The database i'm using is Postgres.I'm wondering if it would be better for me to:A. read the entire CSV into an array (via fgetcsv) and INSERT it into a temporary table in the database and then select based on a LIMIT clause and process around 500 records at a timeB. read each CSV record, line by line (via fgets or fscanf) and process each line as i come accross it.there are other options such as running a system command. what i'm looking for is the fastest method for process these records.thanks for your time. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/ Share on other sites More sharing options...
ober Posted July 25, 2006 Share Posted July 25, 2006 I have no data to back this up, but I'd imagine that option A would be faster. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-63344 Share on other sites More sharing options...
effigy Posted July 25, 2006 Share Posted July 25, 2006 I think B would be faster and more efficient because you're going line by line, whereas loading the whole thing into an array immediately sucks up memory. Do you have the option to use Perl? Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-63392 Share on other sites More sharing options...
ober Posted July 25, 2006 Share Posted July 25, 2006 Ahh... but if he's dumping it into a table within the database, eating up memory isn't an issue. Plus, you have all those reads from the file (slow), instead of queries on the table (faster). I think you're putting too much weight on the fact that the array only exists for a short amount of time before the data is moved to the database for processing. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-63401 Share on other sites More sharing options...
Koobi Posted July 25, 2006 Author Share Posted July 25, 2006 the problem with method A is that it immediately takes up memory like effigy mentioned.i would have to change PHP's default memory allocation. Currenty it's 8 and 2MB. would 20 and 12MB be sufficient?the problem with method B is that i will be accessing the disk over 3 million times. thats a lot :/ Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-63405 Share on other sites More sharing options...
Koobi Posted July 25, 2006 Author Share Posted July 25, 2006 [quote author=effigy link=topic=101767.msg403174#msg403174 date=1153836381]Do you have the option to use Perl?[/quote]Not at the moment...but it's running on a dedicated server which i have full access to so i could install what i want i suppose.I don't think it would be a problem...any ideas, effigy? unfortunately, i don't know Perl but i'm willing to learn what i need to (for now) to get this running efficiently.I'm also considering using the shell because I'm guaranteed that this application will only be run on a Linux/UNIX, Apache, Postgres, PHP environment.i could bypass PHP entirely and use awk and parse the CSV and write them to a text file as SQL inserts and use CLI for Postgres to refer to the text file for the queries.from thereon, i could access a few hundred records at a time from the database and process them.that would be more efficient, i imagine, since it's the OS's shell that will be handling everything.i just thought about this...sounds like a good idea right? except that i know very little awk but it's syntax seems relatively simple but i have no idea how to work Postgres via CLI. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-63577 Share on other sites More sharing options...
effigy Posted July 25, 2006 Share Posted July 25, 2006 I know Perl has a reputation for being the champ at text processing. I asked some of the guys on www.perlmonks.net how they would accomplish this. One recommendation was to use [url=http://search.cpan.org/~jzucker/DBD-CSV-0.22/lib/DBD/CSV.pm]DBD::CSV[/url], while another mentioned that going line-by-line shouldn't be an issue due to input buffering by either readline or the OS itself; I know nothing about either.I think the shell would be nasty for this, but this could depend on the content and how much processing is needed.awk seems like a viable option; I know very little myself.Is the data using quotes as delimiters also, or just commas?You can always use Unix's [b]time[/b] command to see who gets the job done quicker. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-63585 Share on other sites More sharing options...
Koobi Posted July 25, 2006 Author Share Posted July 25, 2006 [quote author=effigy link=topic=101767.msg403374#msg403374 date=1153855496]I know Perl has a reputation for being the champ at text processing. I asked some of the guys on www.perlmonks.net how they would accomplish this. One recommendation was to use [url=http://search.cpan.org/~jzucker/DBD-CSV-0.22/lib/DBD/CSV.pm]DBD::CSV[/url], while another mentioned that going line-by-line shouldn't be an issue due to input buffering by either readline or the OS itself; I know nothing about either.I think the shell would be nasty for this, but this could depend on the content and how much processing is needed.awk seems like a viable option; I know very little myself.Is the data using quotes as delimiters also, or just commas?You can always use Unix's [b]time[/b] command to see who gets the job done quicker.[/quote]thanks for asking at perlmonks.net :)i will have a look at that option if i go for that idea, thanks.it's a comma delimited CSV.yeah, actually i should have used time to do my research instead of posting here but unfortunately, time is a big problem. we are almost at the deadline and there's still some work to be done and i didn't want to have to try a bunch of options so i thought i would ask here first.thanks so far guys. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-63592 Share on other sites More sharing options...
effigy Posted July 25, 2006 Share Posted July 25, 2006 Here's a basic example to start with. Simply run it with the csv file as the argument, e.g. [tt]./csv.pl in.csv[/tt].[code]#!/usr/local2/bin/perluse warnings;use strict;while (<>) { chomp; my @fields = split ',' => $_;}[/code]If the commas are surrounded by whitespace, replace ',' with /\s*,\s*/. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-63596 Share on other sites More sharing options...
Koobi Posted July 26, 2006 Author Share Posted July 26, 2006 [quote author=effigy link=topic=101767.msg403385#msg403385 date=1153856324]Here's a basic example to start with. Simply run it with the csv file as the argument, e.g. [tt]./csv.pl in.csv[/tt].[code]#!/usr/local2/bin/perluse warnings;use strict;while (<>) { chomp; my @fields = split ',' => $_;}[/code]If the commas are surrounded by whitespace, replace ',' with /\s*,\s*/.[/quote]thanks for that.so on the first line you have the shebang line, followed by the inclusion of libraries? and what's the condition in that while loop? and what is chomp? i dont understand the next line either. what does $_ do?can you direct me to the official perl manual?is this it: http://www.cs.cmu.edu/cgi-bin/perl-man ?wouldn't mind some beginner toutorials too, if you have any handy links Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-63808 Share on other sites More sharing options...
effigy Posted July 26, 2006 Share Posted July 26, 2006 One of the beauties of Perl are shortcuts, which make it fun/painful to learn :)[code]#!/usr/local2/bin/perl### The above line should point to your perl executable.### The following 2 lines are called "pragmata"--they### instruct the compiler what to do. In this case, they### instruct it to (1) report warnings; and (2) use### strict (good) programming practices. Always,### always use these 2 lines.use warnings;use strict;### The diamond operator (<>) has a lot of "magic" to### it. By default it reads from the filehandle ARGV,### which is either (1) a list of file names if they were### supplied, or (2) STDIN. This allows programs to### easily be included in pipelines.while (<>) { ### chomp removes the trailing new line character, ### if any, from the string. chomp; ### "my" is declaring a lexical (not global) variable; ### the "@" indicates that it is an array; ### and split is splitting the variable $_ by a comma. my @fields = split ',' => $_;}[/code][b]$_[/b] is Perl's default variable. In most cases it is used when a variable name is not specified. The [tt]while(<>)[/tt] construct is "magically" storing each line into $_, which chomp works off by default also.There should be some tutorials at www.perlmonks.net. Documentation can be found via the perldoc command, or @ http://perldoc.perl.org/, http://www.perl.com/pub/q/documentation, and http://www.perl.org/docs.html for starters.Let me know if you have any other questions. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-63816 Share on other sites More sharing options...
Koobi Posted July 26, 2006 Author Share Posted July 26, 2006 thanks for taking the time to explain :)are warnings in Perl similar to the ones in PHP? i.e., do they have similar error levels?so i could pass the csv file name to perl via terminal as an option and argv would grab it? or would i have to use ">" or "|" to direct the csv contents via cat?also, how is Perl with db connectivity? Postgres specifically. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-64002 Share on other sites More sharing options...
effigy Posted July 26, 2006 Share Posted July 26, 2006 Sort of. In older versions they were either on or off, but in newer versions the pragma can now be scoped and turned on or off based on the type; have a look at [url=http://perldoc.perl.org/perllexwarn.html]this[/url].Yes. Any arguments passed on the command line are in the @ARGV array which can be used like any other array. When the [tt]while (<>) { }[/tt] construct is used, it knows to automatically read each argument in @ARGV and treat it as a file. No redirection or cat-ing is needed.I haven't used Perl with any databases yet, but I know it can. I found [url=http://www.postgresql.org/files/documentation/books/aw_pgsql/node156.html]this[/url] example. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-64011 Share on other sites More sharing options...
Koobi Posted July 26, 2006 Author Share Posted July 26, 2006 ok great. i understand it so far.i have another questions with regards to argv.suppose i want to pass 3 csv files, they would be in argv[1], argv[2] and argv[3] right, assuming i used it like i have posted below?[code]$ perlExecutable -file=1.csv 2.csv 3.csv myScript.pl[/code]do you know of any issues Apache has with perl? I'd have to use mod_perl, right? Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-64028 Share on other sites More sharing options...
effigy Posted July 26, 2006 Share Posted July 26, 2006 I haven't used Perl with Apache yet either :) but yes, mod_perl is what I've heard of.[tt]$ perlExecutable -file=1.csv 2.csv 3.csv myScript.pl[/tt] would fail, unless I'm missing something. You'd really want [tt]$ myScript.pl 1.csv 2.csv 3.csv[/tt], or even [tt]$ myScript.pl *.csv[/tt]. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-64071 Share on other sites More sharing options...
effigy Posted July 28, 2006 Share Posted July 28, 2006 What's the status of this? I'm interested in your results. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-65358 Share on other sites More sharing options...
Koobi Posted July 30, 2006 Author Share Posted July 30, 2006 well i messed about with the basics of perl a bit. didn't really get much done because there are other aspects of the project that need to be completed as well.i was wondering, since i'm in a hurry and this needs to be done properly and securely, do you think i'd be able to outsource this module to you?i could give you the pseudo code as well as the PHP code. if you are interested, let me know, effigy.my contact details can be found in the [b]Staff Contact Info.[/b] sticky thread in the Admin Cafe. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-65918 Share on other sites More sharing options...
effigy Posted July 30, 2006 Share Posted July 30, 2006 I have e-mailed you. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-66006 Share on other sites More sharing options...
Koobi Posted July 30, 2006 Author Share Posted July 30, 2006 thanks, i just replied :) Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-66063 Share on other sites More sharing options...
Koobi Posted August 17, 2006 Author Share Posted August 17, 2006 any developments on this, effigy?if you've replied to my mail, i haven't recieved it yet. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-76270 Share on other sites More sharing options...
effigy Posted August 17, 2006 Share Posted August 17, 2006 Hey,I was waiting on the:[quote]I'll send you more details this week because I don't have all thefiles with me at this moment. I could send you the PHP we havecompleted for this process as well as the rough pseudo code plus asample CSV with about 10 records.[/quote]I should have sent you a follow-up e-mail, but I've been swamped.Overall, based on your requirements, it [i]sounds[/i] simple; it's the details and testing that need to be fine tuned. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-76330 Share on other sites More sharing options...
Koobi Posted August 17, 2006 Author Share Posted August 17, 2006 ah ok, i was waiting for your reply.i'll send you the necessary data this week.thanks. Quote Link to comment https://forums.phpfreaks.com/topic/15563-need-advice-access-times/#findComment-76391 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.