Jump to content

Loop through each line of a text file, add code, and throttle


Recommended Posts

Here's what I'm trying to do, and I am having trouble getting started with this.  It's a very simple process, but I didn't want to spend the next 6 hours in frustration, so some help getting started would be great.

 

Here's the purpose of the script:

1. Allow user to add a text file to a form.

2. Take the text file, add HTML code to the beginning and end of each paragraph (a single line of text, usually paragraphs would be separated by a line return)

3. Send the user an email with the HTML file attached and thank them or whatever.

4. Allow the system to throttle itself (one-at-a-time) so that many people using the site won't bog it down.  These files will probably be anywhere from 100 KB to 1,000 KB in size, usually hitting in the 300-500KB range.

 

Here's what I can do very easily:

1. Allow user to add a text file - very simple and straightforward.

2. Take the text file, add HTML... - this is what I need a little help figuring out.  Each paragraph needs to have <p> at the beginning and </p> at the end, and the script will also search for keywords on certain lines (section headers) and add a <align="center"> tag to that, and so forth.  I can handle the formatting rules, but making sure the loop runs correctly could be a problem.

3. Send the user an email... - very easy, I can do that myself.

4. Allow the system to throttle itself... - this could be tricky.  I was thinking a database with a TINYINT field, 0 for not processed yet, 1 for processing, 2 for processed.  Cron job checks the next one on the list to see if it needs to send it to the processor, if the file is already being processed, or can be sent to a different database (completed entries) and removed from the current queue.  The cron job would also be responsible for triggering the "Your file is converted!" email and the attachment.

 

Any/all help would be greatly appreciated on this.  I am going to work on the parts that I can do myself, and I'll be checking back for the discussion - in between Mountain Dew runs.

Well, I have no idea about the throttle issue. That would be a server configuration issue I would assume.

 

As for parsing the text file and adding html elements to it, I would look at using the explode() command (http://php.net/manual/en/function.explode.php). This is a powerful tool and should at least get you started. Look at the samples in the PHP manual, good info there.

 

I hope this gets you started. Others will no doubt have other ideas for you.

I was afraid you were going to say explode.  I will start working with the explode option to see how terrible it is at working with files of this size.  I suppose I can explode by using the line returns as the delimiter.

See fopen(), fgets(), fwrite(), and fclose(). It looks like fgets() will be better than fread() here.

 

Something along these lines

$fIn = fopen(filename, 'r');
$fOut = fopen(newFilename, 'w');
while ($line = fgets($fIn)) {
  $line = '<p>' . str_replace(array("\r","\n"), "", $line) . "</p>\n" ;
  fwrite($fOut, $line);
}
fclose($fIn);
fclose($fOut);

Thank you very much, David.  That looks like it will do the trick.

 

Also, does anyone think that processing files as large as 2 MB would kill the server?  It would only be doing one at a time, of course.  Should I chop the files up into chunks or throttle it some other way?

Thank you very much, David.  That looks like it will do the trick.

 

Also, does anyone think that processing files as large as 2 MB would kill the server?  It would only be doing one at a time, of course.  Should I chop the files up into chunks or throttle it some other way?

 

Using David's proposal you won't even notice that their is a script running so to speak, in contrast to using explode and file which would pull the entire file contents into memory.

Thank you very much, David.  That looks like it will do the trick.

 

Also, does anyone think that processing files as large as 2 MB would kill the server?  It would only be doing one at a time, of course.  Should I chop the files up into chunks or throttle it some other way?

 

Using David's proposal you won't even notice that their is a script running so to speak, in contrast to using explode and file which would pull the entire file contents into memory.

 

Yes, I tried it with a very large file on the server, about 5 MB in size.  It ran and output the new file in about 12 seconds, and that's with latency time.  (From my browser showing the "finished" page and looking into the FTP folder.)  Hooking the processor to a Cron and a database will be a piece of cake.

 

I am thinking of running the Cron job every 30 seconds to account for extremely large files being processed, but also to make sure that it doesn't make the people waiting for their conversions wait for very long.  On the other hand, I could run the Cron for, say, every 10 minutes or so and process more at that time.  I could also process the file at execution if you think that running this script at 50-100 times at a single instance (processing up to 100 MB at a time) would be too much strain for your average server.  I know Cron jobs can stress the server, but I'd rather the system be self-sufficient like that so that, on the off-chance that 1000 people submit their 1 MB within the same 5 seconds, the server won't have problems.

 

Of course, the service isn't popular yet - it's a start-up - so the system won't have huge numbers of submissions right off the bat (unless I get very lucky).  Any thoughts?

 

Very nice, I appreciate that.  This site always has such helpful people, and the people never cease to impress.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.