Jump to content
fivestringsurf

Lengthy Processing & Informing Users

Recommended Posts

I built a "bulk importer" that takes a .zip file filled with images and a corresponding csv file that holds attributes.

I'm happily using some JavaScript to provide upload-progress feedback to the user.  So if the .zip file is say 10mb... they are seeing it's upload progress. (im using AJAX)

This is all working nicely BUT...

 

Once the .zip hits the server I need to do A TON of processing.  Each image has to be converted into 10 different sizes, cropped, etc...

All entries must be entered into the Database and admin text logs created.

All of this actually works just fine for small files <10mb and I'm sure it could work with bigger files by increasing timeout time,etc...

 

BUT the browser "locks up" during processing and there is no real way to inform the user about the progress of their files being processed.

 

I thought maybe I could be clever and create a "progress table" in the db... and use it like this:

  1. As soon as the .zip file is uploaded to the server I create a row and an id.
  2. Next I send that id back to the browser (AJAX) and immediately start the laborious processing.  The processing would continually update the DB with it's progress.
  3. The js would receive the id and keep polling the DB to check on the processing progress and ultimately report this back to the user.

 

Well my brilliant scheme doesn't seem to work and everything locks up regardless.  I think I was trying to fake multi-threading and I'm not sure how to solve this problem.  

 

My end goal is to crunch huge files and keep the user notified of it's progress - Does anyone have good advice?

Share this post


Link to post
Share on other sites

You need to separate the processing from the browser request. Your progress table is one way to do it, but you need to take it a step further.

 

Your browser request needs to handle uploading the zip and CSV files and stashing them away somewhere. You then create an entry into the progress table that indicates where the files were stashed and that they are in a "need to be processed" state. After that your script is done and gives the user their response. The response could be a page with some Javascript that periodically polls the server for the status of the job by checking back in with the progress table.

 

You'd then have another separate script to do that actual processing of the files. The easiest way to do this is to set it up as a cron job and have it run every minute or so. Each time it runs it will check the progress table for work to do and if any is found do it. As it progresses through the job it can provide feedback by updating the progress table.

 

Another more complicated way is to run some background tasks with the help of a job/message server such as Gearman, beanstalkd, or redis. In such a setup you'd have worker scripts connect to the server and wait for new tasks. Your upload script would then submit a task to the server after an upload is complete. You'd still use the processing table to handle sharing of status and other details. The advantage of this type of setup is you can kick off processing immediately rather than having to wait until the next cron tick.

Edited by kicken

Share this post


Link to post
Share on other sites

If you don't need compatibility with Internet Explorer, you can use Server-sent Events.

 

Then you can take the long-running processing script you already have and simply make it send regular updates to the client (e. g. the percentage of completion).

Share this post


Link to post
Share on other sites

@fivestringsurf: How long does this process take? I can't imagine why an AJAX process to report on the status would lock up the browser. I'm guessing you are unnecessarily passing extraneous data back and forth and the browser is failing at some limit. Or, perhaps this part of the process "The js would receive the id and keep polling the DB to check" is to blame. What does the query look like that is poling the database and how often are you poling? That query could be a resource hog and causing the system to hang. You should make sure you have relevant indexes in your tables and are using efficinet queries (e.g. use a COUNT() in the SELECT statement as opposed to querying all of the rows and then calculating how many rows were returned)

 

Kicken's and Jacques1's suggestions are viable alternatives, but I see no reason why your current approach would not work - if implemented appropriately.

Share this post


Link to post
Share on other sites

@kicken,  I think the only part I was missing is the cron job, because what you described is precisely what I built.    running cron every minute? would that be intensive on the server?  or is this a routine kind of normalcy one can expect?

 

@Jaques1, server-events?  hmmm that seems enticing.  but would php be able to echo out progress (ie: json ) while in the middle of processing?  I thought once php is processing nothing can be echoed out until it's complete?  Please clarify if I'm wrong because that could be a game-changer indeed.  An exception of course would be monitoring the file upload progress.

 

@Psycho  - I incorrectly described the situation, my fault.  The browser isn't locking up of course as it's an asynchronous call.  What is happening is the return response is hanging up until all the processing is completed.  Even if I do this:

$uploadfiles();
echo 'success, hash=123';
$processImages();

Even though the echo is before the processing directive...it never get's sent until the entire script is completed.   So I believe I have to separate the workflow into  2 scripts called separately. 

 

Share this post


Link to post
Share on other sites

@Jaques1, server-events?  hmmm that seems enticing.  but would php be able to echo out progress (ie: json ) while in the middle of processing?  I thought once php is processing nothing can be echoed out until it's complete?  Please clarify if I'm wrong because that could be a game-changer indeed.

 

The response doesn't have to be one fixed block of data like in classical web applications. It can also be a stream, which allows the server to send updates to the client without having to wait for the next request.

 

Compared to polling, this is simpler, quicker (there are no unnecessary delays) and more efficient in terms of traffic.

 

Client-side JavaScript code

var evtSource = new EventSource("process.php");
evtSource.addEventListener("progress", function (event) {
    var eventData = JSON.parse(event.data);
    console.log("Processing: " + eventData.percentage + "%");
});

Server-side PHP code

<?php

header("Content-Type: text/event-stream");



function send_event($event, $data)
{
    if (!preg_match('/\\A\\w+\\z/', $event))
    {
        throw new InvalidArgumentException('Invalid event name: '.$event);
    }

    echo
        "event: $event\n",
        "data:".json_encode($data)."\n\n";
    flush();
}

for ($i = 0; $i <= 100; $i++)
{
    send_event('progress', ['percentage' => $i]);
    sleep(1);
}
Edited by Jacques1

Share this post


Link to post
Share on other sites

running cron every minute? would that be intensive on the server?  or is this a routine kind of normalcy one can expect?

It's not ideal, but it's not that uncommon of a setup since cron is the best option generally in a shared hosting environment and it's relatively easy. If you have control over the server and want to put in the extra effort there are more efficient ways like I mentioned.

 

However wasteful it might seem though, If there is no work to be done then your cron job wouldn't not take much processing power or resources. Just make sure to code it in such a way that the script does very little until it's determined if work is to be done. That way it can quickly start, check, and exit when there is nothing to do. You can also reduce the frequency but that'd reduce how quickly it notices new work.

 

 

I thought once php is processing nothing can be echoed out until it's complete?  Please clarify if I'm wrong because that could be a game-changer indeed.

Being able to stream a response has been a possibility for a long time, but in the past it's been unreliable due to potential buffering beyond your control. I'm not sure how well it works these days. The event source api is something I've not heard of before.

 

Years ago I create a progress bar type thing by streaming out a nearly full page then slowly outputting some javascript to update the display. For example:

... nearly complete page ...
<?php 
for ($i=0; $i<100; $i++){    
   echo '<script type="text/javascript">updateProgress(' . ($i/100) .');</script>'.PHP_EOL;
   sleep(1);
}
From what I recall it worked in some browsers but others would not process the scripts as they came in. Since it was just a personal thing I didn't care about compatibility, it worked fine for my general use.

 

 

I think the key thing however though is that you need to separate the processing into a background task. Whether you use polling or the newer events api for reporting the progress doesn't matter. If you try and do the processing at the same time as the upload the request won't end until the processing is complete meaning the user's browser will sit there acting like it's still loading the page the entire time. The exception would be if you're using PHP-FPM, you could use fastcgi_finish_request to end the request but let the script keep working. Separating it out as a background task also means your not tying up web server threads with work which would result in reduced ability to handle requests.

Share this post


Link to post
Share on other sites

@Jaques1  I set up a test environment and ran your code.  Interesting idea but here's what happens:  It works (kinda) but it throws all results back at once.

 

For instance after loading the page there is no response from server and then after 100 seconds it all shows up in the console.  Then after 100 seconds it does the same thing again.  I can confirm that this is the output/behavior in both ff and chrome Not sure if this is a limitation of my server environment.  I'm running php7 on OSX (my local testing rig) 

Share this post


Link to post
Share on other sites

@kicken  so I tried some code with   fastcgi_finish_request() and unfortunately I got this:

Fatal error: Uncaught Error: Call to undefined function fastcgi_finish_request()

So I'm sure it's some apache mod I'm missing. I looked into it and I think getting that going is above my pay grade...it looks complicated and the more I read I discovered that there can be issues with logging.   hmmm
It's late but I think what I might try tomorrow is a 3 prong approach to keep all 3 phases of the script separate.  Here's what I'm thinking:

 

  1. Upload file and report back progress  (using ajax or this EventSource thing)
  2. Once complete, send a second call to server to start the processing and don't bother listening for a returning msg
  3. Now start polling the server to "listen" on it's progress  (the processing will update DB on progress)

It's what I have in my head anyway... I'll try it tomorrow.

Share this post


Link to post
Share on other sites

If you see all output at once, then your PHP or your webserver perform buffering. Turn it off (or flush the buffer), and you'll get the output immediately. Put ob_flush() before the flush() in case PHP is holding the output back. Then look up the buffering directive for your specific Apache/PHP module if necessary.

 

The workflow is wrong, though:

  • You do a plain upload (no magic here) and get back an ID.
  • Then you either pass the ID to the processing script and get the progress with server-sent events. Or you do your polling thing where the file is processed by a cronjob which stores the status in a database, and the client keeps asking for the current progress.
Edited by Jacques1

Share this post


Link to post
Share on other sites

So I did get it working thanks to all the helpful minds here in this forum. @Jacques1 ob_flush() was key!
It was really difficult to wrap my mind around the solution because eventSource wasn't as easy to work with as ajax.  eventSource expects a very specific (sort of bizarre) returning structure and if even one line ending is off it doesn't work.
 
I also couldn't grasp how to upload the file and then listen for it because you can't send files with eventSource so I couldn't get eventSource to listen in on the file upload progress. But that wasn't the biggest deal...I just used my normal ajax-style upload function with the XMLHttpRequest.progressHandler thingee to do the work.
 
Here's what I did:

  • Upload the file to import.php using ajax and display the progress (pretty straight forward stuff)
  • As soon as the file is done uploading to import.php I log it to the database and generate a hash
  • I send back the hash with the returning json back to the ajax script that started it all
  • I immediately call eventSource to start listening in on a separate script that lives at import_progress.php  (I used ?url_hash=123abc in url to pass the hash)  I don't think eventSource is meant to pass vars... I was trying to be clever
  • import_progress.php checks the db based on the hash and starts processing.  
  • Each time the processing gets through a loop  it increments an (int)progress field forward in the Database and immediately echos the progress out followed by ob_flush(); flush();
  • Meanwhile back on the client side we're listening to the echos and manipulating a progress bar

Maybe it's just me but I really felt like I stretched the technologies, PHP in particular to the limit here in forcing it to behave in a way it was never designed.  Passing the $_GET variable in step 4 felt a bit janky but I didn't know any other way to do it.  Once eventSource is called it has no knowledge of what has been uploaded so this was the only way I found to do it and it can't monitor the ajax upload as far as I know.

 

EventSource is kind of dangerous, it keeps calling the script.  One time I wasn't paying attention and images kept on getting created...I can only imagine if I decided to go to bed and not fix that or at least close the browser - yikes.  

 

I'm going to have to go through my image processing classes and craft some very clever fail safes so EventSource doesn't get hung up.  Maybe I can even time it out on the client side if no progress is being made after a certain time period...  We'll see.   I've won this battle but there's much to do.

Share this post


Link to post
Share on other sites

 Passing the $_GET variable in step 4 felt a bit janky but I didn't know any other way to do it.  

The url parameter to the event source can be any url. Passing a get parameter is fine.

 

 

EventSource is kind of dangerous, it keeps calling the script.  One time I wasn't paying attention and images kept on getting created...I can only imagine if I decided to go to bed and not fix that or at least close the browser - yikes.

Your processing script needs to guard against being executed multiple times, especially if it's web-accessible. You can do this by checking/setting a flag in the database before processing begins.

 

You'll also need to guard against user disconnects if your script is going to be sending data directly to the browser. Use ignore_user_abort to let the script keep going even if the browser disconnects.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.