Jump to content

Multiple Instance of PHP - running php file in n times concurrently


Recommended Posts

I have a scenario in PHP, i need a smart way:

1. I run one process on background forever, scheduler.php - done

2. User on the web submit their task in DB to be processed by the above process(scheduler.php) - done

3. scheduler.php read the db after every 1 sec, looking for user's task.

 

Problem:

Ideally, if it founds like 10 or more tasks it should process them concurently(in parallel)..i.e it should not wait one task to be fully executed so as to run the next one as one task might take a very long time to be fullu executed.

I would do like exec("nohup php <path>/file_name.php >> /dev/null 2>&1 &"), but if i have alot of tasks this will create zombie processes.

 

Any smart way to do this will be highly appreciated!

Have you considered cron jobs instead of having a script looping indefinitely?  And I think what you're wanting to do with concurrent processes would be better executed with C# or a similar language.  I may be wrong though.

PHP doesn't have that ability as far as I know. It executes top-to-bottom (with the exception of function/class definitions).

 

Would your script have any way of determining longer tasks from shorter tasks?

You are looking for pcntl_fork. This is advanced material, here. If you do it correctly, it can work nicely. If you do it wrong, your host may ask you to leave.

 

You need to keep a count of fork'ed children, so you can limit how many run concurrently; collect them as they exit, to avoid zombies; and not overload the system. If there are no queued tasks, you can sleep, which I think does not use any resources. Or you can exit and have a cronjob scheduled every 5 minutes (or whatever) to start a new scheduler if there is not already one running. I personally prefer this method to avoid memory leaks.

 

thanks for reply @chrisdburns. Its web application, hosted some where, cant run C in there.. i have no trouble witht php that run forever coz i use sleep() in my do, while(true) if there is no task..if there is task it reads..thankx

thanks DavidAM, i think ths i sthe best way to go..i have also been adviced by collugue to use that php function.. let me dig..

i would appreciate if u wuld share ur implementation with pcntl_fork() to avoid been kicked by my web_host

Here is a sample of how to use pcntl_fork properly.  This is based on a script I wrote that launched multiple crawlers.  The MAX_PROCESSES define limits the number of concurrent processes.  It will launch up to that many processes and then waits for them to finish.  When one finishes it cleans it up then launches the next one.

 

It also checks for a .stoplauncher file which I can create to terminate the program if necessary.  You could just do a endless loop and kill it instead if you wanted.

 

<?php

if (file_exists('.stoplauncher')){
        die('.stoplauncher file exists.');
}

define('MAX_PROCESSES', 3);

$runningPIDs=array();
while (!file_exists('.stoplauncher')){
        while (count($runningPIDs) < MAX_PROCESSES){
                echo "Launching worker process.";
                if (($pid=pcntl_fork()) == 0){
                        RunProcess();
                }
                else {
                        if ($pid == -1){
                                die('Failed to launch child process');
                        }
                        else {
                                echo "PID=", $pid, "\r\n";
                                $runningPIDs[] = $pid;
                        }
                }
        }


        echo "Waiting for a child to finish.\r\n";
        cleanupChild($runningPIDs);
}

while (count($runningPIDs) > 0){
        echo "Waiting for a child to finish.\r\n";
        cleanupChild($runningPIDs);
}


function cleanupChild(&$runningPIDs){
        $pid = pcntl_wait($status, 0);
        if ($pid == -1){
                echo "Failed to wait\r\n";
        }
        else {
                echo "Cleaned up process ", $pid, "\r\n";
                for ($x=0,$l=count($runningPIDs); $x<$l; $x++){
                        if ($runningPIDs[$x]==$pid){
                                unset($runningPIDs[$x]);
                        }
                }

                $runningPIDs=array_values($runningPIDs);
        }
}

function RunProcess(/* You can pass in whatever parameters you need from the parent */){
//Do whatever you need to do here
//exit when you are finished.
        exit;
}

 

The example from kicken is pretty much the way I would do it. In your case, I think you would add your scheduler check here:

 

        while (count($runningPIDs) < MAX_PROCESSES){
/* Check the schedule table here and sleep if nothing to do 
    The checkSchedule() function should return the database ID of the
    task to be executed or FALSE if there are no tasks waiting */
  while (($IDtoRun = checkSchedule()) === false) sleep(##);

                echo "Launching worker process.";
                if (($pid=pcntl_fork()) == 0){
                        RunProcess($IDtoRun);
                }

 

You have to make sure the checkSchedule() marks the record as being sent to the scheduler somehow. Otherwise, you could launch several processes for the same ID before the first one gets to the point of marking it.

 

While the child process "differs from the parent process only in its PID and PPID", it seems to me that your database connection (and other resources) do NOT get properly replicated. In fact, it seems I had to close the connection before calling fork() and then re-open the connection in both the child and the parent.

 

When you fork(), the process is replicated and the new process starts running in the same spot. This is the reason for the if test in kicken's code. The fork() returns the child's PID to the parent. So the if(($pid=pcntl_fork()) == 0) statement will be TRUE in the CHILD and will be FALSE in the PARENT. I think this was the most confusing point when I first used fork() (in a C utility). -- The CHILD does NOT start at the beginning.

 

kicken's example also includes a check for a file to stop the process. This is a good idea, as just killing the task can leave things in a questionable state. You also need to consider, what would happen if multiple launchers are started. Most *nix services like this, check for a "pid" file at start-up. If this file already exists, the service will print an error message and exit. If it does not exist, the file is created and the PID is placed in it. When the service exits, it will remove the file. Since the file contains the PID (process ID) of the running service, if you need to kill it from the command line, you can look at the contents of this file to get the PID and issue your KILL.

 

You should consider using pcntl_signal to install a signal handler that will intercept the SIG_HUP and SIG_TERM and exit cleanly. Otherwise, the PID file will not be removed and child tasks will become zombies on exit.

 

I would implement SIG_HUP as a command to not start any new tasks. This allows all running children to exit normally and when there are no more children, the parent would exit normally. (This is how kicken's code behaves when it finds the stop file.

 

I would implement SIG_TERM to have the child tasks exit immediately but still have the parent wait and do the clean up. I believe that *nix servers will send SIG_HUP to all processes when a shutdown is issued (i.e. they are rebooting the server). Then a few seconds later a SIG_TERM will be sent to all processes that have not yet exited. Eventually, it sends a SIG_KILL, which you cannot intercept so you just die.

 

For your scheduled tasks, you need to consider a couple of flags in the database. One for "Started" which is set just before forking the process to run it. And one for "Completed" which is updated by the child process just before it exits. When the main process starts, it can update any tasks that are "Started" and NOT "Completed" as NOT Started, so they will be run again. This state will exist if the server dies during a process, or someone kills the child that is running it, or the child ab-ends for some reason.

 

I'm sure there were other lessons learned, but I think that covers the high points.

 

I believe that *nix servers will send SIG_HUP to all processes when a shutdown is issued (i.e. they are rebooting the server). Then a few seconds later a SIG_TERM will be sent to all processes that have not yet exited. Eventually, it sends a SIG_KILL, which you cannot intercept so you just die.

 

I mis-spoke.  :o The shutdown process does NOT send a SIG_HUP. It sends SIG_TERM to all running processes. Then it will kill any that don't exit. I would still implement the two signals the way I described in my previous post. I just wanted to correct my mistake for the record. (It doesn't count as a real mistake if you catch it yourself, right?  ;)

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.