Jump to content

Multi-Threaded PHP Script With Very High Usage


JustinK101

Recommended Posts

So basically we wrote a php script which is run every 1 minute via cron. First thing the script does is makes sure it is already not running, if it is, it kills itself off.

 

The script selects all records at the time the script was originally executed out of a queue table from a database, then does some spacial and mathematical work with each row, updates each row of the database with new results, and finally deletes the row out of the queue. It loops through until it finishes all rows from the original select, then exits and waits until cron fires it again.

 

Our script can process about 10,000 rows an hour.

 

The problem is rows are inserted into the queue table at a pace of about 15,000 rows an hour and growing quickly. So basically the script cannot keep up. The server which is running the PHP script is dual core system, but looking at the resource manager when the script is running only one cpu is being utilized close to 100%, the other hovers around 0 - 15% utilization.

 

Ram is not an issue, mysqld is using about 300mb, php about 20mb, with 2gigs free.

 

Also, the disks are very fast, 15,000rpm SAS in Raid.

 

So basically the problem is purely CPU bound, and a numbers game. I was thinking of simply running two of these scripts, one which select even rows from the queue, the other takes odd rows, and firing them off both from cron.

 

But, even better, can I have PHP manage this? In the script itself, can PHP spawn off another thread(s) and do a block of work to utilize the other idle cpu?

 

Thanks.

Link to comment
Share on other sites

Have you tried reducing the number of times that the script is run?

 

Setup for the script takes some time (database connection and query for example), and perhaps eliminating that overhead every 60 seconds will enable the server to devote more cycles to actually processing the data.

 

Currently, every 60 seconds the script does...

 

check if self is running, if so, kill self
connect to db
query db
loop through 250 rows (15000 rows per hour / 60 minutes) doing functions
exit

 

Perhaps changing that to 7500 rows every 30 minutes will help.

 

Try to run the script once every 15 - 30 minutes (on a test data set, not live) and see if it makes a difference.

 

Also, what "calculations" are you doing on the row(s) that it's taking longer than 60 seconds for ~ 250 rows?

Link to comment
Share on other sites

Looking at: pcntl_fork() might do the trick but I don't have a clue how to use it.

 

Here is basically the code I am doing, if anybody could help and use pcntl_fork() with the below code to spawn multiple processes it would be great.

 

NOTE: This code is after it has already checked to make sure itself is not already running, and setup the database connection.

 

//Get all records from queue, oldest first
$sql = "SELECT id, latitude, longitude, date_inserted FROM queue ORDER BY date_inserted ASC";
$result = mysql_query($sql) or die(mysql_error());

while($row = mysql_fetch_array($result)) {
    //specialMath() is a custom function which does a whole bunch of work, including connecting to the 20gb database an doing work. This is the biggest bottleneck right here. 
    $address_object = spacialMath($row[1], $row[2]);

    $sql = "UPDATE spacial_data SET address = '" . $address_object->address . "', polygon_min = " . $address_object->polygon_min . ", polygon_max = " . $address_object->polygon_max . " WHERE id = " . $row[0];
    $result = mysql_query($sql) or die(mysql_error());

    $sql = "DELETE FROM queue WHERE id = " . $row[0];
    $result = mysql_query($sql) or die(mysql_error());
}

Link to comment
Share on other sites

hitman6003:

 

"Setup for the script takes some time (database connection and query for example), and perhaps eliminating that overhead every 60 seconds will enable the server to devote more cycles to actually processing the data."

 

There inst much overhead because the first thing the script does is makes sure there is not an instance of itself running, before it even sets up the database connection and begins doing work. Sure there is a little overhead, but not a big CPU hit.

Link to comment
Share on other sites

You know what guys, I may have left out a VERY important detail. It inst the PHP process that is using the one CPU 100% utilization but in fact mysqld, which is of course called from PHP though. So basically if there is a way to multithread MySQL queries from PHP that would work as well.

Link to comment
Share on other sites

MySQL is a multi-threaded application...so it will automatically spawn a thread for each connection.

 

Have you actually timed the different parts to determine where the bottle necks are?  I saw you noted one in your code, but have to used microtime to actually time each of the queries and the "specialMath" function?

 

Also, buffer the deletes so that you can do only one query for all of them...

 

<?php
//Get all records from queue, oldest first
$start = microtime(true);
$sql = "SELECT id, latitude, longitude, date_inserted FROM queue ORDER BY date_inserted ASC";
$result = mysql_query($sql) or die(mysql_error());

echo "The initial query took " . (microtime(true) - $start) . " seconds\n";
echo "There are " . mysql_num_rows($result) . " rows in the result set\n";

$delete_ids = array();

while ($row = mysql_fetch_array($result)) {
   $start = microtime(true);
   $address_object = spacialMath($row[1], $row[2]);
echo "It took " . (microtime(true) - $start) . " seconds for spacialMath on row with id " . $row[0] . "\n";
   
   $start = microtime(true);
   $sql = "UPDATE spacial_data SET address = '" . $address_object->address . "', polygon_min = " . $address_object->polygon_min . ", polygon_max = " . $address_object->polygon_max . " WHERE id = " . $row[0];
   // use an unbuffered query here...no need to return a result
   mysql_unbuffered_query($sql) or die(mysql_error());
echo "It took " . (microtime(true) - $start) . " seconds to update the row\n";
   
$delete_ids[] = $row[0];

}

$sql = "DELETE FROM queue WHERE id IN(" . implode(", ", $delete_ids) . ")";
mysql_unbuffered_query($sql) or die(mysql_error());

Link to comment
Share on other sites

"MySQL is a multi-threaded application...so it will automatically spawn a thread for each connection."

 

Yeah but the problem is that I need the php application to spawn the multiple queries in parallel. I would love to do the following:

 

1.) Select everything out of queue

2.) Split the number of rows in half, assign half the rows to one process, the other half of the rows to another process.

 

Each process does exactly the same work, but just in parallel.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.