JustinK101 Posted June 25, 2008 Share Posted June 25, 2008 So basically we wrote a php script which is run every 1 minute via cron. First thing the script does is makes sure it is already not running, if it is, it kills itself off. The script selects all records at the time the script was originally executed out of a queue table from a database, then does some spacial and mathematical work with each row, updates each row of the database with new results, and finally deletes the row out of the queue. It loops through until it finishes all rows from the original select, then exits and waits until cron fires it again. Our script can process about 10,000 rows an hour. The problem is rows are inserted into the queue table at a pace of about 15,000 rows an hour and growing quickly. So basically the script cannot keep up. The server which is running the PHP script is dual core system, but looking at the resource manager when the script is running only one cpu is being utilized close to 100%, the other hovers around 0 - 15% utilization. Ram is not an issue, mysqld is using about 300mb, php about 20mb, with 2gigs free. Also, the disks are very fast, 15,000rpm SAS in Raid. So basically the problem is purely CPU bound, and a numbers game. I was thinking of simply running two of these scripts, one which select even rows from the queue, the other takes odd rows, and firing them off both from cron. But, even better, can I have PHP manage this? In the script itself, can PHP spawn off another thread(s) and do a block of work to utilize the other idle cpu? Thanks. Quote Link to comment Share on other sites More sharing options...
hitman6003 Posted June 25, 2008 Share Posted June 25, 2008 Have you tried reducing the number of times that the script is run? Setup for the script takes some time (database connection and query for example), and perhaps eliminating that overhead every 60 seconds will enable the server to devote more cycles to actually processing the data. Currently, every 60 seconds the script does... check if self is running, if so, kill self connect to db query db loop through 250 rows (15000 rows per hour / 60 minutes) doing functions exit Perhaps changing that to 7500 rows every 30 minutes will help. Try to run the script once every 15 - 30 minutes (on a test data set, not live) and see if it makes a difference. Also, what "calculations" are you doing on the row(s) that it's taking longer than 60 seconds for ~ 250 rows? Quote Link to comment Share on other sites More sharing options...
JustinK101 Posted June 25, 2008 Author Share Posted June 25, 2008 Looking at: pcntl_fork() might do the trick but I don't have a clue how to use it. Here is basically the code I am doing, if anybody could help and use pcntl_fork() with the below code to spawn multiple processes it would be great. NOTE: This code is after it has already checked to make sure itself is not already running, and setup the database connection. //Get all records from queue, oldest first $sql = "SELECT id, latitude, longitude, date_inserted FROM queue ORDER BY date_inserted ASC"; $result = mysql_query($sql) or die(mysql_error()); while($row = mysql_fetch_array($result)) { //specialMath() is a custom function which does a whole bunch of work, including connecting to the 20gb database an doing work. This is the biggest bottleneck right here. $address_object = spacialMath($row[1], $row[2]); $sql = "UPDATE spacial_data SET address = '" . $address_object->address . "', polygon_min = " . $address_object->polygon_min . ", polygon_max = " . $address_object->polygon_max . " WHERE id = " . $row[0]; $result = mysql_query($sql) or die(mysql_error()); $sql = "DELETE FROM queue WHERE id = " . $row[0]; $result = mysql_query($sql) or die(mysql_error()); } Quote Link to comment Share on other sites More sharing options...
JustinK101 Posted June 25, 2008 Author Share Posted June 25, 2008 hitman6003: "Setup for the script takes some time (database connection and query for example), and perhaps eliminating that overhead every 60 seconds will enable the server to devote more cycles to actually processing the data." There inst much overhead because the first thing the script does is makes sure there is not an instance of itself running, before it even sets up the database connection and begins doing work. Sure there is a little overhead, but not a big CPU hit. Quote Link to comment Share on other sites More sharing options...
JustinK101 Posted June 25, 2008 Author Share Posted June 25, 2008 You know what guys, I may have left out a VERY important detail. It inst the PHP process that is using the one CPU 100% utilization but in fact mysqld, which is of course called from PHP though. So basically if there is a way to multithread MySQL queries from PHP that would work as well. Quote Link to comment Share on other sites More sharing options...
hitman6003 Posted June 25, 2008 Share Posted June 25, 2008 MySQL is a multi-threaded application...so it will automatically spawn a thread for each connection. Have you actually timed the different parts to determine where the bottle necks are? I saw you noted one in your code, but have to used microtime to actually time each of the queries and the "specialMath" function? Also, buffer the deletes so that you can do only one query for all of them... <?php //Get all records from queue, oldest first $start = microtime(true); $sql = "SELECT id, latitude, longitude, date_inserted FROM queue ORDER BY date_inserted ASC"; $result = mysql_query($sql) or die(mysql_error()); echo "The initial query took " . (microtime(true) - $start) . " seconds\n"; echo "There are " . mysql_num_rows($result) . " rows in the result set\n"; $delete_ids = array(); while ($row = mysql_fetch_array($result)) { $start = microtime(true); $address_object = spacialMath($row[1], $row[2]); echo "It took " . (microtime(true) - $start) . " seconds for spacialMath on row with id " . $row[0] . "\n"; $start = microtime(true); $sql = "UPDATE spacial_data SET address = '" . $address_object->address . "', polygon_min = " . $address_object->polygon_min . ", polygon_max = " . $address_object->polygon_max . " WHERE id = " . $row[0]; // use an unbuffered query here...no need to return a result mysql_unbuffered_query($sql) or die(mysql_error()); echo "It took " . (microtime(true) - $start) . " seconds to update the row\n"; $delete_ids[] = $row[0]; } $sql = "DELETE FROM queue WHERE id IN(" . implode(", ", $delete_ids) . ")"; mysql_unbuffered_query($sql) or die(mysql_error()); Quote Link to comment Share on other sites More sharing options...
JustinK101 Posted June 25, 2008 Author Share Posted June 25, 2008 "MySQL is a multi-threaded application...so it will automatically spawn a thread for each connection." Yeah but the problem is that I need the php application to spawn the multiple queries in parallel. I would love to do the following: 1.) Select everything out of queue 2.) Split the number of rows in half, assign half the rows to one process, the other half of the rows to another process. Each process does exactly the same work, but just in parallel. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.