Jump to content

Recommended Posts

We are toying with the idea of using PHP to do sorting instead of our database to reduce load and scale. Basically what we would have is multidimensional arrays that are resturned from the SQL engine and we will need to sort them on a specific key in PHP. We have written this functionality, but are looking to make it faster ANYWAY possible. Do you guys see anyway to speed up this code, without throwing more hardware at the problem yet? Currently on our test server this code takes approx 2.4 - 2.6 seconds to execute. We would love others to try this code and report what it takes them to execute. We are shooting for sub 1 second time.

 

$array = array();
$temp = array();
for($i = 0; $i < 100000; $i++) {
$temp['id'] = rand(10000, 99999);
$temp['history_id'] = rand(100000000, 999999999);
$array[] = $temp;
}


////
// Start timer
////
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
$starttime = $mtime; 

////
// Actual sort
////
usort($array, function($a, $b) {
//Compare numbers
if ($a['history_id'] == $b['history_id']) { return 0; }
return ($a['history_id'] < $b['history_id']) ? -1 : 1;
});

////
// End timer
////
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
$endtime = $mtime;
$totaltime = ($endtime - $starttime);

////
// Output execution time
////
echo "Executed in: <strong>". $totaltime ."</strong> seconds.";

It just dawned on me I could try the following technique, since the servers we are running this on are multicore, perhaps we could do the following to make it faster:

 

1.) Split the array in half

2.) Fork another process, so we have two processes

3.) Sort each half using usort() above in their own process

4.) Combine the two sorted halfs

5.) Sort the combined array a final time

 

This might be faster, though the extra work of splitting in half, merging, and sorting at the end, might negate any speed increase.

Congratulations. You almost discovered merge sort. If you have two sorted lists, you can combine them to one sorted list in linear time, so there is no need for the sort in #5. With that change you now have merge sort, or more specifically, parallel merge sort which has almost a linear speedup.

 

With that said, it will be slower and more memory intensive fetching all the values and then sorting them in PHP. Your DBMS already has highly optimized sorting algorithms implemented, and seeing as it has to touch all the resulting rows anyways, it will be faster just doing the sort while it's at it.

Daniel,

 

Actually our research has suggest a cluster of application servers, is much easier, cheaper, and way less headaches to scale then database replication and master-slave setups. In fact, this is what facebook, twitter, etc, do. They do as little processing at the DB level as possible and push that work off to a cluster of application servers. I know it seems counter-intuitive at first, but if you think about it, actually makes sense.

 

Ohhh, and wasn't my distributed merge sort, actually distributed merge sort using quick sort, since usort() uses quicksort?

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.