Pawn Posted March 14, 2009 Share Posted March 14, 2009 Hi. New to this. What is the best way to... Sort a sample of numbers into two groups with the most equal totals? Typically I would need to take ten integers from a DB and display them in two 'balanced' groups of five. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/ Share on other sites More sharing options...
Daniel0 Posted March 14, 2009 Share Posted March 14, 2009 Like this? $numbers = array(/* [...] */); $groups = array(); foreach ($numbers as $num) { $group = floor($num / 5); $groups[$group][] = $num; } (not tested) That should group the numbers in intervals of five. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-784670 Share on other sites More sharing options...
corbin Posted March 14, 2009 Share Posted March 14, 2009 Hrmmm I think he means he wants two sets of numbers with the closest totals possible. Or in math terms: 1 set of numbers broken into two sets such that the difference of the sum of the two sets is the minimal possible. The problem is, I've no idea how to do that in PHP x.x. *Starts trying to code something* Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-784679 Share on other sites More sharing options...
Pawn Posted March 14, 2009 Author Share Posted March 14, 2009 1 set of numbers broken into two sets such that the difference of the sum of the two sets is the minimal possible. Exactly! Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-784684 Share on other sites More sharing options...
Daniel0 Posted March 14, 2009 Share Posted March 14, 2009 Hmm... try this: <?php $numbers = array(1,2,3,4,5,6,7,8,9,10,11,12); $group1 = $group2 = array(); sort($numbers); // i know THIS array is already sorted... while (count($numbers)) { $topVal = array_pop($numbers); $delta = array_sum($group2) - array_sum($group1); if ($delta <= 0) { $group2[] = $topVal; } else { $group1[] = $topVal; } } ?> Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-784693 Share on other sites More sharing options...
corbin Posted March 14, 2009 Share Posted March 14, 2009 Daniel, that will break the set into two non-uniformly sized sets in certain circumstances. I guess one could just find the number(s) that would least affect the sum of each set and move it from one set to the other though. Hrmmmmm...... Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-784699 Share on other sites More sharing options...
Daniel0 Posted March 14, 2009 Share Posted March 14, 2009 Also, if you have a really large data set then it might be faster to use SplStack than an array. Daniel, that will break the set into two non-uniformly sized sets in certain circumstances. I guess one could just find the number(s) that would least affect the sum of each set and move it from one set to the other though. You cannot ensure minimal difference otherwise. If you let x={7,20,4,4,3,5}, then you need y1={7,5,4,4} and y2={20,3} for minimal difference. 7+5+4+4=20, 20+3=23, 23-20=3. There is no other way you could group that data set such that you both have minimal difference between their sums while still making the lists contain an equal amount of elements. Edit: Well, I guess you could. You could move one of the 4s from y1 to y2. That would get a difference of 11 instead of 3 though. Edit 2: If you do want that then you can do this: <?php $numbers = array(7,20,4,4,3,5); $group1 = $group2 = array(); sort($numbers); // i know THIS array is already sorted... while (count($numbers)) { $topVal = array_pop($numbers); $delta = array_sum($group2) - array_sum($group1); if ($delta <= 0) { $group2[] = $topVal; } else { $group1[] = $topVal; } } if (($g2s = sizeof($group2)) != ($g1s = sizeof($group1))) { if ($g2s > $g1s) { $from = &$group2; $to = &$group1; } else { $from = &$group1; $to = &$group2; } while (sizeof($from) > sizeof($to)) { $to[] = array_pop($from); } } ?> However, if you have an odd number of elements then you'll still be stuck though. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-784703 Share on other sites More sharing options...
corbin Posted March 14, 2009 Share Posted March 14, 2009 Hrmmm I think he means he wants two sets of numbers with the closest totals possible. Or in math terms: 1 set of numbers broken into two sets such that the difference of the sum of the two sets is the minimal possible. The problem is, I've no idea how to do that in PHP x.x. *Starts trying to code something* Sorry, earlier I forgot to mention that I think he wants two sets of the same size. "1 set of numbers broken into two sets such that the difference of the sum of the two sets is the minimal possible." -> 1 set of numbers broken into two sets such that the two sets have the same number of elements and the difference of the sum of the two sets is the minimal possible. Edit: This is the logic that I came up with before I got in the shower. I was going to put it into code now, but now I'm leaving lol. $numbers = array(0, 1, 5, 9, 27, 42, 43, 54, 103, 108); //sum 392 //each set should ideally have 196 as a total //108, 0, 103, 1, 54 = 266 //5, 9, 27, 42, 43 = 126 //266-196 = 70. closest element to 70 is 54. //70-54 = 16. closest elem to 16 is 1 //16-1 is 15. closest elem to 15 is 0. //next closest to 15 is 103. //a = {108} //0, 103, 1, 54, 5, 9, 27, 42, 43 = 284 //ordered: //b = {0, 1, 5, 9, 27, 42, 43, 54, 103} 284 //ok, so we need to get rid of 4 elements in the second set. //392 - 284 = 108 //so we need the four elements that add to 108 the closest. //108/4 = 27, so we want the elements closest to 27 //27, 9, 42, are obviously the nearest. //9+27+42 = 78, so we're 30 short. so now we want the elem closest to 30 from the set {0, 1, 5, 43, 54, 103} //5 is 25 from 30, and 43 is 13 from 30, so, 43 it is. //a = {0, 1, 5, 54, 103} //b = {9, 27, 42, 43, 108} //sum(a) = 163 //sum(b) = 229 //both are quite far from 196, so perhaps some adjusting is in order. //196-163 = 33 //closest elem in b to 33 is 27, so: //a = {0, 1, 5, 27, 54, 103} sum(a) = 190 //b = {9, 42, 43, 108) sum(b) = 202 //196-202 = -6. //closest elem in a to -6 is 0, so: //a = {1, 5, 27, 54, 103} sum(a) = 190 //b = {0, 9, 42, 43, 108} sum(b) = 202 //Repeat the process: //a = {1, 9, 27, 54, 103} sum(a) = 194 //b = {0, 5, 42, 43, 108} sum(b) = 198 //The only feasible things to switch now are 103 and 108, and that would make the sets both 3 from the goal, instead of the current 2 from the goal //so, a and b are final. Also, I never checked if that works with another set or if I just got lucky with my numbers. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-784714 Share on other sites More sharing options...
Daniel0 Posted March 14, 2009 Share Posted March 14, 2009 Hmm... my algorithm can split 10000 random integers between 1 and 1000 in about 1.27 seconds on my computer. That's fairly decent I suppose. If we assume you're not going to have more than 100 elements then it has a running time at about 0.3 ms. That should be sufficiently fast. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-784755 Share on other sites More sharing options...
Pawn Posted March 14, 2009 Author Share Posted March 14, 2009 Thanks for your time guys. Daniel - I will push some sample data through that and see how it performs. Is there any chance you could comment it a little so that a remedial specimen like myself might be able to have an inkling of what's going on? Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-784894 Share on other sites More sharing options...
corbin Posted March 15, 2009 Share Posted March 15, 2009 Hrmm didn't see your edit2 earlier Daniel. Your way will most definitely be faster than the way I would do. Hehe. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-785036 Share on other sites More sharing options...
Daniel0 Posted March 15, 2009 Share Posted March 15, 2009 Daniel - I will push some sample data through that and see how it performs. Is there any chance you could comment it a little so that a remedial specimen like myself might be able to have an inkling of what's going on? Sure: <?php class Timer { private $start; public function __construct() { $this->start = (float) microtime(true); } public function getTime() { return microtime(true) - $this->start; } } $numbers = array(); for ($i = 0; $i < 10000; $i++) { $numbers[] = rand(1,1000); } $timer = new Timer(); // START ALGORITHM $group1 = $group2 = array(); // create two empty arrays to store the numbers in sort($numbers); // sort numbers ascendingly. this will put the highest numbers as the end. // we need that for maximum precision so we can do the more fine grained // grouping at the end while (count($numbers)) { // loop through the $numbers array as long as it contains elements $topVal = array_pop($numbers); // remove the last element from $numbers and store it in $topVal // doing this each iteration will mean we iterate over each // element exactly once because the while loop will end once // $numbers has been depleted $delta = array_sum($group2) - array_sum($group1); // we need the current difference of the lists' sums // to determine where to put the current number if ($delta <= 0) { // if the $delta <= 0, then $group1 has the largest sum and as such we need to put $group2[] = $topVal; // the new value in $group2 to even it out... } else { // ...otherwise we do the opposite and place it in $group1 $group1[] = $topVal; } } // we have now split the $numbers list into two lists $group1 and $group2. they are, however, not necessarily // of same size, but the following will take care of if (($g2s = sizeof($group2)) != ($g1s = sizeof($group1))) { // check if their same size and store their size in $g1s and $g2s if ($g2s > $g1s) { // if $group2 is largest then we make $from store a *reference* to $group2, and $to store a reference $from = &$group2; // to $group1. we are going to move elements from $group2 to $group1... $to = &$group1; } else { // ... and we'll do the opposite if $group1 is largest $from = &$group1; $to = &$group2; } while (sizeof($from) > sizeof($to)) { // this loop runs as long as $from is larger than $to $to[] = array_pop($from); // because we grouped the smallest elements in the first part of the algorithm, // this means that $group1 and $group2 will have the smallest elements in the end // as well. we need to move the smallest element because those will have the least // significant change on the diff sum of the lists, so we remove the smallest from // the largest list and move to the smallest list until the lists are evenly sized } } // END ALGORITHM echo $timer->getTime() . PHP_EOL; echo array_sum($group2) - array_sum($group1); //var_dump($group1, $group2); ?> I left over some debugging and benchmarking code in case you're interested. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-785056 Share on other sites More sharing options...
GingerRobot Posted March 15, 2009 Share Posted March 15, 2009 Also, if you have a really large data set then it might be faster to use SplStack than an array. Hmm, i'd never seen that there were some data structures already implemented in PHP. Perhaps i'm being naive here - why would you implement a stack with a doubly-linked list? Being FILO store, why would you want the ability to iterate over the list? I would have thought you'd implement a stack with a linear linked list and pointer to the top of the stack. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-785147 Share on other sites More sharing options...
Daniel0 Posted March 15, 2009 Share Posted March 15, 2009 I'm not sure. Some benchmarks seem to indicate that using the built-in data structures are faster than using arrays though: http://blueparabola.com/blog/spl-deserves-some-reiteration You'll have to wait until PHP 5.3 though. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-785160 Share on other sites More sharing options...
GingerRobot Posted March 15, 2009 Share Posted March 15, 2009 Interesting. Still, it seems like an odd choice. I mean, a doubly-linked list inherently has twice the memory overhead as as linear one (twice the number of pointers) and the next pointer of the list seems completely redundant in a stack. It looks almost like a lazy implementation - "well, we've got a doubly-linked list - let's just stick a couple restrictions on access and use and we'll have a stack". Of course, i've probably got the wrong end of stick/missed something completely. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-785168 Share on other sites More sharing options...
Daniel0 Posted March 15, 2009 Share Posted March 15, 2009 Dunno. Guess you can ask on php.internals. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-785172 Share on other sites More sharing options...
GingerRobot Posted March 15, 2009 Share Posted March 15, 2009 Dunno. Guess you can ask on php.internals. Thanks for the pointer. Seems i was right that it was basically a case of an easy change from the code for the existing doubly-linked list. If you're interested, my question: Firstly, apologies if this is not the correct place to ask such a question - I was directed here from a forum. I recently discovered the datastructures found in the Standard PHP Library. I noticed that SplStack is implemented using a doubly-linked list and wondered why. Given that a stack is LIFO store and the only operations really needed are push and pop, why would next and previous pointers be required on each item? There's no requirement to allow iteration over a stack. It would seem to me that a stack should require a linear linked list and a pointer to the top of the stack - which would have roughly half the memory overhead of a doubly-linked list (half the number of pointers). Do i just have no idea what i'm talking about or have i missed something obvious? Thanks in advance, Ben And the response: You're right, a stack can be implemented in a simpler way that would reduce slightly the memory usage. However, storing one additional pointer per doubly linkedlist (DLL) node and managing it is totally irrelevant when placed in context of PHP code. If you look at the code, you'll see that the additionnal code to make a DLL work as a stack is very small, and that's why it was implemented like that. One could implement a stack and a queue using a single linked list, but that would mean code duplication which ends up to be not worth the irrelevant speed/memory improvement. If SPLinkedList is implemented once, we might consider move SplStack and SplQueue to use that new implementation, but I believe there are other priorities for SPL right now, such as documentation. It still seems slightly odd that one would bother to implement data structures without really optimizing them. Sure they're more efficient than arrays anyway, but if efficiency isn't the goal then why bother implement them at all. Oh well. p.s. Apologies for hijacking the thread. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-785343 Share on other sites More sharing options...
Daniel0 Posted March 15, 2009 Share Posted March 15, 2009 Hmm... interesting. I agree with whomever responded to you that documentation should have higher priority though. The current documentation for SPL sucks. Quote Link to comment https://forums.phpfreaks.com/topic/149397-simple-problem/#findComment-785347 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.