Jump to content

Working out separate numbers from combination and total


Kriptonic

Recommended Posts

Probably not a good subject but I'll get to the point.

 

I am creating a webpage parser where the user copies an entire page and dumps it to a text box, my PHP script will run through and extract all data it needs. One problem is 4 numbers I need.

 

These numbers can be anything from 0 up to 50,000 each (could be higher). For this example I am going to say the 4 numbers in this instance are:

2889 6073 1597 3661, their total is 14220. (In all cases there will always be 4 numbers, 0 is always represented as 0 and I will always have the total of the 4 numbers)

 

When pasted into the text box, the 4 numbers get combined, the above example would yield 2889607315973661 as a number.

 

By using the number '2889607315973661' and the known total of '14220', how can I work out what the 4 separate numbers are that made it up?

 

Just to point out, the numbers 10 16024 0 21 (1016024021) could come out with a total of '16055', the odds of the 4 numbers being equal in length are slim.

 

 

I've thought about it, I've messed about with it, I am just completely stumped on where to start and the best way to go about it.

 

Big thanks to anyone who can shed some light onto this matter and point me in the right direction, I'm just confused and it hurts my brain to think about  :P

Link to comment
Share on other sites

I can think of no mathematical way to pinpoint the exact values of 4 numbers given what you are given, however since you are given an integer with the values concatenated...perhaps you can try to add each value of the integer split up into 4 integers with a given number count...I will show you what I am talking about...

 

$string = '2889607315973661';
$known_total = '14220';
$arr = str_split($string, 4); //produces an array of integers grouped by 4 numbers..
$total = array_sum($arr); //sum of values
print  $total; 
if($total == $know_total){ //if the total of the 4 numbers equal the known total, we have our numbers..
//do something
}else{
//split the numbers up unevenly until a match is found
}

Link to comment
Share on other sites

Any sequence of numbers consisting of identical digits (except where there is a single digit in each number) is impossible to reverse unambiguously, so there will be some cases where you can't find the original numbers.  For example:

 

1, 1, 1, 11 -> Total 14, concatenates to 11111, the 11 could be in any position

1, 11, 11, 1 -> Total 24, concatenates to 111111, the 11s could be in any position.

 

Breaking the concatenated digits into every possible combination might work.  If you do check every combination then you can also detect when there is an ambiguous case.

Link to comment
Share on other sites

Thanks for the comments.

 

@AyKay47 - Thanks, the system I had in place worked like that, though it was a little more long winded, that will help :)

 

@btherl - I agree with the 1, 11, 11, 1 cases, however the likely hood that the numbers are that small is small it's self, so things like that don't worry me. (numbers have a much higher chance to be in the thousands then the tens, though that chance still remains)

 

 

I suppose my main challenge is figuring out how to break the big number up into almost every combination of 4 numbers that we can make, that way we can cycle through until we get a match... This tool is part of a bigger system, once I'm done with some of my current tasks I'll be back to work on this tool. I'll post my findings as others may find it useful.

Link to comment
Share on other sites

Thanks for the comments.

 

@AyKay47 - Thanks, the system I had in place worked like that, though it was a little more long winded, that will help :)

 

@btherl - I agree with the 1, 11, 11, 1 cases, however the likely hood that the numbers are that small is small it's self, so things like that don't worry me. (numbers have a much higher chance to be in the thousands then the tens, though that chance still remains)

 

 

I suppose my main challenge is figuring out how to break the big number up into almost every combination of 4 numbers that we can make, that way we can cycle through until we get a match... This tool is part of a bigger system, once I'm done with some of my current tasks I'll be back to work on this tool. I'll post my findings as others may find it useful.

this will take some work I agree, you will definitely want to create a function to do the work for you, and btherl is right, while the chances of an ambiguous integer occurring are slim, there is still that chance

Link to comment
Share on other sites

So I sat down and had a think. I came to the conclusion that what I really needed was to generate a list of possible number combinations. I've written the following code:

 

if($_POST['submit'] == "Split"){
$split = '2889607315973661';//$_POST['res'];
$sum = '14220';//$_POST['gat'];

checkForMatch($split,$sum);
}

function checkForMatch($num,$tot){
//work out the length of the main number
$numLen = strlen($num);
//we know we HAVE to have 4 numbers, so after we take a random split, we need to be
//100% sure we can make the remaining numbers that total 4
$numbersAssigned = 0;
for($i=3;$i>=0;$i--){
	//the next number will be:
	//	numbers length - numbers we've assigned already - numbers left to make
	//the above ensures that after we've made a number, there are enough digits left to make the remaining numbers
	$randomSlice = rand(1,$numLen-$numbersAssigned-$i);
	//if this is the last number, just assign the unasigned numbers to it as there is no need for a random
	if($i==0) $randomSlice = $numLen-$numbersAssigned;
	//if we've taken our last number length as 4, increment the total so we can take it into consideration next time
	$numbersAssigned += $randomSlice;

	echo $randomSlice . "<br/>";
}
echo "<b>$numLen</b>";
}

 

The output will be 4 numbers, these are the lengths of of the internal 4 numbers and the final is the length of our string. See below an example of the output:

 

1
8
4
3
16

 

The problem to note here is that there is no structure to the number splitting, so should there be a large number and the random number generator sucks, we could theoretically go on forever.

 

I've tested with different lengths of number and it's all working perfectly so far, but as I said, there is no structure at the moment. I'm going to keep messing around as well as expanding the function. I will make it get the numbers using the splits as well as checking that they match the total. I may also implement a while loop to keep checking until a match was found.

 

I will post more results here when I'm done.

Link to comment
Share on other sites

I've conducted the changes listed in my previous post. Across a sample of 10 generations, I was it go as low as 200 loops to match the numbers or as high as 2500 loops to make a match. That was with a base of 4 numbers x 4 digits to get a total length of 16.

 

I think this is acceptable. It might take around instant to 3 seconds max but I feel this will be acceptable as a result is better then no result.

 

Thanks again to all who've provided input. Should I improve the function, I will post here those findings.

 

 

I don't think I will mark it as complete yet because it's not that efficient and I'm interested to see if anyone comes up with a system that works better then my own.

Link to comment
Share on other sites

So your approach is to split randomly until you find a match?  That sounds reasonable.  In your example you can make some extra assumptions:

 

1.  The sum has 5 digits, therefore each individual number has between 1 and 5 digits (assuming these numbers are all positive and don't have leading 0s)

2.  The sum is < 20,000 therefore no more than 1 individual number has 5 digits, the rest have 4 or fewer digits.  Similar rules apply for other sums where the first digit is 1, 2 or 3.  This may be too complex to be worth implementing.

3.  If numbers don't have leading 0s then any split beginning with a 0 digit must be of length 1, so that limits your possibilities a bit.

 

Rule #1 seems like the most useful, rules 2 and 3 might help a little.

Link to comment
Share on other sites

Random slices probably isn't the best way of doing this. I'd much rather do it systematically.

 

Just to confirm -

These numbers can be anything from 0 up to 50,000 each (could be higher)
is confusing. Is it possible to have a 6 digit value for these numbers?
Link to comment
Share on other sites

Here's my solution. With a max a 5 digits per number, you'll get your results using 80 to 130 loops.

 

<?php 

$numbers = '2889607315973661';
$total = '14220';

print_r( splitNumbers($numbers, $total) );

echo '<br><br>';

// Try with randoms

$num = array(
mt_rand(0,99999),mt_rand(0,99999),
mt_rand(0,99999),mt_rand(0,99999)
);

echo 'Found results: ';
print_r( splitNumbers(implode($num),array_sum($num)) );
echo '<br>Actual input:';
print_r( $num );

//assumes no numbers will be more than 99999
function splitNumbers( $numbers, $total ) {

$length = array(); // The current length of the first 3 numbers
$values = array(); // Will hold the actual numbers for cleaner code
$numbersCount = strlen( $numbers ); // Total amount of numbers in the group
$lengthMax = 5; // Max digits allowed per number

// We know the first number HAS to be 2, 28, 288, 2889 or 28896
// We then need to find out the lengths of the next two numbers, then
// use the total length to find the 4th number

// Start with first number starting at 1 digit
for( $length[0] = 1; $length[0] <= $lengthMax; $length[0]++ ) {
	// Then start with second number starting at 1 digit
	for( $length[1] = 1; $length[1] <= $lengthMax; $length[1]++ ) {
		// Then start with the 3rd number starting at 1 digit
		for( $length[2] = 1; $length[2] <= $lengthMax; $length[2]++ ) {
			// Find the length of the last number
			$lengthLast = $numbersCount - array_sum( $length );
			// If the last number isn't at least 1 digit long, the combination
			// is impossible, so we can skip it
			if( $lengthLast < 1 ) continue;
			// Otherwise, split the string up
			$values[0] = substr( $numbers,0,$length[0] );
			$values[1] = substr( $numbers,$length[0],$length[1] );
			$values[2] = substr( $numbers,$length[0]+$length[1],$length[2]);
			$values[3] = substr( $numbers,$numbersCount-$lengthLast,$lengthLast );			
			// If we find a match, return it
			if( array_sum($values) == $total ) return $values;
			// Take out the double forward slash to see what the function is doing
			// print_r( $values ); echo "<br>\n";
			// If the 3rd number starts with 0, no point in checking beyond
			if( $numbers[ $length[0]+$length[1] ] == '0' ) break;
		}
		// If the second number starts with 0, no point in checking beyond
		if( $numbers[ $length[0] ] == '0' ) break;
	}
	// If the first number starts with 0, no point in checking beyond.
	if( $numbers[0] == '0' ) break;
}

return FALSE;

}

?>

Link to comment
Share on other sites

Random slices probably isn't the best way of doing this. I'd much rather do it systematically.

 

"probably" is the word here :)  If the inputs are skewed but the bias is unknown, a random approach which tracks which possiblities have been tried before is usually going to give you average performance, regardless of what the bias is.  A systematic approach will either give you consistently better or consistently worse than the random approach, depending on what order the possibilities are tried in and how the data is skewed.

 

So a systematic method which matches the data is better then random, but a systematic method which doesn't suit the data is worse than random.

 

If it's truly random data then it doesn't matter which approach is used.

 

In splitNumbers(), $lengthMax = strlen($total) is an upper bound on the number of digits in each number being searched for.

Link to comment
Share on other sites

Well, after doing some testing, even this won't produce 100% accurate results. You're going to get around 3% of your results as incorrect. Slightly less if order isn't important. Here's the script I'm using to test.

 

I changed the function up slightly, as I didn't quite understand the importance of skipping number with leading 0's even at the last grouping

 

<?php 

$numbers = array();

// Make equal chances to get 1,2,3,4,5 digit numbers
for( $i = 0; $i < 1000; $i++ ) {
for( $x = 0; $x < 4; $x++ ) {
	$rand = mt_rand( 1,6 );
	$numbers[$i][$x] = mt_rand(
		(int) '1'.str_repeat('0',$rand-1),
		(int) str_repeat('9',$rand)
	);
}
}

$errors = 0;
foreach( $numbers as $key => $number ) {
$error = FALSE;
echo '<p><h3>Test '.$key.'</h3>';
echo 'Using: '. implode( ', ', $number ) .'<br>';
$return = splitNumbers( implode($number), array_sum($number), 6 );
if( $return ) {
	echo 'Match found using '.implode($number).' with a sum of '.array_sum($number).'<br>';
	foreach( $return as $k => $n ) {
		if( $n != $number[$k] ) {
			echo '<span style="color:red">MISMATCH('.$n.','.$number[$k].')</span>, ';
			$error = TRUE;
		} else {
			echo $n . ', ';
		}
	}
} else {
	echo '<span style="color:red">Matching numbers could not be found</span>';
	$error = TRUE;
}
if( $error == TRUE ) $errors++;
}
echo '<h2>Out of 1,000 loops, '.$errors.' were found to be incorrect ('.($errors / 10).'%)</h2>';




function splitNumbers( $numbers, $total, $lengthMax = 5) {

$length = array();
$values = array();
$numbersCount = strlen( $numbers );

for( $length[0] = 1; $length[0] <= $lengthMax; $length[0]++ ) {
	for( $length[1] = 1; $length[1] <= $lengthMax; $length[1]++ ) {
		for( $length[2] = 1; $length[2] <= $lengthMax; $length[2]++ ) {
			$lengthLast = $numbersCount - array_sum( $length );
			if( $lengthLast < 1 ) continue;
			if( $numbers[ $numbersCount-$lengthLast ] == '0' ) continue;
			$values[0] = substr( $numbers,0,$length[0] );
			$values[1] = substr( $numbers,$length[0],$length[1] );
			$values[2] = substr( $numbers,$length[0]+$length[1],$length[2]);
			$values[3] = substr( $numbers,$numbersCount-$lengthLast,$lengthLast );			
			if( array_sum($values) == $total ) return $values;
			if( $numbers[ $length[0]+$length[1] ] == '0' ) break;
		}
		if( $numbers[ $length[0] ] == '0' ) break;
	}
	if( $numbers[0] == '0' ) break;
}

return FALSE;

}

?>

 

Pretty much, the existence of a group of numbers with a single digit number increases the chance of failure VASTLY. The smaller the numbers used, the greater chance the function will return bad values. With minimum 2 digit numbers, you should be expecting less than 1% failure. With minimum 3 digit numbers, less than 0.1%

 

Hope this helps.

 

@btherl, In this case, I'm quite sure the systematic approach is more efficient in almost every aspect. I always say thing like 'probably' though, because I'm always learning, and I love to be proven wrong. From what I gather your response is more of a theoretical discussion than what I actually said in context, and from what I gather you agree with me?

 

$lengthMax = strlen($total) is potentially true, and probably the best solution without knowing exactly. My guess is there's some sort of max bound on this, but it doesn't matter as my function will take any $lengthMax. The point of $lengthMax is to speed up the script, the lower the value, the less values have to be tested.

 

 

Link to comment
Share on other sites

In this case I think a systematic approach is better, and that's because it's a very small search space, and the worst case of having to check every single possibility before finding the solution is not particularly bad.  So I'm talking more about larger search spaces, where a random approach can have a generally a better average case, depending on the distribution of inputs.  In that sense it's theoretical.

 

The non-theoretical part is that I'm sure these inputs are skewed, and so a systematic approach has a chance of usually testing many wrong solutions before it finds the right one.  But it becomes theoretical again because the potential speed up is too small to be worth the effort.

 

The reason I didn't immediately discount Kriptonic's random approach is that it really isn't that bad with such a small search space.  I didn't think it was worth trying to switch him to a systematic method when the efficiency gain wasn't going to be that much.

 

I agree with what you're saying about $lengthMax.  I think Kriptonic would just derive $lengthMax from the length of the total anyway, so I thought it would save a step if splitNumbers() did that itself.

Link to comment
Share on other sites

In a larger search space, wouldn't storing and recalling previous attempts slow down the process, becoming more and more dramatic the longer it takes to find a given result? I've never actually seen a random 'guess and check'-style search engine in my years of coding and administration. It's always systematic to some degree.

Link to comment
Share on other sites

If you store the previous attempts in a large enough hash table then it's almost always constant time to check if a possibility has been checked, with collisions taking slightly longer.  Or you can use a large array and an addressing scheme that puts each possibility in one array slot.  I've done a lot of research in reducing recursive boolean equations to non-recursive boolean equations for program analysis, and it turns out random approaches do quite well.  Once you limit yourself to a particular input set that you know the characteristics of and can develop a systematic approach that works well for those kinds of input, then you can do better than random.  But here I still don't know of any patterns in Kriptonic's input, so he might as well use a random approach.

 

I would agree that the odds are in favour of a systematic approach being faster, even without knowledge of patterns in the input.

Link to comment
Share on other sites

So the only dramatic loss, assuming you have a proper storing/recalling method involved would purely be memory consumption.

 

Anyways, you've made your point pretty clear, and have a lot more bench-marking done on it then I have. Hard not to agree with you.

 

Back to the thread.... /hijack

Link to comment
Share on other sites

try

<?php
$test = '1016024021';
$goal = 16055;
$no_num = 4;
$ansver = array();

function solve($in, $goal, $no, &$ansver, $tot){
if ($no == 1) {
	if (implode($in) == $goal) {
		return array(implode($in)); 
	}else return false;
}
$a = array_shift($in); 
while ($a < $goal and count($in)) {
	if ($b = solve($in, $goal - $a, $no - 1, &$ansver, $tot)){
		array_unshift($b, $a);
		if ($no == $tot){
			array_push($ansver, $b);
			array_shift($b);
		} else return $b;
	}$a .= array_shift($in);
}
}
solve(str_split($test), $goal, $no_num, &$ansver, $no_num);
print_r($ansver);
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.