Jump to content

Generating Random 12 Digit Number


redbrad0

Recommended Posts

In our software, each item has its own 12 digit unique barcode. We are getting close to having 1 million records which out of 500 million, you would think we would not have any problems. I ran a test with our script and generated and logged 990,319 barcodes which are listed below. You can see that calling 1 million barcodes I only generated 162 unique. Because I check if the barcode is unique by searching the database I am having to run a TON of querys.

 


 

Questions

1. Is there a better way to generate a random 12 digit number (has to start with a 1 - 9)

2. Can you think of a way to improve finding a unique barcode even if I have 400 million barcodes in our database?

 


 

	function generateBarcode() {
   		$tmpBarcodeID = "";
   		$tmpCountTrys = 0;
	  while ($tmpBarcodeID == "")	{
  			srand ((double) microtime( )*1000000);
  			$random_1  = rand(1,9);
  			$random_2  = rand(0,9);
  			$random_3  = rand(0,9);
  			$random_4  = rand(0,9);
  			$random_5  = rand(0,9);
  			$random_6  = rand(0,9);
  			$random_7  = rand(0,9);
  			$random_8  = rand(0,9);
  			$random_9  = rand(0,9);
  			$random_10 = rand(0,9);
  			$random_11 = rand(0,9);
  			$random_12 = rand(0,9);

  			$tmpBarcodeID = $random_1 . $random_2 . $random_3 . $random_4 . $random_5 . $random_6 . $random_7 . $random_8 . $random_9 . $random_10 . $random_11 . $random_12;

  			// LETS CHECK TO SEE IF THIS NUMBER HAS EVER BEEN USED
        $query = "select count(Barcode_ID) as tot from Barcode where Barcode_Barcode='" . $tmpBarcodeID . "'";
        $res =& $this->db->query($query);
      if (PEAR::isError($res))	{
        throw new TixException("Barcode.class:", TixExceptionCodes::DATABASE_ERROR);
      }	else	{
        if ($res->numRows() == 1) {
          $res->fetchInto($row);
          if ($row['tot'] == 0 && !empty($tmpBarcodeID)) {
            $this->barcode = $tmpBarcodeID;
            
            return $tmpBarcodeID;
          }
        }
        $tmpBarcodeID = "";
        $tmpCountTrys++;
      	}
  		}
    }

 


 

TotBarcodes Barcode_TimesCalled

162 1

907 2

3354 3

9112 4

20493 5

37695 6

59876 7

83655 8

103426 9

115105 10

117007 11

108835 12

92765 13

74812 14

55155 15

38704 16

25616 17

16299 18

9841 19

5818 20

3529 21

2275 22

1436 23

1088 24

800 25

658 26

497 27

422 28

291 29

210 30

150 31

127 32

64 33

55 34

35 35

19 36

11 37

8 38

3 40

2 41

1 42

1 47

 

Link to comment
Share on other sites

You could create an array of every possible barcode, remove from that array every barcode that's currently being used, and save the array. Then, use array_rand to choose a barcode and remove it from the array when it's chosen.

 

I thought about creating a array of all the barcodes in use, and then running a search on the array on each barcode to find a empty one, but thought it could be a bad idea to store that many items in a array.

 

Basically when I start to generate barcodes, I am not just generating one but I generate approx 500 - 5,000 at a time as a product can have anywhere between 500 - 5,000 barcodes that need to be associated with it.

 

Anyone else?

Link to comment
Share on other sites

You misinterpreted my response.

 

There is going to be size issues (especially with my method) but there isn't going to be search issues.

 

I'm NOT suggesting you create an array with the barcodes in use. I AM suggesting you create an array with barcodes NOT in use. You could use a DB table instead. Eitherway, the solution isn't to look around for what barcodes are in use, but rather to make a list of the barcodes not in use, and delete them once they are.

 

The volume of barcodes you use isn't really relevant when you use my solution. You can search for 10,000 barcodes at a time if you want. You just need to use methods that select barcodes from a list of available barcodes, and never duplicate the choice. Then just delete the selections from that list.

Link to comment
Share on other sites

Errrr....  No offence intended, but that's not exactly the best way to go about it.

 

 

12 digits: 10^12 possibilities.  1000000000000 options at 4 bytes each: 37.35GB.

 

 

 

Anyway, the reason why you're only getting 162 unique numbers in that many iterations is because computers cannot technically come up with random numbers.  (Which makes sense if you think about what a computer does....)

 

Random numbers in computers must be based off of time.  Typically the time is run through an algorithm to come up with a number.  (Technically any seed can be used, but for any seed x, the same set of random numbers will always be generated.)

 

 

Mathematically:

 

 

If r(x) generates a set sized y, if r(x) is run more than y times before x changes, the set will begin to repeat.

 

 

 

So, one option would be to seed your self:

 

 

 

Something like this could work:

 

 

do {

  $random = mt_rand();  //mt_rand is much more random than rand() since it's based off of more than seconds

  //you would need to use something besides just mt_rand() to generate a code, but you get the point.

  $codeExists = mysql_query("SELECT 1 FROM items WHERE code = {$random};");

}

while(mysql_num_rows($codeExists));

 

 

 

 

As for the query, you will want to make sure you have an index on what ever column you're checking.

Link to comment
Share on other sites

I'm not so sure that will work either. If he needs to generate thousands of unique ID's in a single request, you're going to time out long before you come up with a fraction of the solution.

 

As I said before, my approach does create size issues, but provides a computationally possible solution.

 

The scale of your problem demands minimal processing time. The only way to minimize the processing time is to guarantee a correct result every time you generate a barcode. The only way to guarantee a correct barcode upon request is to draw the barcodes from a list that only contains valid barcodes.

 

If you try to generate thousands of unique barcodes randomly that have never been used, there's no way you're ever going to generate them before PHP times out.

 

I think it's also worth pointing out that my solution becomes FASTER the more bar codes you have. Other solutions will become SLOWER the more bar codes you have.

 

Further, if you're only going to need 5M barcodes total, you can reduce the size by only including a list of 5M unique barcodes and then removing from it those barcodes already in use. Other solutions don't take advantage of this fact and utilize the complete list of barcodes while trying to quickly generate thousands of both random and unique barcodes - a task PHP is never going to be able to complete.

 

Take this a step further, my solution focuses on limiting the pool of random barcodes you generate to a set that only contains valid barcodes. You can use this to your advantage in multiple ways to reduce the size of the pool.

 

For example, only store, say, 10,000 unique barcodes (that aren't in use). Whenever a barcode is used, you can let the server do background tasks to generate a new unique random barcode and add it to the list. This way, you have immediate access to unique barcodes that are random (and you're not waiting for them to be generated), you can reduce the size of the table containing those unique values, and you can utilize the servers downtime to populate that list instead of trying to do it right when the user requests.

Link to comment
Share on other sites

This would be my suggestion for a solution.

 

Generate ~50000 unique barcodes and put them into a database table.

 

Have MySQL remove from that table every barcode already in use.

 

Have a CRON run a script every few seconds that checks how many unique barcodes are in the table and, if it is lower than 50,000, attempts to create 1 new barcode once. If the script ran once a second you would have 86,400 attempts to generate a new barcode per day. If you need more attempts, you could have the CRON run, say, once a minute, and have the PHP script try to generate 1 new bar code until it times out.

 

Any kind of combination there in that works for your system.

Link to comment
Share on other sites

He said they have 500 million records...

 

 

10^12 is 1000000000000...

 

 

1000000000000-500000000 = 999500000000 free codes.

 

 

Actually, I just realized that my math is wrong since the number can't start with 0.

 

 

It's actually:

 

 

9*10^11: 900000000000

 

So 900000000000 - 500000000 = 899500000000 codes.

 

 

Anyway, if they need to generate 499 million more codes, there's obviously room in the set for that, but it comes down to a question of how many wasted generations there will be.  I don't actually feel like thinking about the math on that, but assuming a decent randomness algorithm, it would be quite rare.

 

 

 

Anyway, I could ramble all day, but that doesn't really accomplish anything.

 

 

 

<?php

ini_set('memory_limit', '1G');

$iters = 10000000; //1 million iterations

$codes = array();

$clashes = 0;

$START = microtime(true);

for($i = 0; $i < $iters; ++$i) {
$c = (string) GenerateCode();
//If it's not casted to a string, then PHP might mess with it when setting it as the array key.
//(It appears that PHP casts numbers to ints when using them as an array key....)
//The reason it's as the array key instead of the value is for speed.
//It's faster to search an array by a key than to call in_array over and over again
//because of the way hash maps work.
if(!isset($codes[$c])) {
	$codes[$c] = true;
}
else {
	++$clashes;
}
}

$END = microtime(true);

echo "There were $clashes clashes out of $iters codes generated.\n";
echo "It took " . ($END - $START) . " seconds.\n";


function GenerateCode() {
//just for kicks let's break it into 4 3 digit chunks
$c1 = mt_rand(100, 999) * pow(10, 9);
$c2 = mt_rand(0, 999) * pow(10, 6);
$c3 = mt_rand(0, 999) * pow(10, 3);
$c4 = mt_rand(0, 999);
return $c1 + $c2 + $c3 + $c4;
}

 

 

 

There's typically 0-1 clashes per 1,000,000 codes generated.

 

 

I ran it with 10m iterations a couple times in hopes of getting a more accurate number of clashes, and it seemed to be about 55 clashes.

Link to comment
Share on other sites

Here's my solution.

 

First, you fetch ALL your barcodes from the table, with mysql_fetch_assoc_array() and then when generating your numbers, you can just check if the number already exists in the array you fetched with if(in_array()). That way you only need one query, and you can keep generating barcodes that are unique.

Link to comment
Share on other sites

Instead of relying on randomness, why not create an algorithm that is deterministic. If you develop a good hashing algorithm you could easily use that.

 

Just for reflecting on randomness, there is a sorting algorithm called bogosort. It's extremely simple (and stupid): check if your dataset is sorted, if it isn't, shuffle it and start the algorithm again, otherwise stop. Because shuffling is reliant on randomness, this algorithm has a worst case running time of O(∞); you might never get it sorted.

 

It's the same thing here. There is a chance that finding a unique random number will take unreasonably long time, and there is even the (small) chance that you will never find one. Of course as your collection of bar codes increases in size, this probability becomes larger and larger.

Link to comment
Share on other sites

Instead of relying on randomness, why not create an algorithm that is deterministic. If you develop a good hashing algorithm you could easily use that.

 

Just for reflecting on randomness, there is a sorting algorithm called bogosort. It's extremely simple (and stupid): check if your dataset is sorted, if it isn't, shuffle it and start the algorithm again, otherwise stop. Because shuffling is reliant on randomness, this algorithm has a worst case running time of O(∞); you might never get it sorted.

 

It's the same thing here. There is a chance that finding a unique random number will take unreasonably long time, and there is even the (small) chance that you will never find one. Of course as your collection of bar codes increases in size, this probability becomes larger and larger.

 

I agree, I also stand by the idea that there's absolutely no reason to generate these barcodes when they're needed. Everything will run faster and smoother if they're generated before hand.

 

Unlike your typical PHP request, it doesn't matter who, when, or where the request is coming from. The need for a unique barcode is exactly the same. Do the work BEFORE the request comes in and you save the end user time.

Link to comment
Share on other sites

Wanted to edit this in but couldn't:

http://en.wikipedia.org/wiki/Birthday_attack

 

With 4.3B possibilities (yes, yours has more, but this gives you an idea) it only takes 110,000 entries before a truly random generator has a 75% chance of producing duplicates.

 

That is, once you've used a mere 0.26% of the possibilities (out of 4.3 billion), 3/4 of all randomly generated numbers will be duplicates.

 

Attempting to generate the barcodes randomly is going to be a time consuming process. On the scale you need them, it's unreasonable to do in real time.

Link to comment
Share on other sites

Hi

 

One way to do it on the fly, although with a hideous size of intermediate table:-

 

SELECT AllNums
FROM (SELECT a.i+b.i*10+c.1*100+d.i*1000+e.i*10000*f.i*100000+g.i*1000000+h.i*10000000 AS AllNums
FROM integers a,
integers b,
integers c,
integers d,
integers e,
integers f,
integers g,
integers h) z
LEFT OUTER JOIN SomeTable y
ON z.AllNums = y.Id
WHERE y.Id IS NULL
AND AllNums < 500000000
ORDER BY RAND()
LIMIT 1

 

Integers is just a table with one integer column and 10 rows, from 0 to 9. This is repeatedly joining that table against itself to come up with every possible number between 0 and 1000000000. Then joins that against your currently used table and brings back those without a matching record. Then just uses LIMIT to only bring back one. You could order it randomly to bring a random record back.

 

With the hideous size of the generated table this will be DOG SLOW so not practical, but short of storing all the unused numbers (or having a table set up containing a small subset generated when it is quiet) it is the only way I can think of doing it that will always work without having to check each individual one.

 

Faster but far cruder (and almost certain to fail to come up with an unused number occasionally):-

 

SELECT AllNums
FROM (SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums
UNION
SELECT FLOOR(0 + (RAND() * 500000000)) AS AllNums) z
LEFT OUTER JOIN SomeTable y
ON z.AllNums = y.Id
WHERE y.Id IS NULL
LIMIT 1

 

Just comes up with 20 random numbers and returns one of those if it is unused.

 

Another rough idea, again with the probability it will fail eventually:-

 

SELECT AllNums
FROM (SELECT m.RandNum + n.AllNums AS AllNums
FROM (SELECT FLOOR(0 + (RAND() * 500000000)) AS RandNum) m
CROSS JOIN (SELECT a.i+b.i*10+c.i*100 AS AllNums
FROM integers a,
integers b,
integers c
) n) z
LEFT OUTER JOIN Comments y
ON z.AllNums = y.Id
WHERE y.Id IS NULL
LIMIT 1

 

Generates a random number, adds every number from 0 to 99 to it to get 100 possible numbers, and then joins that with you table of already used numbers to only bring back those that are not already used.

 

All the best

 

Keith

Link to comment
Share on other sites

Wanted to edit this in but couldn't:

http://en.wikipedia.org/wiki/Birthday_attack

 

With 4.3B possibilities (yes, yours has more, but this gives you an idea) it only takes 110,000 entries before a truly random generator has a 75% chance of producing duplicates.

 

That is, once you've used a mere 0.26% of the possibilities (out of 4.3 billion), 3/4 of all randomly generated numbers will be duplicates.

 

Attempting to generate the barcodes randomly is going to be a time consuming process. On the scale you need them, it's unreasonable to do in real time.

 

 

Well, that's with a set of numbers that's relatively large in comparison to the set it comes from.

 

 

For example, the chances of having a duplicate when generating 10 random numbers out of 100 possible is about 50%.  5 numbers out of 100 is only 10% oddly enough.  25 numbers is 300% (so you would theoretically have 3 dupes).

 

 

 

 

Think about this:

 

 

For simplicity, assume that valid numbers are 00-99.

 

That's 100 possibilities.

 

 

Now, let's assume you generate a set of random numbers from that range:

 

{a, b, c, d, ...}

 

Now, what is the probability of a repeat the first time?  0/100.  Second? 1/100.  Third?  2/100.

 

 

Basically, the chance of a repeat is:

 

p(n, m) = (n-1)/m

 

Where n is the number of the term being generated and m is the max number of options.  (For the 3rd number, n = 3 for example.)

 

 

So, to find out how many collisions you would have when doing an entire set:

 

What are the chances of having a collision with 5 numbers out of 100?

 

Well, it's 0/100 + 1/100 + 2/100 + 3/100 + 4/100

 

A 10% chance that you'll have a collision (which isn't very surprising).

 

 

So, the probability of having a collision by generating the Nth term is:

 

p(n, m) = sum (x-1)/m

              as x -> n

 

So, going back to earlier, to find the number of terms you must have to have a certain chance of a collision, it would go like this:

 

chance of collision = (n-1)/m

 

m(chance of collision) = n - 1

n = m(chance of collision) + 1

 

So, if you had 0-99 and wanted to know how many numbers you would have to have before a 50% chance:

 

n = 100(50%) + 1

n = 50 + 1

n = 51

 

 

So, to extend that to the actual numbers...  To even have a 10% chance of collision with 100000000000-999999999999:

 

n = (9*10^11)(10%) + 1

  = (9*10^10) + 1

  = 90000000001

 

 

You would have to have 90000000001 terms before having even a 10% chance of making a collision!

 

 

 

 

 

But really what it all comes down to is how much time is wasted when a collision does happen.

 

 

My GenerateCode() function from earlier on my quite crappy computer averages about 543.866 nano seconds per run.

 

 

So figuring out how much time is wasted generating 500 mil item ids:

 

138889 clashes (well, that's based on probability so it would of course fluctuate, potentially highly sometimes).

 

 

138889 * 544 * 1/10^9 = 0.075555616 seconds.

 

 

 

 

That does not factor in database calls and what not of course though.  Also, it assumes a perfectly random generator.  In reality, the random number generating in computers is far from perfectly random in most cases.

 

 

 

 

 

 

 

I feel like my math is wrong (if it is, I'm sure someone will math pwn me :)), so I shall now test it.  I expect about 200k duplicates in 500000000 loops.  :).

Link to comment
Share on other sites

I agree that trying to generate the random numbers in real time as they are needed is problematic. I'd just create a process to geneerate the random numbers and add them to a temporary table if the number in that table is below a predefined threshold. Then pull from that table when you need them. The process to add the numbers to the temp table could be run on a scheduled basis.

 

Granted, creating an algorithym at the start would have been the best case scenario.

Link to comment
Share on other sites

Yeah an algorithm would definitely be the best solution.

 

 

I really just don't see the issue with a random number generation scheme.

 

 

A 12 digit number!  I would go with a different approach if he were generating like 1 million numbers and only had a 8 digit limit or something, but 500 million is nothing out of hat of 12 digits.

 

 

 

Hmmm, I guess a behind the scenes generation could be good though, although I do see a problem.

 

 

In the long run it would be better if he ever did approach even like 25% of the maximum options.

 

 

The problem though is that the chances of having locking problems is much higher.

 

 

If you have 1,000 numbers generated, a script is limited to those random numbers.

 

 

Otherwise, yeah, a script may have collisions generating random numbers, but what are the chances it will have a collision with a script running simultaneously?

Link to comment
Share on other sites

chance of collision = (n-1)/m

 

m(chance of collision) = n - 1

n = m(chance of collision) + 1

 

So, if you had 0-99 and wanted to know how many numbers you would have to have before a 50% chance:

 

n = 100(50%) + 1

n = 50 + 1

n = 51

 

I believe that is incorrect. Say you have a set of distinct elements [imath]X[/imath] with [imath]|X|=k[/imath]. How many elements do you need to pick before you have a 50% chance of picking the same number again?

 

Let [imath]p_k(n)[/imath] be the function determining the probability of picking a duplicate when selecting exactly [imath]n[/imath] elements. By the pigeonhole principle, you are guaranteed to pick duplicates if you pick [imath]n > k[/imath] elements.

 

When determining the probability, it's easier looking at the opposite in this case: what is the probability that all the elements are distinct (i.e., there are no duplicates)? We'll denote this by [imath]p_k'(n)[/imath] and then [imath]\forall n > k : p_k'(n) = 0[/imath].

 

Now, when selecting the first element we have all [imath]x \in X[/imath] available, so [imath]p_k'(1) = \frac{k}{k} = 1[/imath]. Next time we have one less, so by the multiplication principle [imath]p_k'(2) = \frac{k}{k} \cdot \frac{k-1}{k}[/imath], [imath]p_k'(3) = \frac{k}{k} \cdot \frac{k-1}{k} \cdot \frac{k-2}{k}[/imath]. Indeed for [imath]n \leq k[/imath]:

 

[math]\begin{split}

p_k'(n) &= \frac{k}{k} \cdot \frac{k-1}{k} \cdot \frac{k-2}{k} \cdots \frac{k-n+1}{k} \\

&= \frac{k \cdot (k-1) \cdot (k-2) \cdots (k-n+1)}{k^n} \\

&= \frac{k!}{(k-n)! \cdot k^n}

\end{split}[/math]

 

Thus our probability [imath]p_k(n)[/imath] must be given by:

 

[math]

p_k(n) = \begin{cases}

1 &\text{if } n > k \\

1 - \frac{k!}{(k-n)! \cdot k^n} & \text{if } 0 < n \leq k

\end{cases}

[/math]

 

Then we'll see that [imath]p_{100}(12) = 0.4968[/imath] and [imath]p_{100}(13) = 0.5572[/imath].

 

This is of course assuming that selecting any number is equally likely to selecting any other number.

 

By the way, what's up with all the line breaks?

Link to comment
Share on other sites

By the way, what's up with all the line breaks?

 

 

I get a little line break crazy sometimes.  For some reason I mentally separate chunks of text in little blocks.

 

 

I'm not sure if I misspoke or if my math was wrong.  I meant to have the probability of having a duplicate on the term n and only on the term n, not in the process of going from 1 to n.  My logic behind checking that was that if there are 50 terms in the result set, 100 terms in the possible set and the 51st term is being chosen, 50 terms are taken, thus 50 are open, meaning a 50/100 chance or 50%.

 

Based on "How many elements do you need to pick before you have a 50% chance of picking the same number again?" we were talking about different things.  I guess it was confusion how I lacked structure in my post and kind of rambled back in forth between selecting a term and the process of getting to a term.  Or is my math still wrong?

 

 

I just realized that based on your math, my equation involving the summation must've been wrong.  (Well, it was definitely notated incorrectly since I had x->n and should have said something like x = 0 until x = n.)  I don't think you're wrong since I suck at probability, but I don't see the error in my math.

 

10th element in 100 possible:

sum (n-1)/100

x = 0 until x = n

 

So, the probabilities:  0/100, 1/100, 2/100....9/100.  Once again, the first time, 0 terms are taken hence 0/100 chance.  The second time 1 term is taken, hence a 1/100 chance.  The 10th term 9 terms are taken hence a 9/100 chance.  Basically I don't see why assuming r(a, b) is a function that returns a random number c such that a <= c <= b with the number d such that a <= d <= b would not have the probability of being selected of 1/(b-a+1).  Is that where my logic is going astray, or is it in thinking that that concept continues for subsequent numbers?

Link to comment
Share on other sites

Corbin, the problem is that you're comparing 1 number to the list, and not every number in the list to every other number in the list. So in a group of 50 random numbers, we're not looking at the probability that a single given number will match the other numbers in the set, we're looking at the probability that any number will be the same as any other number in the set.

 

If we talk about it as pairings, you're creating 49 pairs (1 number compared to each of the 49 others). But in reality, there's 1225 pairs (50*49/2). The pairs aren't statistically independent, but it gives you some perspective as to the fact that you can't simply compare 1 number to the entire set.

Link to comment
Share on other sites

Yeah, I've been thinking about it for like 15 minutes now, and I just had a breakthrough.  I thought about it backwards like Daniel suggested and realized that the first term has a x/x probability, then the second a x-1/x, then x-2/x and so on....  Thus the probability of having unique terms:  1-(rangeSize-0)(rangeSize-1)(rangeSize-2)....(rangeSize-setSize).  I kept wondering why Daniel's equation had to use factorials, then once I thought about it backwards it made sense.  (I would assume Daniel's and my equation are the same [especially considering I stared at his for 10 minutes and that's what made me realize that haha] although I don't know how to prove that.)

 

 

I still don't fully understand why doing it forwards instead of backwards does not work the way I thought it did, but I do get the backwards way now. Well, I do understand why since what you said with the two way pair thing would make mathematical sense, but I guess I don't understand why that's the way it is.  I guess I'm trying to look at it as statistically independent pairs or something.  Or maybe I'm just thinking of it how I would do it in my head.  I would go through and compare x to every term in Y which would only be |Y| possibilities, but, as you two have both pointed out, that wouldn't make sense math wise.

Link to comment
Share on other sites

That's why e.g. the "birthday paradox" is called like it is. Using the above function, [imath]p_{365}(23) \approx 0.5[/imath], so there is just above 50% chance that at least two people will have the same birthday when there are 23 people if all days are equally likely as birthdays and you ignore leap years. It's not a mathematical paradox, but it's the underlying probability theory that is not intuitively clear until you go through the math.

 

The reason why it uses factorials is to get rid of the [imath]\cdots[/imath]. If you have n!, but want to get rid of the last k terms (counting downwards), then you just divide by (n-k)! because of the way you can cancel out factors in a fraction.

Link to comment
Share on other sites

I first encountered the birthday paradox whilst reading a book about fermats last theorem by Simon Singh, took me awhile to wrap my head around it. The example they used was a football match (soccer to the Americans), if you have 11 players a side plus the referee, equalling 23 people (check out my maths skills, Daniel0 has nothing on me) then there is evens chance that two players share a birthday. Though I must admit I found some of the other seemingly illogical, logic puzzles to be even more confusing (such as the three way duel).

 

Sorry, little off topic.

Link to comment
Share on other sites

Hrmmm....  Yeah, guess it's one of those things that just doesn't make sense to me at the moment.  (I plan on reading more later :).)  As for factorials, I figured it was something like that, but don't remember what.  I remember a couple years (probably like 5 lol) back when we first did factorials we had to write them out the longway for like a week just so we would see that x.x.  (Apparently I didn't remember though hehe.)

Link to comment
Share on other sites

Wow what a great topic and great info. Thanks everyone for your posts.

 

I am starting to think that building a table of available barcodes is the way to go and setting the max number of barcodes available in the database to a number that I should not need more than that in any given day.

 

Because barcodes are released everyday as they were not used, when I release the barcodes because they are no longer needed I will add them back into the available barcodes which should help finding new barcodes easier.

 

Trying to add a salt pattern, I dont see that mt_rand supports this. Do you have a example?

mt_rand  ( int $min  , int $max  )

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.