Jump to content

Recommended Posts

Hi,

 

I am a bit confused about when and how to use crc32(). I am looking for a method to hash email addresses, security is not really important, but the hashes do have to be unique and as short as possible. Originally, I thought that the crc32() function returned a hash with both letters and numbers, but it only seems to return numbers.

 

Instance:

crc32('example@example.com'); // output 875998594
crc32('exampleexampleexampleexampleexampleexampleexample@example.com'); // output 1225065599

 

How is it possible that it will always output a 9-digit or 10-digit no matter the length of the string?

 

What are the chances of collision? For as long as the string is unique will it output a unique 9-digit or 10-digit number?

 

Will it ever output letters as well?

 

I am working on 64-bit system, will this output change in a 32-bit system?

 

I have read a few articles and the php manual on it, but I am still some what clueless if this will in fact fit my needs.

Link to comment
https://forums.phpfreaks.com/topic/269075-phps-crc32-for-hashing/
Share on other sites

Assuming the algorithm is well built - the shorter the digest, the greater chance of collision.

 

The data returned in a digest is always binary, just generally in hex form. In this case, a 32-bit integer is returned. They have quite a bit written in the big, red warning box in the manual entry - including how to convert it to hex.

 

http://php.net/manua...ction.crc32.php

 

Keep in mind, a 10-digit integer can be 'shorter' than a 5 character string, in a storage sense. Hex is generally an inefficient way to store binary data (assuming it's a string)

Edited by xyph

I'm quite interested in knowing the reasons for the two requirements, namely why they have to be unique and why they have to be as short as possible? Those two requirements are usually mutually exclusive, at least to some degree.

 

The reason being is because I am working on a application that will dynamically generate new emails based on existing emails. Instance: example@example.com will generate exam_Ne3u7Ir@domain.com, "Ne3u7Ir" being a hash of the original email "example@example.com".

 

 

Assuming the algorithm is well built - the shorter the digest, the greater chance of collision.

 

The data returned in a digest is always binary, just generally in hex form. In this case, a 32-bit integer is returned. They have quite a bit written in the big, red warning box in the manual entry - including how to convert it to hex.

 

http://php.net/manua...ction.crc32.php

 

Keep in mind, a 10-digit integer can be 'shorter' than a 5 character string, in a storage sense. Hex is generally an inefficient way to store binary data (assuming it's a string)

 

Understood, I'm not really interested in converting it to Hex as I can further shorten this output if it is numeric. So to clarify, what are the odds of Crc32() producing the same "32-bit integer" for two different emails? 1 in 2^32?

Edited by marcbraulio
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.