marcbraulio Posted October 4, 2012 Share Posted October 4, 2012 Hi, I am a bit confused about when and how to use crc32(). I am looking for a method to hash email addresses, security is not really important, but the hashes do have to be unique and as short as possible. Originally, I thought that the crc32() function returned a hash with both letters and numbers, but it only seems to return numbers. Instance: crc32('example@example.com'); // output 875998594 crc32('exampleexampleexampleexampleexampleexampleexample@example.com'); // output 1225065599 How is it possible that it will always output a 9-digit or 10-digit no matter the length of the string? What are the chances of collision? For as long as the string is unique will it output a unique 9-digit or 10-digit number? Will it ever output letters as well? I am working on 64-bit system, will this output change in a 32-bit system? I have read a few articles and the php manual on it, but I am still some what clueless if this will in fact fit my needs. Quote Link to comment https://forums.phpfreaks.com/topic/269075-phps-crc32-for-hashing/ Share on other sites More sharing options...
Christian F. Posted October 4, 2012 Share Posted October 4, 2012 I'm quite interested in knowing the reasons for the two requirements, namely why they have to be unique and why they have to be as short as possible? Those two requirements are usually mutually exclusive, at least to some degree. Quote Link to comment https://forums.phpfreaks.com/topic/269075-phps-crc32-for-hashing/#findComment-1382673 Share on other sites More sharing options...
xyph Posted October 4, 2012 Share Posted October 4, 2012 (edited) Assuming the algorithm is well built - the shorter the digest, the greater chance of collision. The data returned in a digest is always binary, just generally in hex form. In this case, a 32-bit integer is returned. They have quite a bit written in the big, red warning box in the manual entry - including how to convert it to hex. http://php.net/manua...ction.crc32.php Keep in mind, a 10-digit integer can be 'shorter' than a 5 character string, in a storage sense. Hex is generally an inefficient way to store binary data (assuming it's a string) Edited October 4, 2012 by xyph Quote Link to comment https://forums.phpfreaks.com/topic/269075-phps-crc32-for-hashing/#findComment-1382686 Share on other sites More sharing options...
marcbraulio Posted October 5, 2012 Author Share Posted October 5, 2012 (edited) I'm quite interested in knowing the reasons for the two requirements, namely why they have to be unique and why they have to be as short as possible? Those two requirements are usually mutually exclusive, at least to some degree. The reason being is because I am working on a application that will dynamically generate new emails based on existing emails. Instance: example@example.com will generate exam_Ne3u7Ir@domain.com, "Ne3u7Ir" being a hash of the original email "example@example.com". Assuming the algorithm is well built - the shorter the digest, the greater chance of collision. The data returned in a digest is always binary, just generally in hex form. In this case, a 32-bit integer is returned. They have quite a bit written in the big, red warning box in the manual entry - including how to convert it to hex. http://php.net/manua...ction.crc32.php Keep in mind, a 10-digit integer can be 'shorter' than a 5 character string, in a storage sense. Hex is generally an inefficient way to store binary data (assuming it's a string) Understood, I'm not really interested in converting it to Hex as I can further shorten this output if it is numeric. So to clarify, what are the odds of Crc32() producing the same "32-bit integer" for two different emails? 1 in 2^32? Edited October 5, 2012 by marcbraulio Quote Link to comment https://forums.phpfreaks.com/topic/269075-phps-crc32-for-hashing/#findComment-1382905 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.