JonnoTheDev Posted August 27, 2009 Share Posted August 27, 2009 As an experiment (honestly). I am looking at how difficult it is to break captchas by creating a recognition tool. There are a few considerations before getting onto character recognition from the image such as obtaining the image from the screen (most captchas use a server side script to generate the image rather than saving an image in a directory which also sets the soultion in a session). I'm thinking that the tasks required for the whole process maybe out of the scope of php (or at least parts). Process 1. Request url of form 2. Capture screen as graphic 3. Crop image to just captcha graphic 4. Character recognition process on graphic 5. Return result 6. Complete required form input fields 7. Submit 8. Obtain response Thoughts? Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/ Share on other sites More sharing options...
alexdemers Posted August 27, 2009 Share Posted August 27, 2009 Well, you built the logic that mostly everybody would figure out in 30 seconds. Obviously it's step 4 that is the hardest part. I don't know any image recognition library in PHP (I would be surprised if there were any. As for step 2 and 3, they should be replace with viewing the source code instead: $page = file_get_contents('http://urlofform.com/form.php'); ... and do some regex to get the image file path and then run the image recognition software of that single image. I cannot help you further because I never had an interest in breaking/spamming/hacking web sites. Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907598 Share on other sites More sharing options...
ignace Posted August 27, 2009 Share Posted August 27, 2009 As for step 2 and 3, they should be replace with viewing the source code instead: $page = file_get_contents('http://urlofform.com/form.php'); Right and then how would you apply OCR on something like: <img src="qmdhstqst3q3d5hq37g5hq45347tj3q56j4y78sqwrd74jy6+q84ky35s734eyk7s 78ylu4s88l4rs78ul869sr9ur9+§lu/7rs3lu874d73§lu6s346s4ku486sr"> Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907599 Share on other sites More sharing options...
kratsg Posted August 27, 2009 Share Posted August 27, 2009 IMHO, there are good captchas and bad captchas. A bad captcha does many of the following: - not randomize text in way whatsoever - put all the text in a straight line - not randomize the lines that strike through or any other images to obfuscate the text further A good captcha: - randomizes text size, text font, text position/rotation (mainly vertical position and the fact that it tilts at random angles, think italicizing), text color perhaps - randomizes the lines, images that obfuscate the textf Now, in any case, it does not require that the text to copy be an actual random string of alphanumeric characters, it can be words if your Captcha is built correctly. Point in case: reCaptcha (google it) <-- I truly believe all websites should just standardize with this... they've got a hella easy way to embed using PHP, AJAX, and their APIs make it easy to work with. Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907601 Share on other sites More sharing options...
JonnoTheDev Posted August 27, 2009 Author Share Posted August 27, 2009 Well, you built the logic that mostly everybody would figure out in 30 seconds. Obviously it's step 4 that is the hardest part. I don't know any image recognition library in PHP (I would be surprised if there were any. As for step 2 and 3, they should be replace with viewing the source code instead: $page = file_get_contents('http://urlofform.com/form.php'); ... and do some regex to get the image file path and then run the image recognition software of that single image. I cannot help you further because I never had an interest in breaking/spamming/hacking web sites. You obviously think im stupid. If it was a simple as using file_get_contents I wouldnt be posting. Captchas require sessions. How can file_get_contents set a session? Curl would be making the requests (thats if I did use php). Ive not asked for you to build this for me so the process I laid out is obvious to everyone. Don't be so condescending in your reply. If you have no valid contribution to the topic then dont post. Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907606 Share on other sites More sharing options...
ignace Posted August 27, 2009 Share Posted August 27, 2009 IMHO, there are good captchas and bad captchas. A bad captcha does many of the following: - not randomize text in way whatsoever - put all the text in a straight line - not randomize the lines that strike through or any other images to obfuscate the text further A good captcha: - randomizes text size, text font, text position/rotation (mainly vertical position and the fact that it tilts at random angles, think italicizing), text color perhaps - randomizes the lines, images that obfuscate the textf Now, in any case, it does not require that the text to copy be an actual random string of alphanumeric characters, it can be words if your Captcha is built correctly. Point in case: reCaptcha (google it) <-- I truly believe all websites should just standardize with this... they've got a hella easy way to embed using PHP, AJAX, and their APIs make it easy to work with. His question was not which captcha he should be using or what are good and bad captcha's. He wants to try breaking captcha's thus reverse engineer them go from the image back to the text and wants opinions or suggestions on how to achieve that goal. Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907608 Share on other sites More sharing options...
alexdemers Posted August 27, 2009 Share Posted August 27, 2009 As for step 2 and 3, they should be replace with viewing the source code instead: $page = file_get_contents('http://urlofform.com/form.php'); Right and then how would you apply OCR on something like: <img src="qmdhstqst3q3d5hq37g5hq45347tj3q56j4y78sqwrd74jy6+q84ky35s734eyk7s 78ylu4s88l4rs78ul869sr9ur9+§lu/7rs3lu874d73§lu6s346s4ku486sr"> Same way a browser does. Folder: qmdhstqst3q3d5hq37g5hq45347tj3q56j4y78sqwrd74jy6+q84ky35s734eyk7s 78ylu4s88l4rs78ul869sr9ur9+§lu File: 7rs3lu874d73§lu6s346s4ku486sr It is weird tho for a folder/file name. Even if it doesn't end with jpg or whetever the format it, if it has content-type headers it will show up properly in the browser. Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907609 Share on other sites More sharing options...
ignace Posted August 27, 2009 Share Posted August 27, 2009 If you have no valid contribution to the topic then dont post. Or in other words: Engage brain before opening mouth Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907611 Share on other sites More sharing options...
JonnoTheDev Posted August 27, 2009 Author Share Posted August 27, 2009 For your perusal guys: http://network-security-research.blogspot.com/ Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907612 Share on other sites More sharing options...
ignace Posted August 27, 2009 Share Posted August 27, 2009 It is weird tho for a folder/file name. Even if it doesn't end with jpg or whetever the format it, if it has content-type headers it will show up properly in the browser. It is not a folder or file name it's the direct source code of the image placed directly into the src attribute of the image Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907614 Share on other sites More sharing options...
JonnoTheDev Posted August 27, 2009 Author Share Posted August 27, 2009 It is not a folder or file name it's the direct source code of the image placed directly into the src attribute of the image Yes, if you were to follow this target you would end up with a different image generated to the one on the initial form and another session value would be set. That is why I mentioned capturing the screen when the form page is requested in the inpital post process. Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907616 Share on other sites More sharing options...
JonnoTheDev Posted August 27, 2009 Author Share Posted August 27, 2009 For images http://www.mathworks.com/access/helpdesk/help/toolbox/images/index.html?/access/helpdesk/help/toolbox/images/index.html Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907619 Share on other sites More sharing options...
ignace Posted August 27, 2009 Share Posted August 27, 2009 It is not a folder or file name it's the direct source code of the image placed directly into the src attribute of the image Yes, if you were to follow this target you would end up with a different image generated to the one on the initial form and another session value would be set. That is why I mentioned capturing the screen when the form page is requested in the inpital post process. Reading the source won't work because you can't apply OCR on source code (something alex didn't think about before posting) thus the only valid option is capturing the screen at a certain url. This can be done using the GD library afterwards I guess you'll need the algorithm explained on NSR. I've read something similar a while ago.. can't remember where though but the idea was similar to that of NSR and provided C source code. Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907628 Share on other sites More sharing options...
JonnoTheDev Posted August 27, 2009 Author Share Posted August 27, 2009 The screen would have to be captured at the point the page is requested for the session containing the captcha answer to be set and the graphic to be displayed. Multiple requests cannot be made as the captcha graphic will change. Once the captcha image is recognised the results must be submitted into the form with the session still active. Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907634 Share on other sites More sharing options...
ignace Posted August 27, 2009 Share Posted August 27, 2009 Don't know if you heard of it: http://ajaxian.com/archives/captcha-cracking-in-javascript-with-canvas-and-neural-nets This is what my search on Google reader turned up. Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907638 Share on other sites More sharing options...
JonnoTheDev Posted August 27, 2009 Author Share Posted August 27, 2009 Interesting Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907650 Share on other sites More sharing options...
ignace Posted August 27, 2009 Share Posted August 27, 2009 Can't find the other article, sorry. Quote Link to comment https://forums.phpfreaks.com/topic/172135-breaking-captchas/#findComment-907655 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.