Jump to content

Breaking CAPTCHAs


JonnoTheDev

Recommended Posts

As an experiment (honestly). I am looking at how difficult it is to break captchas by creating a recognition tool. There are a few considerations before getting onto character recognition from the image such as obtaining the image from the screen (most captchas use a server side script to generate the image rather than saving an image in a directory which also sets the soultion in a session). I'm thinking that the tasks required for the whole process maybe out of the scope of php (or at least parts).

Process

1. Request url of form

2. Capture screen as graphic

3. Crop image to just captcha graphic

4. Character recognition process on graphic

5. Return result

6. Complete required form input fields

7. Submit

8. Obtain response

 

Thoughts?

Link to comment
Share on other sites

Well, you built the logic that mostly everybody would figure out in 30 seconds. Obviously it's step 4 that is the hardest part. I don't know any image recognition library in PHP (I would be surprised if there were any.

 

As for step 2 and 3, they should be replace with viewing the source code instead:

 

$page = file_get_contents('http://urlofform.com/form.php');

 

... and do some regex to get the image file path and then run the image recognition software of that single image.

 

I cannot help you further because I never had an interest in breaking/spamming/hacking web sites.

Link to comment
Share on other sites

As for step 2 and 3, they should be replace with viewing the source code instead:

 

$page = file_get_contents('http://urlofform.com/form.php');

 

Right and then how would you apply OCR on something like:

 

<img src="qmdhstqst3q3d5hq37g5hq45347tj3q56j4y78sqwrd74jy6+q84ky35s734eyk7s
78ylu4s88l4rs78ul869sr9ur9+§lu/7rs3lu874d73§lu6s346s4ku486sr">

Link to comment
Share on other sites

IMHO, there are good captchas and bad captchas.

 

A bad captcha does many of the following:

- not randomize text in way whatsoever

- put all the text in a straight line

- not randomize the lines that strike through or any other images to obfuscate the text further

 

A good captcha:

- randomizes text size, text font, text position/rotation (mainly vertical position and the fact that it tilts at random angles, think italicizing), text color perhaps

- randomizes the lines, images that obfuscate the textf

 

Now, in any case, it does not require that the text to copy be an actual random string of alphanumeric characters, it can be words if your Captcha is built correctly.

 

Point in case: reCaptcha (google it) <-- I truly believe all websites should just standardize with this... they've got a hella easy way to embed using PHP, AJAX, and their APIs make it easy to work with.

Link to comment
Share on other sites

Well, you built the logic that mostly everybody would figure out in 30 seconds. Obviously it's step 4 that is the hardest part. I don't know any image recognition library in PHP (I would be surprised if there were any.

 

As for step 2 and 3, they should be replace with viewing the source code instead:

 

$page = file_get_contents('http://urlofform.com/form.php');

 

... and do some regex to get the image file path and then run the image recognition software of that single image.

 

I cannot help you further because I never had an interest in breaking/spamming/hacking web sites.

You obviously think im stupid. If it was a simple as using file_get_contents I wouldnt be posting. Captchas require sessions. How can file_get_contents set a session? Curl would be making the requests (thats if I did use php).  Ive not asked for you to build this for me so the process I laid out is obvious to everyone. Don't be so condescending in your reply. If you have no valid contribution to the topic then dont post.

Link to comment
Share on other sites

IMHO, there are good captchas and bad captchas.

 

A bad captcha does many of the following:

- not randomize text in way whatsoever

- put all the text in a straight line

- not randomize the lines that strike through or any other images to obfuscate the text further

 

A good captcha:

- randomizes text size, text font, text position/rotation (mainly vertical position and the fact that it tilts at random angles, think italicizing), text color perhaps

- randomizes the lines, images that obfuscate the textf

 

Now, in any case, it does not require that the text to copy be an actual random string of alphanumeric characters, it can be words if your Captcha is built correctly.

 

Point in case: reCaptcha (google it) <-- I truly believe all websites should just standardize with this... they've got a hella easy way to embed using PHP, AJAX, and their APIs make it easy to work with.

 

His question was not which captcha he should be using or what are good and bad captcha's. He wants to try breaking captcha's thus reverse engineer them go from the image back to the text and wants opinions or suggestions on how to achieve that goal.

Link to comment
Share on other sites

As for step 2 and 3, they should be replace with viewing the source code instead:

 

$page = file_get_contents('http://urlofform.com/form.php');

 

Right and then how would you apply OCR on something like:

 

<img src="qmdhstqst3q3d5hq37g5hq45347tj3q56j4y78sqwrd74jy6+q84ky35s734eyk7s
78ylu4s88l4rs78ul869sr9ur9+§lu/7rs3lu874d73§lu6s346s4ku486sr">

 

 

Same way a browser does.

 

Folder: qmdhstqst3q3d5hq37g5hq45347tj3q56j4y78sqwrd74jy6+q84ky35s734eyk7s

78ylu4s88l4rs78ul869sr9ur9+§lu

 

File: 7rs3lu874d73§lu6s346s4ku486sr

 

It is weird tho for a folder/file name. Even if it doesn't end with jpg or whetever the format it, if it has content-type headers it will show up properly in the browser.

Link to comment
Share on other sites

It is weird tho for a folder/file name. Even if it doesn't end with jpg or whetever the format it, if it has content-type headers it will show up properly in the browser.

 

It is not a folder or file name it's the direct source code of the image placed directly into the src attribute of the image

Link to comment
Share on other sites

It is not a folder or file name it's the direct source code of the image placed directly into the src attribute of the image

Yes, if you were to follow this target you would end up with a different image generated to the one on the initial form and another session value would be set.

That is why I mentioned capturing the screen when the form page is requested in the inpital post process.

Link to comment
Share on other sites

It is not a folder or file name it's the direct source code of the image placed directly into the src attribute of the image

Yes, if you were to follow this target you would end up with a different image generated to the one on the initial form and another session value would be set.

That is why I mentioned capturing the screen when the form page is requested in the inpital post process.

 

Reading the source won't work because you can't apply OCR on source code (something alex didn't think about before posting) thus the only valid option is capturing the screen at a certain url. This can be done using the GD library afterwards I guess you'll need the algorithm explained on NSR. I've read something similar a while ago.. can't remember where though but the idea was similar to that of NSR and provided C source code.

Link to comment
Share on other sites

The screen would have to be captured at the point the page is requested for the session containing the captcha answer to be set and the graphic to be displayed. Multiple requests cannot be made as the captcha graphic will change. Once the captcha image is recognised the results must be submitted into the form with the session still active.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.