Jump to content

Reading text from images...


tissue

Recommended Posts

Hello PHP freaks,

 

I am looking for a script that can read information out of an image, like the following:

http://www.wietze.net/images/entropia/GLOBAL25.JPG

 

The information I need out of these images/screenshots is the writing in Grey. As you can see the font is always the same (but may differ in different screen resolutions) and so is the color (also one other shade of grey is available). I just need the text out of the images, so my scripts can process it further. I need the information out of those screenshots to fill up a database (as you can see here)

And instead of users having to do everything manually, i would like to have it made more automatic.

 

This should be possible because what else reason is there for sites to use those scrambled images to prevent message spamming :D

Anyone knows an excisting script which can do this kind of magic? I tried googling, but its all text to image instead of the other way around. ;D

 

grtz Tissue

 

Link to comment
Share on other sites

This seems like an epic epic task.

 

Im not even sure if it is possible to pull text out of an image in that way.

Perhaps you should have a look underneath the bonnet of the program/game to see how it is posting the text, you may be able to grab it staright from the code as it is generated.

Link to comment
Share on other sites

Getting that info out of the game is impossible. It is a MMORPG with a lot of real cash involved, so it is quite locked down.

 

I too think it is an epic task (or you might say: a challenge).

That is why I was hoping someone already has done it, so I can use that :D

 

grtz Tissue

 

Link to comment
Share on other sites

well, it's an image... a bunch of pixels that define forms and color and stuff. It's not a string, nor an array... So you can't read it with php. The only computerprograms who can "read" (meaning look at the image and interpret the text, meaning transforming it into a string) images are OCR thingies...

 

I'm afraid your users will have to keep adding things manually.

Link to comment
Share on other sites

It could be done, since you can use the GD library to find the color of each individual pixel: imagecolorat

 

It woudln't be pretty, but i would imagine that you could, at least theoretically, run through all the pixels looking for the that grey colour, and try to find patterns which you could associate with letters.

 

It would indeed take a fair bit of effort, and wether or not the results would be entirely satisfactory, im not sure. The script would probably also take a little while to run.

 

In your favour is the fact that the font is always the same. If you could identify the font, you could create some images with the same font and colour to analyse before you start work.

Link to comment
Share on other sites

well, it's an image... a bunch of pixels that define forms and color and stuff. It's not a string, nor an array... So you can't read it with php. The only computerprograms who can "read" (meaning look at the image and interpret the text, meaning transforming it into a string) images are OCR thingies...

 

I'm afraid your users will have to keep adding things manually.

 

how about usage of the image functions in php. php can read pixel colors right?

So maybe you can let the script search a certain area of the picture for grey (font) pixels. Then compare those pixel patterns to known character patterns, and return the most likely match (most matching pixels). All letters are surrounded by space so getting area of each character must be possible too.

once again, a giant task to write such a script, but i still think it is in fact possible (but i am not going to proof it to you, cause my php is not that 1337):D

 

grtz Tissue

 

EDIT: was typing this reply at same time as GingerRobot was replying. We think the same :D

Link to comment
Share on other sites

As I said before it may be easier to look 'under the bonnet' of the game, to see if you can intersept the incoming text.

 

This is definetly how I would go about it. Trying to get php to interpret the image would be a nightmare, it would take you months to get it bang on. The text is coming in from somewhere and gets echo'd (or a similar function) onto the screen, therefore your time is better spent searching for that point in the code!

 

Good luck! Your gonna need it! ;D

Link to comment
Share on other sites

An image analyser could do that but the algorithms for such aren't readily available and those that have them aren't likely to give them away free of charge.

 

I know there is an experimental package called PECL with imaging technique called Imagick (I think) but it's not well documented. There are various methods within the Imagick (Image Magic) that can analyse images, I'm sure it can see the text on it.

 

Sam

Link to comment
Share on other sites

  • 2 weeks later...

I spoke with someone who has a screenshot analysing software made, that extracts text out of the screenshots made.

Hey told me, that the fonts in this game always give a perfect match when he analyses a screenshot.

The font is also single coloured. So what he does is transform all the pixels with the text color in black (or 1) and the rest will be white (0). Then he reads the grid of pixels (or 1 and 0) till he finds text, and analyses it!

 

So I think it is possible to make. Maybe even for me, but not that fast :D

But it will consume so much time for me, that I rather spend it on other features for my website, or for doing a lot of other more important RL stuff.

 

grtz Tissue

Link to comment
Share on other sites

As GingerRobot has already said, you'd have to write quite a hefty script to scan the entire image looking for patterns where the font appears and building up a comprehensive database for each picture.

 

When I was helping to moderate an MMORPG which was purely browser based there were captcha scripts implemented to prevent players refreshing a page at certain intervals to cheat. A script to read those images was available and with a small image that was only a few pixels wide and high it wouldn't take that long. What you're looking at doing is reading an entire screenshot which going by people's computer setups, would be varying snapshots. I play most games on 1280x1024 (which is very large) and there are players out there who use larger screens.

 

If you've got loads of players submitting screens themselves you'll be putting such a workload on the server that you'll more than likely result in the scripts timing out while they read the images.

 

If I had to attempt this I'd get a stand-alone EXE file the users could install to read these images and generate a data file which the users could then upload and PHP would parse that quickly instead. The data files could contain some sort of checksum to prevent the users modifying the information themselves but how long would it take for someone to crack that?

 

EDIT: Just noticed that screenshot is saved out as a JPEG - did you copy and paste that into a paint program of sorts and save it out manually or can the program save it out? The problem with JPEG is that it's a lossy format and when saving there are 12 levels of quality. If you've got a low quality image being loaded up by the scanner you'll get loads of hatching patterns around the text and even the text might not be the exact grey that is needed - the scanner would probably fall over at this point and fail.

Link to comment
Share on other sites

EDIT: Just noticed that screenshot is saved out as a JPEG - did you copy and paste that into a paint program of sorts and save it out manually or can the program save it out? The problem with JPEG is that it's a lossy format and when saving there are 12 levels of quality. If you've got a low quality image being loaded up by the scanner you'll get loads of hatching patterns around the text and even the text might not be the exact grey that is needed - the scanner would probably fall over at this point and fail.

Its just converted to jpeg. You can as well get screenshots as bmp or png.

 

btw. the one i was referring to uses a stand alone application that captures screens and then reads content. So you might be right. Allthough i think patterns are not that difficult, because its a fixed font.

 

tnx all for thinking with me!!!

 

grtz Tissue

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.