Jump to content

[SOLVED] The evil character that ruins my day "�"


Andarian

Recommended Posts

Hey everyone

 

I have a web crawler that inserts data into mysql

 

I'm coming across some annoying web pages like http://news.yahoo.com/s/space/20080915/sc_space/possiblefirstphotoofplanetaroundsunlikestar that should be iso-8859-1 but are declared to be utf-8

 

Thats just an example on how I face the evil character "�", on explorer it looks like a square, on firefox it looks like a square with FF,FD written in it. Seems like some kind of invalid character or something.

Trying to insert that into mysql and he'll scream...

 

I'm trying to make a function that will strip away that character but with no avail so far.

 

Any help will be DEEPLY appreciated

Cheers.

You have probably tried this, and it probably doesnt work, but I thought I try to help.

Navigate manually to one of those webpages containing the 'evil' character, COPY and PASTE that character into your php document in a str_replace function. Something like:

$my_data = str_replace("�", "", $my_data);

at first I tried it and I thought it doesnt work

 

But actually it does work, its just that there are more then 1 character that looks like  � but is actually a different char to PHP, as long as I make sure to find all of them I'm fine.

 

Its funny, the code looks like this:

$search = array('/�/','/�/','/�/','/루/','/에/','/한/','/알/','/로/','/달/','/라/','/지/','/는/','/나/','/의/','/모/','/습/');

$html = preg_replace($search,'',$html);

 

Notice the first 3 chars look the same but are actually different chars to php..

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.