Jump to content

Recommended Posts

Hey everyone

 

I have a web crawler that inserts data into mysql

 

I'm coming across some annoying web pages like http://news.yahoo.com/s/space/20080915/sc_space/possiblefirstphotoofplanetaroundsunlikestar that should be iso-8859-1 but are declared to be utf-8

 

Thats just an example on how I face the evil character "�", on explorer it looks like a square, on firefox it looks like a square with FF,FD written in it. Seems like some kind of invalid character or something.

Trying to insert that into mysql and he'll scream...

 

I'm trying to make a function that will strip away that character but with no avail so far.

 

Any help will be DEEPLY appreciated

Cheers.

You have probably tried this, and it probably doesnt work, but I thought I try to help.

Navigate manually to one of those webpages containing the 'evil' character, COPY and PASTE that character into your php document in a str_replace function. Something like:

$my_data = str_replace("�", "", $my_data);

at first I tried it and I thought it doesnt work

 

But actually it does work, its just that there are more then 1 character that looks like  � but is actually a different char to PHP, as long as I make sure to find all of them I'm fine.

 

Its funny, the code looks like this:

$search = array('/�/','/�/','/�/','/루/','/에/','/한/','/알/','/로/','/달/','/라/','/지/','/는/','/나/','/의/','/모/','/습/');

$html = preg_replace($search,'',$html);

 

Notice the first 3 chars look the same but are actually different chars to php..

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.