Andarian Posted September 16, 2008 Share Posted September 16, 2008 Hey everyone I have a web crawler that inserts data into mysql I'm coming across some annoying web pages like http://news.yahoo.com/s/space/20080915/sc_space/possiblefirstphotoofplanetaroundsunlikestar that should be iso-8859-1 but are declared to be utf-8 Thats just an example on how I face the evil character "�", on explorer it looks like a square, on firefox it looks like a square with FF,FD written in it. Seems like some kind of invalid character or something. Trying to insert that into mysql and he'll scream... I'm trying to make a function that will strip away that character but with no avail so far. Any help will be DEEPLY appreciated Cheers. Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/ Share on other sites More sharing options...
jamesbrauman Posted September 16, 2008 Share Posted September 16, 2008 You have probably tried this, and it probably doesnt work, but I thought I try to help. Navigate manually to one of those webpages containing the 'evil' character, COPY and PASTE that character into your php document in a str_replace function. Something like: $my_data = str_replace("�", "", $my_data); Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642745 Share on other sites More sharing options...
Andarian Posted September 16, 2008 Author Share Posted September 16, 2008 at first I tried it and I thought it doesnt work But actually it does work, its just that there are more then 1 character that looks like � but is actually a different char to PHP, as long as I make sure to find all of them I'm fine. Its funny, the code looks like this: $search = array('/�/','/�/','/�/','/루/','/에/','/한/','/알/','/로/','/달/','/라/','/지/','/는/','/나/','/의/','/모/','/습/'); $html = preg_replace($search,'',$html); Notice the first 3 chars look the same but are actually different chars to php.. Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642776 Share on other sites More sharing options...
Mchl Posted September 16, 2008 Share Posted September 16, 2008 Perhaps use multibyte safe string functions? Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642779 Share on other sites More sharing options...
Andarian Posted September 16, 2008 Author Share Posted September 16, 2008 now that sounds like the right direction... I've tried those MB functions for different things in the past, but how can they help me here? and which one of them? Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642783 Share on other sites More sharing options...
Mchl Posted September 16, 2008 Share Posted September 16, 2008 mb_convert_encoding() maybe? Or mb_substitute_character() Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642785 Share on other sites More sharing options...
Andarian Posted September 16, 2008 Author Share Posted September 16, 2008 Gonna try that, thanks! Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642790 Share on other sites More sharing options...
effigy Posted September 16, 2008 Share Posted September 16, 2008 What encoding are the MySQL tables? Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642848 Share on other sites More sharing options...
Andarian Posted September 16, 2008 Author Share Posted September 16, 2008 utf 8 Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-643010 Share on other sites More sharing options...
effigy Posted September 16, 2008 Share Posted September 16, 2008 The data is in UTF-8 and so is MySQL? Are you specifically telling MySQL that you're using UTF-8? Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-643013 Share on other sites More sharing options...
Andarian Posted September 16, 2008 Author Share Posted September 16, 2008 effigy, yea I am But hey, mb_convert_encoding worked perfectly, need to check how much processing time it added to the program though Quote Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-643064 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.