Andarian Posted September 16, 2008 Share Posted September 16, 2008 Hey everyone I have a web crawler that inserts data into mysql I'm coming across some annoying web pages like http://news.yahoo.com/s/space/20080915/sc_space/possiblefirstphotoofplanetaroundsunlikestar that should be iso-8859-1 but are declared to be utf-8 Thats just an example on how I face the evil character "�", on explorer it looks like a square, on firefox it looks like a square with FF,FD written in it. Seems like some kind of invalid character or something. Trying to insert that into mysql and he'll scream... I'm trying to make a function that will strip away that character but with no avail so far. Any help will be DEEPLY appreciated Cheers. Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/ Share on other sites More sharing options...
jamesbrauman Posted September 16, 2008 Share Posted September 16, 2008 You have probably tried this, and it probably doesnt work, but I thought I try to help. Navigate manually to one of those webpages containing the 'evil' character, COPY and PASTE that character into your php document in a str_replace function. Something like: $my_data = str_replace("�", "", $my_data); Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642745 Share on other sites More sharing options...
Andarian Posted September 16, 2008 Author Share Posted September 16, 2008 at first I tried it and I thought it doesnt work But actually it does work, its just that there are more then 1 character that looks like � but is actually a different char to PHP, as long as I make sure to find all of them I'm fine. Its funny, the code looks like this: $search = array('/�/','/�/','/�/','/루/','/에/','/한/','/알/','/로/','/달/','/라/','/지/','/는/','/나/','/의/','/모/','/습/'); $html = preg_replace($search,'',$html); Notice the first 3 chars look the same but are actually different chars to php.. Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642776 Share on other sites More sharing options...
Mchl Posted September 16, 2008 Share Posted September 16, 2008 Perhaps use multibyte safe string functions? Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642779 Share on other sites More sharing options...
Andarian Posted September 16, 2008 Author Share Posted September 16, 2008 now that sounds like the right direction... I've tried those MB functions for different things in the past, but how can they help me here? and which one of them? Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642783 Share on other sites More sharing options...
Mchl Posted September 16, 2008 Share Posted September 16, 2008 mb_convert_encoding() maybe? Or mb_substitute_character() Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642785 Share on other sites More sharing options...
Andarian Posted September 16, 2008 Author Share Posted September 16, 2008 Gonna try that, thanks! Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642790 Share on other sites More sharing options...
effigy Posted September 16, 2008 Share Posted September 16, 2008 What encoding are the MySQL tables? Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-642848 Share on other sites More sharing options...
Andarian Posted September 16, 2008 Author Share Posted September 16, 2008 utf 8 Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-643010 Share on other sites More sharing options...
effigy Posted September 16, 2008 Share Posted September 16, 2008 The data is in UTF-8 and so is MySQL? Are you specifically telling MySQL that you're using UTF-8? Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-643013 Share on other sites More sharing options...
Andarian Posted September 16, 2008 Author Share Posted September 16, 2008 effigy, yea I am But hey, mb_convert_encoding worked perfectly, need to check how much processing time it added to the program though Link to comment https://forums.phpfreaks.com/topic/124461-solved-the-evil-character-that-ruins-my-day-quot%EF%BF%BDquot/#findComment-643064 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.