coreysnyder04 Posted December 8, 2009 Share Posted December 8, 2009 I'm trying to write a screen scraper and when I pull out the lines of the html file that I'm interested in and put them in a variable, I end up with a few extra weird characters in my string that I wasn't expecting. And I'm having a hell of a time stripping those characters out. Its some type of encoding issue I think because the char shows up different in IE versus FF (which it shows up as a black diamond with a question mark in it). Anyone have any ideas ? Quote Link to comment https://forums.phpfreaks.com/topic/184395-help-with-page-encoding-issue-and-weird-characters/ Share on other sites More sharing options...
cags Posted December 8, 2009 Share Posted December 8, 2009 Probably multibyte characters from something like the UTF-8 character set. Many of the string functions in php aren't multibyte compatible. Without more details it will be difficult to actually offer any solutions. Quote Link to comment https://forums.phpfreaks.com/topic/184395-help-with-page-encoding-issue-and-weird-characters/#findComment-973384 Share on other sites More sharing options...
oni-kun Posted December 8, 2009 Share Posted December 8, 2009 You can use trim or rtrim to strip out any NULL characters (Very most likely the characters you're running to) at the end or beginning of your string, such as if there's padding involved. \0 = null if you want to use str_replace. Depends where they are exactly, but as well you should convert the result to unicode with utf_encode so the stream won't mess up.. Quote Link to comment https://forums.phpfreaks.com/topic/184395-help-with-page-encoding-issue-and-weird-characters/#findComment-973385 Share on other sites More sharing options...
coreysnyder04 Posted December 9, 2009 Author Share Posted December 9, 2009 Here check out this page: http://www.centralohiohockey.com/screenScraper.php Look at the 3rd line, after the 6. You'll see a weird char there. If you view the source in IE for that page(listed on line1) it will appear as a " ". In the php source I'm using "html_entity_decode(" on the html lines that I pull from the page. * Notify me of replies. * Return to this topic. * Don't use smileys. Quote Link to comment https://forums.phpfreaks.com/topic/184395-help-with-page-encoding-issue-and-weird-characters/#findComment-974345 Share on other sites More sharing options...
cags Posted December 9, 2009 Share Posted December 9, 2009 Try putting... header("Content-Type: text/html; charset=UTF-8"); ... at the top of the page. Quote Link to comment https://forums.phpfreaks.com/topic/184395-help-with-page-encoding-issue-and-weird-characters/#findComment-974350 Share on other sites More sharing options...
coreysnyder04 Posted December 10, 2009 Author Share Posted December 10, 2009 Fixed. Thanks Guys. Quote Link to comment https://forums.phpfreaks.com/topic/184395-help-with-page-encoding-issue-and-weird-characters/#findComment-974451 Share on other sites More sharing options...
cags Posted December 10, 2009 Share Posted December 10, 2009 Excellent, don't forget to click 'Topic Solved' (bottom left corner of threads you start). Quote Link to comment https://forums.phpfreaks.com/topic/184395-help-with-page-encoding-issue-and-weird-characters/#findComment-974654 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.