Jump to content

Help with page encoding issue and weird characters


coreysnyder04

Recommended Posts

I'm trying to write a screen scraper and when I pull out the lines of the html file that I'm interested in and put them in a variable, I end up with a few extra weird characters in my string that I wasn't expecting. And I'm having a hell of a time stripping those characters out.  Its some type of encoding issue I think because the char shows up different in IE versus FF (which it shows up as a black diamond with a question mark in it).  Anyone have any ideas ?

Probably multibyte characters from something like the UTF-8 character set. Many of the string functions in php aren't multibyte compatible. Without more details it will be difficult to actually offer any solutions.

You can use trim or rtrim to strip out any NULL characters (Very most likely the characters you're running to) at the end or beginning of your string, such as if there's padding involved. \0 = null if you want to use str_replace. Depends where they are exactly, but as well you should convert the result to unicode with utf_encode so the stream won't mess up..

Here check out this page:

http://www.centralohiohockey.com/screenScraper.php

 

Look at the 3rd line, after the 6. You'll see a weird char there. If you view the source in IE for that page(listed on line1) it will appear as a " ". In the php source I'm using "html_entity_decode(" on the html lines that I pull from the page.

 

    * Notify me of replies.

    * Return to this topic.

    * Don't use smileys.

 

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.