Jump to content

Help with page encoding issue and weird characters


coreysnyder04

Recommended Posts

I'm trying to write a screen scraper and when I pull out the lines of the html file that I'm interested in and put them in a variable, I end up with a few extra weird characters in my string that I wasn't expecting. And I'm having a hell of a time stripping those characters out.  Its some type of encoding issue I think because the char shows up different in IE versus FF (which it shows up as a black diamond with a question mark in it).  Anyone have any ideas ?

Link to comment
Share on other sites

Probably multibyte characters from something like the UTF-8 character set. Many of the string functions in php aren't multibyte compatible. Without more details it will be difficult to actually offer any solutions.

Link to comment
Share on other sites

You can use trim or rtrim to strip out any NULL characters (Very most likely the characters you're running to) at the end or beginning of your string, such as if there's padding involved. \0 = null if you want to use str_replace. Depends where they are exactly, but as well you should convert the result to unicode with utf_encode so the stream won't mess up..

Link to comment
Share on other sites

Here check out this page:

http://www.centralohiohockey.com/screenScraper.php

 

Look at the 3rd line, after the 6. You'll see a weird char there. If you view the source in IE for that page(listed on line1) it will appear as a " ". In the php source I'm using "html_entity_decode(" on the html lines that I pull from the page.

 

    * Notify me of replies.

    * Return to this topic.

    * Don't use smileys.

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.