jRiest Posted January 19, 2010 Share Posted January 19, 2010 Hello, I am relatively new to php, and while I have found my experience thus far to be enjoyable, I seem to have hit an wall and I need help. I am making a small web application for a personal site, and in it I am trying to parse through a website to extract some data that will be stored into a mysql database. To do this, I am using this basic setup: $site = "http://www.usatoday.com/sports/gaming/sheridan.htm"; $content = file_get_contents($site); $content = str_replace("½",".5", $content, $count); echo 'Replacements: ' . $count . '<br />'; $doc = DOMDocument::loadHTML($content); However, the website is using the ½ (the 1/2) character. I am trying to replace all instances of that character with ".5" so that I can store it as a decimal in the database. However, str_replace() doesn't seem to be working. I'm pretty sure it has to do with encoding because when I print out the textContent of the DOMNode that contains that character, it prints out as ½. However, if I change my browser text encoding to UTF-8, it prints out okay. So, any suggestions on how I can replace all instances of the ½ characet with .5? Thanks in advance! Quote Link to comment https://forums.phpfreaks.com/topic/188993-special-characters-while-parsing-html/ Share on other sites More sharing options...
oni-kun Posted January 19, 2010 Share Posted January 19, 2010 Try using utf8_encode with that function. $site = "http://www.usatoday.com/sports/gaming/sheridan.htm"; $content = utf8_encode(file_get_contents($site)); $replacement_char = utf8_encode('½'); $content = str_replace($replacement_char, ".5", $content, $count); echo 'Replacements: ' . $count . '<br />'; $doc = DOMDocument::loadHTML($content); This works fine for me, and removed the 'Â.5' .. problem. Quote Link to comment https://forums.phpfreaks.com/topic/188993-special-characters-while-parsing-html/#findComment-997856 Share on other sites More sharing options...
jRiest Posted January 19, 2010 Author Share Posted January 19, 2010 That didn't seem to work for me. Any idea what I might be doing wrong? When I copied that code, it now prints out "½" for the "½" character. Quote Link to comment https://forums.phpfreaks.com/topic/188993-special-characters-while-parsing-html/#findComment-998108 Share on other sites More sharing options...
crabfinger Posted January 19, 2010 Share Posted January 19, 2010 obviously he should have said http://php.net/manual/en/function.utf8-decode.php Quote Link to comment https://forums.phpfreaks.com/topic/188993-special-characters-while-parsing-html/#findComment-998126 Share on other sites More sharing options...
jRiest Posted January 19, 2010 Author Share Posted January 19, 2010 That didn't seem to work for me either. To check whether it was working, I am using this $site = "http://www.usatoday.com/sports/gaming/sheridan.htm"; $content = utf8_decode(file_get_contents($site)); $replacement_char = utf8_encode('½'); $content = str_replace($replacement_char, ".5", $content, $count); echo 'Replacements: ' . $count . '<br />'; I also tried this and it doesn't work either: $site = "http://www.usatoday.com/sports/gaming/sheridan.htm"; $content = utf8_decode(file_get_contents($site)); $content = str_replace('½', ".5", $content, $count); echo 'Replacements: ' . $count . '<br />'; It never makes any replacements Quote Link to comment https://forums.phpfreaks.com/topic/188993-special-characters-while-parsing-html/#findComment-998219 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.