Jump to content

printing retrieved xml, hyphen and quotes turning into special characters


edfialk

Recommended Posts

Hi all, got an unusual problem I'm really hoping I can get some help with.

 

I'm using: PHP 5.2.8 (cli) (built: Dec 13 2008 18:38:00) and Apache

 

After writing a big long script to retrieve an rss and do some things with it, I noticed some special characters appearing that broke my xml display.

 

I made a small php file to test:

$xmlobj = simplexml_load_file("http://pipes.yahoo.com/pipes/pipe.run?_id=d4e3810bb02299384206deb47536ac5c&_render=rss");
print header("Content-type: text/plain");
print $xmlobj->channel->item[0]->description;

 

which, when viewed in Apache, results in:

DES MOINES, Iowa —Adopting California-style vehicle emissions standards might be Iowa’s best...

 

but when viewed in php from command line results in:

DES MOINES, Iowa âAdopting California-style vehicle emissions standards might be Iowaâs best...

 

:o Where did those come from?!

 

if I copy that exact php file to another location (even same directory), then immediately run it from Apache, I get the results with special characters.

 

One script works correctly in apache, but not command line.  Exact same script renamed doesn't work in either. To be honest, I'm pretty amazed I got it to print out correctly the first time, since I can't seem to do it again in a fresh file or my long script.  I'm too scared to modify the script that works correctly because I might break it. 

 

I want the hyphen and single quotes to come through, reliably, every time.  Anyone have any ideas?

 

Thanks for any and all suggestions!

-Ed

Link to comment
Share on other sites

The command line is not interuprting the source encoding and that is why the special characters are being displayed. Can you post the code that grabs and outputs the xml from both of the documents so we can compare and contrast, and maybe we will be able to provide a more in depth reason and possible solutions.

Link to comment
Share on other sites

Well, I edited the one file that occasionally worked, so I have nothing successful to compare it to.  Either way, the code was the same in both places, now they're both officially acting the same way.  Unfortunately, not the way I'd like. 

 

So:

<?php

$file = file_get_contents("http://pipes.yahoo.com/pipes/pipe.run?_id=d4e3810bb02299384206deb47536ac5c&_render=rss");

$xmlobj = simplexml_load_string($file);
print header("Content-type: text/plain");

print $xmlobj->channel->item[0]->description;

?>

 

is hosted at http://pocus.wustl.edu/delicious/simplexml.php

 

and prints out:

(Washington, D.C. – March 10, 2009) The U.S. Environmental Protection Agency

both in command line and in Apache.

 

I need it to print out:

(Washington, D.C. – March 10, 2009) The U.S. Environmental Protection Agency

 

which is description[0] (until something new is posted) directly from the source:

http://pipes.yahoo.com/pipes/pipe.run?_id=d4e3810bb02299384206deb47536ac5c&_render=rss

 

The problem, again, is I'm grabbing items from the feed, inserting into database, reading later with php from the database, printing out xml, reading from javascript, and doing some displaying.  When I read items from the database and print out xml, the xml breaks as soon as a single one of these characters appears, breaking my javascript display.

 

So I need to either allow the xml to not break if it sees a character, which I can't figure out how to do, or stop the characters from showing up.

 

Any other questions I'll try to answer asap.

Thanks for any suggestions!

-Ed

Link to comment
Share on other sites

Hey WolfRage, thanks for the suggestion.  I am actually using htmlentities() when printing the xml.  Here's one example:

 

here's normal:

CLEVELAND — Cliffs Natural Resources announced Tuesday

 

htmlspecialchars():

CLEVELAND — Cliffs Natural Resources announced Tuesday

 

htmlentities():

CLEVELAND — Cliffs Natural Resources announced Tuesday

 

still has that one weird character.  :(

 

edit: here's what it's supposed to be:

CLEVELAND — Cliffs Natural Resources announced Tuesday

 

retrieved through rss from the site: http://www.virginiamn.com/articles/2009/03/12/news/doc49b723040ada7276745703.txt

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.