Jump to content

[SOLVED] Spanish characters in RSS cause validation failure


kristen

Recommended Posts

I know this is not strictly a PHP question, but I hope someone can help me nonetheless.

 

I have an RSS feed here http://www.childcareaware.org/feeds/aya_sp.rss.  The content comes from a database, so the .rss file has some PHP in it to make that happen. It sounds a little weird, but it works fine. The problem is that the content coming in has Spanish characters (e.g. é), which cannot be read by xml parsers, and cause the feed to fail.  I think that I need to declare a DTD, but after extensive googling and experimentation, I can't seem to find anything that works.

 

Code is below... (minus the function cca_areyouaware_db_query, for security purposes)

 

<? header('Content-type: text/xml'); ?>
<? echo "<?"; ?>xml version="1.0" encoding="ISO-8859-1"<? echo "?>"; ?>

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">

<channel>
<title>Are You Aware? En Español</title>
<description>A Bi-weekly Feature, Presented By Child Care Aware</description>
<link>http://www.childcareaware.org/sp/subscriptions/areyouaware/</link>
<atom:link href="http://www.childcareaware.org/feeds/aya_sp.rss" rel="self" type="application/rss+xml" />

<?
$query = "SELECT * FROM sp_articles ORDER BY id DESC LIMIT 10"; 
$result = cca_areyouaware_db_query($query);
while ($row = mysql_fetch_array($result)){ 

$title_output = preg_replace('/[^\x20-\x7F]+/', '', $row[title]);
$body_output = preg_replace('/[^\x20-\x7F]+/', '', htmlentities($row[body]));

?>
<item>
<title><?= $title_output; ?></title>
<description><?= $body_output; ?></description>
<link>http://www.childcareaware.org/sp/subscriptions/areyouaware/article.php?id=<?= $row['id']; ?></link>
<guid>http://www.childcareaware.org/sp/subscriptions/areyouaware/article.php?id=<?= $row['id']; ?></guid>
</item>
<? } ?>

</channel>
</rss>

 

Thanks for any help you can give!

Can you be a bit more specific? I changed encoding to UTF-8, added utf8_encode, and it just changes the set of errors - I think because the spanish characters in my code are in the format "É", not É. Thank you for the help, I really appreciate it... this has been on my list of to-dos for over a year, and I just keep getting frustrated and giving up. Hopefully this time I'll get it!

 

New code:

<? header('Content-type: text/xml'); ?>
<? echo "<?"; ?>xml version="1.0" encoding="UTF-8"<? echo "?>"; ?>

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">

<channel>
<title><? echo utf8_encode($feedtitle); ?></title>
<description>A Bi-weekly Feature, Presented By Child Care Aware</description>
<link>http://www.childcareaware.org/sp/subscriptions/areyouaware/</link>
<atom:link href="http://www.childcareaware.org/feeds/aya_sp.rss" rel="self" type="application/rss+xml" />

<?

$query = "SELECT * FROM sp_articles ORDER BY id DESC LIMIT 10"; 
$result = cca_areyouaware_db_query($query);
while ($row = mysql_fetch_array($result)){ 

$title_output = utf8_encode($row[title]);
$body_output = utf8_encode($row[body]);

?>
<item>
<title><?= $title_output; ?></title>
<description><?= $body_output; ?></description>
<link>http://www.childcareaware.org/sp/subscriptions/areyouaware/article.php?id=<?= $row['id']; ?></link>
<guid>http://www.childcareaware.org/sp/subscriptions/areyouaware/article.php?id=<?= $row['id']; ?></guid>
</item>
<? } ?>

</channel>
</rss>

 

Again, validation is here: http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fwww.childcareaware.org%2Ffeeds%2Faya_sp.rss

For anyone interested, I did end up figuring it out (kind of). It works now, I'm just not sure it is the best way to do it. Here is my final code:

 

<? header('Content-type: application/xml'); ?>
<? echo "<?";?>xml version="1.0" encoding="iso-8859-1"<? echo "?>";?>

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">

<channel>
<title>Are You Aware En Español</title>
<description>A Bi-weekly Feature, Presented By Child Care Aware</description>
<link>http://www.childcareaware.org/sp/subscriptions/areyouaware/</link>
<atom:link href="http://www.childcareaware.org/feeds/aya_sp.rss" rel="self" type="application/rss+xml" />

<?

$query = "SELECT * FROM sp_articles ORDER BY id DESC LIMIT 10"; 
$result = cca_areyouaware_db_query($query);
while ($row = mysql_fetch_array($result)){ 

$title_output = $row[title];


$badchars = array("<", ">", "&", "“", "’", "”", "–", "—");
$goodchars   = array("<", ">", "&", "'", "'", "'", "-", "-");
$body_output = str_replace($badchars, $goodchars, $row[body]);

?>
<item>
<title><?= $title_output; ?></title>
<description><?= $body_output; ?></description>
<link>http://www.childcareaware.org/sp/subscriptions/areyouaware/article.php?id=<?= $row['id']; ?></link>
<guid>http://www.childcareaware.org/sp/subscriptions/areyouaware/article.php?id=<?= $row['id']; ?></guid>
</item>

<? } ?>

</channel>
</rss>

 

 

Thanks for the help!

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.