Jump to content

RSS and uncommon characters


kennumen

Recommended Posts

I did a search first, but it seems either nobody's had this problem, and/or they had a similar problem but fixed it doing something I'm already doing. Of course, with my luck, there's a good chance I simply glanced over the solution :-\

 

This is how I store the titles used in RSS (I also store other text this way, but none of it is used in the RSS):

function htmlprocess($s){
if(get_magic_quotes_gpc()) $s = stripslashes($s);
return (htmlentities(trim($s),ENT_QUOTES,'UTF-8'));
}

 

Here's the RSS code:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel>
  <title>GASP Comic.com</title>
  <link>http://www.gaspcomic.com/</link>
  <atom:link href="http://www.gaspcomic.com/rss.php" rel="self" type="application/rss+xml" />
  <category>Webcomic</category>
  <copyright>Copyright 2009 GASPcomic.com, Gamesphere.com, Kirsten Vandaele.</copyright>
  <language>en</language>
  <generator>GSC CMS</generator>
  <description>RSS page of GASPcomic.com (listing the last 5 comics published).</description>
  <item>
    <title>The first grind</title>
    <link>http://www.gaspcomic.com/comic.php?id=60</link>
    <guid>http://www.gaspcomic.com/comic.php?id=60</guid>
    <description>GASPcomic.com 60, published 19 hours 50 minutes ago.</description>
  </item>
  <item>
    <title>Everyone loves a good grindfest</title>
    <link>http://www.gaspcomic.com/comic.php?id=59</link>
    <guid>http://www.gaspcomic.com/comic.php?id=59</guid>
    <description>GASPcomic.com 59, published 2 days 20 hours 38 minutes ago.</description>
  </item>
  <item>
    <title>I&#039;m listening</title>
    <link>http://www.gaspcomic.com/comic.php?id=58</link>
    <guid>http://www.gaspcomic.com/comic.php?id=58</guid>
    <description>GASPcomic.com 58, published 7 days 23 hours 48 minutes ago.</description>
  </item>
</channel></rss>

 

That part actually works just fine. A while back though I had a comic titled "Ninja pinata", with a squiggly spanish N. htmlentities turned this into "piñata". Apparently, ñ is an undefined item. Go figure. As you can see above, (some) numeric entities like &#039; are no problem whatsoever. I've looked at the PHP manual though, and found no function like htmlentities, but converting into purely numerical entities.

A different topic here prompted me to change the first line's encoding from ISO-8859-1 to UTF-8, with no effect.

 

The MySQL code is straightforward, but it might (i don't know) interest you that i use the utf8_unicode_ci collation to store the data. Do note that i htmlentities my text before putting it into the database, so i'm storing 8 characters "ñ", not one character squiggly "n".

 

I have no way to predict what I (or others, as I might end up opening up this CMS) might input in the future, so I'd like a "catchall" solution.

 

Thanks,

Kirsten

Link to comment
https://forums.phpfreaks.com/topic/160280-rss-and-uncommon-characters/
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.