cparekh Posted February 6, 2008 Share Posted February 6, 2008 Hi, I've created a rss feed for a blog (where the blog sits behind mod_auth) so that the rss feed is available without logging in. Articles/posts are usually added directly from Word bringing with them a number of invalid characters. "Smart-Quotes" has been turned off on MS Word but some other characters still get through. Posted below is the code I've used to pull the feed data directly from the db - hopefully someone can point me in the right direction to create some sort of filter so the feed is valid... I've used the html_entity_decode() and strip_slashes() functions to get some control over the output but it needs more... <?php header('Content-type: text/xml'); //db connection stuff here... ?> <rss version="2.0"> <channel> <title>Title of RSS Feed</title> <description>Description of RSS Feed</description> <link>http://www.link_to_blog</link> <copyright>Copyright 2008.</copyright> <?php $query = "SELECT * FROM posts WHERE post_type='post' AND post_status='publish' ORDER BY post_date_gmt DESC LIMIT 10"; $result=mysql_query($query); $num=mysql_num_rows($result); function limit_text($text, $limit) { $text = strip_tags($text); $words = str_word_count($text, 2); $pos = array_keys($words); if (count($words) > $limit) { $text = substr($text, 0, $pos[$limit]) . ' ...'; } return $text; } $i=0; while ($i < $num) { $post_id=mysql_result($result,$i,"id"); $post_title=mysql_result($result,$i,"post_title"); $post_content=mysql_result($result,$i,"post_content"); $post_title = html_entity_decode($post_title); $post_content= html_entity_decode($post_content); $post_content = strip_tags($post_content); $text = $post_content; $limit = 50; $description = limit_text($text, $limit); echo " <item> <title>$post_title</title> <description>$description</description> <link>http://www.link_to_post</link> <guid isPermaLink=\"true\">http://www.link_to_post</guid> </item> "; $i++; } ?> </channel> </rss> Any help will be greatly appreciated. Thanx in advance. Mik. Link to comment https://forums.phpfreaks.com/topic/89689-solved-rss-feed-rendered-invalid-due-to-characters-from-ms-word-how-to-filter/ Share on other sites More sharing options...
Cep Posted February 6, 2008 Share Posted February 6, 2008 This is a character set issue, Word uses some sort of Microsoft Char set. You want to try and normalise your data going in and coming out, read this it explains it better, http://www.joelonsoftware.com/articles/Unicode.html Link to comment https://forums.phpfreaks.com/topic/89689-solved-rss-feed-rendered-invalid-due-to-characters-from-ms-word-how-to-filter/#findComment-459590 Share on other sites More sharing options...
cparekh Posted February 6, 2008 Author Share Posted February 6, 2008 Cep, very informative article - I'm using utf8-encode() now and it seems to have solved the problem. Thanx very much for your help. Link to comment https://forums.phpfreaks.com/topic/89689-solved-rss-feed-rendered-invalid-due-to-characters-from-ms-word-how-to-filter/#findComment-459844 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.