jørgenj Posted September 13, 2009 Share Posted September 13, 2009 I am trying to debug a problem with SimplePie (RSS/ATOM feed parser) as used in Joomla! 1.5.14 (latest). Having identical (out of the box) installations on different hosting providers I notice a strange problem with the XML parser (as used in SimplePie). I have absolutely no experience with the XML parser used in PHP by the way. On some installations xml_parse removes '<' and '>' found in cdata (which is not good when sending the news feed description to the browser). On other installations '<' and '>' are translated to '<' and '>' as expected. As far as I can see the installations are identical except for the libXML and PHP version numbers (libXML version 2.6.27 (PHP Version 5.2.10) on installations working OK and libXML version 2.7.3 (PHP Version 5.2.8 ) on installitions having problems). xml_parser_create_ns is used to create the parser (encoding=UTF-8, separator= ' '). OPTION_SKIP_WHITE=1, XML_OPTION_CASE_FOLDING=0. Here is a detailed example. The input to xml_parse is always the same (extract): <description><p><a href="http://www.packtpub.com/nominate-best-open-source-php-cms"> .... On systems that is working OK, the "character data handler" function (as configured by xml_set_character_data_handler) receives the following cdata fragments (in its second parameter "string $data"): (SimplePie_Parser::tag_open tag: description - attributes: a:0:{}) SimplePie_Parser::cdata: '<' SimplePie_Parser::cdata: 'p' SimplePie_Parser::cdata: '>' SimplePie_Parser::cdata: '<' SimplePie_Parser::cdata: 'a href="http://www.packtpub.com/nominate-best-open-source-php-cms"' SimplePie_Parser::cdata: '>' This yields valid HTML: <p><a href="http://www.packtpub.com/nominate-best-open-source-php-cms"> On installations having problems it looks like this: (SimplePie_Parser::tag_open tag: description - attributes: a:0:{}) SimplePie_Parser::cdata: 'p' SimplePie_Parser::cdata: 'a href="http://www.packtpub.com/nominate-best-open-source-php-cms"' As can be seen, fewer calls and the '<' and '>' are just gone! Everything else (Joomla! etc.) works OK by the way... Any idea why this happens? Quote Link to comment Share on other sites More sharing options...
jørgenj Posted September 14, 2009 Author Share Posted September 14, 2009 After some further investigations, this turns out to be a PHP / libxml bug. It affects some installations only: libxml 2.7.x on PHP < 5.2.9 and libxml 2.7.0 to 2.7.2 on any PHP version http://bugs.php.net/bug.php?id=45996 http://bugs.gentoo.org/show_bug.cgi?id=249703 http://blog.code-head.com/fixing-libxml-php-bug-and-issues-with-html-entities-downgrading-libxml http://blog.code-head.com/fixing-libxml-php-bug-and-issues-with-html-entities-libexpat https://glowhost.com/forums/general-support/php5-libxml2-xml-parse-bug-1574.html https://bugzilla.redhat.com/show_bug.cgi?id=467314 Newer versions of SimplePie (version 1.2) has code to get around this bug. Unfortunately however, Joomla! is still using the old SimplePie version 1.0.1. Here is a simple test that can be used to check for this problem (save the following code to a file called "xmltest.php", upload it to the server holding your Joomla! installation and point your browser at it): <?php $parser_check = xml_parser_create(); xml_parse_into_struct($parser_check, '<foo>&</foo>', $values); xml_parser_free($parser_check); $xml_is_sane = isset($values[0]['value']); if (!$xml_is_sane) { echo "XML is broken!"; } else { echo "XML is OK!"; } ?> Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.