gladtobegrey Posted March 2, 2010 Share Posted March 2, 2010 All the HTML, PHP and XML files on my website are encoded as 'UTF-8 without BOM' using Notepad++ All the HTML and PHP pages contain '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />' All the XML files contain '<?xml version="1.0" encoding="utf-8"?>' One of the webpages ('offers.php') contains a mix of HTML code plus some PHP code to read an XML file ('offers.xml') and generate a list of special offers with prices. The first element of 'offers.xml' contains <offers> <offer> <title>'Special Lunchtime Menu' Offer[^]Only £4.99</title> <image>board.png</image> <text> [p]Choose any of the following:[/p] [p][b]PIZZA and SALAD[/b][^] - Margherita[^] - Ham and Mushroom[^] - Pepperoni[^] - Vegetarian[/p] [p][i][b]OR[/b][/i][/p] [p][b]PASTA and GARLIC BREAD[/b][^] - Spaghetti Bolognaise[^] - Penne Arrabiata[^] - Spaghetti Carbonara[^] - Risotto[/p] [p][i][b]OR[/b][/i][/p] [p][b]GRILLED CHICKEN SALAD[/b][/p] </text> </offer> The PHP code parses the file and generates HTML output (the characters between square braces are converted to HTML tags by a preg_replace() regex as part of the process - e.g. '[^] becomes '<br />' ... don't get hung up on this, I have my reasons) The relevant chunk of the parser code is here: function tag_contents($parser, $data) { global $source, $current_tag; $patterns = array ("/\[\^\]/u","/\[\~\]/u","/\[/u","/\]/u","/\t/u"); $replaces = array ("<br />"," ","<",">",""); $result = htmlentities($data, ENT_COMPAT, 'UTF-8'); $newres = preg_replace($patterns, $replaces, $result); //echo '$data="'.$data.'"('.strlen($data).'), $result="'.$result.'"('.strlen($result).'"), $newres="'.$newres.'"('.strlen($newres).')'."\n\r"; switch ($current_tag) { case "IMAGE": echo '<div class="offerimg"><img src="images/'.$newres.'" alt="" /></div>'; break; case "TITLE": echo '<div class="offertitle">'.$newres.'</div>'."\n\r"; break; case "TEXT": echo '<div class="offertext">'.$newres.'</div>'."\n\r"; break; } } My problem is that the output HTML always contains a spurious line-break immediately before the '£' character. Previously it was outputting an A-umlaut before the '£', which has gone away since the addition of the htmlentities() code as added, but I cannot work out how to get rid of the unwanted line break. So, where I'd expect to see: 'Special Lunchtime Menu Offer' Only £4.99 ... I'm getting 'Special Lunchtime Menu Offer' Only £4.99 I've done quite a bit of browsing and trying various solutions, but am now beginning to tear my hair out. I'm probably missing something stupidly obvious, but clearly cannot see the problem. I'm testing on XAMPP under WinXP SP3, with PHP 5.3.1 and Apache 2.2.14. Unfortunately I am constrained to running live under PHP 4.4.2. However, I'm seeing the same issue in that environment as in my test environment. Any help would be very gratefully received. Link to comment https://forums.phpfreaks.com/topic/193898-problem-with-php-xml-and-utf-8/ Share on other sites More sharing options...
gladtobegrey Posted March 2, 2010 Author Share Posted March 2, 2010 Sorted :-[ It was the usual XML line-splitting problem. I fixed the code to handle concatenation, and hey presto ... it works fine. Ooops! Link to comment https://forums.phpfreaks.com/topic/193898-problem-with-php-xml-and-utf-8/#findComment-1020676 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.