I've done alot of looking into this. Wikipedia's website sends UTF-8 text and can embed unicode characters in a page (such as a greek delta symbol). However, when I try to accomplish the same thing in PHP, when I submit a delta symbol Δ using a text area and an html form with method="post", it gets stored in an XML file as Î”. Basically, non-english language characters show up as garbage. The code to store the <textarea>'s content is as follows:
$body = $doc->createElement('body'); $bodytext = $doc->createTextNode(utf8_encode(str_replace(' ', ' ', str_replace("\n", '<br />', str_replace("\r", '<br />', str_replace("\r\n", '<br />', htmlentities(stripslashes($_POST['body'])))))))); $body->appendChild($bodytext); $post->appendChild($body); $doc->documentElement->insertBefore($post, $doc->documentElement->firstChild); $doc->formatOutput = false; $doc->save($fPath);
I'm using the utf8_encode() function because without it there, PHP throws an exception that the submitted character is not a valid XML character. The XML file's encoding is UTF-8, declared as follows:
$doc = new DOMDocument('1.0', 'UTF-8');
Can anyone steer me in the right direction?