Mike521 Posted August 18, 2008 Share Posted August 18, 2008 I am in character encoding hell, I hope someone can get me out! I have a web form encoded in ISO-8859-1. It posts to another ISO-8859-1 page. That page takes the post data and sends it to a script that runs in the background. The script's job is to convert the post data into xml, and then post it to yet another script that will process it. The problem I run into is when there are spanish characters on the input. It seems no matter how I try to encode them, the final receiving script always either ignores all the incoming data, or ignores the fields with spanish characters. It seems to me that the problem is happening in the last post. For example here is what my xml data might look like right before I send it: <?xml version="1.0" encoding="utf-8"?> <data> <spanishStuff>here+are+some+span+chars+%26Ntilde%3B+%26ntilde%3B%26euml%3B%26oacute%3B</spanishStuff> </data> The very first thing I do on the final script is email the post data to myself. Well here's what it looks like: <?xml version="1.0" encoding="utf-8"?> <data> <spanishStuff>here are some span chars Ñ ñëó</spanishStuff> </data> See how the %26's have been replaced with &? Well then when I do a simplexml_load_string, it gives me warnings such as "parser error : Entity 'Ntilde' not defined". After that, all the input is either ignored, or the fields with spanish chars are ignored, depending on which variation of encoding I've tried this time around. I don't know what to do at this point, I've spent a lot of time trying TONS of ways to encode the data, either before I send it or after I receive it, and nothing seems to help. For what it's worth, one of the first things I do is utf8_encode the incoming post data since the web form is in ISO Here is a step-by-step of the process if you want further clarification: 1. user enters data on ISO-8859-1 page 2. data is posted to a receiving ISO-8859-1 page 3. receiving page spawns a background process (using http_build_query on the post data, and fsockopen / fwrite to send it) -- background process ignores user disconnect 4. background process takes the post data and forms it into XML. -- as it does so, it encodes the data in UTF8, htmlentities, and urlencode 5. background process uses cURL to post the xml string to the final, receiving script 6. receiving script grabs the data and does whatever it needs to do the background process technically can be skipped, but we don't want the user waiting around while all this other stuff happens, so I simply tell them thank you and let the system do the rest. hope someone can help, thanks Quote Link to comment https://forums.phpfreaks.com/topic/120259-xml-and-character-encoding-hell/ Share on other sites More sharing options...
DeanWhitehouse Posted August 18, 2008 Share Posted August 18, 2008 Try making all the pages UTF-8 , if not try re creating the files with UTF-8 if your using dreamweaver or created them using dreamweaver. Quote Link to comment https://forums.phpfreaks.com/topic/120259-xml-and-character-encoding-hell/#findComment-619528 Share on other sites More sharing options...
Mike521 Posted August 19, 2008 Author Share Posted August 19, 2008 It seems like the problem is the entity references though. %26Ntilde%3B becomes Ñ when it's received. Then simplexml gives the error "Entity: line 23: parser error : Entity 'Ntilde' not defined" Is there a way to tell simplexml to expect those types of entities, perhaps? Quote Link to comment https://forums.phpfreaks.com/topic/120259-xml-and-character-encoding-hell/#findComment-620098 Share on other sites More sharing options...
Mike521 Posted August 19, 2008 Author Share Posted August 19, 2008 I think I finally figured it out. instead of converting miscellaneous characters to entities, I figured what the hell, if I can get them to utf8 then why should I encode them? so I just utf8 encode the incoming data, replace only the worst characters ( & < > ) with their entities, urlencode, and send. seems to work fine so far. Quote Link to comment https://forums.phpfreaks.com/topic/120259-xml-and-character-encoding-hell/#findComment-620115 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.