lindylex Posted December 6, 2007 Share Posted December 6, 2007 I am trying to do a string find and replace from some xml that I get from a server. The character I am trying to find is É if you can not see it’s chr(201) this is what I have tried with no success. Attempt #1: $patern [0]='/É/'; $patern[1] ='/W/'; $replacements[0] ='XXX'; $replacements[1] ='YYY'; echo preg_replace ($patern,$replacements,$finaldata); Attempt #2: strtr($finaldata, chr(201),"VVV"); Attempt #3: strtr($finaldata, ‘É’,"VVV"); Encoding is encoding="UTF-8" Thanks Lex Quote Link to comment Share on other sites More sharing options...
effigy Posted December 6, 2007 Share Posted December 6, 2007 How is the file encoded? Quote Link to comment Share on other sites More sharing options...
lindylex Posted December 6, 2007 Author Share Posted December 6, 2007 effigy it is encoding="UTF-8". thanks. Quote Link to comment Share on other sites More sharing options...
effigy Posted December 6, 2007 Share Posted December 6, 2007 <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> <pre> <?php ### Create string with "LATIN CAPITAL LETTER E WITH ACUTE." echo $str = 'abc' . pack('C*', 0xc3, 0x89) . '123'; echo '<br>'; ### Remove. echo preg_replace('/\xc9/u', '', $str); ?> </pre> Quote Link to comment Share on other sites More sharing options...
lindylex Posted December 6, 2007 Author Share Posted December 6, 2007 Effigy It gives me this string instead of the E abcÉ123 How do I find the equivalent '/\xc9/u' for the letter I want? I have no idea how to use a regular expression for the E with the accent. Lex Quote Link to comment Share on other sites More sharing options...
effigy Posted December 6, 2007 Share Posted December 6, 2007 "É" is the UTF-8 encoding for "É." You can see this by going to this letter database, scrolling down to "Search in Unicode character names," entering "e with acute," and clicking "Submit Query." The "U00C9" that appears below the graphic of the character is its code point, and the UTF-8 is below that. What browser are you using the for the example? I see: abcÉ123 abc123 Quote Link to comment Share on other sites More sharing options...
lindylex Posted December 6, 2007 Author Share Posted December 6, 2007 effigy I am not seeing what you are seeing. I tried this on Linux using Iceweasel and Windows I.E. 7 and Firefox. I go to this website http://www.eki.ee/letter/ And enter the É and I do not see the same output you see. I searched here. “For my own convenience - this input form accepts (hex) utf8 encodings. It does not accept ranges and returns ? if the input is not valid. Enter Unicode number in UTF-8 (e2 82 ac for Euro): “ And I got nothing. Quote Link to comment Share on other sites More sharing options...
lindylex Posted December 6, 2007 Author Share Posted December 6, 2007 effigy, how can I get this value '/\xc9/u' from that website? Thanks, Lex Quote Link to comment Share on other sites More sharing options...
effigy Posted December 6, 2007 Share Posted December 6, 2007 For my own convenience - this input form accepts (hex) utf8 encodings. It does not accept ranges and returns ? if the input is not valid. Enter Unicode number in UTF-8 (e2 82 ac for Euro): That's under "Search by Unicode number." scrolling down to "Search in Unicode character names," Quote Link to comment Share on other sites More sharing options...
lindylex Posted December 7, 2007 Author Share Posted December 7, 2007 The Solution: Effigy, thanks for your help. I came up with a solution using some valuable information you provided. The process I used was go to this site http://www.w3schools.com/tags/ref_urlencode.asp Then search for the character I need. The xc8 in '/\xc8/u' is the hex value for the character I suppose. But the two characters after the x seem to match the last two of the URL-encode. Using this chart I got the information I needed to make the find and replace work. function look_for_extended_characters ($the_string_to_convert){ $transition=preg_replace('/\xc8/u', 'È', $the_string_to_convert); //È È YES $transition=preg_replace('/\xc9/u', 'É', $transition); //É É $transition=preg_replace('/\xc3/u', 'Ã', $transition); //Ã Ã $transition=preg_replace('/\xcf/u', 'Ï', $transition); //Ï Ï $transition=preg_replace('/\xef/u', 'ï', $transition); //ï ï $transition=preg_replace('/\xbf/u', '¿', $transition); //¿ ¿ $transition=preg_replace('/\xbd/u', '½', $transition); //½ ½ $transition=preg_replace('/\xd4/u', 'Ô', $transition); //Ô Ô $transition=preg_replace('/\xd6/u', 'Ö', $transition); //Ö Ö $transition=preg_replace('/\xc1/u', 'Á', $transition); //Á Ö $transition=preg_replace('/\xc4/u', 'Ä', $transition); //Ä Ä $transition=preg_replace('/\xc7/u', 'Ç', $transition); //Ç Ç $transition=preg_replace('/\xcc/u', 'Ì', $transition); //Ì Ì return $transition; } Thanks Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.