LordLanky Posted August 25, 2010 Share Posted August 25, 2010 Hi All, I am trying to convert HTMLentities style encoding, to UTF decimal encoding. I.e. int the text: U+001D 1d <control><br /> U+007D } 7d RIGHT CURLY BRACKET<br /> U+00A2 ¢ c2 a2 CENT SIGN<br /> I want the ¢ to be its UTF decimal equivalent. that make sense?? Basically, i'm parsing text for an epub and it wont accept special characters in the html entities form - only the decimal form. Rubbish eh? The entire test text i am using is this (there is some HTML in there you see that i DO NOT want converting - basically only things between the &.. THANKS! <html xmlns="http://www.w3.org/1999/xhtml" xmlns:xlink="http://www.w3.org/1999/xlink"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title>Testing images</title> <link href="stylesheet.css" charset="UTF-8" type="text/css" rel="stylesheet" /> </head> <body><h2>Character Testing</h2><p align='justify'>U+0000 00 <control><br /> U+0001 01 <control><br /> U+0002 02 <control><br /> U+0003 03 <control><br /> U+0004 04 <control><br /> U+0005 05 <control><br /> U+0006 06 <control><br /> U+0007 07 <control><br /> U+0008 08 <control><br /> U+0009 09 <control><br /> U+000A 0a <control><br /> U+000B 0b <control><br /> U+000C 0c <control><br /> U+000D 0d <control><br /> U+000E 0e <control><br /> U+000F 0f <control><br /> U+0010 10 <control><br /> U+0011 11 <control><br /> U+0012 12 <control><br /> U+0013 13 <control><br /> U+0014 14 <control><br /> U+0015 15 <control><br /> U+0016 16 <control><br /> U+0017 17 <control><br /> U+0018 18 <control><br /> U+0019 19 <control><br /> U+001A 1a <control><br /> U+001B 1b <control><br /> U+001C 1c <control><br /> U+001D 1d <control><br /> U+001E 1e <control><br /> U+001F 1f <control><br /> U+0020 20 SPACE<br /> U+0021 ! 21 EXCLAMATION MARK<br /> U+0022 " 22 QUOTATION MARK<br /> U+0023 # 23 NUMBER SIGN<br /> U+0024 $ 24 DOLLAR SIGN<br /> U+0025 % 25 PERCENT SIGN<br /> U+0026 & 26 AMPERSAND<br /> U+0027 ' 27 APOSTROPHE<br /> U+0028 ( 28 LEFT PARENTHESIS<br /> U+0029 ) 29 RIGHT PARENTHESIS<br /> U+002A * 2a ASTERISK<br /> U+002B + 2b PLUS SIGN<br /> U+002C , 2c COMMA<br /> U+002D - 2d HYPHEN-MINUS<br /> U+002E . 2e FULL STOP<br /> U+002F / 2f SOLIDUS<br /> U+0030 0 30 DIGIT ZERO<br /> U+0031 1 31 DIGIT ONE<br /> U+0032 2 32 DIGIT TWO<br /> U+0033 3 33 DIGIT THREE<br /> U+0034 4 34 DIGIT FOUR<br /> U+0035 5 35 DIGIT FIVE<br /> U+0036 6 36 DIGIT SIX<br /> U+0037 7 37 DIGIT SEVEN<br /> U+0038 8 38 DIGIT EIGHT<br /> U+0039 9 39 DIGIT NINE<br /> U+003A : 3a COLON<br /> U+003B ; 3b SEMICOLON<br /> U+003C < 3c LESS-THAN SIGN<br /> U+003D = 3d EQUALS SIGN<br /> U+003E > 3e GREATER-THAN SIGN<br /> U+003F ? 3f QUESTION MARK<br /> U+0040 @ 40 COMMERCIAL AT<br /> U+0041 A 41 LATIN CAPITAL LETTER A<br /> U+0042 B 42 LATIN CAPITAL LETTER B<br /> U+0043 C 43 LATIN CAPITAL LETTER C<br /> U+0044 D 44 LATIN CAPITAL LETTER D<br /> U+0045 E 45 LATIN CAPITAL LETTER E<br /> U+0046 F 46 LATIN CAPITAL LETTER F<br /> U+0047 G 47 LATIN CAPITAL LETTER G<br /> U+0048 H 48 LATIN CAPITAL LETTER H<br /> U+0049 I 49 LATIN CAPITAL LETTER I<br /> U+004A J 4a LATIN CAPITAL LETTER J<br /> U+004B K 4b LATIN CAPITAL LETTER K<br /> U+004C L 4c LATIN CAPITAL LETTER L<br /> U+004D M 4d LATIN CAPITAL LETTER M<br /> U+004E N 4e LATIN CAPITAL LETTER N<br /> U+004F O 4f LATIN CAPITAL LETTER O<br /> U+0050 P 50 LATIN CAPITAL LETTER P<br /> U+0051 Q 51 LATIN CAPITAL LETTER Q<br /> U+0052 R 52 LATIN CAPITAL LETTER R<br /> U+0053 S 53 LATIN CAPITAL LETTER S<br /> U+0054 T 54 LATIN CAPITAL LETTER T<br /> U+0055 U 55 LATIN CAPITAL LETTER U<br /> U+0056 V 56 LATIN CAPITAL LETTER V<br /> U+0057 W 57 LATIN CAPITAL LETTER W<br /> U+0058 X 58 LATIN CAPITAL LETTER X<br /> U+0059 Y 59 LATIN CAPITAL LETTER Y<br /> U+005A Z 5a LATIN CAPITAL LETTER Z<br /> U+005B [ 5b LEFT SQUARE BRACKET<br /> U+005C \ 5c REVERSE SOLIDUS<br /> U+005D ] 5d RIGHT SQUARE BRACKET<br /> U+005E ^ 5e CIRCUMFLEX ACCENT<br /> U+005F _ 5f LOW LINE<br /> U+0060 ` 60 GRAVE ACCENT<br /> U+0061 a 61 LATIN SMALL LETTER A<br /> U+0062 b 62 LATIN SMALL LETTER B<br /> U+0063 c 63 LATIN SMALL LETTER C<br /> U+0064 d 64 LATIN SMALL LETTER D<br /> U+0065 e 65 LATIN SMALL LETTER E<br /> U+0066 f 66 LATIN SMALL LETTER F<br /> U+0067 g 67 LATIN SMALL LETTER G<br /> U+0068 h 68 LATIN SMALL LETTER H<br /> U+0069 i 69 LATIN SMALL LETTER I<br /> U+006A j 6a LATIN SMALL LETTER J<br /> U+006B k 6b LATIN SMALL LETTER K<br /> U+006C l 6c LATIN SMALL LETTER L<br /> U+006D m 6d LATIN SMALL LETTER M<br /> U+006E n 6e LATIN SMALL LETTER N<br /> U+006F o 6f LATIN SMALL LETTER O<br /> U+0070 p 70 LATIN SMALL LETTER P<br /> U+0071 q 71 LATIN SMALL LETTER Q<br /> U+0072 r 72 LATIN SMALL LETTER R<br /> U+0073 s 73 LATIN SMALL LETTER S<br /> U+0074 t 74 LATIN SMALL LETTER T<br /> U+0075 u 75 LATIN SMALL LETTER U<br /> U+0076 v 76 LATIN SMALL LETTER V<br /> U+0077 w 77 LATIN SMALL LETTER W<br /> U+0078 x 78 LATIN SMALL LETTER X<br /> U+0079 y 79 LATIN SMALL LETTER Y<br /> U+007A z 7a LATIN SMALL LETTER Z<br /> U+007B { 7b LEFT CURLY BRACKET<br /> U+007C | 7c VERTICAL LINE<br /> U+007D } 7d RIGHT CURLY BRACKET<br /> U+007E ~ 7e TILDE<br /> U+007F 7f <control><br /> U+0080 c2 80 <control><br /> U+0081 c2 81 <control><br /> U+0082 c2 82 <control><br /> U+0083 c2 83 <control><br /> U+0084 c2 84 <control><br /> U+0085 c2 85 <control><br /> U+0086 c2 86 <control><br /> U+0087 c2 87 <control><br /> U+0088 c2 88 <control><br /> U+0089 c2 89 <control><br /> U+008A c2 8a <control><br /> U+008B c2 8b <control><br /> U+008C c2 8c <control><br /> U+008D c2 8d <control><br /> U+008E c2 8e <control><br /> U+008F c2 8f <control><br /> U+0090 c2 90 <control><br /> U+0091 c2 91 <control><br /> U+0092 c2 92 <control><br /> U+0093 c2 93 <control><br /> U+0094 c2 94 <control><br /> U+0095 c2 95 <control><br /> U+0096 c2 96 <control><br /> U+0097 c2 97 <control><br /> U+0098 c2 98 <control><br /> U+0099 c2 99 <control><br /> U+009A c2 9a <control><br /> U+009B c2 9b <control><br /> U+009C c2 9c <control><br /> U+009D c2 9d <control><br /> U+009E c2 9e <control><br /> U+009F c2 9f <control><br /> U+00A0 c2 a0 NO-BREAK SPACE<br /> U+00A1 ¡ c2 a1 INVERTED EXCLAMATION MARK<br /> U+00A2 ¢ c2 a2 CENT SIGN<br /> U+00A3 £ c2 a3 POUND SIGN<br /> U+00A4 ¤ c2 a4 CURRENCY SIGN<br /> U+00A5 ¥ c2 a5 YEN SIGN<br /> U+00A6 ¦ c2 a6 BROKEN BAR<br /> U+00A7 § c2 a7 SECTION SIGN<br /> U+00A8 ¨ c2 a8 DIAERESIS<br /> U+00A9 © c2 a9 COPYRIGHT SIGN<br /> U+00AA ª c2 aa FEMININE ORDINAL INDICATOR<br /> U+00AB « c2 ab LEFT-POINTING DOUBLE ANGLE QUOTATION MARK<br /> U+00AC ¬ c2 ac NOT SIGN<br /> U+00AD c2 ad SOFT HYPHEN<br /> U+00AE ® c2 ae REGISTERED SIGN<br /> U+00AF ¯ c2 af MACRON<br /> U+00B0 ° c2 b0 DEGREE SIGN<br /> U+00B1 ± c2 b1 PLUS-MINUS SIGN<br /> U+00B2 ² c2 b2 SUPERSCRIPT TWO<br /> U+00B3 ³ c2 b3 SUPERSCRIPT THREE<br /> U+00B4 ´ c2 b4 ACUTE ACCENT<br /> U+00B5 µ c2 b5 MICRO SIGN<br /> U+00B6 ¶ c2 b6 PILCROW SIGN<br /> U+00B7 · c2 b7 MIDDLE DOT<br /> U+00B8 ¸ c2 b8 CEDILLA<br /> U+00B9 ¹ c2 b9 SUPERSCRIPT ONE<br /> U+00BA º c2 ba MASCULINE ORDINAL INDICATOR<br /> U+00BB » c2 bb RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK<br /> U+00BC ¼ c2 bc VULGAR FRACTION ONE QUARTER<br /> U+00BD ½ c2 bd VULGAR FRACTION ONE HALF<br /> U+00BE ¾ c2 be VULGAR FRACTION THREE QUARTERS<br /> U+00BF ¿ c2 bf INVERTED QUESTION MARK<br /> U+00C0 À c3 80 LATIN CAPITAL LETTER A WITH GRAVE<br /> U+00C1 Á c3 81 LATIN CAPITAL LETTER A WITH ACUTE<br /> U+00C2 Â c3 82 LATIN CAPITAL LETTER A WITH CIRCUMFLEX<br /> U+00C3 Ã c3 83 LATIN CAPITAL LETTER A WITH TILDE<br /> U+00C4 Ä c3 84 LATIN CAPITAL LETTER A WITH DIAERESIS<br /> U+00C5 Å c3 85 LATIN CAPITAL LETTER A WITH RING ABOVE<br /> U+00C6 Æ c3 86 LATIN CAPITAL LETTER AE<br /> U+00C7 Ç c3 87 LATIN CAPITAL LETTER C WITH CEDILLA<br /> U+00C8 È c3 88 LATIN CAPITAL LETTER E WITH GRAVE<br /> U+00C9 É c3 89 LATIN CAPITAL LETTER E WITH ACUTE<br /> U+00CA Ê c3 8a LATIN CAPITAL LETTER E WITH CIRCUMFLEX<br /> U+00CB Ë c3 8b LATIN CAPITAL LETTER E WITH DIAERESIS<br /> U+00CC Ì c3 8c LATIN CAPITAL LETTER I WITH GRAVE<br /> U+00CD Í c3 8d LATIN CAPITAL LETTER I WITH ACUTE<br /> U+00CE Î c3 8e LATIN CAPITAL LETTER I WITH CIRCUMFLEX<br /> U+00CF Ï c3 8f LATIN CAPITAL LETTER I WITH DIAERESIS<br /> U+00D0 Ð c3 90 LATIN CAPITAL LETTER ETH<br /> U+00D1 Ñ c3 91 LATIN CAPITAL LETTER N WITH TILDE<br /> U+00D2 Ò c3 92 LATIN CAPITAL LETTER O WITH GRAVE<br /> U+00D3 Ó c3 93 LATIN CAPITAL LETTER O WITH ACUTE<br /> U+00D4 Ô c3 94 LATIN CAPITAL LETTER O WITH CIRCUMFLEX<br /> U+00D5 Õ c3 95 LATIN CAPITAL LETTER O WITH TILDE<br /> U+00D6 Ö c3 96 LATIN CAPITAL LETTER O WITH DIAERESIS<br /> U+00D7 × c3 97 MULTIPLICATION SIGN<br /> U+00D8 Ø c3 98 LATIN CAPITAL LETTER O WITH STROKE<br /> U+00D9 Ù c3 99 LATIN CAPITAL LETTER U WITH GRAVE<br /> U+00DA Ú c3 9a LATIN CAPITAL LETTER U WITH ACUTE<br /> U+00DB Û c3 9b LATIN CAPITAL LETTER U WITH CIRCUMFLEX<br /> U+00DC Ü c3 9c LATIN CAPITAL LETTER U WITH DIAERESIS<br /> U+00DD Ý c3 9d LATIN CAPITAL LETTER Y WITH ACUTE<br /> U+00DE Þ c3 9e LATIN CAPITAL LETTER THORN<br /> U+00DF ß c3 9f LATIN SMALL LETTER SHARP S<br /> U+00E0 à c3 a0 LATIN SMALL LETTER A WITH GRAVE<br /> U+00E1 á c3 a1 LATIN SMALL LETTER A WITH ACUTE<br /> U+00E2 â c3 a2 LATIN SMALL LETTER A WITH CIRCUMFLEX<br /> U+00E3 ã c3 a3 LATIN SMALL LETTER A WITH TILDE<br /> U+00E4 ä c3 a4 LATIN SMALL LETTER A WITH DIAERESIS<br /> U+00E5 å c3 a5 LATIN SMALL LETTER A WITH RING ABOVE<br /> U+00E6 æ c3 a6 LATIN SMALL LETTER AE<br /> U+00E7 ç c3 a7 LATIN SMALL LETTER C WITH CEDILLA<br /> U+00E8 è c3 a8 LATIN SMALL LETTER E WITH GRAVE<br /> U+00E9 é c3 a9 LATIN SMALL LETTER E WITH ACUTE<br /> U+00EA ê c3 aa LATIN SMALL LETTER E WITH CIRCUMFLEX<br /> U+00EB ë c3 ab LATIN SMALL LETTER E WITH DIAERESIS<br /> U+00EC ì c3 ac LATIN SMALL LETTER I WITH GRAVE<br /> U+00ED í c3 ad LATIN SMALL LETTER I WITH ACUTE<br /> U+00EE î c3 ae LATIN SMALL LETTER I WITH CIRCUMFLEX<br /> U+00EF ï c3 af LATIN SMALL LETTER I WITH DIAERESIS<br /> U+00F0 ð c3 b0 LATIN SMALL LETTER ETH<br /> U+00F1 ñ c3 b1 LATIN SMALL LETTER N WITH TILDE<br /> U+00F2 ò c3 b2 LATIN SMALL LETTER O WITH GRAVE<br /> U+00F3 ó c3 b3 LATIN SMALL LETTER O WITH ACUTE<br /> U+00F4 ô c3 b4 LATIN SMALL LETTER O WITH CIRCUMFLEX<br /> U+00F5 õ c3 b5 LATIN SMALL LETTER O WITH TILDE<br /> U+00F6 ö c3 b6 LATIN SMALL LETTER O WITH DIAERESIS<br /> U+00F7 ÷ c3 b7 DIVISION SIGN<br /> U+00F8 ø c3 b8 LATIN SMALL LETTER O WITH STROKE<br /> U+00F9 ù c3 b9 LATIN SMALL LETTER U WITH GRAVE<br /> U+00FA ú c3 ba LATIN SMALL LETTER U WITH ACUTE<br /> U+00FB û c3 bb LATIN SMALL LETTER U WITH CIRCUMFLEX<br /> U+00FC ü c3 bc LATIN SMALL LETTER U WITH DIAERESIS<br /> U+00FD ý c3 bd LATIN SMALL LETTER Y WITH ACUTE<br /> U+00FE þ c3 be LATIN SMALL LETTER THORN<br /> U+00FF ÿ c3 bf LATIN SMALL LETTER Y WITH DIAERESIS</p> </body> </html> Link to comment https://forums.phpfreaks.com/topic/211748-man-i-have-such-a-headache-converting-utf-someone-help/ Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.