Jump to content

decode html entity (decimal)


dsaba

Recommended Posts

i have another question related to my main goal of doing this string manipulation

-------------------------------------------------------------------------------

TERMS: number-thingy = &#5555

number-unit = &#5555&#5555&#5555

--------------------------------------------------------------------------------

there is an easier way to do this, but I had no success in doing it my way, i'll tell you what i think the solution was, and what I tried, and how it didn't work

 

&#1497&#1510&#1493&#1512 &#1502&#1493&#1513&#1500&#1501 Perfect Creature

 

this here was originally hebrew and english, (the english remains intact)(while each hebrew word has become a number-unit)

 

I want to write this string onto an image with the imagettftext(); function

I have successfully done that when the string looks like this:

יצור מושלם Perfect Creature

 

as you can see now the string has become:

&#1497&#1510&#1493&#1512 &#1502&#1493&#1513&#1500&#1501 Perfect Creature

 

so my I'm thinking i need to "decode" the number-units back into their hebrew characters, AS thats the only way the imagettftext() function will take it

 

so i research this encoding format and I find this out:

Name HEBREW LETTER FINAL MEM

Block Hebrew

Category Letter, Other [Lo]

Combine 0

BIDI Right-to-Left [R]

Mirror N

Version Unicode 1.1.0 (June, 1993)

Encodings

HTML Entity (decimal) ם

HTML Entity (hex) ם

How to type in Microsoft Windows Alt +05DD

 

UTF-8 (hex) 0xD7 0x9D (d79d)

UTF-8 (binary) 11010111:10011101

UTF-16 (hex) 0x05DD (05dd)

UTF-16 (decimal) 1,501

UTF-32 (hex) 0x000005DD (05dd)

UTF-32 (decimal) 1,501

C/C++/Java source code "\u05DD"

Python source code u"\u05DD"

 

 

so now I know that the number-thingy is in fact encoded in html entity (decimal)

so now I try to do this:

html_entity_decode($string); - does not work

html_specialchars_decode($string) - does not work

 

 

so my question is how do I successfully convert the encrypted hebrew words back into their utf-8 hebrew characters which they were encoded with in the first place

 

if you're going suggest that an browser will interprete and decode these characters for me and then display them, I am well aware of that, however it is not in the browser where I need the hebrew characters to display

 

-IT is in the actual php script, because I need to feed the imagettftext() function with it, and it is not a browser and does not interpret encrypted hebrew characters

 

-ANY THOUGHTS??? ----------thanks a bunch

 

**EDIT

NOTE: If you're testing this out, you need to view the source on the .php page and if you see the html entities in the source then it DID NOT WORK, you should see some kind of gibberish or hebrew characters in the SOURCE then it did work

Link to comment
Share on other sites

Found this on php.net in the user contrib. Maybe this will work?

 

 

<?php
// also try with get_html_translation_table(HTML_ENTITIES)  instead see if that works.
function my_htmlspecialchars_decode($text) {
       return strtr($text, array_flip(get_html_translation_table(HTML_SPECIALCHARS)));
}

?>

 

Worth a shot.

 

 

Link to comment
Share on other sites

i'm completely and utterly confused

from reading on php.net various notes and comments

I do understand that decoding UTF-8 from html entities cannot be done simply by using the html_entity_decode function like I did earlier

 

It requires many more complications and exceptions, i've tried at least three different custom functions people posted on php.net and none of them work for me, its still in html entities when i'm done sending the string throught the function.

 

in order for me to begin to understand HOW to properly decode html entities into UTF-8, i need to understand WHY you can't simply use the html_entity_decode function by itself

and WHY such custom functions (which do not work for me!) are neccesary

 

can anyone shed some light?

if it matters, the language that was encoded in html entities is HEBREW and was UTF-8 originally

Link to comment
Share on other sites

I have had success with this one (from the html_entity_decode() manual page comments) for decoding greek characters:

 

<?
    function utf8_replaceEntity($result){
        $value = (int)$result[1];
        $string = '';
       
        $len = round(pow($value,1/8));
       
        for($i=$len;$i>0;$i--){
            $part = ($value & (255>>2)) | pow(2,7);
            if ( $i == 1 ) $part |= 255<<(8-$len);
           
            $string = chr($part) . $string;
           
            $value >>= 6;
        }
       
        return $string;
    }
   
    function utf8_html_entity_decode($string){
        return preg_replace_callback(
            '/&#([0-9]+);/u',
            'utf8_replaceEntity',
            $string
        );
    }
   
    $string = '&#8217;&#8216; &#8211; &#8220; &#8221;'
        .'&#61607; &#263; &#324; &#345;'
    ;
    $string = utf8_html_entity_decode($string,null,'UTF-8');
   
    header('Content-Type: text/html; charset=UTF-8');
    echo '<li>'.$string;
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.