Jump to content

array utf8 display problem


Go to solution Solved by requinix,

Recommended Posts

Hi,

 
So I have this code...
The problem is that $data is correctly utf8, because i echo it and can see the correct string. However when I pass it to $name, this charset is broken.
If i wish now to display this new array, it shows question marks on the black diamond shaped figure.
 
Could you please maybe help me debug this code?
The php file has this in the header.
<meta charset="UTF-8">
<meta http-equiv="Content-type" content="text/html; charset=UTF-8">
<?php
$data = array();
$inc = 0;
$handle = @fopen("content_realisations.php", "r");
if ($handle) {
    while (($buffer = fgets($handle, 4096)) !== false) {
        $data[$inc] = ($buffer);
$inc = $inc+1;
    }
    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}

?>



<?php

//for ($i=0; $i<$inc; $i++){
$name = get_data($data, $inc);
//echo utf8_encode($name[5][2]);
echo $name[5][2];
// echo $values[1] . "<br>";
// echo $values[2] . "<br>";
  echo '<pre>'; print_r($name); echo '</pre>';
//}

//echo $inc; $length = strlen(utf8_decode($data[22])); echo $length . "<br>"; echo $data[22][$length-3];

function get_data($data, $inc){
for ($row=0; $row<$inc; $row++){
if ($mode == 0){
$z=0; $y=0; $w=0;
$data2 = array();
for($i=0 ; $i< utf8_decode(strlen($data[$row])) ; $i++){
if (($data[$row][$i] == '>') and ($z < 3)){
$z++;
$data_start = $i;
//echo $i . "<br>";
}
if (($data[$row][$i] == '<') and ($y < 4)){
$y++;
$data_end = $i;
//echo $i . "<br>";
} 
if ($data[$row][$i] == '"'){
$data2[$w] = $i; 
$w++;
} 
}
$file = substr($data[$row], $data2[2]+1, $data2[3]-$data2[2]-1);
$thumb = substr($data[$row], $data2[6]+1, $data2[7]-$data2[6]-1);
$id = substr($data[$row], $data2[0]+1, $data2[1]-$data2[0]-1);

//echo $id . "<br>";echo $file . "<br>";echo $thumb . "<br>";echo '<pre>'; print_r($data2); echo '</pre>';
$s = $data_start+1;
//echo $z . "<br>"; echo $y . "<br>"; echo $row . "<br>";
if ($s < $data_end and $z!=0 and $y!=0 and $id == "im")  {
//$name[$row][0] = $row; //must change the index !!!!
while($s != $data_end){
$name[$row][$s-$data_start] = $data[$row][$s];
echo $data[$row][$s];
$s++;
} 
}
}

$length = strlen(utf8_decode($data[$row])); 
$a1=$data[$row][0] . $data[$row][1] . $data[$row][2]; 
//echo '<pre>'; print_r($a1); echo '</pre>';
//echo gettype($a1[0]), "\n";
$is_match = (similar_text($a1, "<!-") == 3) ;
if ($is_match == 1){
//echo "1";
$mode = 1;
}else{
if (similar_text($a1, "-->") == 3 or similar_text($data[$row][$length-3] . $data[$row][$length-2], "->") == 2){
$mode = 0;
}
//echo "0"; 
}
//echo $file . "<br>";
//echo $thumb . "<br>"; 
}
    return $name; 
}


?>

 

Link to comment
https://forums.phpfreaks.com/topic/292246-array-utf8-display-problem/
Share on other sites

$iThat does not make sense. strlen() gives you the number of bytes in the string, and then you utf8_decode() that number?

 

With multibyte strings you cannot use functions like strlen() or even use offsets, like [$i]. You also should not be utf8_decode()ing the string because what actually happens is PHP converts it from UTF-8 to ISO 8859-1 and you'll lose characters.

 

The whole function needs to be rewritten. Can't use strlen, substr, offsets, utf8_decode... Think you can handle that?

Thanks for the feedback. I am quite lost then... why are all the manipulations on the strings work... but when I get to that particular array $name everything gets ruined. Bellow you can see a screenshot of what I am seeing? It's a copy paste... You can see that the first string is echoed and is $data... and the data is shown correctly, the last item with the question mark is the $name[5][2]

the $name is the array that is echoed bellow .

 

I agree with the tf8_decode(strlen($data[$row]))... it was a left over from some tests.

 

C WhitePure Club Med Gym BastilleCitadines Louvre SuiteHôtel N'vy GenèveSofitel Casablanca Tour BlancheRestaurant L'instant d'Or, ParisClub Med ValmorelThalazur Cabourg�

 

Array
(
[2] => Array
(
[1] => C
[2] =>
[3] => W
[4] => h
[5] => i
[6] => t
[7] => e
)

[3] => Array
(
[1] => P
[2] => u
[3] => r
[4] => e
[5] =>
[6] => C
[7] => l
[8] => u
[9] => b
[10] =>
[11] => M
[12] => e
[13] => d
[14] =>
[15] => G
[16] => y
[17] => m
[18] =>
[19] => B
[20] => a
[21] => s
[22] => t
[23] => i
[24] => l
[25] => l
[26] => e
)

[4] => Array
(
[1] => C
[2] => i
[3] => t
[4] => a
[5] => d
[6] => i
[7] => n
[8] => e
[9] => s
[10] =>
[11] => L
[12] => o
[13] => u
[14] => v
[15] => r
[16] => e
[17] =>
[18] => S
[19] => u
[20] => i
[21] => t
[22] => e
)

[5] => Array
(
[1] => H
[2] => �
[3] => �
[4] => t
[5] => e
[6] => l
[7] =>
[8] => N
[9] => '
[10] => v
[11] => y
[12] =>
[13] => G
[14] => e
[15] => n
[16] => �
[17] => �
[18] => v
[19] => e
)

[6] => Array
(
[1] => S
[2] => o
[3] => f
[4] => i
[5] => t
[6] => e
[7] => l
[8] =>
[9] => C
[10] => a
[11] => s
[12] => a
[13] => b
[14] => l
[15] => a
[16] => n
[17] => c
[18] => a
[19] =>
[20] => T
[21] => o
[22] => u
[23] => r
[24] =>
[25] => B
[26] => l
[27] => a
[28] => n
[29] => c
[30] => h
[31] => e
)

[7] => Array
(
[1] => R
[2] => e
[3] => s
[4] => t
[5] => a
[6] => u
[7] => r
[8] => a
[9] => n
[10] => t
[11] =>
[12] => L
[13] => '
[14] => i
[15] => n
[16] => s
[17] => t
[18] => a
[19] => n
[20] => t
[21] =>
[22] => d
[23] => '
[24] => O
[25] => r
[26] => ,
[27] =>
[28] => P
[29] => a
[30] => r
[31] => i
[32] => s
)

[8] => Array
(
[1] => C
[2] => l
[3] => u
[4] => b
[5] =>
[6] => M
[7] => e
[8] => d
[9] =>
[10] => V
[11] => a
[12] => l
[13] => m
[14] => o
[15] => r
[16] => e
[17] => l
)

[9] => Array
(
[1] => T
[2] => h
[3] => a
[4] => l
[5] => a
[6] => z
[7] => u
[8] => r
[9] =>
[10] => C
[11] => a
[12] => b
[13] => o
[14] => u
[15] => r
[16] => g
)

)

  • Solution

You're asking why everything seems to be working up until the point that it doesn't work?

 

As I said, offsets and those functions work on individual bytes. Characters in UTF-8 strings can be one byte (like in most of those names) but they could be up to four bytes. The code will break on strings that have any of those.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.