I seem to be having some sort of character encoding problem. The basic situation is this: I'm using CURL to retrieve data from an external webpage and then parse the results. The result of this is a Chinese string. So far so good.
When I echo the results, though, they come out as �s. Confused, I tried comparing the output to a hard-coded string which is known to exactly match the results. Please see below, assuming $definition has already been set by the aforementioned CURL function.
function OrdOut($str)
{
$out = array();
for($i=0; $i<strlen($str); $i++)
{
$out[] = dechex(ord($str[$i]));
}
echo($str."=>".implode(":",$out)."<br>");
}
$correct = "歌曲";
OrdOut($correct);
OrdOut($definition);
This outputs the following:
歌曲=>e6:ad:8c:e6:9b:b2
����=>b8:e8:c7:fa
$definition SHOULD exactly match $correct, but it doesn't seem to. I'm afraid this might have something to do with CURL and parsing Chinese text, and the headers on the page being retrieved not being correct. It strikes me as curious the difference in string lengths between the two of them, which seems to indicate some sort of different encoding, but I could be completely off.
I appreciate any help.