Jump to content

Transform from UTF8 (latin ISO-8859-1, ISO-8859-2, ISO-8859-16) to normal ASCII


gls2ro

Recommended Posts

Hello guys,

 

I want to transform special latin chars like Diactrics (see the ones from Diacritis section from http://www.utexas.edu/learn/html/spchar.html) to their corresponding non-diactricts writing letters like for example:

Ă - should be A

ă - should be a

Ş - should be S

ş - should be s

...

 

I'm doing this after reading some HTML pages using cURL.

 

What I've done so far is:

1. Be sure to convert to UTF8 using

mb_convert_encoding ($html, "UTF-8", mb_detect_encoding($html, "UTF-8, ISO-8859-1, ISO-8859-2, ISO-8859-15, ISO-8859-16", true))

2. Then clear the UTF8 using the following function:

 
function clearUTF($s)
{
  $r = ''; 
  $s1 = iconv('UTF-8', 'ASCII//TRANSLIT', $s);
  for ($i = 0; $i < strlen($s1); $i++)
  {   
      $ch1 = $s1[$i];
      $ch2 = mb_substr($s, $i, 1); 

      $r .= $ch1=='?'?$ch2:$ch1;
  }   
  return $r; 
}

 

But it does not work as expected for only one particular HTML page.

 

For this HTML page the encoding returned by mb_detect_encoding is ISO-8859-1.

 

Do you have any ideas about what I should try?

 

Thank you

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.