Jump to content

Site in 5 languages, special entities convert to asci problem


Recommended Posts

i have developed a website in 5 languages and i got to build a content management system to control the content.

i wanted to build a function that replace special characters with ascii code to be sure that in my db everything is clean and i don't want a surprise when the text is publish this is a a part of the list of char i need to replace, i used various function to make this but neither one  was ok for the 5 language i use(en/de/it/hu/ro):

 [ è ]_______________è............................. e grave//
				[ é ]_______________é.............................e acute//
				[ Á ]_______________Á.............................A acute//
				[ À ]_______________À.............................A grave//
				[ á ]_______________á.............................a acute//
				[ à ]_______________à.............................a grave//
				[ ì ]_______________ì.............................i grave//
				[ í ]_______________í.............................i acute//
				[ ò ]_______________ò.............................o grave//
				[ ó ]_______________ó.............................o acute//
				[ ő ]_______________ő.............................o maghiar//
				[ Ó ]_______________Ó.............................O acute//
				[ ù ]_______________ù.............................u grave//
				[ ú ]_______________ú.............................u acute//
				[ ű ]_______________ű.............................u maghiar//
				[ ü ]_______________ü.............................u uml//
				[ ë ]_______________ë.............................e uml//
				[ ö ]_______________ö.............................o uml//
				[ Ö ]_______________Ö.............................O uml//
				[ ü ]_______________ü.............................u uml//
				[ Ü ]_______________Ü.............................U uml//
				[ ä ]_______________ä.............................a uml//
				[ Ä ]_______________Ä.............................A uml//
				[ ß ]_______________ß.............................ss zed//
				[ - ]_______________−.............................minus sign//
				[ ~ ]_______________∼.............................tilde sign//
				[ \ ]_______________".............................quot, quotation mark//
				[ \"]_______________".............................quot, quotation mark//
				[ < ]_______________&#60;............................. less than//
				[ > ]_______________&#62;............................. greater than//
				[ ´ ],_______________&#180;.............................  acute//
				[ ' ]_______________&#180;.............................  acute//

Do somebody know a method to solve this without problems?

 

Thanks in advance. It's quite a wile I'm searching the answer to this problem It was easy with 3 languages it/de/en but when i added the east european languages(hu/ro) all became  a real headache.

i try this function

function htmlnumericentities($str){
  return preg_replace('/[^!-%\x27-;=?-~ ]/e', '"&#".ord("$0").chr(59)', $str);
}

function numericentitieshtml($str){
  return utf8_encode(preg_replace('/&#(\d+);/e', 'chr(str_replace(";","",str_replace("&#","","$0")))', $str));
}

but still doesn't recognize the char

&#238;

î

You could place them into the database as the code, and when retrieving them convert them back. IE:

 

<?php
$charArr = array("char" => "î");

$input = "char and than there was char";

foreach ($charArr as $char => $val) {
       $input = str_replace($char, $val, $input);
}

// place input into DB

// now grab input from DB
$output = "î and than there was î";

foreach ($charArr as $char => $val) {
       $output = str_replace($val, $char, $input);
}

print $output;
?>

 

I may be missing the point, but I think it would work?

the problem is that i already have all the chars in db converted to ascii.

so i want something that convert every special caracters(letters) to ascii &#...;

i've made this function that replace only what i need :

function formatare($text){

				$text=str_replace("è","&#232;",$text);//e grave//
				$text=str_replace('é','&#233;',$text);//e acute//
				$text=str_replace('Á','&#193;',$text);//A acute//
				$text=str_replace('À','&#192;',$text);//A grave//
				$text=str_replace('á','&#225;',$text);//a acute//
				$text=str_replace('à','&#224;',$text);//a grave//
				$text=str_replace('ì','&#236;',$text);//i grave//
				$text=str_replace('í','&#237;',$text);//i acute//
				$text=str_replace('ò','&#242;',$text);//o grave//
				$text=str_replace('ó','&#243;',$text);//o acute//
				$text=str_replace('Ó','&#211;',$text);//O acute//
				$text=str_replace('ù','&#249;',$text);//u grave//
				$text=str_replace('ú','&#250;',$text);//u acute//
				$text=str_replace('ë','&#235;',$text);//e uml//
				$text=str_replace('ö','&#246;',$text);//o uml//
				$text=str_replace('Ö','&#214;',$text);//O uml//
				$text=str_replace('ü','&#252;',$text);//u uml//
				$text=str_replace('Ü','&#220;',$text);//U uml//
				$text=str_replace('ä','&#228;',$text);//a uml//
				$text=str_replace('Ä','&#196;',$text);//A uml//
				$text=str_replace('ß','&#223;',$text);//ss zed//
				//$text=str_replace('-','&#8722;',$text);//minus sign//
				$text=str_replace('~','&#8764;',$text);//tilde sign//
				$text=str_replace("´","&#180;",$text);// acute//
				$text=str_replace("'","&#180;",$text);// acute//			     
				return $text;
			   } //
			   

but still there are some chars like &#337 ; that can not be seen so if i put in my function this char like :

$text=str_replace("ő'","&#337;",$text);// acute//

this char is not seen by my function and will not replaced but replace all "a" chars and that is something that i do not want.

I thought that the problems is may declaration of the charset

<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8"/>

i try with different declarations like east-european  iso-8859-2 but nothing i don't really understand maybe php don't recognize this type of chars.

this is the db text in hungarian:

Show fesztiv&#225;l szilveszter &#233;jszak&#225;n Jesol&#243;ban
Rendezv&#233;nyek Jesol&#243;ban
Szilveszter &#233;jszak&#225;n zen&#233;s, var&#225;zslatos mix v&#225;rja a l&#225;togat&#243;kat Jesol&#243;ban, a Show&#8722;fesztiv&#225;l alkalm&#225;b&#243;l, a rendezv&#233;nyt Jesol&#243; v&#225;ros polg&#225;rmesteri hivatala szponzor&#225;lja &#233;s a R&#225;di&#243; Birikina, valamint a R&#225;di&#243; Bella&#38;Monella szervez&#233;se. A fesztiv&#225;l december 2006/12/31&#8722;&#233;n  22,00 &#243;rakor kezd&#337;dik a Milan&#243;&#8722;i piac k&#246;rny&#233;k&#233;n Jesol&#243; Lid&#243;ban.A bel&#233;p&#233;s d&#237;jtalan.

and this is the output:

Show fesztivál szilveszter éjszakán Jesolóban Rendezvények Jesolóban Szilveszter éjszakán zenés, varázslatos mix várja a látogatókat Jesolóban, a Show−fesztivál alkalmából, a rendezvényt Jesoló város polgármesteri hivatala szponzorálja és a Rádió Birikina, valamint a Rádió Bella&Monella szervezése. A fesztivál december 2006/12/31−én 22,00 órakor kezdődik a Milanó−i piac környékén Jesoló Lidóban.A belépés díjtalan.

this is the db text in italian:

La notte di San Silvestro a Jesolo si festeggia con la musica e la magia del Festival Show, manifestazione promossa dal Comune di Jesolo ed organizzata da Radio Birikina e Radio Bella &#38; Monella. L&#180;evento si terr&#224; dalle ore 22.00 in Piazza Mazzini sabato 31 dicembre. L&#180;ingresso sar&#224; gratuito. 

La notte di San Silvestro a Jesolo si festeggia con la musica e la magia del Festival Show, manifestazione promossa dal Comune di Jesolo ed organizzata da Radio Birikina e Radio Bella & Monella. L´evento si terrà dalle ore 22.00 in Piazza Mazzini sabato 31 dicembre. L´ingresso sarà gratuito.

 

all this texts are available in 5 languages. to be sure i convert all the chars to ascii.

 

as you can see this form here in the forum just do the job of converting the chars, i want something similar. in some pages i have html so i don't need something to convert html to ascii, but only the chars(letters), here i see that the chars set is iso-8859-1 called also latin1

 

the text i put it up there is shown like this

<div class="quote">Show fesztivál szilveszter éjszakán Jesolóban Rendezvények Jesolóban Szilveszter éjszakán zenés, varázslatos mix várja a látogatókat Jesolóban, a Show&#8722;fesztivál alkalmából, a rendezvényt Jesoló város polgármesteri hivatala szponzorálja és a Rádió Birikina, valamint a Rádió Bella&Monella szervezése. A fesztivál december 2006/12/31&#8722;én 22,00 órakor kezd&#337;dik a Milanó&#8722;i piac környékén Jesoló Lidóban.A belépés díjtalan.</div>

as you can see they replace only few chars when i submit strange text, they are changing only this chars  &#337 ;that are not in iso-8859-1 charset but in iso-8859-2 so that's why they are replacing with ascii.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.