Jump to content

Recommended Posts

I'm using the following function to clean accented characters to their non-accented equivilents. I'm finding, however, that the replacement is failing and the accented characters are just being nixed in the preg_replace (catchall) at the end of the script. Any ideas?

function cleanFileName($str)
{	$accent_array = array(
    'e' => array('é','è','ê','ë'),
    'E' => array('É','È','Ê','Ë'),
    'a' => array('á','à','â','ä','å','ª'),
    'A' => array('Á','À','Â','Ä','Å'),
    'i' => array('ì','í','î','ï'),
    'o' => array('ò','ó','õ','ô','ö'),
    'u' => array('ù','ú','û','ü'),
    'n' => array('ñ'),
    'c' => array('ç'),
    'ae' => array('æ'),
    'oe' => array('œ'),
    'y' => array('ÿ')
    );
    
foreach($accent_array as $acc_key => $acc_val_array)
{   $reg_exp_accent = '';
    for($m=0;$m<count($acc_val_array);$m++)
    {
        $reg_exp_accent .= $acc_val_array[$m].'|';
    }
    $reg_exp_accent = substr_replace($reg_exp_accent,"",-1);
    $str = ereg_replace($reg_exp_accent,$acc_key,$str);
}
return str_replace(" ","_",preg_replace("/[^A-Za-z0-9.\-\ ]/",'',$str));
}

Link to comment
https://forums.phpfreaks.com/topic/156952-solved-clean-file-name-script/
Share on other sites

Hello,

It would be best to just make the replacement in your for loop right? The storing of variables to just form a regular expression seems a bit odd. Also, preg is better than ereg for the record. Also, read up on str_replace. It can take an array as an argument.

 

function cleanFileName($str) {
  $accent_array = array(
    'e' => array('é','è','ê','ë'),
    'E' => array('É','È','Ê','Ë'),
    'a' => array('á','à','â','ä','å','ª'),
    'A' => array('Á','À','Â','Ä','Å'),
    'i' => array('ì','í','î','ï'),
    'o' => array('ò','ó','õ','ô','ö'),
    'u' => array('ù','ú','û','ü'),
    'n' => array('ñ'),
    'c' => array('ç'),
    'ae' => array('æ'),
    'oe' => array('œ'),
    'y' => array('ÿ')
    );
   
   foreach($accent_array as $acc_key => $acc_val_array) {
      $str = str_replace($acc_val_array, $key, $str);
   }
   return $str;
}

 

Does that work?

Hmmmf.. no.

It might be something i'm mucking up somewhere else, but the file uploader i'm using gives me the following SUPPLIED NAME (original file name) and NAME (after cleaning):

 

    [name] => Grossstadtgeflster - Ich muss gar nix.mp3

    [type] => application/octet-stream

    [tmp_name] => /tmp/phpa7yhqo

    [error] => 0

    => 5705353

    [supplied_name] => Grossstadtgeflüster - Ich muss gar nix.mp3

This script fails too in the file uploader:

function cleanFileName($str)
{	$cleaner = array();
$cleaner[] = array('expression'=>"/[àáäãâª]/",'replace'=>"a");
$cleaner[] = array('expression'=>"/[èéêë]/",'replace'=>"e");
$cleaner[] = array('expression'=>"/[ìíîï]/",'replace'=>"i");
$cleaner[] = array('expression'=>"/[òóõôö]/",'replace'=>"o");
$cleaner[] = array('expression'=>"/[ùúûü]/",'replace'=>"u");
$cleaner[] = array('expression'=>"/[ñ]/",'replace'=>"n");
$cleaner[] = array('expression'=>"/[ç]/",'replace'=>"c");

$str = strtolower($str);  
$ext_point = strpos($str,".");
if ($ext_point===false) return false;

$ext = substr($str,$ext_point,strlen($str));
$str = substr($str,0,$ext_point); 
foreach($cleaner as $cv)
{ 	$str = preg_replace($cv["expression"],$cv["replace"],$str); 
}
return preg_replace("/[^a-z0-9-]/","_",$str).$ext;
}

Oh wait. My bad. I realized I used $key instead of $acc_key. My bad.

 

function cleanFileName($str) {
  $accent_array = array(
    'e' => array('é','è','ê','ë'),
    'E' => array('É','È','Ê','Ë'),
    'a' => array('á','à','â','ä','å','ª'),
    'A' => array('Á','À','Â','Ä','Å'),
    'i' => array('ì','í','î','ï'),
    'o' => array('ò','ó','õ','ô','ö'),
    'u' => array('ù','ú','û','ü'),
    'n' => array('ñ'),
    'c' => array('ç'),
    'ae' => array('æ'),
    'oe' => array('œ'),
    'y' => array('ÿ')
    );
   
   foreach($accent_array as $acc_key => $acc_val_array) {
      $str = str_replace($acc_val_array, $acc_key, $str);
   }
   return $str;
}

Å should be AA, å should be aa, Ä should be AE and ä should be ae. You might also what Ø to OE, ø to oe, and Æ to AE. You'll run into trouble with the double characters though. You can't know if Æ should be AE or Ae without checking the surrounding characters.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.