Jump to content

[SOLVED] clean file name script


bcoffin

Recommended Posts

I'm using the following function to clean accented characters to their non-accented equivilents. I'm finding, however, that the replacement is failing and the accented characters are just being nixed in the preg_replace (catchall) at the end of the script. Any ideas?

function cleanFileName($str)
{	$accent_array = array(
    'e' => array('é','è','ê','ë'),
    'E' => array('É','È','Ê','Ë'),
    'a' => array('á','à','â','ä','å','ª'),
    'A' => array('Á','À','Â','Ä','Å'),
    'i' => array('ì','í','î','ï'),
    'o' => array('ò','ó','õ','ô','ö'),
    'u' => array('ù','ú','û','ü'),
    'n' => array('ñ'),
    'c' => array('ç'),
    'ae' => array('æ'),
    'oe' => array('œ'),
    'y' => array('ÿ')
    );
    
foreach($accent_array as $acc_key => $acc_val_array)
{   $reg_exp_accent = '';
    for($m=0;$m<count($acc_val_array);$m++)
    {
        $reg_exp_accent .= $acc_val_array[$m].'|';
    }
    $reg_exp_accent = substr_replace($reg_exp_accent,"",-1);
    $str = ereg_replace($reg_exp_accent,$acc_key,$str);
}
return str_replace(" ","_",preg_replace("/[^A-Za-z0-9.\-\ ]/",'',$str));
}

Link to comment
https://forums.phpfreaks.com/topic/156952-solved-clean-file-name-script/
Share on other sites

Hello,

It would be best to just make the replacement in your for loop right? The storing of variables to just form a regular expression seems a bit odd. Also, preg is better than ereg for the record. Also, read up on str_replace. It can take an array as an argument.

 

function cleanFileName($str) {
  $accent_array = array(
    'e' => array('é','è','ê','ë'),
    'E' => array('É','È','Ê','Ë'),
    'a' => array('á','à','â','ä','å','ª'),
    'A' => array('Á','À','Â','Ä','Å'),
    'i' => array('ì','í','î','ï'),
    'o' => array('ò','ó','õ','ô','ö'),
    'u' => array('ù','ú','û','ü'),
    'n' => array('ñ'),
    'c' => array('ç'),
    'ae' => array('æ'),
    'oe' => array('œ'),
    'y' => array('ÿ')
    );
   
   foreach($accent_array as $acc_key => $acc_val_array) {
      $str = str_replace($acc_val_array, $key, $str);
   }
   return $str;
}

 

Does that work?

Hmmmf.. no.

It might be something i'm mucking up somewhere else, but the file uploader i'm using gives me the following SUPPLIED NAME (original file name) and NAME (after cleaning):

 

    [name] => Grossstadtgeflster - Ich muss gar nix.mp3

    [type] => application/octet-stream

    [tmp_name] => /tmp/phpa7yhqo

    [error] => 0

    => 5705353

    [supplied_name] => Grossstadtgeflüster - Ich muss gar nix.mp3

This script fails too in the file uploader:

function cleanFileName($str)
{	$cleaner = array();
$cleaner[] = array('expression'=>"/[àáäãâª]/",'replace'=>"a");
$cleaner[] = array('expression'=>"/[èéêë]/",'replace'=>"e");
$cleaner[] = array('expression'=>"/[ìíîï]/",'replace'=>"i");
$cleaner[] = array('expression'=>"/[òóõôö]/",'replace'=>"o");
$cleaner[] = array('expression'=>"/[ùúûü]/",'replace'=>"u");
$cleaner[] = array('expression'=>"/[ñ]/",'replace'=>"n");
$cleaner[] = array('expression'=>"/[ç]/",'replace'=>"c");

$str = strtolower($str);  
$ext_point = strpos($str,".");
if ($ext_point===false) return false;

$ext = substr($str,$ext_point,strlen($str));
$str = substr($str,0,$ext_point); 
foreach($cleaner as $cv)
{ 	$str = preg_replace($cv["expression"],$cv["replace"],$str); 
}
return preg_replace("/[^a-z0-9-]/","_",$str).$ext;
}

Oh wait. My bad. I realized I used $key instead of $acc_key. My bad.

 

function cleanFileName($str) {
  $accent_array = array(
    'e' => array('é','è','ê','ë'),
    'E' => array('É','È','Ê','Ë'),
    'a' => array('á','à','â','ä','å','ª'),
    'A' => array('Á','À','Â','Ä','Å'),
    'i' => array('ì','í','î','ï'),
    'o' => array('ò','ó','õ','ô','ö'),
    'u' => array('ù','ú','û','ü'),
    'n' => array('ñ'),
    'c' => array('ç'),
    'ae' => array('æ'),
    'oe' => array('œ'),
    'y' => array('ÿ')
    );
   
   foreach($accent_array as $acc_key => $acc_val_array) {
      $str = str_replace($acc_val_array, $acc_key, $str);
   }
   return $str;
}

Å should be AA, å should be aa, Ä should be AE and ä should be ae. You might also what Ø to OE, ø to oe, and Æ to AE. You'll run into trouble with the double characters though. You can't know if Æ should be AE or Ae without checking the surrounding characters.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.