Jump to content

Getting the directory listing of files with some accented characters


Recommended Posts

PHP Version:  7.0.10
Windows

So I have this function, that grabs all the files within a directory.

function getDirectoryListing($folder) {

	$aryListing = array();

	$dir = new RecursiveDirectoryIterator($folder, FilesystemIterator::SKIP_DOTS);

	// Flatten the recursive iterator, folders come before their files
	$it = new RecursiveIteratorIterator($dir, RecursiveIteratorIterator::SELF_FIRST);

	foreach ($it as $fileinfo) {
		if ($fileinfo->isFile()) {
			$f = array();
			$f['file'] = $fileinfo->getFilename();
			$f['dir'] = "\\" . $it->getSubPath();
			$f['pathfile'] = $it->getSubPathName();
			$f['size'] = $fileinfo->getSize();
			$f['size_human'] = bytesToHuman($fileinfo->getSize());
			$f['time_mod'] = $fileinfo->getMTime();
			$f['time_mod_full'] = date('F j, Y, g:i a', $fileinfo->getMTime());
			$aryListing[] = $f;
		} elseif ($fileinfo->isDir()) {
			//print($fileinfo->__toString() . PHP_EOL);  // directory
		} else {
			// echo $fileinfo->getFilename(); // not file or directory?
		}
	}

	return $aryListing;

}

But with certain accented characters such as ğ or ě (https://en.wikipedia.org/wiki/Ğ and https://en.wikipedia.org/wiki/Ě respectively), it returns as the letters g and e instead of the accented characters. 

So a file like Dağ_Piě.txt will return as Dag_Pie.txt

So in the above code, it is not returned in the array as it is skipped over since Dag_Pie.txt is not a file (and is not a directory either).

This doesn't happen with all files with accented characters.  Files such as café.txt are fine. 

Sure, I can rename all the files manually I find on the server, but I prefer a solution that can read the filenames correctly (and then rename them accordingly if I choose to with the script).  I don't want to go through every single filename ever so often. 

scandir returns the same thing. 

Any help would be appreciated

14 hours ago, ginerjm said:

I"m gonna guess that this is a character set issue.  What are you using when you display the output?

I am just displaying it to the browser to test it.    I have tried writing the results of the contents of the directory to a file, but still, comes back as g/e respectively.  While the other filenames like in cafe will show in the file fine (and I can change the encoding while viewing in notepad++ and see it).  But its like the g/e are just that, the letters g/e itself, as if the iterator/scandir functions returned it to me as such. 

I do believe that if you are not referencing a proper charset your displays will be showing as you describe - not what you want.  As you have noted in your use of notepad++, when you change the encoding it shows up fine.  Have to do that with your browser and probably your script.

1 hour ago, ginerjm said:

I do believe that if you are not referencing a proper charset your displays will be showing as you describe - not what you want.  As you have noted in your use of notepad++, when you change the encoding it shows up fine.  Have to do that with your browser and probably your script.

When I mention that notepad++ shows it fine when changing the encoding, I am referring to café, not the characters ğ/ě.  Notepad is showing ğ/ě as g/e as if php/iterator has replaced it with the letters g/e.  Even if I change the encoding to be correct, it still shows as g/e

And I am not worried about the display as in the end, I am not using it for display (more for backend stuff).  The bigger problem for me is that I am not even able to reference the file or know that the file is in the directory. 

When I run the function, if the directory contains 5 files, but one file contains the g/e, it returns it to me with an array of 4 files.  $fileinfo->isFile() returns false, and $fileinfo->isDir() returns false.  So the rest of the code after the function call acts as if the file does not exist whatsoever. 

So a directory with café.txt, test1.txt, Dağ_Piě.txt ... the function returns an array(café.txt, test1.txt)

The iterator that grabs all the files is able to pick up the file, just returns false on checking if its a file or directory. 

if I uncomment the echo in the else{} stm, it will display the filename, but the letters ğ/ě replaced with g/e, again, as if the 2 letters were replaced by e/g and $fileinfo->isFile()/$fileinfo->isDir() returns false no matter what ... which leads me to believe that no matter what I change the encoding on display, it won't matter what encoding I use to display. 

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.