Jump to content

Recommended Posts

Hi all

I hope I have come to the right place.

My system reads files stored on a drive and lists them to users through plain HTML, I made it 11 years ago and have to refresh my memory.

My problem is that filenames seem to come in different formats, to how to decode/encode them is an issue...

My users use Scandinavian letters (æøåäöüõ) and it seems like one filename is in one format and another in another format. There is no logic to what format the filenames comes it. All files are ok and downloads as they should, they just dont list well.
I tried downloading one with Æ and uploading it again as -2 and it lists differently.

Any idea how I can handle this issue?

 

 

Link to comment
https://forums.phpfreaks.com/topic/310830-readdir-scandir-formats-of-filenames/
Share on other sites

11 years ago, it was unusual for people to use UTF-8. This was also before HTML5 became the defacto standard, but previous to that, people often used a particular character set, so check your app to see what if any meta charset it's setting.

 

<meta charset="UTF-8">

//or possibly
<meta charset="ISO-8859-1">

In the past it was not uncommon for people in the west to use ISO-8859-1 as it covers english and a lot of the european languages, and Finnish and Swedish.  There is also ISO-8859-4 which supports "Scandinavia/Baltic".  They overlap to a fair degree, but obviously there are some characters that are different.

As requinix stated, we really need more info on the OS of the server.  Again, going back 11 years, windows servers were still possibly using a codepage rather than unicode.  There's also some issues with different OS's as to the support or lack thereof for case sensitive filenames.  

It would also be helpful if you could provide a specific example of a file that has one name on the filesystem and displays as garbage or something else in your app.

  • 2 weeks later...

Thanks for the answer. They system is actually 12 years old and I last tuched it in 2016, cannot remember much. I found my system is on use on 3 servers, with 3 different results

Questions: how should I output things?

I follewed your ideas and read this about UTF-8. I now changed from 8859 to UTF-8.

https://www.w3schools.com/charsets/

When uploading (using <input type=file>) I can upload, and it saves it as is (when I check the uploaded and saved file with WS_FTP. So say Ærøskøbing stays the way it should.

Next, I read stuff, using readdir. Looks like readdir reads things from the disk as they are.

What is then the right way to output things?
echo $fileame;
echo utf8_encode($filename);
I guess that htmlentities is not the right thing when working with UTF-8.

Or is the string itself UTF8, or just plain ASCII?
I am new to all this, so sorry for asking. I tried to read but got more confused than smarter in the subject.

 

Edited by Sonnich
missing comment
3 minutes ago, Sonnich said:

Next, I read stuff, using readdir. Looks like readdir reads things from the disk as they are.

What is then the right way to output things?
echo $fileame;
echo utf8_encode($filename);
I guess that htmlentities is not the right thing when working with UTF-8.

Or is the string itself UTF8, or just plain ASCII?

I don't know the answer to this one. Create a file with a non-ASCII character, make sure that you can access it through the web server (without PHP), then see what readdir() shows you.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.