Sonnich Posted May 21, 2020 Share Posted May 21, 2020 Hi all I hope I have come to the right place. My system reads files stored on a drive and lists them to users through plain HTML, I made it 11 years ago and have to refresh my memory. My problem is that filenames seem to come in different formats, to how to decode/encode them is an issue... My users use Scandinavian letters (æøåäöüõ) and it seems like one filename is in one format and another in another format. There is no logic to what format the filenames comes it. All files are ok and downloads as they should, they just dont list well. I tried downloading one with Æ and uploading it again as -2 and it lists differently. Any idea how I can handle this issue? Quote Link to comment https://forums.phpfreaks.com/topic/310830-readdir-scandir-formats-of-filenames/ Share on other sites More sharing options...
requinix Posted May 21, 2020 Share Posted May 21, 2020 Make sure you're doing everything with the same encoding - preferably UTF-8. What operating system? IIRC Windows and Linux deal with it differently. Quote Link to comment https://forums.phpfreaks.com/topic/310830-readdir-scandir-formats-of-filenames/#findComment-1578259 Share on other sites More sharing options...
gizmola Posted May 22, 2020 Share Posted May 22, 2020 11 years ago, it was unusual for people to use UTF-8. This was also before HTML5 became the defacto standard, but previous to that, people often used a particular character set, so check your app to see what if any meta charset it's setting. <meta charset="UTF-8"> //or possibly <meta charset="ISO-8859-1"> In the past it was not uncommon for people in the west to use ISO-8859-1 as it covers english and a lot of the european languages, and Finnish and Swedish. There is also ISO-8859-4 which supports "Scandinavia/Baltic". They overlap to a fair degree, but obviously there are some characters that are different. As requinix stated, we really need more info on the OS of the server. Again, going back 11 years, windows servers were still possibly using a codepage rather than unicode. There's also some issues with different OS's as to the support or lack thereof for case sensitive filenames. It would also be helpful if you could provide a specific example of a file that has one name on the filesystem and displays as garbage or something else in your app. Quote Link to comment https://forums.phpfreaks.com/topic/310830-readdir-scandir-formats-of-filenames/#findComment-1578278 Share on other sites More sharing options...
Sonnich Posted May 31, 2020 Author Share Posted May 31, 2020 (edited) Thanks for the answer. They system is actually 12 years old and I last tuched it in 2016, cannot remember much. I found my system is on use on 3 servers, with 3 different results Questions: how should I output things? I follewed your ideas and read this about UTF-8. I now changed from 8859 to UTF-8. https://www.w3schools.com/charsets/ When uploading (using <input type=file>) I can upload, and it saves it as is (when I check the uploaded and saved file with WS_FTP. So say Ærøskøbing stays the way it should. Next, I read stuff, using readdir. Looks like readdir reads things from the disk as they are. What is then the right way to output things? echo $fileame; echo utf8_encode($filename); I guess that htmlentities is not the right thing when working with UTF-8. Or is the string itself UTF8, or just plain ASCII? I am new to all this, so sorry for asking. I tried to read but got more confused than smarter in the subject. Edited May 31, 2020 by Sonnich missing comment Quote Link to comment https://forums.phpfreaks.com/topic/310830-readdir-scandir-formats-of-filenames/#findComment-1578549 Share on other sites More sharing options...
requinix Posted May 31, 2020 Share Posted May 31, 2020 3 minutes ago, Sonnich said: Next, I read stuff, using readdir. Looks like readdir reads things from the disk as they are. What is then the right way to output things? echo $fileame; echo utf8_encode($filename); I guess that htmlentities is not the right thing when working with UTF-8. Or is the string itself UTF8, or just plain ASCII? I don't know the answer to this one. Create a file with a non-ASCII character, make sure that you can access it through the web server (without PHP), then see what readdir() shows you. Quote Link to comment https://forums.phpfreaks.com/topic/310830-readdir-scandir-formats-of-filenames/#findComment-1578550 Share on other sites More sharing options...
Sonnich Posted May 31, 2020 Author Share Posted May 31, 2020 Just now, requinix said: I don't know the answer to this one. Create a file with a non-ASCII character, make sure that you can access it through the web server (without PHP), then see what readdir() shows you. So every server has it own needs? Quote Link to comment https://forums.phpfreaks.com/topic/310830-readdir-scandir-formats-of-filenames/#findComment-1578551 Share on other sites More sharing options...
requinix Posted May 31, 2020 Share Posted May 31, 2020 Just now, Sonnich said: So every server has it own needs? Not necessarily. I'm saying I don't know so experiment with it and find out. Quote Link to comment https://forums.phpfreaks.com/topic/310830-readdir-scandir-formats-of-filenames/#findComment-1578552 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.