fourrings Posted March 19, 2009 Share Posted March 19, 2009 Is it possible and if so how? I need to get a recursive filename listing of a directory that contains unicode filenames. I've attempted piping scandir() output through php's iconv() with no luck. This is for my desktop use so the platform is WinXP. I've tried every trick possible and searched on dozens of forums with no luck. Farthest I've gotten is being able to read and save contents from one utf-8 encoded file to another. My fallback option is to render the list using DIR and then using php for processing, but sticking with php would be ideal. Quote Link to comment Share on other sites More sharing options...
corbin Posted March 19, 2009 Share Posted March 19, 2009 Are you running the script on the web or CLI? I don't think cmd can display Unicode. Quote Link to comment Share on other sites More sharing options...
fourrings Posted March 19, 2009 Author Share Posted March 19, 2009 CLI. Web would not be a problem as my personal web site picks up unicode filenames without issues. CMD by default will list unicode filenames, but requires a "/u" parameter when piping unicode content to file. Quote Link to comment Share on other sites More sharing options...
corbin Posted March 19, 2009 Share Posted March 19, 2009 Hrmmmm are we talking UTF8 or UTF16? 餵 does not appear in cmd, but I'm assuming it's because the CMD font cannot display that lol. What is an example of a character you're trying to display? Quote Link to comment Share on other sites More sharing options...
fourrings Posted March 19, 2009 Author Share Posted March 19, 2009 Whatever is native to XP's filesystem (NTFS), which I believe is indeed UTF16. But I've not had os-specific issues using UTF8. Try this... Едем На Море. It will print in default CMD but will choke if you pipe it to a file, unless you include /u. Quote Link to comment Share on other sites More sharing options...
Mchl Posted March 19, 2009 Share Posted March 19, 2009 What functions do you use to write to file? Who goes to the sea? Quote Link to comment Share on other sites More sharing options...
fourrings Posted March 19, 2009 Author Share Posted March 19, 2009 fwrite() Quote Link to comment Share on other sites More sharing options...
Mchl Posted March 19, 2009 Share Posted March 19, 2009 Did you fopen the file with binary flag? Quote Link to comment Share on other sites More sharing options...
fourrings Posted March 19, 2009 Author Share Posted March 19, 2009 Yup. Are you speaking theoretically or have you done this successfully in the past? Quote Link to comment Share on other sites More sharing options...
corbin Posted March 19, 2009 Share Posted March 19, 2009 Hrmmm.... I don't think readdir, glob, scandir and so on support Unicode x.x. Edit: readdir, glob, scandir and the like do not support Unicode. Quote Link to comment Share on other sites More sharing options...
fourrings Posted March 19, 2009 Author Share Posted March 19, 2009 That's the conclusion I'm coming to. However this must be exclusive to Windows as I'm successfully doing a readdir() on my unix-based web server. Quote Link to comment Share on other sites More sharing options...
corbin Posted March 19, 2009 Share Posted March 19, 2009 Excerpts from win32\readdir.h and win32\readdir.c from the PHP 5.2.9 src released on php.net: In the following example, dp is a DIR type which is defined as: typedef struct { HANDLE handle; /* _findfirst/_findnext handle */ short offset; /* offset into directory */ short finished; /* 1 if there are not more files */ WIN32_FIND_DATA fileinfo; /* from _findfirst/_findnext */ char *dir; /* the dir we are reading */ struct dirent dent; /* the dirent to return */ } DIR; As you can see, with TCHAR as the type of the cFileName member (well, if you looked up the WIN32_FIND_DATA struct), it is possible for Unicode support to be in that structure, but _UNICODE and so on are never defined in the PHP source. Also, the following code would not work with Unicode: strlcpy(dp->dent.d_name, dp->fileinfo.cFileName, _MAX_FNAME+1); DIR.dent.d_name is a char array . So, there is definitely no Unicode support for readdir on Windows. Quote Link to comment Share on other sites More sharing options...
fourrings Posted March 19, 2009 Author Share Posted March 19, 2009 You da man! Thank you for going so far as to review the source code. I should've thought of that before spending so much time on this. Quote Link to comment Share on other sites More sharing options...
corbin Posted March 20, 2009 Share Posted March 20, 2009 Eh did it because I was curious as to how PHP implemented opendir/readdir on Windows (the POSIX opendir/readdir are non existent on Windows, and FindFirstFile/FindNextFile are usually used, as PHP does). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.