So, I really want VCMI to support unicode file paths ASAP, because Windows users with a non-ascii username currently can’t run VCMI (and I’m one of them). I’ve implemented a solution using shortpaths (github.com/vcmi/vcmi/pull/151), but as I was told, it is not a very good solution.
So I thought about a better solution. While MSVC supports a non-standard wchar_t overload of std::fstream::open(), it does not exist in MinGW, so that’s not portable. I could, however, re-write CFileInputStream to use C-style FILE* internally instead of an ifstream. Then supporting unicode file paths would be easy and portable, with very little platform-specific code. The interface of CFileInputStream wouldn’t need to change at all - the beauty of hiding implementation details.
Do the VCMI developers think this would be a good solution? If so, I’m willing to implement it.
Right solution is to use boost::filesystem everywhere - it works well with unicode - in most places it is used already, but this work has not been done yet.
Internally, the boost::filesystem::basic_ifstream::open() method (and the constructor) that takes a boost::filesystem::path as an argument, converts the path to a c-string and passes it to std::basic_ifstream::open(). For non-windows systems, it gets passed as an UTF-8 encoded char*, which works perfectly fine. However, windows does not support opening files through an UTF-8 name. So on MSVC, it gets passed as an UTF-16 encoded wchar_t*, to a non-standard overload of std::basic_ifstream::open(), which takes a wchar_t* as an argument instead of a char*. But that overload does not exist on MinGW. So there, it ends up passing an UTF-8 encoded char*, even though windows does not support it. The code compiles just fine, but you get an error every time you try to open a file with a non-ascii name.
rewrite boost::filesystem::basic_ifstream to make explicit UTF8->UTF16 conversion on windows, or wrap boost::filesystem::ifstream with some template to use wchar_t based versions on windows.
That wouldn’t work either, because boost::filesystem::path already stores the path in UTF-16 on windows, conversion is not the issue. boost::filesystem::basic_ifstream relies on calling std::basic_ifstream to open files, and there is no standardized way to open unicode files as a std::basic_ifstream on windows. I’ve seen an ugly hack for accomplishing it with MinGW, but it is very ugly and relies on implementation details of libstdc++. However, opening a FILE* with a unicode path is easy - there’s a function _wfopen that takes a wchar_t*, and it exists on both MSVC and MinGW. That’s why I proposed that solution.
Also non-standard I guess. So we need wrapper class with c-style (wchar_t-based) IO on windows and streams or standard c-style IO
on other systems, right?
Yes, that’s what I think is needed. Since CFileInputStream currently doesn’t use any iostream functionality, it would be easy to switch it to c-style IO. I’m not sure of how to handle zip files, because minizip doesn’t seem to support unicode file names.
I have updated my github with some changes for better unicode support. Most notably, I made a class called FileStream that is a subtype of std::iostream and uses a FILE* internally. I also merged Ivan’s serialization branch so I wouldn’t need to duplicate some work. Some things probably don’t work as intended right now; in particular, I’m not sure how to do things in CClient::loadGame().
I’ve decided to try a different approach instead. I’ll revert back all my changes, then only implement changes to CFileInputStream. It seems I’ve misunderstood how parts of VCMI work. I’ll also use your MinizipExtensions.cpp/h files to make the functions in the ZipArchive namespace support unicode filenames.
I’ve decided to somewhat narrow the scope of my task. I’m only aiming to add support for unicode characters in the path in front of the VCMI-specific folders. For example, “C:\Users<non-ascii name>\My Games\vcmi” is supported, but “C:\Program Files\vcmi\config<non-ascii file>” is not. Implementing the latter would mean major changes to how VCMI handles resources. And it would be a whole lot harder to support unicode names inside zip archives.
This way, users with non-ascii usernames are not “discriminated” any more (meaning everything works just as fine for them as for everyone else). But for example, savegames still can’t have non-ascii filenames.
What I have left to do now is to test things some more, and to make some of the new code cleaner.
I think this is sufficient to just allow players to launch the game, which is major issue for now. VCMI is officially English only and savegame names are far from critical.
[s]So, I’ve finished unicode support in most places, but as very often, 20% of the task requires 80% of the work…
ISimpleResourceLoader has a method load() returning std::unique_ptr, but there is no corresponding method for writing. I’d like to add a method write() which returns std::unique_ptr (or similar). Currently, a lot of code looks like “std::ofstream(ISimpleResourceLodader::GetResourceName())”, which is a bit messy, and I don’t think those call sites needs to actually know the file name, they only need an output stream.[/s]
EDIT: I don’t think the above is needed right now; maybe it’s something to be done at a later time.