Supporting unicode file paths

So, I really want VCMI to support unicode file paths ASAP, because Windows users with a non-ascii username currently can’t run VCMI (and I’m one of them). I’ve implemented a solution using shortpaths (github.com/vcmi/vcmi/pull/151), but as I was told, it is not a very good solution.

So I thought about a better solution. While MSVC supports a non-standard wchar_t overload of std::fstream::open(), it does not exist in MinGW, so that’s not portable. I could, however, re-write CFileInputStream to use C-style FILE* internally instead of an ifstream. Then supporting unicode file paths would be easy and portable, with very little platform-specific code. The interface of CFileInputStream wouldn’t need to change at all - the beauty of hiding implementation details.

Do the VCMI developers think this would be a good solution? If so, I’m willing to implement it.

Right solution is to use boost::filesystem everywhere - it works well with unicode - in most places it is used already, but this work has not been done yet.

It does not work, because on Windows it relies on the non-standard overload provided by MSVC…

@King_Crusher Details please. I`m not using MSVS

Internally, the boost::filesystem::basic_ifstream::open() method (and the constructor) that takes a boost::filesystem::path as an argument, converts the path to a c-string and passes it to std::basic_ifstream::open(). For non-windows systems, it gets passed as an UTF-8 encoded char*, which works perfectly fine. However, windows does not support opening files through an UTF-8 name. So on MSVC, it gets passed as an UTF-16 encoded wchar_t*, to a non-standard overload of std::basic_ifstream::open(), which takes a wchar_t* as an argument instead of a char*. But that overload does not exist on MinGW. So there, it ends up passing an UTF-8 encoded char*, even though windows does not support it. The code compiles just fine, but you get an error every time you try to open a file with a non-ascii name.

I tested wrong build (public release one - but it works at least for UTF8 mod path) :frowning: MinGW build definitely crash in any non ASCII filename.

Possible solution:

rewrite boost::filesystem::basic_ifstream to make explicit UTF8->UTF16 conversion on windows, or wrap boost::filesystem::ifstream with some template to use wchar_t based versions on windows.

That wouldn’t work either, because boost::filesystem::path already stores the path in UTF-16 on windows, conversion is not the issue. boost::filesystem::basic_ifstream relies on calling std::basic_ifstream to open files, and there is no standardized way to open unicode files as a std::basic_ifstream on windows. I’ve seen an ugly hack for accomplishing it with MinGW, but it is very ugly and relies on implementation details of libstdc++. However, opening a FILE* with a unicode path is easy - there’s a function _wfopen that takes a wchar_t*, and it exists on both MSVC and MinGW. That’s why I proposed that solution.

Also non-standard I guess. So we need wrapper class with c-style (wchar_t-based) IO on windows and streams or standard c-style IO
on other systems, right?

Yes, that’s what I think is needed. Since CFileInputStream currently doesn’t use any iostream functionality, it would be easy to switch it to c-style IO. I’m not sure of how to handle zip files, because minizip doesn’t seem to support unicode file names.

Actually I`ve already started implementing the solution for this problem using minizip IOAPI, see map format branch (MinizipExtensions.h)

I have updated my github with some changes for better unicode support. Most notably, I made a class called FileStream that is a subtype of std::iostream and uses a FILE* internally. I also merged Ivan’s serialization branch so I wouldn’t need to duplicate some work. Some things probably don’t work as intended right now; in particular, I’m not sure how to do things in CClient::loadGame().

You screwed up everything, merging Ivan`s branch was a bad idea. Also you deleted many files with no reason.

Also if you really want to use Ivan`s branch as a base make a pullrequest to that branch but not develop

I’ve decided to try a different approach instead. I’ll revert back all my changes, then only implement changes to CFileInputStream. It seems I’ve misunderstood how parts of VCMI work. I’ll also use your MinizipExtensions.cpp/h files to make the functions in the ZipArchive namespace support unicode filenames.

Progress has been made: github.com/vcmi/vcmi/compare/develop…Zyx-2000:develop

I’ve decided to somewhat narrow the scope of my task. I’m only aiming to add support for unicode characters in the path in front of the VCMI-specific folders. For example, “C:\Users<non-ascii name>\My Games\vcmi” is supported, but “C:\Program Files\vcmi\config<non-ascii file>” is not. Implementing the latter would mean major changes to how VCMI handles resources. And it would be a whole lot harder to support unicode names inside zip archives.

This way, users with non-ascii usernames are not “discriminated” any more (meaning everything works just as fine for them as for everyone else). But for example, savegames still can’t have non-ascii filenames.

What I have left to do now is to test things some more, and to make some of the new code cleaner.

I think this is sufficient to just allow players to launch the game, which is major issue for now. VCMI is officially English only and savegame names are far from critical.

[s]So, I’ve finished unicode support in most places, but as very often, 20% of the task requires 80% of the work…

ISimpleResourceLoader has a method load() returning std::unique_ptr, but there is no corresponding method for writing. I’d like to add a method write() which returns std::unique_ptr (or similar). Currently, a lot of code looks like “std::ofstream(ISimpleResourceLodader::GetResourceName())”, which is a bit messy, and I don’t think those call sites needs to actually know the file name, they only need an output stream.[/s]

EDIT: I don’t think the above is needed right now; maybe it’s something to be done at a later time.

I have done some testing, and I think everything works as it should. I have created a pull request.