0.94e issues and plans

#1

In my opinion we should try to find a way how we can get rid of that threading issues (mostly race conditions) in the long term. Hard crashes are something I really don’t like and can stop you from working. I think we have two ways to simply reduce the possibility for threading problems, either reduce the amount of threads or reduce the points where synchronization is required. For the coming release we shouldn’t change much, perhaps we can fix some of those crashes (I will look into them).

Sometimes I really have the feeling to get rid of all old things and rewrite everything from the ground up. I thought of rewriting parts of VCMI in Scala, just for fun, to learn Scala (has a lot of nice programming constructs) and to see how it would work. Just to name a few advantages, no manual memory management, actor-based communication, automatic serialization, hardware-accelerated graphics, write once - run everywhere (Java principle), better IDE support than C++. But in the end I think it’s better to get VCMI finished, the result is more important, than the code or the programming constructs you can use.

Current tasks
#2

Probably the right way to start would be to write down synchronization scheme of VCMI. Then we could seek for weak spots.

#3

I don’t think threading is a particularly buggy part of the project. Every release we fix dozens of crashbugs. Very few of them are caused by threads or race conditions. We usually get tripped over some trivialities like dereferencing null pointer.
Well, I guess the real issue with race conditions are debugging difficulties. I guess there are a few last bugs left that simply needs to be fixed once and for all.

If you say that you encounter the end-mission crash so often that it makes your development impossible, I believe you should be able to debug it — or at least give me some clues, what is happening and how to fix it. So far I only hear about the crash but I haven’t received such basic information as full logs of both client and server.

As for reducing possible thread issues in the long term…

  1. Document current behaviour. What threads we have. What are the shared resources. Who affects what, and who is affected by what. What is allowed and what is not.
  2. Some parts of code (like ending the game) are particularly shabby and should be rewritten in the cleaner way. Perhaps we should also formalize means of inter-thread communication. That is: what messages thread sends and to whom.
  3. Check if some shared resources could feasibly replaced with messages.

Not exactly a threading issue but we have also a problem with aliasing. Tons of crashes were like “AI stores somewhere pointer to owned hero, loses the hero, forgets to clear the pointer”. Generally speaking the game state structures should be encapsulated and player interfaces (GUI/AI) should not see pointers to them.
I have partially solved this by introducing HeroPtr smart pointer class that wraps hero id and throws exception when dereferencing lost hero… But it is only for heroes and AI interface still uses CGHeroInstance* in many places.
The whole player interface (CGameInterface and CCallback) needs rewrite to:
— eliminate aliasing over gamestate
— clearly describe relations between functions and threads
— cover all missing events and I guess all events should have their “before” and “after” versions
— move all bookkeeping (tracking set, engagement in battles, mutexes, etc) responsibilities to the core engine (from the AI/player)

We need to introduce smart pointers. They are not a silver-bullet for resource freeing but they should vastly improve current situation.

That’s hardly a language feature.
There was ongoing effort to make VCMi hardware-accelerated ( sourceforge.net/apps/trac/vcmi/b … hes/OpenGL ), unfortunately it was never finished.

There is something better in the world than Visual Studio (with plugins)?
[Unless we talk about Linux-supporting IDE’s… I haven’t seen comparably good C++ IDE for Linux.]

#4

Any reasons for HeroPtr class to be hero-specific? Shouldn’t more generic template work just as well here? Something like this:

template <typename TObject>
class CMapPtr
{ ... }

CMapPtr<CGHeroInstance> hero(heroObjectID) // throws if not found or on type mismatch
CMapPtr<CGTownInstance> town(townObjectID)

hero.get() // returns CGHeroInstance, throws if not found or wrong type
town.get() // returns CGTownInstance

And even working bits (main menu) were extremely unstable on my side. So unless we’ll get somebody familiar with OpenGL any progress here is quite unlikely. At this point SDL2 looks much more feasible opportunity (and likely more portable).

#5

Just stepping by to show you that Visual can be crazy sometimes

1>------ Build started: Project: VCMI_client, Configuration: RD Win32 ------
1>  VCMI_client.vcxproj -> C:\Temp\RD\VCMI_client.exe
========== Build: 1 succeeded, 0 failed, 1 up-to-date, 0 skipped ==========
#6

That’s right, only a few bugs are caused by threads.

I will create a mantis ticket if I encounter the end-mission crash or some else crashes (whatever sort of type they are) with all information I can provide.

I agree with all of your points and I want to add a few more:

  • After the 0.95 release we can centralize event handling (merge CPlayerInterface and CGuiHandler).
  • Merge the CConsoleHandler thread with the main GUI thread if possible.
  • Keep the thread for loading the game or a map and receiving net packages, possibly also sending packages.
  • Do not switch from synchron to asynchron networking. Question and answer game of client, server communication when initializing a game can stay the same.
  • I thought of a synchronized queue where one thread (networking thread) adds a net package at one end and the other thread (GUI main thread) reads and executes the net package at the other end. If user input or a certain amount of time is required to complete the execution of the net package, we could register a callback handler and set a flag on the net package signalizing the end of the execution. The central event handling engine can then take the next net package in the queue.

I think above points should be relatively simple and quick to implement.

#7

I should have done it as you say.
The reason is basically that heroes are the only object type that AI tends to lose during the turn. There was simply very little need to have other specific-object wrappers.
I wanted to eventually generalize this… but:
a) I didn’t want to develop very smart wrappers on the AI side when I beleive it should be solved on the engine side
b) there are always tons of more urgent issues

There were some issues that were needed to be solved using shaders. Basically the palette-based effects/animations but also effects like Bloodlust or sepia-colored grail map. Reliance on GLSL (and, generally, assuming OpenGL) is probably the biggest factor limiting portability here. [Eg. WinRT doesn’t provide OpenGL support, perhaps some library like ANGLE can save the day here.]

I don’t think it’s wrth effort. It would require writing more platform-specific code (that I hate, particularly when it involves winapi) and the effect wouldn’t be significant.
The processing console commands might not be fully safe but:
— basically the console commands are not part of official feature set, it is only to serve us as some debugging help
— it is very difficult to issue the command in console and in the same time trigger an event in the game

If you mean common parts of update method — yes.

I have mixed feelings about this.
Basically I like that currently responsibilities are split — one part of client handles GUI and another one handles connection and updates gamestate. I am afraid of creating a single blob-segment that is meant to handle everything, from GUI input to updating gamestate.
On the other hand… mutexes already make sure that GUI doesn’t run when net package is applied, so there is little reason to keep applying out of the GUI thread.

#8

I agree that merging CConsoleHandler thread with the main GUI thread is not possible. I thought it would be but it’s not possible with standard C++. Anyway that’s fine. You’re right only possibility that console thread could cause problems is when you issue commands which aren’t official feature set and are only for debugging or testing.

The good news is that I finally found out where the heavy threading problems are coming from: broken console under Linux / Fedora. Somehow logged messages (random!) serve as input for processing commands. Unknown commands are sent to the server and back to all clients. Especially when campaign scenario ends and client - server communication gets unstable, it crashes random at different locations.

@Ivan: Do you know of any console related threading issues? You have written this part of console handler long time ago:

//disabling sync to make in_avail() work (othervice always returns 0)
	std::ios::sync_with_stdio(false);
	std::string buffer;

while ( std::cin.good() )
	{
#ifndef _WIN32
		//check if we have some unreaded symbols
		if (std::cin.rdbuf()->in_avail())
		{
			if ( getline(std::cin, buffer).good() )
				if ( cb && *cb )
					(*cb)(buffer);
		}
		else
			boost::this_thread::sleep(boost::posix_time::millisec(100));

		boost::this_thread::interruption_point();
#else
		std::getline(std::cin, buffer);
		if ( cb && *cb )
			(*cb)(buffer);
#endif
	}

Is it possible to rewrite this code without using std::ios::sync_with_stdio ? I don’t know but why does a simple std::getline doesn’t work plattform independently?

When the above bug is fixed and threading issues are gone, we can keep current GUI / network handling separation.

#9

Not sure. Problem here is how to close vcmi properly. In order to terminate console thread correctly we need to call boost::thread::interruption_point(). So correct logic here should be like this:

1.while console available do
2.    check if we have unprocessed input
3.    if yes - process command
4.    check if thread was interrupted
5.    wait for a bit

Notes:

  1. This may evaluate to false when user starts vcmi without console as normal GUI application (system console is optional on Linux),
  2. In order for this function to actually work sync_with_stdio must be turned off.
    As it turned out - something in my code does not works as expected on Win. As result - we’ve got Linux version and Win version.

Maybe simply disable sending messages to server via console? Not the best solution but console is useless in pregame due to lack of server and after map is started in-game console is available anyway.

UPD: maybe boost::asio may help? We’re using it for networking anyway.
stackoverflow.com/questions/2831 … with-stdin

#10

There is a command “onlyai” which configures only AI-controlled game with no GUI at all.

#11

still be made ​​at this saves. was used very well …

#12

Warmonger, sorry, bad wording. I was talking specifically about “sending messages to server” part. I think that it can be removed without any loss of (debugging) functionality. Right now there is tricky check to detect status of server which seems to fail during endgame.

Of course this won’t solve root of the problem (console output acting as console input) but this should stop crashes from happening.

#13

Oo
I didn’t think it is possible.

Sounds reasonable as a workaround.

Is that really that important to close VCMI properly? I have never observed any negative consequences of ending VCMI with having a console thread vlocked upon the getline call.
Yes, it is not a proper way of closing (eg. destructors won’t be called for stack objects in such thread) but does it matter here?

Were there any issues caused by blocked thread on Linux? I guess you didn’t put that in_avail without a reason. :slight_smile:

#14

Before that change there were ~3 second long shutdown times - probably system waits for all threads to shut down normally. I think there were something more but I can’t remember at the moment.

Current way to close that thread on Win (forced shutdown) seems to unavailable on Linux - I haven’t found in API anything that can be used for that.

#15

Me too:) Yesterday I investigated the problem in a more detail and I found out that it seems to be that the problem doesn’t depend on our linux specific command processing. I commented that part out and used windows specific console code. I can even lock/unlock mutex when reading from std::cin or writing to std::cout, but it doesn’t make any difference.

Ivan, do you have any problems on your side? I looked into Bjarne stroustrup C++ book, but I couldn’t find anything that we could have missed. So the problem has either to be a Fedora, Clang or Std libs problem on my machine. I’ll look into it further and will write a simple test application. I have the feeling that std::cin doesn’t correctly update / synchronize read pointer after something has been written to std::cout.

That’s fine as a workaround.

#16

I have no such crashes in campaigns since Tow fixed that desync crash. However I haven’t tested campaigns a lot lately - just as much as I need to fix campaign-related bugs. So it is possible that I’m lucky to evade this bug.

#17

Just want to say that I found out where the console issues came from: QtCreator. Qtcreator has a integrated console output window which can’t be used for input and apparently as we saw has some bugs. It is possible to use an extra console window like on Windows and now everything is working fine: No more crashes! :mrgreen:

I commited Xeron handling to SVN, so AB campaign should be playable now. Will fix further campaign bugs this weekend.

@Ivan: I implemented Xeron handling hardcoded in the code. Perhaps we can later integrate that special handling in mapOverrides.json. The rule is that if the hero can’t be found (killed or in prison, etc…), he will be created newly.