Memory fragmentation

lezebulon · October 19, 2014, 11:36am

Hi Guys

I’ve been looking at the source code over last week and it seems that the implementation that was chosen for certain things generates ton of memory fragmentation, which I think make every slower esp on Android. It also means that it’s really slow to execute VCMI on windows when it’s compiled in debug mode.

for instance :

JSON parser keeps allocating / deleting memory when we change the type being processed, why not use an existing C++ JSON parser?
in h3mMapLoader, during the “load object” phase, we are creating tons of new objects, just to put them in a vector that is not supposed to be edited during the game (I think), and we remove everything at once when the map is unloaded. I think it would be better to handle this by using memory pages so we only call “new” 2 or 3 times to get enough memory (probably 10ko per page is enough), then deleting everything when the map is unloaded.
in CMap, the Terrain structure is handled by a 3D array. It was made in such a way that caching is extremely inefficient, because the data that is allocated in a block is just the tile (x,y,z) and if underground (x,y,z+1). However almost never look in the map that way, we usually are on a fixed z level and we scan along the x or y axis. This means that the caching could be improved a lot by having the 3rd dimension be x or y, but not z. However I don’t know how much memory would 200 contiguous tiles data would take, but I’m sure that even Android platforms can allocate that much data in one go. Right now this is extremely bad as for instance if we create a 100x100 map with no underground we dynamically create 1000 tiles randomly in memory, whereas we want it to be grouped as much as possible.

What do you guys think?
I have a simple fix for 2) if you want, for 3) as well but it’s a bit less simple

thanks!

Warmonger · October 19, 2014, 4:30pm

You are absolutely right about #3 - it’s always better to have x dimension in adjacent tiles, followed by y in adjacent rows in C++.

This can be probably fixed quite easily, as long as all tiles are accesed by defined functions - can’t be sure about that.

Ivan · October 19, 2014, 5:08pm

Hmm… Do you have any ideas on how much this can effect speed? AFAIK memory fragmentation should not have significant effect on speed. But it may have effect on used memory. I suggest profiling before jumping to conclusions. For example it turned out that AI slowness have nothing to do with slow code but due to TCP latency we were not aware of.

As for your suggestions:

IIRC parser shouldn’t create allocations that may cause fragmentation. There could be some issues from std::map (perhaps we should switch to unordered/hash tables & reserve 10-20 entries?) but parser should be OK. We do have a lot of small allocations but they should be sequential so it shouldn’t cause any big issues. Yes, parser changes type of node causing new allocations but it should not cause deallocations since starting type of node is null/empty node with no data.

Deallocations would happen all at once since all Json tree structure get destroyed at once. So it shouldn’t cause any signifincant speed loss. Extensive editing of Json in runtime may cause this but this happens extremely rarely in game.

There are two problems:

We can’t just put them in pointer-less vector (if this is what you want) since objects may have multiple different types of different size
There are multiple ways to create new objects. For example new mosters generated on new month. Not sure but newly recruited heroes may also cause new entries in objects vector

Switching to one huge block with custom accessors would be perfect solution here. e.g something like this:
template

class vector3d
{
    std::vector<T> storage;
    int3 dimensions;
public:
    vector3d(int3 dimensions); // size is constant during game so just allocate one chunk at once

    T & at(const int3 & dimensions);
    T & at(int x, int y, int z);
}

And internally map x-y-z to single & cache-friendly index ID in internal storage.

It is possible that we should use 2D vector here instead of 3D to decrease likehood of issues like Magic Plains having effect on both levels due to missing z index check and to (possibly) - allow differently-sized levels. IMO even with H3 mechanics surface/underground should be separated as much as possible internally

lezebulon · October 19, 2014, 5:46pm

hi
for 1) I need to double-check the implementation, but it seems that in debug at least the JSON Parser is slow

no it would handle derived objects fine
I can’t tell for sure in what cases this vector is used, as it’s public. However adding / removing objects on the fly works. The only thing that would not be possible would be to de-allocate the memory of the objects removed from the vector, because the memory could only be released all at once for the whole vector. See here for an implementation I made that I think would be ok for this use case : codereview.stackexchange.com/que … mory-pages . The only way this would be very inefficient would be if there are tons of objects that are generated and removed after the game started, but the objects added after the start of the game could be managed by the usual new / delete and that would be fine.

Yeah this would work as well, the only thing I’m concerned is about how much memory could be allocated in one go on Android / iPhones etc . On my machine TerrainTile is 56 bytes, so that would be 100x100x56 = 560kb for a Large Map… really easy for a PC but I have no idea about this on portable device. Anyway I have a solution for this as well but I’ll post it a bit later

Ivan · October 19, 2014, 6:18pm

Can this be caused by large number of small allocations? (should not cause fragmentation but may create overhead due to OS memory management).

UPD: actually it may be caused by large number of reallocations from std::map. Switching to unordered_map + reserve() call in json object parsing should help here. I remember comparing unordered/rbtree container performace in Json but I haven’t noticed any real difference in CPU usage on my system (Linux/Ubuntu) so I kept everything as it.

Ah, got it. Well, we’re not AAA to fight for every percent of speed so rewriting OS mechanisms may not be the best idea - for us this usually means more code to maintain. More thoughts:

Actually deallocating object is relatively frequent operation - picking up resources, fighting monsters, artifacts, scholars. Defeating a hero causes removal from object list but does not deletes object entirely.
It is possible that largest number of allocations comes not from objects themselves but from internal containers - vectors, maps, strings. So throwing objects into a single/several memory blocks won’t have much effect on memory layout.

Not sure about Android but native code should have less restrictions than Java virtual machine. We’re allocating huge images up to several MB in size and so far everything was OK. So unless SDL/Android uses some magic to handle images we should be fine.

Oh - and on your math:
144x144x56 = 1134 kB -> max size of H3 map level (twice for underground)
252x252x56 = 3472 kB -> max size that H3 engine can handle (check HotA if you wish to see this).

VCMI should be able to handle even larger maps but this was never tested due to H3 map format limitations.

lezebulon · October 19, 2014, 6:59pm

for this I was only talking about CMapLoaderH3M::readObjects() function, which create tons of objects and push them in the “object” member. Are you saying that this vector is resized / modified every time a ressource is picked up for instance?
My point was, if you have allocated 600 objects at the start of the map, then when an object is picked up we don’t really need to free the memory for this specific object, because the game was working fine before with all objects on the map (unless you are now desperate for 40 bytes of memory). This is working fine I think as long as you don’t generate too many objects after the map was loaded. I think that for these objects it would be fine to use the usual new / delete if we want to reclaim memory when they are removed later.
I didn’t get you second point sorry.

Yeah I was thinking the same at first too, but I wasn’t really sure mobile device could handle such a memory allocation request. If you think that you can allocate 4Mb in one go on Android, then it’s clearly the best way to proceed.

Warmonger · October 19, 2014, 7:30pm

Before you bend the reality, check this ready solution:

boost.org/doc/libs/1_55_0/li … /user.html

This class is already optimized for the access and dimensions can be easily rotated.

Allocating shouldn’t be a problem, but access could be faster. Keep in mind that the map is accessed both by pathfinder and RMG, which are most demanding pieces of our code.

Ivan · October 20, 2014, 7:09am

Yes. Whenever player removes object from map, e.g. picks resource game completely deletes this object from game - removes it from objects vector and deallocates memory.

So I think that during course of the game something like 1/3 of objects may be removed. Other 2/3 include statics and visitable objects. So by the end of the game memory would still have quite a lot of unused fragments (although still allocated by that memory pages class)

You want to allocate one or several large chunks of memory to decrease memory fragmentation from allocation of CGObjetInstance-derived classes. But it is very likely that members of classes like CGHeroInstance generate much more fragmentation. For example army is represented as std::map<slot, creature>. So changing armies may cause large number of allocations and in the end would still create heavily fragmented memory since your approach won’t help with this case.

Macron1 · October 20, 2014, 7:37am

Looks like map sizes are hardcoded in sources:
github.com/vcmi/vcmi/blob/66c4d469f832583f855ea4db195d6cefb5f22b7a/lib/mapping/CMap.cpp
Maybe it will be good to move these defines to config, like

"mapSize":

{name":"s","size":36},
{"name":"m","size":72},
{"name":"l","size":108}, 
{"name":"xl","size":144}
],

Then user can add new map sizes and RMG templates for them.
I want to play 1008x1008 map, for example.

PS Yes, i know, that i am a bad guy, and i haven’t program anything from summer. I still not found motivation, and i have very lot of work for myself with modding stuff. I also have some problems with health. Don’t know if i can be back to business in november.

lezebulon · October 20, 2014, 6:56pm

ok I double checked, the memory is indeed deleted but the genral “objects” vector still has an empty pointer for the given ID. As I said, I don’t think it’s that bad that 1/3 of the objects allocated at startup are not removed from memory. I mean, the game is supposed to run fine at startup with all the objects, it’s not like you need the initial memory to decrease as the game goes on. The only problematic thing would be when you dynamically create objet, then delete them, and keep doing that lots of time. In that case you’d allocate memory for new pages but never free the old ones that are empty. However for the objects that are generated after the game started, we could just use new / delete as is done now. Also, if right now you are doing this kind of thing, you’d still get the “objects” vector growing, being filled with 0s when the objects are removed, and still growing as you’d add new objects.
But anyway I see your point, this is probably not a very important improvement all things considered.