Serializer improvements

Hi,
I have committed changes to the serializer. Their purpose is to fix numerous long-standing issues and missing features that were hampering other parts of the engine.

After this commit all things should work as before. If they do not — please let me know.

There might be some compile errors on gcc/clang — code is heavy on templates and Visual Studio can be very lax when it comes to enforcing standard conformance [well, basically the compiler itself doesn’t provide two-phase lookup.]. Fix them if you can or report them here, so I can try to figure them out.

What’s new? The serializer now:

  1. properly supports classes with multiple inheritance.
  2. properly supports shared_ptr
  3. allows polymorphic serialization using pointer to the abstract base class

At the moment weak_ptr still isn’t properly supported but I hope to fix this within several days. There are a few minor improvements I want to make but they’ll be hardly visible outside the serializer.

Example:
The following code will work

struct BaseA
{
    int a;

    virtual ~BaseA() {};

    template <typename Handler> void serialize(Handler &h, const int version)
    {
        h & a;
    }
};

struct BaseB
{
    int b;

    virtual void foo() = 0; //abstract classes work as well
    virtual ~BaseB() {};

    template <typename Handler> void serialize(Handler &h, const int version)
    {
        h & b;
    }
};

struct Der : BaseA, BaseB
{
    int d;

    virtual void foo() override {};

    template <typename Handler> void serialize(Handler &h, const int version)
    {
        h & static_cast<BaseA&>(*this);
        h & static_cast<BaseB&>(*this);
        h & d;
    }

    bool operator==(const Der &rhs) const
    {
        return d == rhs.d && a == rhs.a && b == rhs.b;
    }
};

void testSerializer()
{
    Der *d = new Der();
    d->a = 40;
    d->b = 45;
    d->d = 50;
    BaseA *a = d;
    BaseB *b = d;

    auto ds = make_shared<Der>();
    ds->a = 40;
    ds->b = 45;
    ds->d = 50;
    shared_ptr<BaseA> as = ds;
    shared_ptr<const BaseB> bs = ds;


    {
        CSaveFile save("test.txt");
        save.registerType<BaseA, Der>();
        save.registerType<BaseB, Der>();

        save << b << d << a;
        save << bs << ds << as;
    }

    {
        Der *dd;
        BaseA *aa;
        BaseB *bb;

        shared_ptr<Der> dds;
        shared_ptr<BaseA> aas;
        shared_ptr<const BaseB> bbs;

        {
            CLoadFile load("test.txt");
            load.registerType<BaseA, Der>();
            load.registerType<BaseB, Der>();

            load >> bb >> dd >> aa;
            assert(dd == aa);
            assert(dd == bb);
            assert(*dd == *d);

            load >> bbs >> dds >> aas;
            assert(dds == aas);
            assert(dds == bbs);
            assert(dds.get() == aas.get());
            assert(dds.get() == bbs.get());
            assert(*dds == *ds);
            assert(dds.use_count() == 4); //three explicit ptrs plus one copy hold by the CLoadFile
        }
        assert(dds.use_count() == 3);
    }
}

There’s one significant change from the prevoius implementation. The registerType template naw takes two parameters — base class and its derived class. That means we are not registering the types to be handled polymorphically — we register the inheritance relations. Only direct relations should registered.

As a consequence it should be finally possible:
a) to properly serialzie polymorphic shared_ptr’s in VCAI (they’ll just need to be registered before serialization)
b) to properly store bonus tree hierarchy
c) to introduce smart pointers into some parts of gs/vlc

Working on fixing compilation with gcc.

  1. We need to do something with RegisterTypes.cpp - it has now peak usage of ~3.5 Gb of RAM.

Update:
both gcc and clang should work now. But that 3.5 Gb’s will be a problem for those who don’t have that much RAM

especially for me as my 64 bit build is still not work (now It craches insode SDL_Init() ), and there is no 64bit host >32bit target mingw gcc.

Has the memory memory usage increased significantly after this update or it was always that bad?

I don’t have much ideas as for reducing memory usage. I don’t know how to simplify serializer.
Well, perhaps we could get rid of the CRTP trick. The actual serializar classes don’t overload the template member functions, so we can use usual inheritance here and make read/write virtual function. Nevertheless I have no idea if (how much) this will help and most likely won’t find time for such rewrite.

Does splitting the file helps? What is the memory usage when registerTypes instantiation in present (and the rest is commented out)?

That file was always quite heavy, both compile time and mem usage but not that big.

That file consist solely from template instantiations so removing them decreases mem usage to 0.

Removing all instantiations but TypeList reduces mem usage to ~1.6 Gb
Removing these two lines from TypeList::registerType decreases to ~0.4 Gb

		casters[std::make_pair(bti, dti)] = make_unique<const PointerCaster<Base, Derived>>();
		casters[std::make_pair(dti, bti)] = make_unique<const PointerCaster<Derived, Base>>();

So it seems to be coming from huge amounts of instantiations of of template methods/classes

Yes, but where? RegisterTypes.h is a part of lib, which is independent of VCAI.

I suppose, separate “RegisterTypes” inside VCAI should work.

I’m somewhat surprised that this part takes that much memory. So simply creating about ~600 instantations of PointerCaster requires 1.2 GB? That’s 2 MB per instantiation. And the class is really simple. [just dynamic_casts the pointer wrapped with boost::any] Huh… but what we can do?

If you have some spare time for this, you may try replacing IPointerCaster with three std::function<boost::any(const boost::any &)> objects and assign to them lambdas created within CTypeList::registerType that do the same thing.
There’ll be no class template instantiations here, std::function is type-erased, so it should be lighter on compiler.

If you have even more time, you can try different thing: actually dynamic cast is really rarely needed (only when virtual inheritance is involved). In most cases casting does nothing (we convert between the first base and derived object) because pointer offset is zero. In the several cases (non-first base) there is a constant offset involved.
We mgiht create PointerCaster only for virtual inheritance cases, otherwise use a simple class that instead of dynamic cast just moves pointer by given offset (that is easily known in the CTypeList::registerType). The differentation between the three cases should be relatively easily achievable with type traits.

As seen in the example, you just need to call the registerType on the serializer object. The only places where AI has access to serializer object are the methods save/load and that’s where it has to be done. In the save/load (or in serializeInternal) first register the types, then simply proceed to serialization as usual.
(registration, as discussed, isn’t cheap for compilation time, to it’s best to define the function for registration in a single compilation unit, not header.)

Done.

One little issue so far: serializer needs public default constructors, which should be banned (private) for some Goals. Could probably try some friend class acrobatics, not sure if it’s worth it.

Does splitting the file helps? What is memory usage if only the “template void registerTypes<CISer>(CISer& s);” line is present?

I’d say it is worth. If the default-constructed goal is invalid, the default constructor should be private.
Serializer uses a helper class to create objects (with a default constructor) before deserialzizing their data, so beffriending this should suffice:

template <typename T, typename Enable>
friend struct ClassObjectCreator

or sth like this.

Wow. 2.6 Gb with just that line… So splitting won’t help much for 32-bit systems - this is too close to OS limit (3.0 Gb IIRC)

Interesting that compiled file (*.o) here is also notably bigger than everything else - 80 Mb when everything is enabled while most of files here are below 10 Mb.

Since I can’t look (easily) into compiler internals to find out what takes such amounts of memory I tried to demangle this object file. Biggest symbols in it are these (duplicated for each registered type ofc):

  1. savePointer / loadPointer - Huge number of these, probably responsible for majority of current size
  2. PointerCaster<…>::castSmartPtr<shared_ptr>(boost::any) - Specifically this method, everything else in PointerCaster seems to be less significant.
  3. vector<…>_default_append - Not as numerous as #1 & #2, but still notable
  4. serialize() methods from biggest classes (e.g. CGHeroInstance, CQuest)
  5. CTypeList::registerType<…>()

UPD: false lead. Removing these methods have quite insignificant effect on allocated memory.

As for splitting… the lines itself in the file rather won’t work, its the registered types that create that much instantiations.
What if the RegisterTypes.cpp instead of registerTypes function called just registerTypes4? Does this reduce the memory usage significantly? If so, we could just split registerTypes onto as many smaller methods as necessary to keep VCMI compilable with GCC.

Seems to be a way to go:

// registerTypes1:  1.9 Gb <- don't like it, but we may split it later if needed
// registerTypes2:  2.2 Gb <- split into two templates below
//     registerTypesClientPacks1 1.6 Gb
//     registerTypesClientPacks2 1.6 Gb
// registerTypes34: 1.6 Gb <- 3+4 can be merged into one
//     registerTypes3:  1.3 Gb
//     registerTypes4:  1.3 Gb

Will do proper splitting tomorrow

On win64 there is no 3Gb limit for 32 bit applications, limit is 2 GB or 4 Gb depending on linking options. (so TypesMapObjects should be split - i’ve got “out of memory”)

Right now maximum mem usage comes from that file. And I have 1.9 Gb memory usage - far below 4 Gb limit. (gcc 4.8.1, 64 bit system)

Can you do any necessary splitting yourself? Out of memory on this file means that you have notably bigger memory usage than I do so I can’t say for sure what and how should be split to make compilation work on your system.

The problem is that on win64 limit is actually 2Gb in case of gcc and 1.9 is too much too. (I`ll do splitting myself)

I have another question about serialization:
Why we are serializing handlers? Why not do instead one of following:

  1. dont save any handler related data, load handlers from config always. Check mod checksums on game load, fail if mismatch.

  2. same as 1 but check only version - this will allow small config-related changes in started game.

  3. save processed json configuration (from modHandler) to savegames. Initialize handlers from saved json.

  • This (case #3 and is some point #2) will add much more backward compatibility to saves, even for several releases: json loading code is quite flexible and network protocol changes does not affect save compatibility.
  • Less template magic will be also useful.

3 seems to me to be simply another form of what we have now. I don’t think it adds backwards compatibility — if you add new data to the savegame or change schema you need to handle this out anyway with if’s.

There is no necessity to store handlers, 1) and 2) are maybe not trivial to implement but certainly possible. [It would require a number of changes to properly support pointers to handler structures the gamestate wants to serialize, bonus system relations and so on.]
Still, I prefer the current approach. For example, when we get a bugreport with a saved game, after the load we have the handlers of the reporter. That’s quite a neat feature. Otherwise reporters would have to package their whole config, mods and so on.
I’d say that breaking save compatibility by config changes is worse than breaking them by the changes to the handlers source code.

Moreover, the handlers should not be assumed to be a static set of data loaded from config that never changes. Well, that is pretty much the state for now (except for some parts, like bonuses) but it will be most likely idfferent when scripts support will be implemented. I would certainly expect that scripts API will allow to dynamically modify the handlers state — and therefore there’ll be need to keep it serialized. So while I understand your desire to get rid of some serialziation templates, I wouldn’t want to remove them even if handlers were not a part of the save.

It seems that another huge bottleneck is netpack serializer. Not sure if something can be improved here, but won’t hurt to ask.


Show full method names please.