Optimizing VCMI_client

Hello!

I’ve been following your project for a while now and I must say I’m impressed with your progress so far. There are still a lot of things to be done of course :slight_smile:
Being a Heroes3 fan and also an game engine programmer myself (which is my hobby) I’ve decided to check out VCMI sources and do some profiling work. What struck me immediately is the amount of time spent in CSDL_Ext::blit8bppAlphaTo24bppT, blitting individual colors. Why not to try using SDL_BlitSurface surface here instead? Replacing the whole ColorPutter loop with a SDL_SetAlpha+CSDL_Ext::blitSurface pair eliminated the aforementioned method from profiling log almost completely with 90% of images looking correctly. Granted, it doesn’t fully work for all surfaces that way (for example, the battlefield surface seems to have an alpha value inverted?) but it’s a good start in my opinion.

Keep up the good work!

Ah, the black battlefield effect seems to come from hex overlay, guess that should be easily fixable…

Interesting. We’ve written CSDL_Ext::blit8bppAlphaTo24bppT because we discovered that SDL isn’t capable of blitting 8bpp surfaces with alpha channel into 24bpp surfaces. 90% of images seems to be the number of images without alpha channel… SDL_SetAlpha sets alpha value per-surface, but sometimes we need per-pixel alpha in 8bpp surfaces (e.g. adventure map objects’ shadows) - and this is what CSDL_Ext::blit8bppAlphaTo24bppT does.

Yeah, I guess this is simply the case of blit8bppAlphaTo24bppT being overused, that is, being used for images that don’t need per-pixel alpha blending.
SDL_SetAlpha should work fine for hex overlays of battlefields though.

I think the major performance hog is FoW, which covers huge areas of the map at the beginning. Does it really need per-pixel alpha blending?

I agree that this one slows game terribly. Simple discovering the map in early game can hurt, and even on decent machine Cartographer needs a few seconds to activate.

On the other hand, it may be nice to implement smooth FoW blending when it’s revealed just for good graphical effect present in most of strategies I can think of.

Well, even in this case, alpha-blending would be only needed for border tiles, wouldn’t it?

What you’ve profiled - adventure map view or battlefield (or something else)? Have you used fully optimized build of VCMI?

Unfortunately it does.

Each time I’ve used blit8bppAlphaTo24bpp function I had my reasons. And I hope that this is true also for other programmers. Of course you may find places where SDL blit functions may be used (with or without additional workarounds) but it’ll always result in making additional assumptions about game graphics and will impose limitations.

The best solution would be to optimize blit8bppAlphaTo24bppT - if it can be fast in SDL, it can be fast in VCMI.

That is totally unrelated to the blitting FoW on the adventure map.

I profiled adventure map. And yes, I used a fully optimized build of VCMI.
The only way to optimize blit8bppAlphaTo24bpp is to give up the inlined C++ templates and other clusterfuckages and just go with MMX/SSE intrinsics. Obviously one can’t optimize individual methods like putColor…

I’m not sure that 24bit blits are really that optimizable though… Another possible option is to convert all images/surfaces to fullblown 32bit format and let sdl do the convertion to screen pixel format when actual blitting to screen happens.

Actually adventure map performance could be improved by not blitting objects hidden by full FoW and only borders of FoW must be blitted by blit8bppAlphaTo24bpp… terrainRect is an old, ugly function that should be redesigned anyway (selecting animation of hero by info from CGDefInfo and mechanism of showing hero’s flag are terrible).

Well, this is one of the things I was saying above… Will try to implement that later.

I’ve just implemented it. I’ll commit my change soon.

revision 1778, you mean?

No, actually not. I’m working now on support for 800x480 resolution, you have to wait one more hour or so.

EDIT:
Optimization committed in revision 1780.

Well, even with this optimization in place, profiling sampler ends up with 88% of hits in blit8bppAlphaTo24bpp :slight_smile:

Believe me, it’s better than army of duplicated mega-clusters. Code duplication is even a bigger evil than premature optimization :stuck_out_tongue:

If you know how to do that (while keeping the code possibly portable) - go for it and replace that colorPutterWithAlphaSwitch loop with inline asm that will do that faster.
However I’m afraid that blitting 8bpp surfaces is hard for optimization - I looked into SDL sources and apparently the only optimization they do there is a loop unrolling…

That would take too much memory.

Quite a much. I’m getting about 51% (adventure map just after starting standard Arrogance), with vast majority calls coming from blitting objects, not FoW.

Yup :frowning: The 1toN blit function does exactly that… Another possible micro-optimization is checking whether the current palette index is the same as the previous one, so we don’t have to re-read the same RGBA value again (a sort of RLE compression), which in turn will reduce the amount of cache misses. Although there will be a penalty of branching misprediction…

Yup, revealing the whole map doesn’t make things much faster, sadly

Well… I wonder why SDL function is SO MUCH faster? You said that replacing color putter loop with SDL blit eliminated the function from profiling log almost completely. [BTW what are the exact numbers before and after?]

If I hadn’t missed something, there are two differences - SDL implementation uses loop unrolling and doesn’t have that alpha switch. However it has an IF for checking for color key… There are quite few pixels that need alpha blending, so it should be possible to make our implementation not much slower than SDL.

Um… this site: sdl.beuc.net/sdl.wiki/SDL-1.3_Notes reports that SDL 1.3 uses SSE/MMX while blitting… but this doesn’t solve the problem for SDL 1.2. Maybe the compiler can replace unrolled loops with MMX instructions?

Not for blitting 8bpp graphicss AFAIK.