Optimizing VCMI_client

Vicious · August 24, 2010, 9:58pm

Hello!

I’ve been following your project for a while now and I must say I’m impressed with your progress so far. There are still a lot of things to be done of course
Being a Heroes3 fan and also an game engine programmer myself (which is my hobby) I’ve decided to check out VCMI sources and do some profiling work. What struck me immediately is the amount of time spent in CSDL_Ext::blit8bppAlphaTo24bppT, blitting individual colors. Why not to try using SDL_BlitSurface surface here instead? Replacing the whole ColorPutter loop with a SDL_SetAlpha+CSDL_Ext::blitSurface pair eliminated the aforementioned method from profiling log almost completely with 90% of images looking correctly. Granted, it doesn’t fully work for all surfaces that way (for example, the battlefield surface seems to have an alpha value inverted?) but it’s a good start in my opinion.

Keep up the good work!

Vicious · August 24, 2010, 10:17pm

Ah, the black battlefield effect seems to come from hex overlay, guess that should be easily fixable…

Tow_dragon · August 25, 2010, 9:17am

Interesting. We’ve written CSDL_Ext::blit8bppAlphaTo24bppT because we discovered that SDL isn’t capable of blitting 8bpp surfaces with alpha channel into 24bpp surfaces. 90% of images seems to be the number of images without alpha channel… SDL_SetAlpha sets alpha value per-surface, but sometimes we need per-pixel alpha in 8bpp surfaces (e.g. adventure map objects’ shadows) - and this is what CSDL_Ext::blit8bppAlphaTo24bppT does.

Vicious · August 25, 2010, 10:02am

Yeah, I guess this is simply the case of blit8bppAlphaTo24bppT being overused, that is, being used for images that don’t need per-pixel alpha blending.
SDL_SetAlpha should work fine for hex overlays of battlefields though.

Vicious · August 25, 2010, 5:02pm

I think the major performance hog is FoW, which covers huge areas of the map at the beginning. Does it really need per-pixel alpha blending?

Warmonger · August 25, 2010, 5:06pm

I agree that this one slows game terribly. Simple discovering the map in early game can hurt, and even on decent machine Cartographer needs a few seconds to activate.

On the other hand, it may be nice to implement smooth FoW blending when it’s revealed just for good graphical effect present in most of strategies I can think of.

Vicious · August 25, 2010, 5:13pm

Well, even in this case, alpha-blending would be only needed for border tiles, wouldn’t it?

Tow · August 25, 2010, 5:58pm

What you’ve profiled - adventure map view or battlefield (or something else)? Have you used fully optimized build of VCMI?

Unfortunately it does.

Each time I’ve used blit8bppAlphaTo24bpp function I had my reasons. And I hope that this is true also for other programmers. Of course you may find places where SDL blit functions may be used (with or without additional workarounds) but it’ll always result in making additional assumptions about game graphics and will impose limitations.

The best solution would be to optimize blit8bppAlphaTo24bppT - if it can be fast in SDL, it can be fast in VCMI.

That is totally unrelated to the blitting FoW on the adventure map.

Vicious · August 25, 2010, 6:29pm

I profiled adventure map. And yes, I used a fully optimized build of VCMI.
The only way to optimize blit8bppAlphaTo24bpp is to give up the inlined C++ templates and other clusterfuckages and just go with MMX/SSE intrinsics. Obviously one can’t optimize individual methods like putColor…

I’m not sure that 24bit blits are really that optimizable though… Another possible option is to convert all images/surfaces to fullblown 32bit format and let sdl do the convertion to screen pixel format when actual blitting to screen happens.

Tow_dragon · August 26, 2010, 10:11am

Actually adventure map performance could be improved by not blitting objects hidden by full FoW and only borders of FoW must be blitted by blit8bppAlphaTo24bpp… terrainRect is an old, ugly function that should be redesigned anyway (selecting animation of hero by info from CGDefInfo and mechanism of showing hero’s flag are terrible).

Vicious · August 26, 2010, 11:48am

Well, this is one of the things I was saying above… Will try to implement that later.

Tow_dragon · August 26, 2010, 12:19pm

I’ve just implemented it. I’ll commit my change soon.

Vicious · August 26, 2010, 12:50pm

revision 1778, you mean?

Tow_dragon · August 26, 2010, 1:36pm

No, actually not. I’m working now on support for 800x480 resolution, you have to wait one more hour or so.

EDIT:
Optimization committed in revision 1780.

Vicious · August 26, 2010, 4:44pm

Well, even with this optimization in place, profiling sampler ends up with 88% of hits in blit8bppAlphaTo24bpp

Tow · August 26, 2010, 5:24pm

Believe me, it’s better than army of duplicated mega-clusters. Code duplication is even a bigger evil than premature optimization

If you know how to do that (while keeping the code possibly portable) - go for it and replace that colorPutterWithAlphaSwitch loop with inline asm that will do that faster.
However I’m afraid that blitting 8bpp surfaces is hard for optimization - I looked into SDL sources and apparently the only optimization they do there is a loop unrolling…

That would take too much memory.

Quite a much. I’m getting about 51% (adventure map just after starting standard Arrogance), with vast majority calls coming from blitting objects, not FoW.

Vicious · August 26, 2010, 5:37pm

Yup The 1toN blit function does exactly that… Another possible micro-optimization is checking whether the current palette index is the same as the previous one, so we don’t have to re-read the same RGBA value again (a sort of RLE compression), which in turn will reduce the amount of cache misses. Although there will be a penalty of branching misprediction…

Yup, revealing the whole map doesn’t make things much faster, sadly

Tow · August 26, 2010, 8:27pm

Well… I wonder why SDL function is SO MUCH faster? You said that replacing color putter loop with SDL blit eliminated the function from profiling log almost completely. [BTW what are the exact numbers before and after?]

If I hadn’t missed something, there are two differences - SDL implementation uses loop unrolling and doesn’t have that alpha switch. However it has an IF for checking for color key… There are quite few pixels that need alpha blending, so it should be possible to make our implementation not much slower than SDL.

Tow_dragon · August 27, 2010, 9:09am

Um… this site: sdl.beuc.net/sdl.wiki/SDL-1.3_Notes reports that SDL 1.3 uses SSE/MMX while blitting… but this doesn’t solve the problem for SDL 1.2. Maybe the compiler can replace unrolled loops with MMX instructions?

Tow · August 27, 2010, 9:56am

Not for blitting 8bpp graphicss AFAIK.