[ros-dev] [ros-diffs] [tkreuzer] 42353: asm version of DIB_32BPP_ColorFill: - Add frame pointer - Get rid of algin_draw, 32bpp surfaces must be DWORD aligned - Optimize the loop - Add comments

Wed Aug 5 09:43:13 CEST 2009

> On most processors, less than 8 iterations will be faster with a move than
with a rep.

I'd say more like 4 (separate moves), and not feasible if the number of
iterations is variable like in our case. It would be possible a loop with
many moves inside, even better SSE stores, and after that a rep stosd for
the remainder, indeed faster for large cx counts. Does any compiler
currently generate that automatically? None of the ones I know, but can be
done to some extent writing it that way in C. Possible in asm? Of course.
DMA fill? No joy. GPU accelerated fill? Perhaps in the future.
I keep thinking that this is not important enough to justify asm, not even
to break the loop in two in C. At least not before ROS is complete and
stable and we want to optimize every bit. And by then we may be very well
thinking about GPU accelerated GDI too. 

Jose Catena
DIGIWAVES S.L.