[ros-dev] [ros-diffs] [tkreuzer] 42353: asm version of DIB_32BPP_ColorFill: - Add frame pointer - Get rid of algin_draw, 32bpp surfaces must be DWORD aligned - Optimize the loop - Add comments

Jose Catena jc1 at diwaves.com
Wed Aug 5 02:37:34 CEST 2009


> - A builtin / intrinsic != inline asm

I never said that. I used "instead". I apologize for not being clear enough.

> but how would you want to optimize "rep stosd" anyway?

No way. That's what I said, possibly with the exception of using a 64 bit
equivalent if we could assume that the CPU is 64 bit capable.
But Alex knows better, he's is calling me an ignorant. He says that

L1:	Mov [edi], eax
	Add edi, 4
	Dec ecx
	Jnz L1

Is faster than

	rep stosd

Both things do exactly the same thing, the later much smaller AND FASTER in
any CPU from the 386 to the i7.
And he shows an irrelevant portion of code to prove nothing regarding what I
said, BTW we don't know what his compiler generated for the loop.
In other cases he changes the meaning of what I wrote, corrects something I
didn't say at all, or make unbased assumptions.
I'm not going to answer him, LOL! This would be an endless loop. Anyway I
always agreed with him in that asm is not helpful in this and most cases.
This discussion is a waste of time. 
I thought from previous posts that he had better knowledge, and perhaps he
has, but certainly does not know much of assembly and CPU architectures, yet
he pretends and doesn't like to be corrected... bad for him.

> none of the compilers I tested was able to generate a rep stosd from
either a loop or memset

LOL, are we really in 2009? Try the C source I posted, it should be compiled
as rep stosd. MSVC and Intel certainly do regardless of the target CPU, and
not precisely since recent versions. Let me know if yours doesn't, I won't
like a compiler that doesn't do such a basic and evident optimization.
Most often I know pretty well what a compiler will generate without looking
at the generated asm, the way C code is written matters in some cases.

As for memset, MSVC inline memset will generate rep stosd and possibly a
stosw and/or stosb if the byte count is not a multiple of the max size or
non constant, what's ok. The library version also uses the same, with the
call overhead. Anyway memset is not suitable here, it is for 8 bit and
wmemset for 16 bit values, while we want to store 32 bit values.

Jose Catena
DIGIWAVES S.L.





More information about the Ros-dev mailing list