[ros-dev] Speed Tests (was: ping Alex regarding log2() for scheduler)

Alex Ionescu ionucu at videotron.ca
Wed Mar 23 21:57:51 CET 2005


Ash wrote:

> Hello,
>
> I'd like to provide a small VS.NET project file with 5 different tests.
>
> - 46ffffe9 tells everything is alright with the function calculation
> - times are measured with QueryPerformanceCounter
> - loop run cnt: 0x2ffffff
>
> result orig function            46ffffe9
> it took         1491052
> result orig function inlined    46ffffe9
> it took         1035547
> result second proposal inlined  46ffffe9
> it took         1244434
> result optimized asm            46ffffe9
> it took         1338367
> result debug asm                46ffffe9
> it took         8774815
>
> The second proposal is the original proposal but more shifts - still 
> slower tho.
>
> Interesting is the inlined version generated by MSVC, shaving off 
> almost 1/3 of the overall time.
> Also stared as "optimized asm" - no gurantee on register safety tho ;)
>
> For portability and performance sake it should be considered to create 
> a compiler macro.
> This function is terribly small, any optimisations inside are 
> outweighted by the calling overhead in this case.
> The most impressive one is the original function inlined, althought 
> the ASM would only work on x86.
>
> Please do not think about using 64k tables, thats what, 1/2 of a 
> Sempron L2 cache?
> It would really really trash performance.

Hi Ash,

Thanks a lot for your tests. I don't have much time tonight, but if 
you'd like/can, can you add two more tests?

One using "bsr", the intel opcode. It does all the work for you and 
returns the index.

i think it's as simple as "bsr ecx, eax" where "ecx" is the mask and eax 
is the returned index. Or it might be backwards.

The second test is using a 256-byte log2 table:

const char LogTable256[] = 
{
  0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
  4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
  5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
  5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
  6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
  6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
  6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
  6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
};


and a lookup code like this:

Addon = 16;
if ((IntMask = Mask >> 16)) {
     Addon = 0;
     IntMask = Mask;
}
if (IntMask && 0xFFFFFF00) {
     Addon += 8;
}
HighBit = LogTable256[(Mask >> Addon)] + Addon;



More information about the Ros-dev mailing list