[ros-dev] Speed Tests (was: ping Alex regarding log2() for
scheduler)
Alex Ionescu
ionucu at videotron.ca
Wed Mar 23 21:57:51 CET 2005
Ash wrote:
> Hello,
>
> I'd like to provide a small VS.NET project file with 5 different tests.
>
> - 46ffffe9 tells everything is alright with the function calculation
> - times are measured with QueryPerformanceCounter
> - loop run cnt: 0x2ffffff
>
> result orig function 46ffffe9
> it took 1491052
> result orig function inlined 46ffffe9
> it took 1035547
> result second proposal inlined 46ffffe9
> it took 1244434
> result optimized asm 46ffffe9
> it took 1338367
> result debug asm 46ffffe9
> it took 8774815
>
> The second proposal is the original proposal but more shifts - still
> slower tho.
>
> Interesting is the inlined version generated by MSVC, shaving off
> almost 1/3 of the overall time.
> Also stared as "optimized asm" - no gurantee on register safety tho ;)
>
> For portability and performance sake it should be considered to create
> a compiler macro.
> This function is terribly small, any optimisations inside are
> outweighted by the calling overhead in this case.
> The most impressive one is the original function inlined, althought
> the ASM would only work on x86.
>
> Please do not think about using 64k tables, thats what, 1/2 of a
> Sempron L2 cache?
> It would really really trash performance.
Hi Ash,
Thanks a lot for your tests. I don't have much time tonight, but if
you'd like/can, can you add two more tests?
One using "bsr", the intel opcode. It does all the work for you and
returns the index.
i think it's as simple as "bsr ecx, eax" where "ecx" is the mask and eax
is the returned index. Or it might be backwards.
The second test is using a 256-byte log2 table:
const char LogTable256[] =
{
0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
};
and a lookup code like this:
Addon = 16;
if ((IntMask = Mask >> 16)) {
Addon = 0;
IntMask = Mask;
}
if (IntMask && 0xFFFFFF00) {
Addon += 8;
}
HighBit = LogTable256[(Mask >> Addon)] + Addon;
More information about the Ros-dev
mailing list