[ros-dev] Speed Tests (was: ping Alex regarding log2() forscheduler)

Mark Junker mjscod at gmx.de
Thu Mar 24 11:25:48 CET 2005


Ash schrieb:

> BSR has a latency of 8-12 Cycles on Athlon/P3 but can be pipelined. 
> Worse (up to ~80 cycles) on Pentium and other older CPUs.
> http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_3748,00.html 
>

My tests have shown that you're right and BSR is much too slow.

> Dont know about A64 - maybe someone can test BSR with A64?

I have an AMD64 here but it doesn't run in 64 bit mode.

> It doesnt make much sense to put the optimized ASM in there, neither 
> is much hope of GCC having a good day and doing a lot of optimisation.
> So far the best option would be the macro with a lookup table (only 
> one global kernel table tho).

I've converted your sources to be compileable with GCC (MinGW). I 
attached the sources.

> Here are the updated STATS
> also available at http://hackersquest.org/kerneltest.html
>
> result orig function            46ffffe9
> it took         1526862         18%
> result orig function inlined    46ffffe9
> it took         1041460         12%
> result second proposal inlined  46ffffe9
> it took         1248990         15%
> result optimized asm            46ffffe9
> it took         1321532         16%
> result lookup inlined           46ffffe9
> it took         682264          8%
> result bsr inlined              46ffffe9
> it took         1751088         21%
> result macro                    46ffffe9
> it took         653692          7%

This are my results on the AMD64 using your Release-EXE:

STATS
result orig function            46ffffe9
it took         1272638         18%
result orig function inlined    46ffffe9
it took         875751  12%
result second proposal inlined  46ffffe9
it took         1051861         15%
result optimized asm            46ffffe9
it took         1225282         17%
result lookup inlined           46ffffe9
it took         549861          7%
result bsr inlined              46ffffe9
it took         1410179         20%
result macro                    46ffffe9
it took         607638          8%

This are my results using the GCC EXE (-O2):

STATS
result orig function            46ffffe9
it took         1321663         24%
result orig function inlined    46ffffe9
it took         879318  16%
result second proposal inlined  46ffffe9
it took         940285  17%
result lookup inlined           46ffffe9
it took         615267          11%
result bsr inlined              46ffffe9
it took         1103432         20%
result macro                    46ffffe9
it took         484450          9%

BTW:  I had to remove all functions using the __asm() statement. The 
"result bsr inlined" uses my GCC BSR macro. You can see that using BSR 
seems to be much too slow ...

Regards,
Mark

-------------- next part --------------
A non-text attachment was scrubbed...
Name: SpeedTest.zip
Type: application/x-zip-compressed
Size: 3062 bytes
Desc: not available
Url : http://reactos.com:8080/pipermail/ros-dev/attachments/20050324/1e8955f9/SpeedTest.bin


More information about the Ros-dev mailing list