[ros-diffs] [gvg] 19843: Document (failed) attempt to optimize memcpy()

gvg at svn.reactos.com gvg at svn.reactos.com
Sat Dec 3 20:40:55 CET 2005


Document (failed) attempt to optimize memcpy()
Modified: trunk/reactos/lib/string/i386/memcpy_asm.s
Added: trunk/reactos/media/doc/memcpy_optimize.txt
  _____  

Modified: trunk/reactos/lib/string/i386/memcpy_asm.s
--- trunk/reactos/lib/string/i386/memcpy_asm.s	2005-12-03 18:16:02 UTC
(rev 19842)
+++ trunk/reactos/lib/string/i386/memcpy_asm.s	2005-12-03 19:40:52 UTC
(rev 19843)
@@ -1,9 +1,7 @@

-/* 
- * $Id$
- */
-
 /*
  * void *memcpy (void *to, const void *from, size_t count)
+ *
+ * Some optimization research can be found in
media/doc/memcpy_optimize.txt
  */
 
 .globl	_memcpy
  _____  

Added: trunk/reactos/media/doc/memcpy_optimize.txt
--- trunk/reactos/media/doc/memcpy_optimize.txt	2005-12-03 18:16:02 UTC
(rev 19842)
+++ trunk/reactos/media/doc/memcpy_optimize.txt	2005-12-03 19:40:52 UTC
(rev 19843)
@@ -0,0 +1,55 @@

+Surfing the Internet, I stumbled upon http://www.sciencemark.org where
you
+can download a benchmark program that (amongst others) can benchmark
different
+x86 memcpy implementations. Running that benchmark on my machine
revealed that
+the fastest implementation was roughly twice as fast as the "rep movsl"
+implementation (lib/string/i386/memcpy_asm.s) that ReactOS uses.
+To test the alternate implementations in a ReactOS setting, I first
+instrumented the existing memcpy implementation to log with which
arguments
+it was being called. I then booted ReactOS, started a background
compile in it
+(to generate some I/O) and played a game of Solitaire (to generate
graphics
+operations). After loosing the game, I shut down ReactOS. I then
extracted
+the memcpy calls roughly between the start of Explorer (to get rid of
one time
+startup effects) an shutdown. The resulting call profile is attached
below.
+I then used that profile to make calls to the existing memcpy and an
alternate
+implementation (I selected the "MMX registry copy with SSE
prefetching"),
+taking care to use different source and destination regions to remove
caching
+effects. The profile consisted of roughly 250000 calls to memcpy, I
found
+that I had to execute the profile 10000 times to get "reasonable" time
values.
+To compensate for the overhead of the test program, I also ran a test
where
+the whole memcpy routine consisted of a single instruction: "ret". The
test
+results, after applying a correction for the overhead:
+
+rep movl 70.5 sec
+mmx registers 58.3 sec
+Speed increase: 17%
+
+(Test machine: AMD Athlon MP 2800+ running Linux).
+Although the relative speed increase is nice (17%), we also have to
look at the
+absolute speed increase. Remember that the 70.5 sec for the "rep movl"
case
+was obtained by running the whole profile 10000 times. This means that
all the
+memcpy's executed during the profiling run of ReactOS together took
only
+0.00705 seconds. So the conclusion has to be that we're simply not
spending
+a significant amount of time in memcpy (BTW, our memcpy implementation
is
+shared between kernel and user mode, of the total of 250000 memcpy
calls about
+90% were made from kernel mode and 10% from user mode), so optimizing
memcpy
+(although possible) will not result in a significant better performance
of
+ReactOS as a whole.
+Just for fun, I then used only the part of the profile where the memory
area
+was larger than 128 bytes. The MMX implementation actually only runs
for sizes
+over 128 bytes, for smaller sizes it deferred to the "rep movl"
implementation.
+According to the profile, the vast majority of memcpy calls is made
with a
+size smaller than 128 bytes (96.8%).
+
+rep movl 52.9 sec
+mmx registers 27.1 sec
+Speed increase 48%
+
+This is more or less in line with the results I got from the membench
benchmark
+from http://www.sciencemark.org.
+
+Final conclusion: Although optimizing memcpy is useful (and feasible)
for
+transfer of large blocks, the usage pattern in ReactOS consists mostly
of
+small blocks. The resulting absolute spead increase doesn't justify the
+increased code complexity.
+
+2005/12/03 GvG
Property changes on: trunk/reactos/media/doc/memcpy_optimize.txt
___________________________________________________________________
Name: svn:eol-style
   + native
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.reactos.org/pipermail/ros-diffs/attachments/20051203/5f7acbdd/attachment.html


More information about the Ros-diffs mailing list