[ros-dev] Optimization Proposal

Wed Jan 4 01:49:28 CET 2006

Hi,

I have come to the conclusion that using -O2 is beneficial even for DBG 
= 1 builds, and that it should be set on by default on all builds. The 
typically given reason for not using optimizations on a "Debug" build is 
because these apparently make assembly code harder to read. I have 
realized otherwise, and as seen in the example that I will include 
below, I'm sure this will be mutually agreed on. I note the following 
advantages in using -O2 on a DBG = 1 build as well:

- -O2 makes the compiler do additional checks. For example, gcc will NOT 
detect uninitialized variables unless -O2 is being used, even though 
they are a very important programming bug. Apart from finding more bugs, 
it also makes trunk compilable. Right now, I see at least two commits by 
Thomas or others being made every week in order to fix some code which 
used unitinialized variables (I myself have been guilty of this). This 
means that some of us, like Thomas, have to constantly fix other 
people's mistakes.
- -O2 means less last-minute blockers. Because we release in -O2 but 
almost never build it like that, this creates a big problem for people 
like Andrew or Brandon, which handle the release process and do testing. 
Because the -O2 build gets less testing coverage, it is very possible 
for a critical bug to be in ROS for a month before anyone notices it at 
release time, in which case we will all have to scramble to find a fix 
for it.
- -O2 will not undefine DBG or change anything else in the code. All the 
advatanges, extra error checking and assertions of the DBG =1 build 
would remain.
- -O2 builds are much faster, greatly helping testing speed.
- -O2 builds are much more likely to bring up race conditions and other 
important timing bugs we need to watch out for.
- -O2 means easier debugging. This point is really important because 
until I realized how true it was, I didn't want to bring this up. Here 
is a pseudo(but real) disassembly of something I've seen in my dbg = 1 
kernel binary while debugging:

0x40b845:
push ebp
mov ebp, esp
sub esp, 4
mov [ebp-4], fs:18h
mov eax, [ebp-4]
leave
retn

0x4bc8a5:
push ebp
mov ebp, esp
sub esp, 4
call 0x40b845
mov ecx, [eax+1c]
mov [ebp-4], eax
mov eax, [ebp-4]
leave
retn

0x42b845:
push ebp
mov ebp, esp
sub esp, 4
call 0x4bc8a5
mov ecx, [eax+124]
mov [ebp-4], eax
mov eax, [ebp-4]
leave
retn

KeFooBar:
push ebp
mov ebp, esp
sub esp, 4c
call 0x42b845
mov [ebp-0xc], eax
mov eax, [ebp-0xc]
<..>
leave
retn

This is how it looks with -O2

KeFooBar:
push ebp
mov ebp, esp
sub esp, 4c
mov eax, fs:124h
<..>
leave
retn

I hope we can all agree on which one of these is readable. The -O2 build 
clearly shows you that eax is fs:124h, which you oughta know is 
Pcrb->CurrentThread; even if you don't, you can easily check in a 
header. The non-o2 build calls 3 other functions, out of which 2 are 
merely calling other functions themselves (due to lack of symbols you 
have no way of knowing what these functions are doing), until we finally 
get to a function which does fs:18, which you then realize is the PCR, 
you then walk back and realize pcr->0x1c is PCRB, and Prcb->0x124 is 
current thread.

Yes, this example could easily be destroyed by saying " use a #define 
with inline assembly" but I can bring many more; we can't start using 
inline assembly everywhere... msvc does an amazing job at optimizing 
these things, and even gcc isn't that bad, if only you let it. Code 
built without -o2 makes horrible usage of the stack, which makes you 
have to memory a lot more addresses then code which simple stores values 
in registers. Because humans are smart, the loops generated by -O2 are 
also much closer to what someone that understands assembly is used to 
(for example, the loop will use ecx, and not a stack variable that you 
need to memorize). I consider myself an expert on assembly coding, and I 
simply have great trouble reading non-O2 kernels, so how exactly does it 
help debugging?

In the end, I am convinced that the only disadvantage of using -O2 by 
default is that it will slightly increase build times. I don't think 
this increase is more then, at most 1 minute or two for a complete 
build. If this issue is really critical to someone people, then perhaps 
only core system files should use -O2 (kernel32, ntdll, ntoskrnl, csr, 
win32k, drivers, etc).

I know some of the developers on IRC are strongly for this, but I want 
to make sure I get a broader opinion.

Best regards,
Alex Ionescu