[ros-dev] [ros-diffs] [sir_richard] 45052: Patch that fixes VMWare boot (and
Ros Arm
ros.arm at reactos.org
Tue Jan 12 15:08:51 CET 2010
- Previous message: [ros-dev] [ros-diffs] [sir_richard] 45052: Patch that fixes VMWare boot (and should fix QEMu/KVM boot on the testbot): [NTOS]: A trap can get us into a state where DS/ES are invalid, making any pointer dereference (on DS/ES segmented memory, not SS, the stack) crash (and probably double-fault). Therefore, we have to be careful to switch to a good DS/ES before touching the TrapFrame pointer, which we don't have in ESP like the ASM code, but in a DS/ES-segmented register. For V8086 traps we can switch to the good DS/ES immediately, but for other kinds of traps, we actually need to save the current (bad) segments first. So we save them on the stack now, then switch to the good ones, then store the stack values into the trap frame. This is what happens on a non-optimized (-O0) build. On an optimized build, the segments will end up in registers instead, which is fine too (they'll be direct values). The order of instructions is guaranteed since the segment macros are volatile. [NTOS]: The GPF and Invalid Opcode handlers are performance critical when talking about V8086 traps, because they control the main flow of execution during that mode (GPFs will be issued for any privileged instruction we need to emulate, and invalid opcode might be generated for BOPs). Because of this, we employ a fast entry/exit macro into V8086 mode since we can make certain assumptions. We detect, and use, such scenarios when the V8086 flag is enabled in EFLAGS. However, because we can land in a GPF handler with an invalid DS/ES, as some V8086 code could trample this during BIOS calls for example, we must make sure that we are on a valid DS/ES before dereferencing any pointer. We fixup DS/ES either in KiEnterTrap (for normal entry/exit) or, for V86, in KiEnterV86Trap. Notice the problem: we need to detect which of these to use early on but we can't touch the EFLAGS in the frame because DS/ES could be invalid. Thankfully SS is always guaranteed valid, so stack dereferences are game! We therefore read the EFLAGS here, in assembly, where we can touch ESP as we please. We save this in EDX, which will be used as the second argument for the FASTCALL C trap entry. When we make the fast V86 check, we use the parameter instead of the trap frame, leading us to using the correct trap entry function, which fixes up DS/ES and lets us go on our merry way... [NTOS]: Make appropriate changes to GENERATE_TRAP_HANDLERS macro. [NTOS]: Switch to using well-known NT trap handler names (hex-based, double-zeroed) instead of decimal-based trap handler names which are confusing. [NTOS]: Clean up some debug spew.
- Next message: [ros-dev] [ros-diffs] [sir_richard] 45052: Patch that fixes VMWare boot (and
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Hi Timo,
Regarding #1, we actually have this working in our tree. It was required for the new SYSENTRY C handler. So thank you for the advice and for pointing that out. Unfortunately I will not credit you for the idea since the code was already written, I hope this is understandable. We had to do this not for performance, but for compatibility with the Eoi helper function which is exported. The performance difference between pushad and doing the instructions manually (and later fixing up) was actually not that big, since pushad has certain pipeline advantages. However, when dealing with interrupts, it starts becoming more of a problem (although that code is not in mainline yet). Also your sample code has some bugs: the last integer register save is duplicated, and it would be better to use a symbolic name instead of 10 * 4. That value is also wrong, since it depends on whether or not the trap generates an error code or not. x86 is not easy, have to be careful!
Regarding #2, I unfortunately do not see the need to add more assembly when it is not needed. There are no subtle compiler breakages involved, since it is impossible for the compiler to add a DS/ES access for no reason at all (can you come up with an example?), this is part of the contract the compiler has to make (if compilers did this, it would break a whole class of embedded and real-mode software). Because the stack is safe, I think it's worth avoiding the extra lines of assembly. The C inlines will produce exactly the same code anyway. Also, it's not as easy as you make it seem, since for a FAST-V86 trap, we don't want to save the registers, just change them. So we need to insert a check for EFLAGS to see if the V8086 flag is there or not, then conditionally do the registers, then jump to the C code that will do another conditional check, and pick the right trap entry C code... it gets quite complicated. Since there are no "real" issue with doing it in C, other than personal choice and style, I think we'll stick with what we have.
The patch for #1 will also start introducing some other performance enhancing features, such as branch prediction hints for the CPU and compiler, and other little boosts. With the pusha conversion code gone, the C code is actually more efficient than previous assembly implementations.
Thanks for your comments and suggestions, please keep them coming. Do not be discouraged by this particular decision regarding #2 (it's also slightly political) as a sign or pattern that any suggestion will be ignored (you were not the first to bring up #1, otherwise we'd have done it thanks to you).
Also, if anyone has any experience with MSC++ and could port the GCC-centric macros, that would be much appreciated. I've given up on MSC++ after the compiler and linker have tried, year after year, to add more and more shit to what used to be a simple portable codebase. Buffer this, buffer that, nx this, dep that, safe seh here, hotpatch there, manifest files to even get it to run... ARGH!
Finally, the recent fixes have solved the double fault problem, but it seems certain flavours of VMWare use a different Video BIOS than my test machines and that V8086 C emulation code must be hitting a bug. It could also be we're seeing the BOP and not handling it. I'll look into this today. Note that the ASM implementation of PUSHF (or was it POPF) was completely broken. Someone must've made some serious typos when *ahem* writing the code. Hopefully there are no dependencies on that bug.
-r
> Hi,
>
> I suggest the following changes to the current implementation:
>
> 1.) Replace the PUSHA in the trap entry code, by a series of MOVs to the
> correct stack positions. The same for the trap exit code.
> KiTrapFrameFromPushaStack can be removed then. This would result in more
> clear, less complex and faster code, with only a few additional
> assembly instructions in the trap entry macro. The exit could be done
> either by normal call/return or by a jmp to an exit handler.
>
> 2.) Segements should be fixed up before entering C code, as not doing so
> may introduce possible compiler dependend breakages. It's also much
> cleaner and there's no reason to do the same stuff later in inline
> assembly instead of direcly in the asm entry point.
>
> The resulting code might looks something like this:
>
> /* Allocate KTRAP_FRAME */
> sub esp, KTRAP_FRAME_LENGTH - 10 * 4
>
> /* Save integer registers */
> mov [esp + KTRAP_FRAME_EBP], ebp
> mov [esp + KTRAP_FRAME_EBX], ebx
> mov [esp + KTRAP_FRAME_ESI], esi
> mov [esp + KTRAP_FRAME_EDI], edi
> mov [esp + KTRAP_FRAME_EAX], eax
> mov [esp + KTRAP_FRAME_ECX], ecx
> mov [esp + KTRAP_FRAME_EDX], edx
> mov [esp + KTRAP_FRAME_EBX], ebx
>
> /* Save segment regs */
> mov [esp + KTRAP_FRAME_SEGDS], ds
> mov [esp + KTRAP_FRAME_SEGES], es
>
> /* Fixup segment regs */
> mov ax, KGDT_R3_DATA | RPL_MASK
> mov ds, ax
> mov es, ax
>
>
> Timo
>
>
> _______________________________________________
> Ros-dev mailing list
> Ros-dev at reactos.org
> http://www.reactos.org/mailman/listinfo/ros-dev
- Previous message: [ros-dev] [ros-diffs] [sir_richard] 45052: Patch that fixes VMWare boot (and should fix QEMu/KVM boot on the testbot): [NTOS]: A trap can get us into a state where DS/ES are invalid, making any pointer dereference (on DS/ES segmented memory, not SS, the stack) crash (and probably double-fault). Therefore, we have to be careful to switch to a good DS/ES before touching the TrapFrame pointer, which we don't have in ESP like the ASM code, but in a DS/ES-segmented register. For V8086 traps we can switch to the good DS/ES immediately, but for other kinds of traps, we actually need to save the current (bad) segments first. So we save them on the stack now, then switch to the good ones, then store the stack values into the trap frame. This is what happens on a non-optimized (-O0) build. On an optimized build, the segments will end up in registers instead, which is fine too (they'll be direct values). The order of instructions is guaranteed since the segment macros are volatile. [NTOS]: The GPF and Invalid Opcode handlers are performance critical when talking about V8086 traps, because they control the main flow of execution during that mode (GPFs will be issued for any privileged instruction we need to emulate, and invalid opcode might be generated for BOPs). Because of this, we employ a fast entry/exit macro into V8086 mode since we can make certain assumptions. We detect, and use, such scenarios when the V8086 flag is enabled in EFLAGS. However, because we can land in a GPF handler with an invalid DS/ES, as some V8086 code could trample this during BIOS calls for example, we must make sure that we are on a valid DS/ES before dereferencing any pointer. We fixup DS/ES either in KiEnterTrap (for normal entry/exit) or, for V86, in KiEnterV86Trap. Notice the problem: we need to detect which of these to use early on but we can't touch the EFLAGS in the frame because DS/ES could be invalid. Thankfully SS is always guaranteed valid, so stack dereferences are game! We therefore read the EFLAGS here, in assembly, where we can touch ESP as we please. We save this in EDX, which will be used as the second argument for the FASTCALL C trap entry. When we make the fast V86 check, we use the parameter instead of the trap frame, leading us to using the correct trap entry function, which fixes up DS/ES and lets us go on our merry way... [NTOS]: Make appropriate changes to GENERATE_TRAP_HANDLERS macro. [NTOS]: Switch to using well-known NT trap handler names (hex-based, double-zeroed) instead of decimal-based trap handler names which are confusing. [NTOS]: Clean up some debug spew.
- Next message: [ros-dev] [ros-diffs] [sir_richard] 45052: Patch that fixes VMWare boot (and
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the Ros-dev
mailing list