Dynamic Application Translation

All development related issues welcome

Moderator: Moderator Team

Post Reply
BlackRabbit
Posts: 128
Joined: Sat Dec 22, 2012 7:36 am

Dynamic Application Translation

Post by BlackRabbit »

Hi All,

As several have mentioned over the last few years, it would be beneficial if we could port ReactOs to ARM, while not cannibalizing our human resources that are dedicated to x86-32/x86-64/PPC development. Thinking about the nature of Win32 applications, I was wondering what all of you might think of the following idea. It takes a somewhat different approach from pure run-time emulators, and, assuming that it worked, would allow x86-32 binaries to run on ARM devices at native ARM speed, without having to recompile, manually, the x86-32 applications: :idea:

First, some observations:
  • The vast majority of Win32 applications consists of one or more EXE modules and one or more DLL modules.
  • The format of these modules is well-documented, and obviously understood by the members of the ReactOS team that created the ReactOS ring-3 loader.
    See: http://en.wikipedia.org/wiki/Portable_Executable
  • An EXE or DLL essentially runs in an execution "bubble". Anything beyond register/RAM state-manipulation goes through a documented interface: Win32.
  • The calls that are made to the Win32 API are readily observable from examination of the invoking module's import table. Calls invoked via the LoadLibrary/GetProcAddress sequence are special, of course.
Given these facts, it seems that it should be possible to perform dynamic application translation from, say, x86-32 to ARMv7 by doing the following:
  • Install the x86-32 application onto the ARM device as usual. The layout of the x86-32 application on the ARM file system would be essentially identical to what it would have been had the application been ARM-targeted. [This is an inductive step. See explanation below.]
  • Execute the x86-32 application using the regular mechanism: CreateProcess will be invoked against the initial binary for the application, which of course, at this point, will contain ARM machine code.
  • The ReactOs ring-3 loader would need to be modified so that it can detect if the executable image is x86-32-based or ARM-based.
  • If the image is x86-32-based, fine. There is nothing to do. If the imagine is ARM-based, then the loader would perform dynamic translation as follows:
    • Reverse-compile the x86-32 image using a machine-code-to-C disassembler.
      See: http://en.wikibooks.org/wiki/X86_Disass ... ecompilers
    • Compile the generated C code back to ARM machine code.
    • Take advantage of the fact that it is highly likely that the code-sequence for invocation of functions of the Win32 API will follow a regular, recognizable pattern. For example, when the function CreateFile is called, there would be a code sequence where the DLL import table is consulted in preparation of invocation of the CreateFile call. This is where the loader (or, equivalently, a user-mode agent that acts on behalf of the loader) performs any necessary modification so that the ARM machine code performs an equivalent operation as would the corresponding x86-32 code.
  • The conversion of an EXE, and of each DLL upon which it depends, would be a one-time operation. After the translation occurs, the generated ARM EXE and ARM DLL's would be stored in a cache on disk. The cache might be a set of files whose names are the name of the EXE or DLL affixed with its string-encoded SHA-256 hash of the file's contents as the contents were before translation. The contents of the file would be the machine code of the EXE or DLL as the machine code is after translation. A small bit of fancy foot-work would be necessary to move the ARM image to replace, temporarily, the x86-32 imagine so that GetCurrentDirectory/SetCurrentDirectory/etc. work as expected.
Explanation of Inductive Step: One might ask:

How is it possible to install an x32-86 application onto an OS that is purely ARM?

If the installation is primarily driven by an EXE, then the same mechanism described above would work. If the installation is driven by an MSI database, then something equivalent to msiexec.exe would need to be used.

Naturally, there are some problems with this technique. It will only work for well-behaved applications. Well-behaved applications consistently delegate OS-specific functions to the OS itself. There is also the matter of thread-local storage, where, I believe, the FS or GS register on x86-32 is accessed directly. Then there are specialized applications, like those that go snooping in their own thread-execution block (TEB).

One other benefit of this approach is that hoards of programmers who are comfortable using Visual Studio Express, for example, to write native C/C++ applications under Windows, would be able to target ARM without targeting ARM, by continuing to write well-behaved applications for x86-32.

Any thoughts appreciated. :mrgreen:
binsys
Posts: 19
Joined: Thu May 03, 2007 12:13 pm

Re: Dynamic Application Translation

Post by binsys »

Now can be run arm native instructions on intel x86 cpu ,dynamic binary translation technology .
For example BlueStacks, http://www.buildroid.org/blog/?p=198, libhoudini.so.
So it should not be difficult, and only need to make some changes based on qemu.
BlackRabbit
Posts: 128
Joined: Sat Dec 22, 2012 7:36 am

Re: Dynamic Application Translation

Post by BlackRabbit »

Looking at the description of BlueStacks here:
http://www.zdnet.com/blog/open-source/r ... acks/11122
...
It appears that BlueStacks is different from what I am proposing.

It appears that BlueStack emulates the Android Dalvik Virtual Machine. This makes sense. After all, so-called "native" C++ development on Android is not really native. The native code is invoked by the JNI interface from Java. The same appears to be true on Windows Phone 8. On Windows Phone 8, it appears that Microsoft's so-called "native" C++ coding is not really native, but a just-in-time invocation of your blob of native code from a managed environment. [I could be wrong. Please correct me if I am.]

By contrast, under my proposal, there would be 0% run-time emulation. The strategy that I propose would result in an executable, running under ReactOS, on ARM, that, for all practical purposes, might as well had been developed and compiled on an ARM CPU, even though the original CPU was x86-32. There would be no emulation, no virtual machine, no thunking, and no run-time layer.
hto
Developer
Posts: 2193
Joined: Sun Oct 01, 2006 3:43 pm

Post by hto »

By contrast, under my proposal, there would be 0% run-time emulation.
Utopian idea… :)
binsys
Posts: 19
Joined: Thu May 03, 2007 12:13 pm

Re: Dynamic Application Translation

Post by binsys »

There would be no emulation, no virtual machine, no thunking, and no run-time layer.
Good idea, but that is not possible, and CPU instruction set is not the same. Must have all the dynamic process of translation, please refer to qemu the User mode emulation.
BlackRabbit
Posts: 128
Joined: Sat Dec 22, 2012 7:36 am

Re: Dynamic Application Translation

Post by BlackRabbit »

Good idea, but that is not possible, and CPU instruction set is not the same. Must have all the dynamic process of translation, please refer to qemu the User mode emulation.
That is the reason for using a C compiler: to re-compile the code after it has been de-compiled, to accommodate the change in instruction set:

[FOO.EXE on x86-32] ---> (de-compile) ---> [FOO.C] ---> (re-compile) ---> [FOO.EXE on ARMv7]
SomeGuy
Posts: 586
Joined: Mon Nov 29, 2004 9:48 am
Location: Marietta, GA

Re: Dynamic Application Translation

Post by SomeGuy »

This is basically the same approach that FX!32 took on the Digital Alpha CPUs under NT 4.

Unfortunately, that didn't fly too well back then. And the Alpha had the advantage of being a FASTER processor than the x86 for a time. So it is probably technically possible, but would still be an inferior experience (slower, less compatible) compared to using native compiled applications.
BlackRabbit
Posts: 128
Joined: Sat Dec 22, 2012 7:36 am

Re: Dynamic Application Translation

Post by BlackRabbit »

FX32!: http://en.wikipedia.org/wiki/FX!32

(Could someone please tell me how to use the ReactOS HTML composer to superpose URL's and their labels.)

The paper for FX32!:

http://static.usenix.org/publications/l ... ernoff.pdf

FX32! is a hybrid system that consists of both an emulator and a translator. While the emulator runs code, it profiles the (interpreted) source machine code for opportunities to convert portions of the source machine code to the target machine code. Then, on a different run of the application, under the emulator, at strategic execution points while executing the source machine code inside the emulator, the emulator invokes synthesized DLL's that embody target machine code that was previously and opportunistically translated.

That is different than what I am proposing.

What I propose is to eliminate completely the emulator so that, when a program is run, there is no emulation or translation whatsoever. The application will execute natively, entirely on its own, on the target CPU.

To achieve this, the source machine code would have to be translated entirely to the target machine code before the application is executed.

In the paper above, it was noted:
The resulting translated application runs up to ten times faster than the same application running under the emulator
What they mean by this is that their hybrid sometimes-interpreted-sometimes-directly-executed application was 10 times faster than the interpreted-only version.

I am proposing that the emulator should be eliminated entirely.
Z98
Release Engineer
Posts: 3379
Joined: Tue May 02, 2006 8:16 pm
Contact:

Re: Dynamic Application Translation

Post by Z98 »

I wonder if you really understand how hard what you're proposing really would be. Not all applications are portable at the instruction level.
BlackRabbit
Posts: 128
Joined: Sat Dec 22, 2012 7:36 am

Re: Dynamic Application Translation

Post by BlackRabbit »

I wonder if you really understand how hard what you're proposing really would be. Not all applications are portable at the instruction level.
Well, of course they are not. One should not try to translate machine code to machine code. :) If I tried, for example, to translate a program that has the x86_32 instruction SYSENTER in it, there would be trouble, because even though many CPU's have an equivalent instruction for fast user-to-kernel-mode transitions, it is still an instruction that comes with certain environment-specific implications that are not readily translatable.

As it turns out, this would not be an issue, because it would hardly be necessary to translate "weird" instructions. In fact, it would never be necessary to translate instructions at all for most applications, because instruction translation is not the same as de-compilation/re-compilation.

Let me show an example with an actual Windows program that puts "Hello, World!" on the screen:

Code: Select all

#include <windows.h>

int __stdcall  WinMain (HINSTANCE, HINSTANCE, LPSTR, int)
{
	MessageBox (HWND_DESKTOP, TEXT("Hello, World!"), TEXT("Title"), MB_OK);
	return 0;
}
Now we see the assembly language corresponding to this program:

Code: Select all

_WinMain@16 PROC					; COMDAT

; 4    : {

	push	ebp
	mov	ebp, esp
	sub	esp, 192				; 000000c0H
	push	ebx
	push	esi
	push	edi
	lea	edi, DWORD PTR [ebp-192]
	mov	ecx, 48					; 00000030H
	mov	eax, -858993460				; ccccccccH
	rep stosd

; 5    : 	MessageBox (HWND_DESKTOP, TEXT("Hello, World!"), TEXT("Title"), MB_OK);

	mov	esi, esp
	push	0
	push	OFFSET ??_C@_1M@MNHBCACD@?$AAT?$AAi?$AAt?$AAl?$AAe?$AA?$AA@
	push	OFFSET ??_C@_1BM@LOODKPFG@?$AAH?$AAe?$AAl?$AAl?$AAo?$AA?0?$AA?5?$AAW?$AAo?$AAr?$AAl?$AAd?$AA?$CB?$AA?$AA@
	push	0
	call	DWORD PTR __imp__MessageBoxW@16
	cmp	esi, esp
	call	__RTC_CheckEsp

; 6    : 	return 0;

	xor	eax, eax

; 7    : }

	pop	edi
	pop	esi
	pop	ebx
	add	esp, 192				; 000000c0H
	cmp	ebp, esp
	call	__RTC_CheckEsp
	mov	esp, ebp
	pop	ebp
	ret	16					; 00000010H
_WinMain@16 ENDP
_TEXT	ENDS
END
Look at the code at the entry to the WinMain function:

Code: Select all

	push	ebp
	mov	ebp, esp
	sub	esp, 192				; 000000c0H
Now look at the code at the exit of the WinMain function:

Code: Select all

	cmp	ebp, esp
	call	__RTC_CheckEsp
	mov	esp, ebp
	pop	ebp
	ret	16					; 00000010H
An experienced C programmer, who also knows iAPX assembly language, can see these two chunks of code, and immediately recognize what they are. It is clear that the stack frame is being set-up, and torn-down.

That is the job of a de-compiler. It looks at an .EXE, and disassembles the machine instructions, and is able to generate C code from the machine instructions. Looking at the code, there are several things that I could determine, if I were a de-compiler: 8-)
  • The (local) automatic variables consume 192 bytes.
  • The function WinMain has __stdcall in its prototype. I can see this from the ret 16 at the end of the function, which says that the called function is responsible for cleaning its arguments off the stack.
  • I know that, likely, there were 4 arguments passed to WinMain, not because of the ret 16, but because the entry point of the .EXE is the location of my WinMain function. I know that this is not a console app, because I can see in the Portable Executable flags of the .EXE that this is a Win32 application.
  • I can see that my function is making a call to MessageBox. I can see this because the Portable Executable, again, tells me that I need to pull in the DLL that contains MessageBoxW, and place the pointer to MessageBoxW at the the location where the argument to the call instruction should be.
Anyhow, you get the idea. By the time the de-compiler is finished, you have C code that is re-compilable to the target CPU.

As it turns out, most applications that people write follow this model, even some that are complex. This a testament to the portability of the C language, and indeed, of stable languages in general. The programmer is able to generate an arbitrarily-complex sequence of instructions, by the compiler is not. For a given code sequence in a high-level language, a compiler will generate a somewhat predictable machine code sequence. A push of the ECX register, for example, just before a call, is a dead giveaway that a member function of a C++ object is about to be called. The code sequence for hunting down a virtual function from the v-table is always readily recognizable.

The role of a de-compiler/re-compiler chain is to automate this whole process so that one starts with x86_32 .EXE, and ends with ARM .EXE.
Z98
Release Engineer
Posts: 3379
Joined: Tue May 02, 2006 8:16 pm
Contact:

Re: Dynamic Application Translation

Post by Z98 »

Decompilation as you've described it also constitutes copyright infringement in the US and I'm pretty sure the EU. Then there's the question of what happens if someone is using SSE (or more generally, assembly) intrinsics. Or if they have shader or GPU code which happens to rely on a graphics driver to exist on the target platform to bootstrap the code. You are proposing what might be an interesting research project. You however have a lot of challenges that you would need to overcome, both legally and technically, to achieve your goal.
BlackRabbit
Posts: 128
Joined: Sat Dec 22, 2012 7:36 am

Re: Dynamic Application Translation

Post by BlackRabbit »

Decompilation as you've described it also constitutes copyright infringement in the US and I'm pretty sure the EU. Then there's the question of what happens if someone is using SSE (or more generally, assembly) intrinsics. Or if they have shader or GPU code which happens to rely on a graphics driver to exist on the target platform to bootstrap the code. You are proposing what might be an interesting research project. You however have a lot of challenges that you would need to overcome, both legally and technically, to achieve your goal.
Well, my goal would not be to translate all applications, as you can see in my original post:
Naturally, there are some problems with this technique. It will only work for well-behaved applications. Well-behaved applications consistently delegate OS-specific functions to the OS itself. There is also the matter of thread-local storage, where, I believe, the FS or GS register on x86-32 is accessed directly. Then there are specialized applications, like those that go snooping in their own thread-execution block (TEB).
I guess I should have been clearer about what I meant by "well-behaved" applications. They are applications that are blandly insular, meaning that, by definition, the problems with non-translatable instructions, violating API bubbles, etc. are not present. I have no idea what percentage such applications represent relative to the total number of Windows applications.

I did consider the legal implications. For example, someone might go to the store, buy an extra copy of their favorite, well-behaved software, translate it to run on ARM, and abstain from using it on x86_32. This would likely be illegal, as you point out. However, there are many companies that write "well-behaved" applications for x86_32, but regard porting to ARM not worth their trouble (financially/technically/etc.). In this case, the tool that I proposed might have some benefit to the company, as well as their customers. It would be left to the company to decide whether their customers are allowed to translate the company's applications to ARM.
ThePhysicist
Developer
Posts: 509
Joined: Mon Apr 25, 2005 12:46 pm

Re: Dynamic Application Translation

Post by ThePhysicist »

There is in fact something that claims to do exactly this: Winulator.

http://www.geek.com/articles/mobile/win ... -20121128/

It claims to make old Windows games run natively on Android.

But I fear it is a hoax :D
BlackRabbit
Posts: 128
Joined: Sat Dec 22, 2012 7:36 am

Re: Dynamic Application Translation

Post by BlackRabbit »

But I fear it is a hoax
My guess is that it is real indeed.

I took a quick look at the bottom of the Winulator FAQ page, and it seems that the developer, Dan Aloni , is doing pretty much was I proposed, which makes me happy that I did not spend any time working on this myself. 8-)

Several people have said that converting an x86_32 .EXE to an ARM .EXE is hard, but it seems not hard at all. An .EXE is quite structured. The key is how it interacts with the OS, and that is through the Win32 API. I suspect this is what Dan is doing:
  • de-compile x86_32 .EXE to .C source code
  • re-compile .C source code to ARM .EXE
  • patch the code in the ARM .EXE DLL import table so that, when call is made to , say, GetMessage, the call actually goes to a GetMessage shim that provides the same operation on Linux. This would be something like XNextEvent.
You will notice that Dan mentions that only a subset of games will work. He prefers games, and early games at that, like the ones that run on Windows 95. The reason is like that it limits the scope of his work-load. The older the application, the fewer and less-advanced its API utilization, the fewer and easier are the shims that he must write. A game from the mid-1990's will surely use a relatively small, very-well-understood set of API's. DirectX, however, would be an entirely different matter.

If he were to take the part of his tool that re-compiles an x86_32 .EXE into an ARM .EXE, and stop there, without writing any Win32 shims, it would be possible to run a very large number of Windows applications, natively, on ReactOS on ARM.
BigChimp
Posts: 6
Joined: Sun Nov 25, 2012 4:16 pm

Re: Dynamic Application Translation

Post by BigChimp »

BlackRabbit wrote:Decompilation as you've described it also constitutes copyright infringement in the US and I'm pretty sure the EU.
Well, perhaps.

AFAIK, EU has explicit copyright exemptions for interoperability"
https://en.wikipedia.org/wiki/Reverse_e ... pean_Union

EU Computer Program Directive:
(15) The unauthorised reproduction, translation, adaptation or transformation of the form of the code in which a copy of a computer program has been made available constitutes an infringement of the exclusive rights of the author. Nevertheless, circumstances may exist when such a reproduction of the code and translation of its form are indispensable to obtain the necessary infor­mation to achieve the interoperability of an indepen­dently created program with other programs. It has therefore to be considered that, in these limited circum­stances only, performance of the acts of reproduction and translation by or on behalf of a person having a right to use a copy of the program is legitimate and compatible with fair practice and must therefore be deemed not to require the authorisation of the right­holder. An objective of this exception is to make it possible to connect all components of a computer system, including those of different manufacturers, so that they can work together. Such an exception to the author's exclusive rights may not be used in a way which prejudices the legitimate interests of the rightholder or which conflicts with a normal exploitation of the program.
... though reading that text it appears to be aimed at decompiling the unchanged original program, finding out how interfaces/API/file handling works, then writing your own programs to use that info together with the unchanged original program.
However, if you interpret "independently created program" as the ReactOS, it might fit.

The section on the DMCA in the US has similar terms to the EU Computer Program Directive mentioned above...though the wikipedia article indicates EULAs may complicate things.

Finally, as mentioned, binary translation is being done already, e.g. in the x86=>Alpha translation on NT, as well as in virtual machine hypervisors, so that's why I'm having trouble believing it would be illegal (otherwise we'd have seen a lot of lawsuits against Virtualbox/Oracle etc).
Post Reply

Who is online

Users browsing this forum: No registered users and 17 guests