win32 on macOS

Wed Dec 11 18:19:37 CST 2019

Hi Fabian,

On Dec 11, 2019, at 12:51 PM, Fabian Maurer <dark.shadow4 at web.de> wrote:
> 
> is there any documentation on how the CodeWeavers solution for 32bit
> Applications on Catalina works?
> I'm interested in the technical details of this miracle, but there don't seem
> to be many details out there. What I found was either on the superficial
> level, or a bunch of speculation.

Our solution involves a custom version of the Clang compiler as well as modifications to Wine.  It also relies on a new feature of macOS Catalina to allow the creation of 32-bit code segments in a 64-bit process.  Our modified versions of Wine and Clang/LLVM are included in our source tarball at <https://www.codeweavers.com/products/more-information/source>.  We hope to have a more convenient approach to sharing and collaborating on this in the future.

The custom compiler has a number of features that support building 32-on-64-bit Wine.  These features are enabled when you compile with the "-mwine32" architecture option (as opposed to -m32 or -m64).  "-mwine32" is a variant of -m64 with additional functionality:

* It knows about both 32- and 64-bit pointers.

* It knows about 32-bit Microsoft calling conventions (cdecl32, stdcall32, thiscall32, fastcall32).

* It will automatically generate 32-to-64-bit thunks for functions with such 32-bit calling conventions.

* A function pointer whose pointee type has a 32-bit calling convention is assumed to point to 32-bit code.  When calling through such a pointer, the compiler automatically generates the appropriate 64-to-32-bit thunk.  Taking the address of a function with a 32-bit calling convention will yield a pointer to the 32-to-64-bit thunk the compiler generated.  As an optimization, the compiler-generated 32-to-64-bit thunks have a recognizable signature and the call-site code can check that to skip the 64-to-32-to-64-bit thunking that would otherwise occur.

* It has a notion of "local" include paths and thus local (Wine) headers vs. external/system headers.  It uses this to apply certain defaults to types from external/system headers.

* It defines the macro __i386_on_x86_64__.

Now for some of the nitty-gritty details:

The compiler has a concept of "address spaces", the normal 64-bit one (called, somewhat confusingly, "default" and a 32-bit one (called "ptr32").  It maintains a current implicit stack address space, a current implicit storage address space, and a current implicit pointer address space.

The implicit stack address space tells the compiler where a stack variable lives.  The type of the address-of operator applied to a stack variable is a pointer of the appropriate size.  If the variable is in the 32-bit address space, then the resulting pointer is 32 bits in size.  If it's in the 64-bit address space, the address is a 64-bit pointer.  The same thing happens when an array on the stack decays to a pointer.

By default, when compiling with -mwine32, the implicit stack address space is the 32-bit address space.  There's a command-line option, -mstack64, to override that.

The compiler does not control where the stack actually is at runtime.  So, it's Wine's responsibility to make sure that threads which run code compiled with the 32-bit stack address space actually have their stack in the low 4GB of the process's virtual memory.

The implicit storage address space tells the compiler where static data and code live.  So, if you take the address of a static variable or a function, you get a pointer type of the appropriate size.  Similarly for string literals.

By default, when compiling with -mwine32, the implicit storage address space is the 32-bit address space.  That can be overridden with the command-line option -mstorage-address-space={default | ptr32}.  Furthermore, it can be altered in code using:

#pragma clang storage_addr_space({default | ptr32})
#pragma clang storage_addr_space(push, {default | ptr32})
#pragma clang storage_addr_space(pop)

When an external/system header is #include'd, it's as though the content were surrounded by

#pragma clang storage_addr_space(push, default)
…
#pragma clang storage_addr_space(pop)

So, all of the declarations and definitions in such headers are in the default (64-bit) address space.

Again, the compiler does not actually control where in the process's virtual memory a module gets loaded.  It's Wine's responsibility to ensure that modules with code compiled with the 32-bit storage address space is actually loaded in the low 4GB.

The implicit pointer address space governs the size of declared pointers.

When a typedef, struct, or union is defined, the pointer address space that's current at the time is remembered.  Later, if a pointer to such a type is declared, the pointer is in that remembered address space.  So, for example, a Win32 type such as CREATESTRUCTW will be defined in the ptr32 pointer address space, therefore "CREATESTRUCTW *cs" will be a 32-bit pointer.  A system type such as struct stat will be defined in the default (64-bit) pointer address space, so "struct stat *st" will be a 64-bit pointer.

A pointer type can be decorated with the __ptr32 or __ptr64 keyword to explicitly declare its size, overriding the logic above.  For example, "CREATESTRUCTW * __ptr64 cs" will be a 64-bit pointer.

If neither of the above apply, the pointer's size is dictated by the current implicit pointer address space.  So, for "int *foo", foo is a 32-bit pointer if the implicit pointer address space is ptr32, or a 64-bit pointer if it's default.

The implicit pointer address space is "default" (64-bit), by default.  This can be overridden with the command-line option "-mdefault-address-space={default | ptr32}".  Furthermore, it can be altered in code using:

#pragma clang default_addr_space({default | ptr32})
#pragma clang default_addr_space(push, {default | ptr32})
#pragma clang default_addr_space(pop)

And, again, external/system headers are processed as though their content were surrounded by:

#pragma clang default_addr_space(push, default)
…
#pragma clang default_addr_space(pop)

A big potential source of problems with 32- and 64-bit pointers is accidental truncation by assigning a 64-bit pointer to a 32-bit pointer.  So, such assignment is prohibited.  Even a normal cast is not enough to allow it.  Only a special cast syntax can enable shortening a pointer: foo = (__addrspace Type)bar.

Another potential source of problems is when casting a pointer type to a smaller integer type or casting an integer type to a smaller pointer type.  There's a new set of warnings for that (which 32-on-64-bit Wine promotes to errors).  There's another special cast syntax to suppress the warning for a specific conversion: foo = (__truncate Type)bar.

The 32-bit calling conventions can be applied to a function type using __attribute__(({cdecl32 | stdcall32 | thiscall32 | fastcall32})).  These calling conventions are slightly tweaked for 64-bit code.  None of them are callee-pop; they assume the caller will clean up the arguments pushed to the stack.  Since 32-bit callers of stdcall32, thiscall32, or fastcall32 functions will have assumed the callee cleans up, that job falls to the 32-to-64-bit thunks the compiler generates.  Also, the generated 64-bit code expects 12 extra bytes on the stack between the return address and the stack arguments.  (This makes the thunks simpler and more efficient.)

The name of the 32-to-64-bit thunk generated for such a function is <prefix>_thunk_<function name>.  The prefix defaults to "__i386_on_x86_64" but that can be overridden with the command-line option "-minterop64-32-thunk-prefix=<whatever>".  Wine sets that to "wine", so the thunk name is wine_thunk_<function_name>.

If the code already supplies a definition of a symbol with that name, the compiler treats that as a custom thunk (or alternative 32-bit implementation) and doesn't auto-generate a thunk of its own.

The generated thunks (both 32-to-64-bit and 64-to-32-bit) need to know the code segment selector to use.  The compiler assumes two unsigned short variables are defined whose values are those selectors.  By default, the variable names are __i386_on_x86_64_cs32 and __i386_on_x86_64_cs64, but those can be overridden with "-minterop64-32-cs32-name=<name>" and "-minterop64-32-cs64-name=<name>".  Wine uses "wine_32on64_cs32" and "wine_32on64_cs64".  Those variables are defined and initialized in libwine.

Speaking of code segments, the big thing that Catalina provides that enables this all to work is the ability to create 32-bit code segments in a 64-bit process.  For that, they enabled the use of i386_set_ldt() in 64-bit processes.  The big caveat though, is that this functionality is restricted by System Integrity Protection (SIP).  For now, your best bet to get this working for yourself is to disable SIP.  (CrossOver doesn't require that, but the mechanism by which we accomplish that is in flux internally to Apple.  When it settles down, I'll update this thread.)

All of that in place, the rest of the work was modifying Wine to use this compiler and OS functionality.  That largely consisted of fixing the compilation errors resulting from the interfacing of system libraries (using 64-bit pointers) and Win32 APIs (using 32-bit pointers).  Also, everywhere there was an architecture dependency had to be reviewed and possibly altered.  That, of course, includes all assembly language code.

As long as this email is, I'm sure I've forgotten or glossed over some stuff.  Feel free to ask questions.

Cheers,
Ken