[ros-dev] Broken RtlMultiByteToUnicodeSize

Myria myriachan at cox.net
Thu Jul 20 07:43:03 CEST 2006


A Taiwanese user of ReactOS came into the IRC channel to report a bug.  If 
you set ReactOS's code page to 936, the function RtlMultiByteToUnicodeSize 
will crash during startup.

I can't code a fix for it, but I can say how.  The algorithm should work 
like this:

- If the code page is not DBCS, don't bother, and just set *UnicodeSize to 
MbSize * sizeof(WCHAR).  This is already done.
- Begin counting with a length of 0.
- While MbSize is not zero:
-- Grab a byte and decrement MbSize.
-- Determine whether it is a DBCS lead byte for the code page.
-- If it is a lead byte:
--- If MbSize is now zero, increment length, set *UnicodeSize to your length 
* sizeof(WCHAR) and return STATUS_SUCCESS.  The broken half-character is 
counted.
--- Decrement MbSize and increment your length.  Two DBCS bytes just became 
a single Unicode character.  We ignore the value of the second byte.
-- If it is not:
--- Increment length.
- Set *UnicodeSize to length * sizeof(WCHAR) and return STATUS_SUCCESS.

Is it possible for a DBCS character's mapping to be a UTF-16 surrogate?  If 
so, the routine becomes more complicated.

I personally think ReactOS should support UTF-8 as a default code page, but 
I doubt that others agree.  This function is one of the many that would have 
to change...

Melissa



More information about the Ros-dev mailing list