Page 1 of 1

Translation: Different file formats, why?

Posted: Mon Nov 07, 2011 6:34 pm
by Witch
When looking at "first stage setup" code where you work with rc-files and sometimes C header files, why does the different countries use different file formats?
When there are uncertainties I'm slowed down and not able to move forward. :cry:

I thought that standard for all translation files would be in UTF-8, but my file format check below makes me confused and uncertain on the facts :(
:~/svn/reactos/base/setup/usetup/lang$ file sv-SE.h
sv-SE.h: ISO-8859 C program text

:~/svn/reactos/base/setup/usetup/lang$ file en-US.h
en-US.h: ASCII English text

:~/svn/reactos/base/setup/usetup/lang$ file nn-NO.h
nn-NO.h: ASCII text

:~/svn/reactos/base/setup/usetup/lang$ file ru-RU.h
ru-RU.h: Non-ISO extended-ASCII English text, with LF, NEL line terminators

:~/svn/reactos/base/setup/usetup/lang$ file it-IT.h
it-IT.h: Non-ISO extended-ASCII text, with LF, NEL line terminators

:~/svn/reactos/base/setup/usetup/lang$

Re: Translation: Different file formats, why?

Posted: Mon Nov 07, 2011 7:08 pm
by EmuandCo
Afaik it has to be the dos standard for your language... No utf8

Re: Translation: Different file formats, why?

Posted: Mon Nov 07, 2011 7:15 pm
by Witch
So I'm using Linux (english-US), gedit and Bash terminal when I download and edit the ReactOS svn files. Do I have to convert my (swedish) rc-files manually on Linux to be compatible with Windows DOS standard?
I'm still confused.

Re: Translation: Different file formats, why?

Posted: Mon Nov 07, 2011 9:12 pm
by EmuandCo
The usetup files are NO resource files. RC Files are .rc files no Headers. They are normally utf8 or the region's own CPXXX Codepage

Posted: Mon Nov 07, 2011 9:43 pm
by hto
The sv-SE.h file uses CP-850; this is how USetup is implemented. GEdit probably can work with different encodings, but if not, then convert the file to UTF-8 before editing.

Something like this:

Code: Select all

iconv -f CP850 -t UTF8 < sv-SE.h > sv-SE.txt
or this:

Code: Select all

recode CP850/..UTF8 < sv-SE.h > sv-SE.txt
Than back to CP-850:

Code: Select all

iconv -t CP850 -f UTF8 < sv-SE.txt > sv-SE.h
or:

Code: Select all

recode UTF8..CP850/ < sv-SE.txt > sv-SE.h

Re: Translation: Different file formats, why?

Posted: Tue Nov 15, 2011 3:17 pm
by igorko
Easy explanation:

usetup/lang/xx-XX.h should be in your local ASCII codepage. Just check your ASCII codepage in internet and add it to gedit codepage list

All the rest of files (xx-XX.rc) are resource files. They should use UTF-8(without BOM if your editor supports it ). Also .rc files can use your local ANSI codepage, but it is better to use UTF-8(there will be less problems with multiplatforming for both OS and IDE) and just because UTF-8 ROCKS. :)

So if some rc files already exist for your locale and use ANSI you can convert them to UTF-8 or leave as is. When converting don't forget to move include in rsrc.rc below #pragma(65001).
If you want to translate new rc files, get english version, translate it and save file in UTF-8(without BOM if supported)

As for "DOS/Linux" standart (maybe you didn't want to ask this but i will answer anyway ;))there is also difference in EOLs(End Of Lines). It doesn't matter what EOl are you using so you can save files in both Windows and Linux without any additional steps.

Re: Translation: Different file formats, why?

Posted: Tue Nov 15, 2011 11:55 pm
by Witch
The most annoying thing about using gedit on Linux is that when I explicitly tell it to "save as" in UTF8 format. Then it doesn't do what I tell it to do. :x

I've read on the Internet from somebody who had a similar question years ago. And it seems that if the text file doesn't contain any UTF8 characters then it will automatically save my files in ASCII format even though I tell it to explicitly "save as" UTF8.

Only when the text file do contain UTF8 characters will gedit save my file in UTF8 format. That is a little bit annoying since I want consistency even when no UTF8 characters are present in my files.

Converting 100 ANSI or ASCII files to UTF8 through some Linux scripting will probably be a breeze if I want to find that solution. But I'm just saying I don't like it when programs doesn't do what I tell it to do from the get go. :)

Posted: Wed Nov 16, 2011 2:22 pm
by hto
And it seems that if the text file doesn't contain any UTF8 characters then it will automatically save my files in ASCII format even though I tell it to explicitly "save as" UTF8.
That's how it should be. All ASCII texts are also UTF-8 texts, by virtue of the fact that ASCII is a subset of UTF-8. :)

Re: Translation: Different file formats, why?

Posted: Fri Nov 18, 2011 11:34 pm
by Witch
I see, thanks for the confirmation!