Translation: Different file formats, why?

All development related issues welcome

Moderator: Moderator Team

Translation: Different file formats, why?

Postby Witch » Mon Nov 07, 2011 6:34 pm

When looking at "first stage setup" code where you work with rc-files and sometimes C header files, why does the different countries use different file formats?
When there are uncertainties I'm slowed down and not able to move forward. :cry:

I thought that standard for all translation files would be in UTF-8, but my file format check below makes me confused and uncertain on the facts :(

:~/svn/reactos/base/setup/usetup/lang$ file sv-SE.h
sv-SE.h: ISO-8859 C program text

:~/svn/reactos/base/setup/usetup/lang$ file en-US.h
en-US.h: ASCII English text

:~/svn/reactos/base/setup/usetup/lang$ file nn-NO.h
nn-NO.h: ASCII text

:~/svn/reactos/base/setup/usetup/lang$ file ru-RU.h
ru-RU.h: Non-ISO extended-ASCII English text, with LF, NEL line terminators

:~/svn/reactos/base/setup/usetup/lang$ file it-IT.h
it-IT.h: Non-ISO extended-ASCII text, with LF, NEL line terminators

:~/svn/reactos/base/setup/usetup/lang$

Witch
 
Posts: 293
Joined: Thu Jul 24, 2008 12:30 pm
Location: Stockholm, Sweden

Re: Translation: Different file formats, why?

Postby EmuandCo » Mon Nov 07, 2011 7:08 pm

Afaik it has to be the dos standard for your language... No utf8
EmuandCo
Developer
 
Posts: 2877
Joined: Sun Nov 28, 2004 7:52 pm
Location: Germany, Bavaria, Steinfeld

Re: Translation: Different file formats, why?

Postby Witch » Mon Nov 07, 2011 7:15 pm

So I'm using Linux (english-US), gedit and Bash terminal when I download and edit the ReactOS svn files. Do I have to convert my (swedish) rc-files manually on Linux to be compatible with Windows DOS standard?
I'm still confused.
Witch
 
Posts: 293
Joined: Thu Jul 24, 2008 12:30 pm
Location: Stockholm, Sweden

Re: Translation: Different file formats, why?

Postby EmuandCo » Mon Nov 07, 2011 9:12 pm

The usetup files are NO resource files. RC Files are .rc files no Headers. They are normally utf8 or the region's own CPXXX Codepage
EmuandCo
Developer
 
Posts: 2877
Joined: Sun Nov 28, 2004 7:52 pm
Location: Germany, Bavaria, Steinfeld

Postby hto » Mon Nov 07, 2011 9:43 pm

The sv-SE.h file uses CP-850; this is how USetup is implemented. GEdit probably can work with different encodings, but if not, then convert the file to UTF-8 before editing.

Something like this:
Code: Select all
iconv -f CP850 -t UTF8 < sv-SE.h > sv-SE.txt

or this:
Code: Select all
recode CP850/..UTF8 < sv-SE.h > sv-SE.txt


Than back to CP-850:
Code: Select all
iconv -t CP850 -f UTF8 < sv-SE.txt > sv-SE.h

or:
Code: Select all
recode UTF8..CP850/ < sv-SE.txt > sv-SE.h
hto
 
Posts: 2185
Joined: Sun Oct 01, 2006 3:43 pm

Re: Translation: Different file formats, why?

Postby igorko » Tue Nov 15, 2011 3:17 pm

Easy explanation:

usetup/lang/xx-XX.h should be in your local ASCII codepage. Just check your ASCII codepage in internet and add it to gedit codepage list

All the rest of files (xx-XX.rc) are resource files. They should use UTF-8(without BOM if your editor supports it ). Also .rc files can use your local ANSI codepage, but it is better to use UTF-8(there will be less problems with multiplatforming for both OS and IDE) and just because UTF-8 ROCKS. :)

So if some rc files already exist for your locale and use ANSI you can convert them to UTF-8 or leave as is. When converting don't forget to move include in rsrc.rc below #pragma(65001).
If you want to translate new rc files, get english version, translate it and save file in UTF-8(without BOM if supported)

As for "DOS/Linux" standart (maybe you didn't want to ask this but i will answer anyway ;))there is also difference in EOLs(End Of Lines). It doesn't matter what EOl are you using so you can save files in both Windows and Linux without any additional steps.
igorko
 
Posts: 145
Joined: Thu Jun 18, 2009 3:12 pm

Re: Translation: Different file formats, why?

Postby Witch » Tue Nov 15, 2011 11:55 pm

The most annoying thing about using gedit on Linux is that when I explicitly tell it to "save as" in UTF8 format. Then it doesn't do what I tell it to do. :x

I've read on the Internet from somebody who had a similar question years ago. And it seems that if the text file doesn't contain any UTF8 characters then it will automatically save my files in ASCII format even though I tell it to explicitly "save as" UTF8.

Only when the text file do contain UTF8 characters will gedit save my file in UTF8 format. That is a little bit annoying since I want consistency even when no UTF8 characters are present in my files.

Converting 100 ANSI or ASCII files to UTF8 through some Linux scripting will probably be a breeze if I want to find that solution. But I'm just saying I don't like it when programs doesn't do what I tell it to do from the get go. :)
Witch
 
Posts: 293
Joined: Thu Jul 24, 2008 12:30 pm
Location: Stockholm, Sweden

Postby hto » Wed Nov 16, 2011 2:22 pm

And it seems that if the text file doesn't contain any UTF8 characters then it will automatically save my files in ASCII format even though I tell it to explicitly "save as" UTF8.


That's how it should be. All ASCII texts are also UTF-8 texts, by virtue of the fact that ASCII is a subset of UTF-8. :)
hto
 
Posts: 2185
Joined: Sun Oct 01, 2006 3:43 pm

Re: Translation: Different file formats, why?

Postby Witch » Fri Nov 18, 2011 11:34 pm

I see, thanks for the confirmation!
Witch
 
Posts: 293
Joined: Thu Jul 24, 2008 12:30 pm
Location: Stockholm, Sweden


Return to Development Help

Who is online

Users browsing this forum: No registered users and 2 guests