UTF-8 encoding problems

Ask your support questions in here

Moderator: Moderator Team

Post Reply
Fraizeraust
Posts: 228
Joined: Thu Jan 05, 2017 11:46 am
Location: Italy
Contact:

UTF-8 encoding problems

Post by Fraizeraust » Sat Dec 09, 2017 10:10 pm

I've been struggling checking what is the root cause of this problem but I gave up...

Basically what I want to do is to create a translation patch to update the current Italian translation RC file for RAPPS. I already have the Git basics and I normally use git diff commitid1 commitid2 > mypatch.patch to create patches based on the latest staged commit so hopefully it shouldn't be a problem.

As per the ReactOS translation rules, the .RC file must be saved as UTF-8 so I just did that. When I attempt to create the patch from commit it went all fine BUT when I check the patch file to see what's there some Italian characters are wrongly encoded.

Code: Select all

diff --git a/base/applications/rapps/lang/it-IT.rc b/base/applications/rapps/lang/it-IT.rc
index 8832b1938a..6a4b7a5aa1 100644
--- a/base/applications/rapps/lang/it-IT.rc
+++ b/base/applications/rapps/lang/it-IT.rc
@@ -152,13 +152,13 @@ END
 STRINGTABLE
 BEGIN
     IDS_AINFO_VERSION "\nVersione: "
-    IDS_AINFO_AVAILABLEVERSION "\nAvailable Version: "
+    IDS_AINFO_AVAILABLEVERSION "\nVersione Disponibile: "
     IDS_AINFO_DESCRIPTION "\nDescrizione: "
     IDS_AINFO_SIZE "\nDimensione: "
     IDS_AINFO_URLSITE "\nHome Page: "
     IDS_AINFO_LICENSE "\nLicenza: "
     IDS_AINFO_URLDOWNLOAD "\nScaricare: "
-    IDS_AINFO_LANGUAGES "\nLanguages: "
+    IDS_AINFO_LANGUAGES "\nLingue: "
 END
 
 STRINGTABLE
@@ -187,7 +187,7 @@ BEGIN
     IDS_INSTALL "Installa"
     IDS_UNINSTALL "Rimuovi"
     IDS_MODIFY "Modifica"
-    IDS_APPS_COUNT "Numero applicazioni: %d; Selected: %d"
+    IDS_APPS_COUNT "Numero applicazioni: %d; Selezionate: %d"
     IDS_WELCOME_TITLE "Benvenuto!\n\n"
     IDS_WELCOME_TEXT "Scegliere una categoria a sinistra, poi scegliere una applicazione da installare o disinstallare.\nReactOS Web Site: "
     IDS_WELCOME_URL "http://www.reactos.org"
@@ -200,28 +200,28 @@ BEGIN
     IDS_APP_REG_REMOVE "Sicuro di voler cancellare dal registro i dati sui programmi installati?"
     IDS_INFORMATION "Informazioni"
     IDS_UNABLE_TO_DOWNLOAD "Impossibile scaricare il pacchetto! Indirizzo non trovato!"
-    IDS_UNABLE_TO_DOWNLOAD2 "Unable to download the package! Check Internet Connection!"
+    IDS_UNABLE_TO_DOWNLOAD2 "Impossibile scaricare il pacchetto! Controlla la connessione internet!"
     IDS_UNABLE_TO_REMOVE "Impossibile cancellare i dati dal registro!"
-    IDS_UNABLE_TO_INSTALL "Unable to open installer!"
+    IDS_UNABLE_TO_INSTALL "Impossibile aprire l'installer!"
     IDS_CERT_DOES_NOT_MATCH "Verifica del certificato SSL fallita."
-    IDS_INTEG_CHECK_TITLE "Verifica integrità pacchetto..."
-    IDS_INTEG_CHECK_FAIL "Il pacchetto non ha superato il controllo di integrità, potrebbe essere stato danneggiato o manomesso durante lo scaricamento. L'esecuzione del software non è raccomandata."
-    IDS_INTERRUPTED_DOWNLOAD "Lo scaricamento è stato interrotto. Verificare la connessione a Internet."
+    IDS_INTEG_CHECK_TITLE "Verifica integrità pacchetto..."
+    IDS_INTEG_CHECK_FAIL "Il pacchetto non ha superato il controllo di integrità, potrebbe essere stato danneggiato o manomesso durante lo scaricamento. L'esecuzione del software non Ú raccomandata."
+    IDS_INTERRUPTED_DOWNLOAD "Lo scaricamento Ú stato interrotto. Verificare la connessione a Internet."
     IDS_UNABLE_TO_WRITE "Impossibile scrivere su disco: lo spazio libero potrebbe essere esaurito."
-    IDS_SELECT_ALL "Select/Deselect All"
-    IDS_INSTALL_SELECTED "Install Selected"
+    IDS_SELECT_ALL "Seleziona/Deseleziona Tutte"
+    IDS_INSTALL_SELECTED "Installa le selezionate"
 END
 
 STRINGTABLE
 BEGIN
-    IDS_STATUS_INSTALLED "Installed"
-    IDS_STATUS_NOTINSTALLED "Not installed"
-    IDS_STATUS_DOWNLOADED "Downloaded"
-    IDS_STATUS_UPDATE_AVAILABLE "Update available"
-    IDS_STATUS_DOWNLOADING "Downloading…"
-    IDS_STATUS_INSTALLING "Installing…"
-    IDS_STATUS_WAITING "Waiting to install…"
-    IDS_STATUS_FINISHED "Finished"
+    IDS_STATUS_INSTALLED "Installato"
+    IDS_STATUS_NOTINSTALLED "Non Installato"
+    IDS_STATUS_DOWNLOADED "Scaricato"
+    IDS_STATUS_UPDATE_AVAILABLE "Aggiornamento Disponibile"
+    IDS_STATUS_DOWNLOADING "Scaricamento..."
+    IDS_STATUS_INSTALLING "Installazione..."
+    IDS_STATUS_WAITING "In attesa di installare..."
+    IDS_STATUS_FINISHED "Finito"
 END
 
 STRINGTABLE
@@ -233,16 +233,16 @@ END
 
 STRINGTABLE
 BEGIN
-    IDS_LANGUAGE_AVAILABLE_TRANSLATION "Supports your language"
-    IDS_LANGUAGE_NO_TRANSLATION "Supports other languages"
-    IDS_LANGUAGE_ENGLISH_TRANSLATION "Supports English"
-    IDS_LANGUAGE_SINGLE "Single language"
-    IDS_LANGUAGE_MORE_PLACEHOLDER " (+%d more)"
-    IDS_LANGUAGE_AVAILABLE_PLACEHOLDER " (+%d available)"
+    IDS_LANGUAGE_AVAILABLE_TRANSLATION "Supporta la tua lingua"
+    IDS_LANGUAGE_NO_TRANSLATION "Supporta le altre lingue"
+    IDS_LANGUAGE_ENGLISH_TRANSLATION "Supporta l'Inglese"
+    IDS_LANGUAGE_SINGLE "Lingua singola"
+    IDS_LANGUAGE_MORE_PLACEHOLDER " (+%d più)"
+    IDS_LANGUAGE_AVAILABLE_PLACEHOLDER " (+%d disponibile)"
 END
 
 STRINGTABLE
 BEGIN
-    IDS_DL_DIALOG_DB_DISP "Applications Database"
-    IDS_DL_DIALOG_DB_DOWNLOAD_DISP "Updating Database..."
+    IDS_DL_DIALOG_DB_DISP "Applicazioni Database"
+    IDS_DL_DIALOG_DB_DOWNLOAD_DISP "Aggiornamento Database..."
 END
As you can see in the patch carefully, some words such as integrità whose their special characters have been malformed. For example, instead of integrità it became integrità or instead of più it became più after creating the patch. During committing to HEAD master it did not happen.

The patch is really worthless right now because I cannot simply send it to JIRA without having this issue fixed. I really don't know how can I tackle the problem so I wish someone could please help me... Thank you very much in advance!

val
Posts: 69
Joined: Fri Feb 10, 2017 5:22 am

Re: UTF-8 encoding problems

Post by val » Sat Dec 09, 2017 11:39 pm

the code point of the à symbol is 00e0. it is encoded as UTF-8 into 2-byte hexadecimal number c3a0 (LE order).
the code point of the à symbol is 00c3. it is encoded as UTF-8 into 2-byte hexadecimal number c383 (LE order).

Looking in the hexadecimal representation of your example, you've got c3 a0 in the right places and c3 83 20, in the wrong ones (this is for the case of where à should be), added by you. the question is what text editor you have been using doing this? For example Notepad does it right.
c38320 as a 3-byte sequnece is not a valid UTF-8 encoding, so you see two symbols instead: Ã (3c83) and space after it (20).

It may be your editor that makes this garbage, or some intermediate retranslation (git?). Try another editor.

Fraizeraust
Posts: 228
Joined: Thu Jan 05, 2017 11:46 am
Location: Italy
Contact:

Re: UTF-8 encoding problems

Post by Fraizeraust » Sun Dec 10, 2017 11:37 am

When I had Windows on my PC I used Sublime Text 3, MS Notepad and Notepad++. All of them broke the special Italian characters inside the patch file. At the moment I am running Ubuntu MATE and used Pluma text editor to modify the file. Same problem. :(

Maybe vim (or Gvim) would be a great option? Does it fuck up the Italian chars?

hbelusca
Developer
Posts: 1128
Joined: Sat Dec 26, 2009 10:36 pm
Location: Zagreb, Croatia

Re: UTF-8 encoding problems

Post by hbelusca » Sun Dec 10, 2017 4:00 pm

Another question is whether, when applying this seemingly-broken patch onto the code, the Italian translations become ok again.

Fraizeraust
Posts: 228
Joined: Thu Jan 05, 2017 11:46 am
Location: Italy
Contact:

Re: UTF-8 encoding problems

Post by Fraizeraust » Sun Dec 10, 2017 4:07 pm

For the curiosity I tried vim with a few tweaks below and everything seems fine! I do still thank you guys for the help.

Code: Select all

set encoding=utf-8
set fileencoding=utf-8
Finally! :mrgreen:

Post Reply

Who is online

Users browsing this forum: Google [Bot] and 2 guests