mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-08-08, 17:01   #34
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

11·13·17 Posts
Default

Updating Prime95's source code to support internalization is probably the biggest challenge. However, this would make localization much easier in the long run.

I believe many programs use gettext for i18n and l10n. One potential issue is that gettext libraries use the GNU Public License, which George is opposed to. However, NetBSD's implementation is said to be a good alternative: http://wiki.netbsd.org/projects/project/libintl
ixfd64 is offline   Reply With Quote
Old 2021-08-08, 20:25   #35
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·11·353 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
Updating Prime95's source code to support internalization is probably the biggest challenge. However, this would make localization much easier in the long run.

I believe many programs use gettext for i18n and l10n. One potential issue is that gettext libraries use the GNU Public License, which George is opposed to. However, NetBSD's implementation is said to be a good alternative: http://wiki.netbsd.org/projects/project/libintl
Windows is a challenge. The menus and dialog boxes get their strings from a Windows resource file -- not very compatible with a Linux solution using gettext or JSON file.

Linux using gettext libraries might be OK. The GNU license for libraries is OK as far as I can tell. We use the GMP library under the LGPL license.
Prime95 is offline   Reply With Quote
Old 2021-08-08, 21:33   #36
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

23×109 Posts
Default

Quote:
Originally Posted by kriesel View Post
...
"The ultimate introductory guide to software localization" advises to place each language's translation in a separate file, and to use a standard such as JSON for the file content's format, determined partly by support in the development environment, which IIRC is MS VS for prime95.
JSON makes sense for roll-your-own. Short of a library to magic away the complexity the following seems about the simplest way it could be done that is also safe. I like it better than the function pointer method:
  • As per your localisation link have the variables generically defined in the JSON string with things like {count}, {iteration} etc
  • Define lprintf (and lsprintf etc) as variadic functions just like all the other printf functions, but instead of the format string you have an id of the format string to be found in whichever localisation file is loaded. Instead of just passing the variables by value also pass the generic name used in the localisation file and the string used to print that type
  • It's safe because the localisation file does not define how to print variables the code does, most gotcha's remain out of the loc file but validation is still needed (at least converting all % in the format string to %%, % should only exist in the JSON when a % character should be printed)
JSON example:
Code:
[
    {
        "id": "pet_stats",
        "format": "My pet is {name}, they're a {animal} and are {age} years old\n"
    },
    ...
]
Called like this:
Code:
int some_function()
{
    char *name_of_pet="Lassie";
    char *type="dog";
    int age = 42;
     lprintf("pet_stats", name_of_pet, "{name}", "%s", age, "{age}", "%d", type, "{animal}", "%s");
     //triplets can be given to lprintf in any order, it's up to lprintf to get the ducks in a row
     //printf("My pet is %s, they're a %s and are %d years old\n", name_of_pet, type, age);//instead of this
}
Maybe it's possible to also make a macro that automatically generates the triplet for the lprintf functions given just the variable, so instead of having to explicitly write
Code:
lprintf("someid", age, "{age}", "%d", name, "{name}", "%s");
you could write

Code:
lprintf("someid", EXPAND(age), EXPAND(name));
You'd lose the ability to have the generic name be different than the variable name, might be a problem if the variable names are not descriptive.
M344587487 is offline   Reply With Quote
Old 2021-08-09, 15:30   #37
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

23·109 Posts
Default

Here's a toy example of an external json + variadic localisation implementation pretty much as described in the previous post (sans json parsing and validation, this example manually generates the structs to match what the bundled json would produce because lazy). It has the following features:
  • The json is an array of objects, one json file per language
  • An object has a string "id" and a string "format"
  • A variable is defined in the json format string as a generic string starting with '{' ending with '}' which makes it safe, the code defines how a variable is printed not the json
  • It is safe to use '{' and '}' just as characters, whenever a '{' is encountered it looks for a matching generic but if one doesn't exist it just prints out '{' as you'd expect
  • lprintf is used in place of printf in the code, similar functions could replace sprintf etc
  • Argument order agnostic in the json format string and lprintf, different languages can swap variables around if needed and the programmer doesn't need to worry about order just that they pass the correct variables
  • If a format string is not found in the preferred language it can look for it in an optional fallback language (aka english if english is not the preferred language)
Code:
Example of localising via external json

English preferred with no fallback
Call lprintf with no optional variables to print the string from the json verbatim
The {animal} goes {noise}
Now called with variables properly set
The dog goes woof
The farmer could see 7 of their 9 sheep in the pen
order-agnostic variables in lprintf:
The farmer could see 7 of their 9 sheep in the pen
{pickle} is printed verbatim because there is no argument matching the generic string '{pickle}'
unknown id prints nothing

French preferred with english fallback
The french version is printed because it exists, also order-agnostic variables in format string
woof goes the french dog
The english version is printed because no french version exists
The farmer could see 7 of their 9 sheep in the pen
Supports advanced formatting by default as it just uses whatever the programmer supplies
 The farmer could see 00007 of their 9 sheep in the pen
No need for security sauce IMO (once properly implemented). The english translation could be baked into the executable with bin2c to ensure there's always a valid fallback if desired.
Attached Files
File Type: xz lang.tar.xz (5.3 KB, 59 views)
M344587487 is offline   Reply With Quote
Old 2021-08-09, 17:20   #38
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·3·1,697 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Here's a toy example of an external json + variadic localisation implementation pretty much as described in the previous post (sans json parsing and validation, this example manually generates the structs to match what the bundled json would produce because lazy).
This is one of the reasons I love hanging out here. Nice work!
chalsall is offline   Reply With Quote
Old 2021-08-09, 19:26   #39
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

36816 Posts
Default

Thanks, it's nothing really just an excuse to brush up on variadics. If json is a good candidate for a localisation format (unicode so no reason other than whatever the hell locale actually means to think it isn't) it shouldn't take much more than making the code more robust and wiring up json parsing to be an actual solution. Except for the tens of hours of braindeath that is manually and carefully sucking out the english into json and converting the code to lprintf's.

If you want a laugh look at these macros:
Code:
#define LI(A) "{"#A"}", "%d", A
#define LS(A) "{"#A"}", "%s", A
, they could be used to make these lprintf's equivalent
Code:
int count=7, total=9;
char *animal="sheep";

lprintf("animal_count", 3*3, LI(count), LI(total), LS(animal));
lprintf("animal_count", 3*3, "{count}", "%d", count, "{total}", "%d", total, "{animal}", "%s", animal);
, saving a chunk of time and typos if manually porting the code for well-named variables that don't need special formatting. Whether or not the macros crumble like tissue paper is another matter, there's normally a pole between me and macros so it's hard to tell how close to the head the gun is pointing ;)
M344587487 is offline   Reply With Quote
Old 2021-08-10, 18:14   #40
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

23×109 Posts
Default

I've refined it, cleaned up the code a bit and added JSON parsing so you can now edit the json to test it (feeding malformed json will definitely break things, valid input should work). Every language tested seems to work at least on my terminal, I doubt many if any modern terminals fail. cmd on windows probably fails but powershell probably works, it should compile pretty easily if anyone wants to test.

JSON is UTF-8 which means it can handle all unicode with only a few unicode control symbols having to be escaped (which we don't want anyway). "\uhhhh" is a valid escape sequence in JSON but it's unnecessary as utf-8 can handle any symbols we need directly, so \u handling has been omitted (all other backslash escaping should work).
Attached Thumbnails
Click image for larger version

Name:	langv2.png
Views:	73
Size:	238.4 KB
ID:	25443  
Attached Files
File Type: xz langv2.tar.xz (12.6 KB, 62 views)
M344587487 is offline   Reply With Quote
Old 2021-08-10, 18:22   #41
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

27C616 Posts
Default

Quote:
Originally Posted by M344587487 View Post
I've refined it...
chalsall is offline   Reply With Quote
Old 2021-08-11, 18:21   #42
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

36816 Posts
Default

gettext looks good with the drawback of not allowing for swapped arguments. It has the benefit of being a de-facto standard, and as it is the standard it makes me think that swapped arguments aren't such a big deal. It does require being compiled in (because you can't safely expose raw format strings to the user) which is a drawback given that prime95 isn't distributed repository-style (there is a repo mirror which is nice: https://github.com/shafferjohn/Prime95 ), but I don't think requiring compilation is a dealbreaker. What I'm not so keen on is the english format string being used as the id in all of the translation .pot files, it's a bit of a maintenance nightmare (not that it matters too much with a stable codebase like Prime95) but there's probably tooling to help with that.

From what I can tell .rc files can be used for compile-time localisation of the gui with some caveats (and potentially heavy refactorisation). Despite being ancient nonsense cooked up by the devil they can use UTF-8 encoding by using code page 65001 ( https://devblogs.microsoft.com/oldne...7-00/?p=102569 ) so there is potential, untested as I don't think the gui can be compiled without MSVC(?). Worth someone testing.

Unless there's some other way to localise the gui without .rc I think a split solution of multiple rc's and gettext/json is as good as it gets, short of doing something crazy.

edit: There is a way for gettext to have swapped arguments but it's complicated enough to be misused and may not be portable. IMO it should be added to the large pile of libc constructs not to be touched by anyone ever for any reason, but it's interesting that it exists:
Code:
       The arguments must correspond properly  (after  type  promo‐
       tion)  with the conversion specifier.  By default, the argu‐
       ments are used in the order given, where each '*' (see Field
       width  and  Precision  below)  and each conversion specifier
       asks for the next argument (and it is an error  if  insuffi‐
       ciently many arguments are given).  One can also specify ex‐
       plicitly which argument is taken, at each place where an ar‐
       gument  is  required,  by  writing  "%m$" instead of '%' and
       "*m$" instead of '*', where the decimal  integer  m  denotes
       the  position  in the argument list of the desired argument,
       indexed starting from 1.  Thus,

           printf("%*d", width, num);

       and

           printf("%2$*1$d", width, num);

       are equivalent.  The second style allows repeated references
       to the same argument.  The C99 standard does not include the
       style using '$', which comes from the Single UNIX Specifica‐
       tion.   If  the  style  using  '$'  is used, it must be used
       throughout for all conversions taking an  argument  and  all
       width and precision arguments, but it may be mixed with "%%"
       formats, which do not consume an argument.  There may be  no
       gaps  in  the  numbers of arguments specified using '$'; for
       example, if arguments 1 and 3 are specified, argument 2 must
       also be specified somewhere in the format string.

Last fiddled with by M344587487 on 2021-08-14 at 13:44
M344587487 is offline   Reply With Quote
Old 2021-08-21, 03:35   #43
Happy5214
 
Happy5214's Avatar
 
"Alexander"
Nov 2008
The Alamo City

11000010002 Posts
Default

You might want to see if you can get this on https://translatewiki.net/ or something similar and let them handle some of the translation work. I don't know if the "open-source but not free" nature of P95/mprime is an issue. They would get the format strings and translate them into any number of languages as a volunteer effort.

Edit: Don't let the home page fool you. It looks spammy, but the back-end pages (MediaWiki-style) are the same as from when I used to contribute years ago.

Last fiddled with by Happy5214 on 2021-08-21 at 03:37 Reason: Address possible concerns
Happy5214 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Integrated graphics processors, how to run GIMPS software on them, and why you may not want to kriesel kriesel 8 2021-09-13 16:45
On the origin of language ... Dr Sardonicus Lounge 28 2018-10-10 19:52
Uninstall GIMPS Software? BillMMar Information & Answers 6 2010-05-02 22:23
Body Language Orgasmic Troll Lounge 2 2005-11-29 16:52
GIMPS software for Sony PS/2 Linux? delta_t Software 5 2002-12-06 17:36

All times are UTC. The time now is 10:00.


Wed Jan 19 10:00:26 UTC 2022 up 180 days, 4:29, 0 users, load averages: 1.52, 1.28, 1.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔