mersenneforum.org Language localization for GIMPS software.
 Register FAQ Search Today's Posts Mark Forums Read

 2021-08-08, 17:01 #34 ixfd64 Bemusing Prompter     "Danny" Dec 2002 California 11×13×17 Posts Updating Prime95's source code to support internalization is probably the biggest challenge. However, this would make localization much easier in the long run. I believe many programs use gettext for i18n and l10n. One potential issue is that gettext libraries use the GNU Public License, which George is opposed to. However, NetBSD's implementation is said to be a good alternative: http://wiki.netbsd.org/projects/project/libintl
2021-08-08, 20:25   #35
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

11110010111102 Posts

Quote:
 Originally Posted by ixfd64 Updating Prime95's source code to support internalization is probably the biggest challenge. However, this would make localization much easier in the long run. I believe many programs use gettext for i18n and l10n. One potential issue is that gettext libraries use the GNU Public License, which George is opposed to. However, NetBSD's implementation is said to be a good alternative: http://wiki.netbsd.org/projects/project/libintl
Windows is a challenge. The menus and dialog boxes get their strings from a Windows resource file -- not very compatible with a Linux solution using gettext or JSON file.

Linux using gettext libraries might be OK. The GNU license for libraries is OK as far as I can tell. We use the GMP library under the LGPL license.

2021-08-08, 21:33   #36
M344587487

"Composite as Heck"
Oct 2017

2·19·23 Posts

Quote:
 Originally Posted by kriesel ... "The ultimate introductory guide to software localization" advises to place each language's translation in a separate file, and to use a standard such as JSON for the file content's format, determined partly by support in the development environment, which IIRC is MS VS for prime95.
JSON makes sense for roll-your-own. Short of a library to magic away the complexity the following seems about the simplest way it could be done that is also safe. I like it better than the function pointer method:
• As per your localisation link have the variables generically defined in the JSON string with things like {count}, {iteration} etc
• Define lprintf (and lsprintf etc) as variadic functions just like all the other printf functions, but instead of the format string you have an id of the format string to be found in whichever localisation file is loaded. Instead of just passing the variables by value also pass the generic name used in the localisation file and the string used to print that type
• It's safe because the localisation file does not define how to print variables the code does, most gotcha's remain out of the loc file but validation is still needed (at least converting all % in the format string to %%, % should only exist in the JSON when a % character should be printed)
JSON example:
Code:
[
{
"id": "pet_stats",
"format": "My pet is {name}, they're a {animal} and are {age} years old\n"
},
...
]
Called like this:
Code:
int some_function()
{
char *name_of_pet="Lassie";
char *type="dog";
int age = 42;
lprintf("pet_stats", name_of_pet, "{name}", "%s", age, "{age}", "%d", type, "{animal}", "%s");
//triplets can be given to lprintf in any order, it's up to lprintf to get the ducks in a row
//printf("My pet is %s, they're a %s and are %d years old\n", name_of_pet, type, age);//instead of this
}
Maybe it's possible to also make a macro that automatically generates the triplet for the lprintf functions given just the variable, so instead of having to explicitly write
Code:
lprintf("someid", age, "{age}", "%d", name, "{name}", "%s");
you could write

Code:
lprintf("someid", EXPAND(age), EXPAND(name));
You'd lose the ability to have the generic name be different than the variable name, might be a problem if the variable names are not descriptive.

2021-08-09, 15:30   #37
M344587487

"Composite as Heck"
Oct 2017

2×19×23 Posts

Here's a toy example of an external json + variadic localisation implementation pretty much as described in the previous post (sans json parsing and validation, this example manually generates the structs to match what the bundled json would produce because lazy). It has the following features:
• The json is an array of objects, one json file per language
• An object has a string "id" and a string "format"
• A variable is defined in the json format string as a generic string starting with '{' ending with '}' which makes it safe, the code defines how a variable is printed not the json
• It is safe to use '{' and '}' just as characters, whenever a '{' is encountered it looks for a matching generic but if one doesn't exist it just prints out '{' as you'd expect
• lprintf is used in place of printf in the code, similar functions could replace sprintf etc
• Argument order agnostic in the json format string and lprintf, different languages can swap variables around if needed and the programmer doesn't need to worry about order just that they pass the correct variables
• If a format string is not found in the preferred language it can look for it in an optional fallback language (aka english if english is not the preferred language)
Code:
Example of localising via external json

English preferred with no fallback
Call lprintf with no optional variables to print the string from the json verbatim
The {animal} goes {noise}
Now called with variables properly set
The dog goes woof
The farmer could see 7 of their 9 sheep in the pen
order-agnostic variables in lprintf:
The farmer could see 7 of their 9 sheep in the pen
{pickle} is printed verbatim because there is no argument matching the generic string '{pickle}'
unknown id prints nothing

French preferred with english fallback
The french version is printed because it exists, also order-agnostic variables in format string
woof goes the french dog
The english version is printed because no french version exists
The farmer could see 7 of their 9 sheep in the pen
Supports advanced formatting by default as it just uses whatever the programmer supplies
The farmer could see 00007 of their 9 sheep in the pen
No need for security sauce IMO (once properly implemented). The english translation could be baked into the executable with bin2c to ensure there's always a valid fallback if desired.
Attached Files
 lang.tar.xz (5.3 KB, 59 views)

2021-08-09, 17:20   #38
chalsall
If I May

"Chris Halsall"
Sep 2002

61·167 Posts

Quote:
 Originally Posted by M344587487 Here's a toy example of an external json + variadic localisation implementation pretty much as described in the previous post (sans json parsing and validation, this example manually generates the structs to match what the bundled json would produce because lazy).
This is one of the reasons I love hanging out here. Nice work!

 2021-08-09, 19:26 #39 M344587487     "Composite as Heck" Oct 2017 36A16 Posts Thanks, it's nothing really just an excuse to brush up on variadics. If json is a good candidate for a localisation format (unicode so no reason other than whatever the hell locale actually means to think it isn't) it shouldn't take much more than making the code more robust and wiring up json parsing to be an actual solution. Except for the tens of hours of braindeath that is manually and carefully sucking out the english into json and converting the code to lprintf's. If you want a laugh look at these macros: Code: #define LI(A) "{"#A"}", "%d", A #define LS(A) "{"#A"}", "%s", A , they could be used to make these lprintf's equivalent Code: int count=7, total=9; char *animal="sheep"; lprintf("animal_count", 3*3, LI(count), LI(total), LS(animal)); lprintf("animal_count", 3*3, "{count}", "%d", count, "{total}", "%d", total, "{animal}", "%s", animal); , saving a chunk of time and typos if manually porting the code for well-named variables that don't need special formatting. Whether or not the macros crumble like tissue paper is another matter, there's normally a pole between me and macros so it's hard to tell how close to the head the gun is pointing ;)
2021-08-10, 18:14   #40
M344587487

"Composite as Heck"
Oct 2017

2·19·23 Posts

I've refined it, cleaned up the code a bit and added JSON parsing so you can now edit the json to test it (feeding malformed json will definitely break things, valid input should work). Every language tested seems to work at least on my terminal, I doubt many if any modern terminals fail. cmd on windows probably fails but powershell probably works, it should compile pretty easily if anyone wants to test.

JSON is UTF-8 which means it can handle all unicode with only a few unicode control symbols having to be escaped (which we don't want anyway). "\uhhhh" is a valid escape sequence in JSON but it's unnecessary as utf-8 can handle any symbols we need directly, so \u handling has been omitted (all other backslash escaping should work).
Attached Thumbnails

Attached Files
 langv2.tar.xz (12.6 KB, 62 views)

2021-08-10, 18:22   #41
chalsall
If I May

"Chris Halsall"
Sep 2002

237138 Posts

Quote:
 Originally Posted by M344587487 I've refined it...

 2021-08-11, 18:21 #42 M344587487     "Composite as Heck" Oct 2017 15528 Posts gettext looks good with the drawback of not allowing for swapped arguments. It has the benefit of being a de-facto standard, and as it is the standard it makes me think that swapped arguments aren't such a big deal. It does require being compiled in (because you can't safely expose raw format strings to the user) which is a drawback given that prime95 isn't distributed repository-style (there is a repo mirror which is nice: https://github.com/shafferjohn/Prime95 ), but I don't think requiring compilation is a dealbreaker. What I'm not so keen on is the english format string being used as the id in all of the translation .pot files, it's a bit of a maintenance nightmare (not that it matters too much with a stable codebase like Prime95) but there's probably tooling to help with that. From what I can tell .rc files can be used for compile-time localisation of the gui with some caveats (and potentially heavy refactorisation). Despite being ancient nonsense cooked up by the devil they can use UTF-8 encoding by using code page 65001 ( https://devblogs.microsoft.com/oldne...7-00/?p=102569 ) so there is potential, untested as I don't think the gui can be compiled without MSVC(?). Worth someone testing. Unless there's some other way to localise the gui without .rc I think a split solution of multiple rc's and gettext/json is as good as it gets, short of doing something crazy. edit: There is a way for gettext to have swapped arguments but it's complicated enough to be misused and may not be portable. IMO it should be added to the large pile of libc constructs not to be touched by anyone ever for any reason, but it's interesting that it exists: Code:  The arguments must correspond properly (after type promo‐ tion) with the conversion specifier. By default, the argu‐ ments are used in the order given, where each '*' (see Field width and Precision below) and each conversion specifier asks for the next argument (and it is an error if insuffi‐ ciently many arguments are given). One can also specify ex‐ plicitly which argument is taken, at each place where an ar‐ gument is required, by writing "%m$" instead of '%' and "*m$" instead of '*', where the decimal integer m denotes the position in the argument list of the desired argument, indexed starting from 1. Thus, printf("%*d", width, num); and printf("%2$*1$d", width, num); are equivalent. The second style allows repeated references to the same argument. The C99 standard does not include the style using '$', which comes from the Single UNIX Specifica‐ tion. If the style using '$' is used, it must be used throughout for all conversions taking an argument and all width and precision arguments, but it may be mixed with "%%" formats, which do not consume an argument. There may be no gaps in the numbers of arguments specified using '\$'; for example, if arguments 1 and 3 are specified, argument 2 must also be specified somewhere in the format string. Last fiddled with by M344587487 on 2021-08-14 at 13:44
 2021-08-21, 03:35 #43 Happy5214     "Alexander" Nov 2008 The Alamo City 30816 Posts You might want to see if you can get this on https://translatewiki.net/ or something similar and let them handle some of the translation work. I don't know if the "open-source but not free" nature of P95/mprime is an issue. They would get the format strings and translate them into any number of languages as a volunteer effort. Edit: Don't let the home page fool you. It looks spammy, but the back-end pages (MediaWiki-style) are the same as from when I used to contribute years ago. Last fiddled with by Happy5214 on 2021-08-21 at 03:37 Reason: Address possible concerns

 Similar Threads Thread Thread Starter Forum Replies Last Post kriesel kriesel 8 2021-09-13 16:45 Dr Sardonicus Lounge 28 2018-10-10 19:52 BillMMar Information & Answers 6 2010-05-02 22:23 Orgasmic Troll Lounge 2 2005-11-29 16:52 delta_t Software 5 2002-12-06 17:36

All times are UTC. The time now is 23:56.

Sat Jan 22 23:56:49 UTC 2022 up 183 days, 18:25, 0 users, load averages: 1.69, 1.27, 1.14