mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-08-07, 12:45   #23
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

32·52·13 Posts
Default

In my Wiki I've the option to include all languages by using an own page.

For example I've just created a page for the readme.txt file with just pre'ed the text (not all is included).
The path of this page is only
Quote:
Readme.txt
To include another language, a subpage with the language-code is used:
So the German page is found under
Quote:
Readme.txt/de
The first line in those pages is the template to show all available languages and to switch easily between those.
kar_bon is offline   Reply With Quote
Old 2021-08-07, 16:23   #24
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

863 Posts
Default

Personally I don't see the point of localising the program, I know it's an easy thing to say as a native english speaker but adding technical debt for minimal benefit doesn't seem worthwhile. That said like any good internet slacker if it has to be done I have opinions on how to do it:

  • Enforce strict UTF-8 as the character encoding, it's the winner of the format war all other encodings need not apply. Validation is simple, editing is natural, and at least with a Linux terminal display seems to just work after setting a UTF-8 locale (my terminal is UTF-8 by default albeit en_GB, no idea what significance if any there is to printing with regional locales). Alternatively ignore locale and instead convert non-ascii code points to "\uxxxE" format if that turns out to increase portability. Either way whether it actually displays properly is largely out of your hands AFAIK, there's probably a cross-platform display library if it isn't trivially portable
  • +1 to iso dates
  • It looks ugly in code, but as suggested replacing individual words/phrases and static sentences could make sense. The one thing you do not want to do is expose entire printf format strings replacement-codes-and-all to users, that would be a nightmare to sanitise and would be a trivial way to break the program
  • I'd be inclined to compile the translations into the binaries instead of exposing user-breakable ini files to users. Set up a git repo with the english reference that translators can do pull requests to to add translations, the validation could also be done at compile time
  • If compiling the translations in you could potentially allow translation of full printf format strings containing replacement codes, but the parameters would still need to be in the same order
  • Variable args in any order would require translators to define the entire printf command instead of just part/all of a format string. To do that while allowing the language to be chosen at runtime probably requires function pointer wrappers for every printf command unless there's magic to be done (macro or otherwise), essentially the translation file creates a bank of output functions for a language that can be switched to at runtime if that language is chosen
  • Alternatively, baking a single language in at compile time could be done easily by just replacing all printf's with unique defines that are defined in the translation file, easier format and implementation but a binary per language is not ideal.
  • Please no "british english" translation, I know american is a strange dialect but I think we can muddle through
M344587487 is offline   Reply With Quote
Old 2021-08-07, 17:37   #25
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,807 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Personally I don't see the point of localising the program... That said like any good internet slacker if it has to be done I have opinions on how to do it
How someone else ought do it, right? And you've plenty company.
I think there are better uses for George's time or other program authors than generating x OS-specific times y localization-specific executables and zip files and uploading the lot. And redoing that at every version update or bug fix update.
Suppose the localization was in a plain text file, and included a line of special secret security sauce that was content-dependent.
If the file is called for by a localization line in local.txt, and passes the security check, it gets loaded. If it fails the security check the program falls back to the author's originally coded language, which is typically American english.
That would allow x OS-specific images to load any one of y-1 or y (depending on implementation) localization files upon user choice, and prevent end user tampering.

Last fiddled with by kriesel on 2021-08-07 at 17:42
kriesel is offline   Reply With Quote
Old 2021-08-07, 19:20   #26
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

863 Posts
Default

Quote:
Originally Posted by kriesel View Post
How someone else ought do it, right? And you've plenty company.
Of course, as I said I don't think it's necessary so am unlikely to be inspired to implement it, but it doesn't stop me from reasoning out how it could be done.

Quote:
Originally Posted by kriesel View Post
I think there are better uses for George's time or other program authors than generating x OS-specific times y localization-specific executables and zip files and uploading the lot. And redoing that at every version update or bug fix update.
Suppose the localization was in a plain text file, and included a line of special secret security sauce that was content-dependent.
If the file is called for by a localization line in local.txt, and passes the security check, it gets loaded. If it fails the security check the program falls back to the author's originally coded language, which is typically American english.
That would allow x OS-specific images to load any one of y-1 or y (depending on implementation) localization files upon user choice, and prevent end user tampering.
Compiling localisations into the binary does not mean a binary per language, only one of the bullet points mentioned that and not in a favourable way. The third from bottom bullet point is what I settled on as a good enough compromise between programmer-ease and user-ease. Once an english translation file was implemented a translator would copy it, rename it from en to whatever (the functions/variables/array-pointers can be auto-mangled with __FILE__ if necessary), then the translator proceeds to edit only the s/printf lines within the functions to do the translation and they are done. The wrapper function names and parameters remain untouched, but passed to the wrapped s/printf in any order by the translator as required. It would only take a few comments to teach a non-programmer how to edit the file to do the translation, validation from translator-error would be largely free thanks to the compiler and a cursory glance can weed out anything malicious. Whenever a new language is ready it just requires a few minor edits to the code to include the new translation file and register the language as an option (that could be automated too but probably not worth the effort).

What you're suggesting could work but involves "special secret security sauce", special handling and a custom format for argument ordering. You end up with something far more complicated than just sucking all output into hot-swappable function pointers which is all my suggestion boils down to.
M344587487 is offline   Reply With Quote
Old 2021-08-07, 21:33   #27
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,807 Posts
Default

So, if I understand you correctly, using function pointers, the source code contains y languages times z small message print functions, where z is the number of different unique print formats in the nonlocalized program. To add or correct translation for a language requires x recompiles, one for each OS. The program source gets modified to add or modify a language's localization, and the program source contains all message formats times all supported languages. To extend features in the program would require getting y translations for the y languages, of each of any new phrases or messages added, then recompiling for x OSes with the y * delta z print functions added.

Conversely, the external localization data file approach does not require recompile of the executables to add or correct a language's translation. It amounts to pulling in at run time a few variable values, a validation string, and an associative array. The file probably should have a human-readable indicator which application version & build it is intended for too. An end user could download or place in a working directory only the one or two localization data files wanted. Executables plus localization data files per user are likely to be smaller than the function-pointers-and-all-languages-supported-in-executable approach. Translations could be updated independently, allowing program logic to be tested in beta while there are only placeholders or early machine-translated versions for newly added messages relating to new features.

If someone were to throw together a shell script or equivalent to output all occurrences of printf in the mprime source code files, count them, sort them, and eliminate duplicates, counting what remains, that could be eye-opening. I manually counted 27 "printf" (actually sprintf) in v30.5b2's 20KB cert.c alone, one of the smaller of 46 .c source files totaling nearly 10MB. There are also 24 .cpp files totaling another 190KB.

It will be interesting to see the choices made by George or whoever tackles adding localization to their programs. Thanks for your comments.

Something I don't recall being mentioned previously is multilanguage support on the mersenne.org web pages.
I do not plan to do any localization of the reference blog. It is quite large already in one language.

Last fiddled with by kriesel on 2021-08-07 at 21:57
kriesel is offline   Reply With Quote
Old 2021-08-07, 22:19   #28
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101Γ—103 Posts

1001910 Posts
Default

The localization does not need to require compilation for each new language. There aren't that many messages.
Have a single variable in local.ini that is set at install with the preferred language. Have a single file that contains all the items that are language variant, in all the languages. On boot, Prime95 reads the local and then from the language file reads in all the messages in that language. The language file can even have the language choices in the first section. As new languages are added, the language file can be updated with no need to change the code. There can be some minor bit shift or something to prevent the average user from corrupting the language file by manual editing (because they won't understand what they are seeing).
Uncwilly is online now   Reply With Quote
Old 2021-08-07, 22:28   #29
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

863 Posts
Default

There's a few hundred messages at least with english maybe up to 400 or so, spread over sprintf, fprintf, printf, excluding sqlite, json and three of the 4 architecture directories that have identical source. Hard to tell precisely even with grep as there are many calls spread over multiple lines and many with no english so a simple grep won't tell the whole story. cout and snprintf don't appear to be a factor, probably missing some others that are though.

Code:
grep -r -P "[ \t][fs]{0,1}printf" ./
M344587487 is offline   Reply With Quote
Old 2021-08-08, 08:06   #30
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

5·349 Posts
Default

In my opinion, the messages from the software do not need to be translated : most of the concepts are new to the users and even if translated need explanation. One would also need to translate the users possible answers and their treatment, mprime for instance has some dialogues that expect a text answer as "y/n". The proposed localisation work would make it logical to extend it to the results. This in turn would mean the servers routines should also be adapted.

On the other hand, and as some others have already said, the readme and undoc files would benefit from translation. This is a simple thing to do even if it would take time and energy. Those "translations" could also be expanded to explain the concepts, the program messages and settings (even in English ;-)

As a side note amongst the languages proposed I miss the USA's second language : Spanish. It has a user base equivalent to Hindi.

Last fiddled with by S485122 on 2021-08-08 at 08:10 Reason: fiddling after posting ;-(
S485122 is offline   Reply With Quote
Old 2021-08-08, 08:15   #31
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

22×3×11×83 Posts
Default

Quote:
Originally Posted by S485122 View Post
In my opinion, the messages from the software do not need to be translated : most of the concepts are new to the users and even if translated need explanation. One would also need to translate the users possible answers and their treatment, mprime for instance has some dialogues that expect a text answer as "y/n". The proposed localisation work would make it logical to extend it to the results. This in turn would mean the servers routines should also be adapted.

On the other hand, and as some others have already said, the readme and undoc files would benefit from translation. This is a simple thing to do even if it would take time and energy. Those "translations" could also be expanded to explain the concepts, the program messages and settings (even in English ;-)

As a side note amongst the languages proposed I miss the USA's second language : Spanish. It has a user base equivalent to Hindi.
+1
xilman is offline   Reply With Quote
Old 2021-08-08, 14:48   #32
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

132578 Posts
Default

Spanish is indeed widely used; #4 for total speakers, #2 for native speakers. And like English, there are regional differences in Spanish; fortunately less in written than spoken form. https://en.wikipedia.org/wiki/Spanis..._and_varieties Arabic is also common. https://www.k-international.com/blog/learn-a-language/

Number of people per language is not quite the metric we're looking for. (Total speakers is an available proxy for number of readers.)
Something like steepest increase of GIMPS participation or GHD/day gained per unit localization effort expended by the available volunteer pool might be closer to the mark, and very hard to estimate.

A mostly-computerized search I made in prime95 v30.5b2 source code could be summarized as follows.
Search all source files *.c or *.cpp for lines containing strings printf or cout, dump into file each occurrence.
Strip preceding indentation or conditionals or labels
Remove records that are only comments, or that produce JSON format results records, which won't be localized.
Sort the result of the above.
Note there was no cout found.
Remove duplicates to obtain a sorted list of unique printf statements.
Remove lines that appear to be irrelevant to localization considerations. This involves some judgment.
Result is ~482 distinct cases. Probably a low figure, since following lines of multiline printfs get missed using this method, yielding only "sprintf(buf," in several cases that probably differed on their following lines.
I think the actual number is ~490 +-~10% unique print statements relevant to localization.

Occurrence rate of a specific *printf line was seen to vary significantly; maximum was 12. Average is around 1.7, estimating by original file size to final file size comparison.

The grep line posted earlier, while very usefully concise, would have missed all the _tprintf (12) and _stprintf (1) occurring in the prime95 source if I read it correctly. Or in Gpuowl, would have missed snprintf, or vaprintf which IIRC was used in some versions.

"The ultimate introductory guide to software localization" advises to place each language's translation in a separate file, and to use a standard such as JSON for the file content's format, determined partly by support in the development environment, which IIRC is MS VS for prime95.

Keeping the executable size smaller will be an advantage for those who run code on small-ram devices such as Raspberry Pis, compute sticks, cell phones, ancient laptops, etc.

I assume localization would apply initially to some future version released. There may be some utility to also backfitting earlier versions later. Note the variety of version numbers versus OS on https://www.mersenne.org/download/, or the common use still of Gpuowl v6.11-380 or -364 or v7.2-53.

Last fiddled with by kriesel on 2021-08-08 at 15:00
kriesel is offline   Reply With Quote
Old 2021-08-08, 15:21   #33
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

22·3·11·83 Posts
Default

Quote:
Originally Posted by kriesel View Post
Spanish is indeed widely used; #4 for total speakers, #2 for native speakers. And like English, there are regional differences in Spanish; fortunately less in written than spoken form. https://en.wikipedia.org/wiki/Spanis..._and_varieties
Indeed. I am (slowly) learning Canario.

Last fiddled with by xilman on 2021-08-08 at 15:54
xilman is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Integrated graphics processors, how to run GIMPS software on them, and why you may not want to kriesel kriesel 8 2021-09-13 16:45
On the origin of language ... Dr Sardonicus Lounge 28 2018-10-10 19:52
Uninstall GIMPS Software? BillMMar Information & Answers 6 2010-05-02 22:23
Body Language Orgasmic Troll Lounge 2 2005-11-29 16:52
GIMPS software for Sony PS/2 Linux? delta_t Software 5 2002-12-06 17:36

All times are UTC. The time now is 06:38.


Tue Oct 26 06:38:18 UTC 2021 up 95 days, 1:07, 0 users, load averages: 1.41, 1.53, 1.70

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.