 mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Programming (https://www.mersenneforum.org/forumdisplay.php?f=29)

 paulunderwood 2015-08-29 22:25

GWNUM program

This is another call for a simple 50ish line program to be written with GWNUM to implement Tony Reix's Digraph for 2^n-3:

[CODE]for(n=4, 10000, N=2^n-3; S0=8; x=S0; for(i=1, n-1, x=Mod(x^2-2,N)); if(x==S0, print1(n,", ")))[/CODE]

To reiterate, the code should look something like:
[LIST][*]include GWNUM[*]read some parameter(s) from a file or a decimal string[*]initialize a few things[*]loop over bits including mults, squarings, additions etc. including (special) mod reductions[*]test a condition[*]report result to screen and file[/LIST]
The reason is to write a small program to illustrate the use of GWNUM. I know we could whittle down the LLR source code but I though some one could just [I]write[/I] a program. :smile:

 henryzz 2015-08-31 08:39

There are two possibilities. A pfgw script could do what you want. Another option is talking to Jean about adding it to llr.
Both would be much easier than a new program.

 danaj 2015-08-31 12:44

[QUOTE=henryzz;409251]There are two possibilities. A pfgw script could do what you want. Another option is talking to Jean about adding it to llr.
Both would be much easier than a new program.[/QUOTE]

From my position, the specific program Paul asked for was an *example*. The point isn't to solve the Reix test, but to illustrate writing a very simple gwnum program. It's "Hello World" so someone familiar with GMP could get a good idea what to do.

I, for instance, am more interested in doing a BPSW test, then adding a nextprime / prevprime framework. I'm not interested in calling the PFGW executable for each value (Niceky's cglp4 does this), and I don't think a PFGW script will do this properly.

I admit there is a "Dear Lazyweb" prefix to this request.

 paulunderwood 2015-09-07 05:29

I have hacked it! To get it running I needed to "make" llr64 in order to get gwnum.a in place. The following script is put in the root directory of the LLR source tree and compiled with:

[CODE]gcc s2.c gwnum/gwnum.a gwnum/gwnum.ld -lm -lpthread -lstdc++[/CODE]

s2.c:
[CODE]#include <stdio.h>
#include <string.h>
#include "gwnum/giants.h"
#include "gwnum/gwnum.h"
#include "gwnum/gwcommon.h"

FILE *fd;
char string;
int i, a = 0; // We want jacobi(a^2-4,n)==-1
unsigned long int LEN;
unsigned long int para_a = {0}, para_b = {2}; // n == 3 (mod 4), b = a+2
unsigned long int init_s = {1}, init_t = {2};
giant n, p, s, t, z; // p = n+1
gwnum wn, wp, ws, wt, wx, wz; // wx, wz temps
gwhandle *gwdata;

int main ( int argc, char *argv[] ) {
fd = fopen ( argv, "r" );
fscanf ( fd, "%s", string );
fclose ( fd );

LEN = strlen ( string ); LEN = ( LEN >> 2 ) + 8;
n = newgiant ( LEN ); p = newgiant ( LEN ); s = newgiant ( LEN );
t = newgiant ( LEN ); z = newgiant ( LEN );

gwdata = (gwhandle*) malloc ( sizeof ( gwhandle ) );
gwinit (gwdata);
ctog ( string, n );
gwsetup_general_mod_giant( gwdata, n );
wn = gwalloc ( gwdata ); wp = gwalloc ( gwdata ); ws = gwalloc ( gwdata );
wt = gwalloc ( gwdata ); wx = gwalloc ( gwdata ); wz = gwalloc ( gwdata );

itog ( 1, p ); addg ( n, p );
LEN = bitlen ( p );
binary64togw ( gwdata, init_s, 1, ws );
binary64togw ( gwdata, init_t, 1, wt );
for ( i = LEN-2; i >= 0; i-- ) {
binary64togw ( gwdata, para_a, 1, wz );
gwsafemul ( gwdata, ws, wz );
gwaddquick ( gwdata, wt, wz );
gwaddquick ( gwdata, wt, wz );
gwsafemul ( gwdata, ws, wz );
gwsub3quick ( gwdata, wt, ws, wx );
gwaddquick ( gwdata, ws, wt );
gwsafemul ( gwdata, wx, wt );
gwcopy ( gwdata, wz, ws );
if ( bitval ( p, i ) ) {
binary64togw ( gwdata, para_b, 1, wz );
gwsafemul ( gwdata, ws, wz );
gwaddquick ( gwdata, wt, wz );
gwaddquick ( gwdata, wt, wt );
gwsubquick ( gwdata, ws, wt );
gwcopy ( gwdata, wz, ws );
}
}
itog ( 2*a+5, z );
gwtogiant ( gwdata, wt, t );
subg ( z, t );
if ( gwiszero ( gwdata, ws ) && isZero ( t ) )
printf ( "Likely prime!\n" );
else
printf ( "Not prime :(\n" );
}
//gwtogiant( gwdata, ws, s );
//gtoc( s, string, 100 );
//printf( "%s\n", string );
//gcc s2.c gwnum/gwnum.a gwnum/gwnum.ld -lm -lpthread -lstdc++
//http://www.mersenneforum.org/showpost.php?p=388342&postcount=1[/CODE]

Any ideas on how to speed up what is already a fast script -- thanks to GW and JP and the late RC -- will be welcome.

At the moment the program requires a number (==3 (mod 4)) to be in a file. Will I need to write my own jacobi() function?

 Prime95 2015-09-07 13:51

[QUOTE=paulunderwood;409763]
Any ideas on how to speed up what is already a fast script [/QUOTE]

gwset_general_mod is the slowest way to use gwnum. gwnum does better testing numbers of the form k*b^n+c. My understanding is you are testing numbers where k=1, b=2, c=-3.

The last in a series of gwadds or gwsubs should not be of the "quick" variety.

Pull the binarytogw out of the loop, only do it once. Consider pre-FFTing the value (with gwfft) then use gwfftmul instead of gwsafemul.

The last gwcopy to wz from ws seems pointless.

 paulunderwood 2015-09-07 14:04

[QUOTE=Prime95;409781]gwset_general_mod is the slowest way to use gwnum. gwnum does better testing numbers of the form k*b^n+c. My understanding is you are testing numbers where k=1, b=2, c=-3.
[/quote]

Sorry for the confusion; This program implements my 2 selfridge combined Fermat PRP and Lucas PRP test, not Tony's idea. It is meant to be illustrative of what can be done with gwnum.

[quote]
The last in a series of gwadds or gwsubs should not be of the "quick" variety.
[/quote]

As far as I can see, since the final bit of p=n+1 is zero, the if is not entered into and so the mod is handled by the multiplication functions.

[quote]
Pull the binarytogw out of the loop, only do it once. Consider pre-FFTing the value (with gwfft) then use gwfftmul instead of gwsafemul.
[/quote]

[quote]
The last gwcopy to wz from ws seems pointless.[/QUOTE]

I think it is needed because I am altering ws and wt in each loop. However, with some clever trickery I might be able to eliminate it. :smile:

 Prime95 2015-09-07 19:28

[QUOTE=paulunderwood;409784]As far as I can see, since the final bit of p=n+1 is zero, the if is not entered into and so the mod is handled by the multiplication functions.[/QUOTE]

If you do an addquick or subquick before a gwmul (or gwsafemul, gwsquare, etc.) then you may get a roundoff error in the multiplication.

 paulunderwood 2015-09-08 06:05

[QUOTE=Prime95;409812]If you do an addquick or subquick before a gwmul (or gwsafemul, gwsquare, etc.) then you may get a roundoff error in the multiplication.[/QUOTE]

I now see what you wrote. Thanks for being patient. I have not yet used gwmul() etc.

I found gwiszero() to be buggy.

The new code for s2.c is:
[CODE]#include <stdio.h>
#include <string.h>
#include "gwnum/giants.h"
#include "gwnum/gwnum.h"
#include "gwnum/gwcommon.h"

FILE *fd;
char string;
float a = 0.0, b = 2.0; // We want minimum a: jacobi(a^2-4,n)==-1, b = a+2
int i, LEN;
unsigned init_s = {1}, init_t = {2};
giant n, p, s, t, z; // p = n+1
gwnum ws, wt, wx, wz; // wx, wz temps
gwhandle *gwdata;

int main ( int argc, char *argv[] ) {
fd = fopen ( argv, "r" );
fscanf ( fd, "%s", string );
fclose ( fd );

LEN = strlen ( string ); LEN = ( LEN >> 2 ) + 8;
n = newgiant ( LEN ); p = newgiant ( LEN );
s = newgiant ( LEN ); t = newgiant ( LEN ); z = newgiant ( LEN );

gwdata = (gwhandle*) malloc (sizeof ( gwhandle ) );
gwinit ( gwdata );
ctog ( string, n );
gwsetup_general_mod_giant( gwdata, n );
ws = gwalloc ( gwdata ); wt = gwalloc ( gwdata );
wx = gwalloc ( gwdata ); wz = gwalloc ( gwdata );

binarytogw ( gwdata, init_s, 1, ws );
binarytogw ( gwdata, init_t, 1, wt );
itog ( 1, p ); addg ( n, p );
LEN = bitlen ( p );
for ( i = LEN-2; i >= 0; i-- ) {
gwcopy ( gwdata, ws, wz );
gwsmallmul ( gwdata, a, wz );
gwaddquick ( gwdata, wt, wz );
gwadd ( gwdata, wt, wz );
gwsafemul ( gwdata, ws, wz );
gwsub3 ( gwdata, wt, ws, wx );
gwadd ( gwdata, ws, wt );
gwsafemul ( gwdata, wx, wt );
if ( bitval ( p, i ) ) {
gwcopy ( gwdata, wz, wx );
gwsmallmul ( gwdata, b, wx );
gwadd ( gwdata, wt, wx );
gwaddquick ( gwdata, wt, wt );
gwsub ( gwdata, wz, wt );
gwcopy ( gwdata, wx, ws );
} else gwcopy ( gwdata, wz, ws );
}
gwtogiant ( gwdata, ws, s );
gwtogiant ( gwdata, wt, t );
itog ( 2*(int)a+5, z );
subg ( t, z );
if ( isZero ( s ) && isZero ( z ) )
printf ( "Likely prime!\n" );
else
printf ( "Not prime :(\n" );
}
//gwtogiant( gwdata, ws, z );
//gtoc( z, string, 100 );
//printf( "s%s\n", string );
//gcc s2.c gwnum/gwnum.a gwnum/gwnum.ld -lm -lpthread -lstdc++
//http://www.mersenneforum.org/showpost.php?p=388342&postcount=1[/CODE]

 paulunderwood 2015-09-08 18:45

I have changed the interface slightly; You now must enter the value of "a" -- minimal a: jacobi(a^2-4,n)==-1 -- on the command line. The interface is not important to this exercise. This is my final version.

I have re-written the main loop, getting rid of some gwcopy(). :smile:

[CODE]#include <stdio.h>
#include <string.h>
#include "gwnum/giants.h"
#include "gwnum/gwnum.h"
#include "gwnum/gwcommon.h"

FILE *fd;
char string;
float a , b;
int i, LEN;
unsigned init_s = {1}, init_t = {2};
giant n, p, s, t, z; // p = n+1
gwnum ws, wt, wx, wz; // wx, wz temps
gwhandle *gwdata;

int main ( int argc, char *argv[] ) {
if( argc != 3 ) {
printf( "Usage: s2 file_name a (where a is min: jacobi(a^2-4,n)==-1)\n" );
return;
}
fd = fopen ( argv, "r" );
fscanf ( fd, "%s", string );
fclose ( fd );
a = (float)atoi ( argv ); b = a + 2.0;

LEN = strlen ( string ); LEN = ( LEN >> 2 ) + 8;
n = newgiant ( LEN ); p = newgiant ( LEN );
s = newgiant ( LEN ); t = newgiant ( LEN ); z = newgiant ( LEN );

gwdata = (gwhandle*) malloc (sizeof ( gwhandle ) );
gwinit ( gwdata );
ctog ( string, n );
gwsetup_general_mod_giant( gwdata, n );
ws = gwalloc ( gwdata ); wt = gwalloc ( gwdata );
wx = gwalloc ( gwdata ); wz = gwalloc ( gwdata );

binarytogw ( gwdata, init_s, 1, ws );
binarytogw ( gwdata, init_t, 1, wt );
itog ( 1, p ); addg ( n, p );
LEN = bitlen ( p );
gwcopy ( gwdata, ws, wz );
for ( i = LEN-2; i >= 0; i-- ) {
gwcopy ( gwdata, wz, ws );
gwsmallmul ( gwdata, a, wz );
gwaddquick ( gwdata, wt, wz );
gwadd ( gwdata, wt, wz );
gwsafemul ( gwdata, ws, wz );
gwsub3 ( gwdata, wt, ws, wx );
gwadd ( gwdata, ws, wt );
gwsafemul ( gwdata, wx, wt );
if ( bitval ( p, i ) ) {
gwcopy ( gwdata, wz, ws );
gwsmallmul ( gwdata, b, wz );
gwadd ( gwdata, wt, wz );
gwaddquick ( gwdata, wt, wt );
gwsubquick ( gwdata, ws, wt );
}
}
gwtogiant ( gwdata, wz, s );
gwtogiant ( gwdata, wt, t );
itog ( 2*(int)a+5, z );
subg ( t, z );
if ( isZero ( s ) && isZero ( z ) )
printf ( "Likely prime!\n" );
else
printf ( "Not prime :(\n" );
}
//gwtogiant( gwdata, ws, z );
//gtoc( z, string, 100 );
//printf( "s%s\n", string );
//gcc -o s2 s2.c gwnum/gwnum.a gwnum/gwnum.ld -lm -lpthread -lstdc++
//http://www.mersenneforum.org/showpost.php?p=388342&postcount=1[/CODE]

 paulunderwood 2016-04-13 18:53

I added [c]gwset_num_threads ( gwdata, 4 );[/c] to the above program and ran it against [URL="http://www.mersenneforum.org/showpost.php?p=418996&postcount=12"]this 971486-bit number[/URL]. The percentage usage was heavily dependent on how many other tasks I was running. After killing a few, like Iceweasel, I noticed that the CPU usage went up to 255% of an idle 4770k (no HT). Can I expect more CPU usage from big mega-primes/PRPs? Here is the result (with some processes being killed throughout the run):

[CODE]Likely prime!

real 29m42.842s
user 72m11.292s
sys 2m36.888s
[/CODE]

 paulunderwood 2016-04-14 16:33

[QUOTE=paulunderwood;431487]I added [c]gwset_num_threads ( gwdata, 4 );[/c] to the above program and ran it against [URL="http://www.mersenneforum.org/showpost.php?p=418996&postcount=12"]this 971486-bit number[/URL]. The percentage usage was heavily dependent on how many other tasks I was running. After killing a few, like Iceweasel, I noticed that the CPU usage went up to 255% of an idle 4770k (no HT). Can I expect more CPU usage from big mega-primes/PRPs?[/QUOTE]

To answer my own question: I tested Anand Nair's mega-PRP (1042896 decimal digits) Mersenne co-factor ((2^3464473-1)/604874508299177) using [c]gwsetup()[/c] for special mod for which I change the interface and added [c]modg ( n, s );[/c] after the main loop to handle the fraction. The computation used 224% of CPU and resulted in:

[CODE]Likely prime!

real 123m28.272s
user 269m14.144s
sys 7m14.316s
[/CODE]

As this test is doing two multiplications per loop, would it be possible/advisable to thread them? if so, how? :smile:

All times are UTC. The time now is 21:07.