mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GMP-ECM (https://www.mersenneforum.org/forumdisplay.php?f=55)
-   -   64-bit GMP-ECM on Apple G5/OS X v10.4 (https://www.mersenneforum.org/showthread.php?t=4061)

 PBMcL 2005-04-30 23:28

64-bit GMP-ECM on Apple G5/OS X v10.4

1 Attachment(s)
Apple's just-released OS X v10.4 supports true 64-bit computing on its G5 (PowerPC970) models. The attached G5_64bit_GMP-ECM.zip archive contains all of the instructions, patches, and extra files you'll need to build and install 64-bit versions of GMP-4.1.4 and ECM-6.0.1 on a G5.

The improvement over 32-bit code is fairly dramatic; here are timing comparisons for some Cunningham cofactors (6^329 - 1, 2^833 + 1, 5^421 + 1, 2^ 2018 + 1, and 10^386 + 1) of various lengths:

32-bit version of GMP-ECM on Apple G5, 2.5 GHz, OS X 10.3.8:

Input number is 65030090232295456717...09355134587611097719 (150 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=513591587
Step 1 took 70190ms
Step 2 took 29210ms
Input number is 11846804646723081354...15329989531005685163 (200 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=3742068876
Step 1 took 88700ms
Step 2 took 41500ms
Input number is 74153397868455467120...13316867740036509963 (250 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=1201248243
Step 1 took 155260ms
Step 2 took 50580ms
Input number is 45306169533352784567...10063050866528133509 (300 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=3821960013
Step 1 took 228200ms
Step 2 took 64230ms
Input number is 68760637088087042795...66376993609358157449 (348 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=3555916332
Step 1 took 293210ms
Step 2 took 81590ms

64-bit version of GMP-ECM on Apple G5, 2.5 GHz, OS X 10.4:

Input number is 65030090232295456717...09355134587611097719 (150 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=861374654
Step 1 took 30948ms
Step 2 took 16502ms
Input number is 11846804646723081354...15329989531005685163 (200 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=3093777225
Step 1 took 40605ms
Step 2 took 22657ms
Input number is 74153397868455467120...13316867740036509963 (250 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=2753020307
Step 1 took 63725ms
Step 2 took 27288ms
Input number is 45306169533352784567...10063050866528133509 (300 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=4290308355
Step 1 took 90626ms
Step 2 took 35623ms
Input number is 68760637088087042795...66376993609358157449 (348 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=366885523
Step 1 took 120730ms
Step 2 took 43114ms

Post here or contact me if you have any problems.

Phil McLaughlin

 rogue 2005-05-01 20:11

Thanks. I'll try it out. I haven't used 64-bit GMP on PPC before. Did you run some tests to verify that it find expected factors?

BTW, I think that mpn_addmul_1() can be dramatically improved to use a model similar to the 32-bit code. I haven't seen the GMP 4.2 code, so it might already have such an improvement.

 PBMcL 2005-05-01 21:41

[QUOTE=rogue]Thanks. I'll try it out. I haven't used 64-bit GMP on PPC before. Did you run some tests to verify that it find expected factors?

BTW, I think that mpn_addmul_1() can be dramatically improved to use a model similar to the 32-bit code. I haven't seen the GMP 4.2 code, so it might already have such an improvement.[/QUOTE]

GMP passes all 'make check' tests, and ECM-6.0.1 also passes its 'make check' tests, which include a fair amount of factor finding. I did test a couple of known factorizations, and it worked, but not much beyond that.

It may be worthwhile to improve the code if you can, since the release date of 4.2 and status of any improvements for 64-bit PPC are unknown.

Phil

 rogue 2005-05-01 23:16

[QUOTE=PBMcL]GMP passes all 'make check' tests, and ECM-6.0.1 also passes its 'make check' tests, which include a fair amount of factor finding. I did test a couple of known factorizations, and it worked, but not much beyond that.

It may be worthwhile to improve the code if you can, since the release date of 4.2 and status of any improvements for 64-bit PPC are unknown.
Phil[/QUOTE]

I just installed Tiger and applied your changes. Everything appears to be working correctly. GCC 4.0 has some changes for socket.h that I needed to apply to ECMNet, but nothing significant.

One note is that you have the line "sudo gcc_select 4" in one of your readmes. I think that should be "sudo gcc_select -v 4".

I hope that GMP 4.2 is out soon. I would like to try, but I haven't much time as I have my finger in too many pots at this time. I think all that you need to do is take powerpc32/mul_1.asm, copy it to powerpc64 and then change the code to use the appropriate 64-bit instructions.

 PBMcL 2005-05-02 00:08

[QUOTE=rogue]I just installed Tiger and applied your changes. Everything appears to be working correctly. GCC 4.0 has some changes for socket.h that I needed to apply to ECMNet, but nothing significant.

One note is that you have the line "sudo gcc_select 4" in one of your readmes. I think that should be "sudo gcc_select -v 4".

I hope that GMP 4.2 is out soon. I would like to try, but I haven't much time as I have my finger in too many pots at this time. I think all that you need to do is take powerpc32/mul_1.asm, copy it to powerpc64 and then change the code to use the appropriate 64-bit instructions.[/QUOTE]

Glad to hear it works! I don't use ECMnet, but others may want to know what you had to do.

You are correct. The "gcc_select 4" command came from an older Apple PDF document (64bitporting.pdf) from last year. Now it says "gcc_select 4.0". But I believe this is the default in OS X 10.4/Xcode 2.0 anyway.

I'll try modifying the 32-bit mul_1.asm to 64-bit. If there is a significant speed boost, I'll post it here.

Phil

 PBMcL 2005-06-04 06:12

Improved assembly code

1 Attachment(s)
As Mark suggested, I've upgraded the GMP-4.1.4 addmul_1.c, mul_1.c, and submul_1.c files with improved assembly blocks. The full package of files and patches is attached.

Here are the new GMP-ECM timings for the test numbers given in the original post above:

Input number is 65030090232295...7611097719 (150 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=2091664324
Step 1 took 28605ms
Step 2 took 15620ms
Input number is 11846804646723...1005685163 (200 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=2433697906
Step 1 took 36737ms
Step 2 took 20987ms
Input number is 741533978684...40036509963 (250 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=232452003
Step 1 took 53508ms
Step 2 took 25151ms
Input number is 453061695333...528133509 (300 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=2709846699
Step 1 took 78669ms
Step 2 took 32485ms
Input number is 68760637088...09358157449 (348 digits)
Using B1=3000000, B2=4016636513, polynomial Dickson(6), sigma=2374092456
Step 1 took 104431ms
Step 2 took 39437ms

 All times are UTC. The time now is 20:17.