mersenneforum.org Next-gen Odroid announcement
 Register FAQ Search Today's Posts Mark Forums Read

2019-06-07, 13:39   #45
paulunderwood

Sep 2002
Database er0rr

3,413 Posts

Quote:
 Originally Posted by ewmayer Based on the dates you sent me I am planning to ship it on Monday the 10th ... no idea what transit time to expect, but it sounded like your main concern was that it not arrive earlier than yourself. Until then, it's continuing to crunch a first-time LL test on the 4xa73 core and a DC on the 2xa53 ... there have been no signs of throttling, the big passive heatsink appears to suffice for all reasonable ambient temperatures.
My N2 is here, but I am awaiting a SDcard for it. Ernst, how long does it take to do a wavefront LL test on it? I will be looking for some number crunching for it. Is it quite easy to download your code and get it running on a N2? Instructions please!

Last fiddled with by paulunderwood on 2019-06-07 at 13:40

2019-06-07, 19:35   #46
ewmayer
2ω=0

Sep 2002
República de California

2×13×443 Posts

Quote:
 Originally Posted by paulunderwood My N2 is here, but I am awaiting a SDcard for it. Ernst, how long does it take to do a wavefront LL test on it? I will be looking for some number crunching for it. Is it quite easy to download your code and get it running on a N2? Instructions please!
[Edit: I just realized that the cfg-file data I copied below are from my "advance peek" v19 binary, so you might as well just use that one from the get-go - cf. the attachment at bottom.]

Yes, it's very easy to get up and running - download the ARMv8 binary linked at the readme page, set up a pair of rundirs, one for the job which will run on the big and little CPUs, respectively. On the N2, the two a53 cores are numbered 0-1 in /proc/cpuinfo and the four a73 cores are 2-5 - I suggest you double-check that in your own copy of said file, because it's crucial to getting the most out of your Mlucas runs. So say you call the a73-rundir 'run0' and the a53 one 'run1'. To create the optimal-FFT-config files in each:

In run0: [path to exec] -s m -iters 100 -cpu 2:5

In run1: [path to exec] -s m -iters 100 -cpu 0:1

I suggest doing these sequentially, to avoid any timing weirdness from the short timing subtests done by each self-test somehow throwing each other off. By way of reference, here is the 4a73 mlucas.cfg from my N2 self-tests:
Code:
18.0
2048  msec/iter =   37.48  ROE[avg,max] = [0.325223214, 0.375000000]  radices = 256 16 16 16  0  0  0  0  0  0
2304  msec/iter =   43.51  ROE[avg,max] = [0.287946429, 0.343750000]  radices = 288 16 16 16  0  0  0  0  0  0
2560  msec/iter =   48.17  ROE[avg,max] = [0.275669643, 0.312500000]  radices = 160 16 16 32  0  0  0  0  0  0
2816  msec/iter =   54.51  ROE[avg,max] = [0.259933036, 0.312500000]  radices = 352 16 16 16  0  0  0  0  0  0
3072  msec/iter =   60.39  ROE[avg,max] = [0.316294643, 0.400000000]  radices = 192 16 16 32  0  0  0  0  0  0
3328  msec/iter =   65.47  ROE[avg,max] = [0.280580357, 0.375000000]  radices = 208 16 16 32  0  0  0  0  0  0
3584  msec/iter =   69.10  ROE[avg,max] = [0.325000000, 0.375000000]  radices = 224 16 16 32  0  0  0  0  0  0
3840  msec/iter =   75.63  ROE[avg,max] = [0.275892857, 0.312500000]  radices = 240 16 16 32  0  0  0  0  0  0
4096  msec/iter =   79.60  ROE[avg,max] = [0.267633929, 0.343750000]  radices = 256 16 16 32  0  0  0  0  0  0
4608  msec/iter =   91.84  ROE[avg,max] = [0.284375000, 0.375000000]  radices = 288 16 16 32  0  0  0  0  0  0
5120  msec/iter =  104.83  ROE[avg,max] = [0.323437500, 0.406250000]  radices = 320 16 16 32  0  0  0  0  0  0
5632  msec/iter =  114.77  ROE[avg,max] = [0.228450230, 0.250000000]  radices = 352 16 16 32  0  0  0  0  0  0
6144  msec/iter =  134.42  ROE[avg,max] = [0.240848214, 0.281250000]  radices = 768 16 16 16  0  0  0  0  0  0
6656  msec/iter =  149.31  ROE[avg,max] = [0.266964286, 0.343750000]  radices = 208 32 32 16  0  0  0  0  0  0
7168  msec/iter =  159.99  ROE[avg,max] = [0.228906250, 0.281250000]  radices = 224 32 32 16  0  0  0  0  0  0
7680  msec/iter =  174.89  ROE[avg,max] = [0.252455357, 0.312500000]  radices = 240 32 32 16  0  0  0  0  0  0
Those timings reflect no crunching going on on the a53 CPU - with jobs running on both you can expect a 5-10% timing hit. On my N2 a first-time LL-test with p ~ 89M [5120K FFT] is getting ~110ms/iter 4-threaded on the a73, thus ~4 months per test; a DC with p ~50M [2816K FFT] is getting ~220ms/iter 2-threaded on the a53. The latter core is only half as strong as the quad-a53 on my Odroid C2, so even DCs run painfully slowly on it. The quad Snapdragon CPU in each of my dozen Galaxy S7 broke-o-phones is about the same speed as the N2's quad-a73 core, by way of comparison.

Once you're up and running and have a few checkpoints under your belt, I'll post an "advance peek" v19 binary - same build I recently switched all my ARMs to - which still lacks the PRP support which will go into the final v19 release, but has some speedups related to relaxing the floating-point accuracy requirements for exponents not close to the p_max for each FFT length. I'm getting 2-8% speedup (depending on FFT length, exponent and random run-to-run timing variations) from using the new code, for the ~90% of exponents at each FFT length which are eligible for the accuracy-for-speed tradeoff. From the user perspective, it's a simply drop-in binary replacement, though.
Attached Files
 mlucas_v19.tar.xz (1.52 MB, 47 views)

Last fiddled with by ewmayer on 2019-06-07 at 20:17

2019-06-07, 22:18   #47
paulunderwood

Sep 2002
Database er0rr

341310 Posts

Quote:
 Originally Posted by ewmayer [Edit: I just realized that the cfg-file data I copied below are from my "advance peek" v19 binary, so you might as well just use that one from the get-go - cf. the attachment at bottom.] Yes, it's very easy to get up and running - download the ARMv8 binary linked at the readme page, set up a pair of rundirs, one for the job which will run on the big and little CPUs, respectively. On the N2, the two a53 cores are numbered 0-1 in /proc/cpuinfo and the four a73 cores are 2-5 - I suggest you double-check that in your own copy of said file, because it's crucial to getting the most out of your Mlucas runs. So say you call the a73-rundir 'run0' and the a53 one 'run1'. To create the optimal-FFT-config files in each: In run0: [path to exec] -s m -iters 100 -cpu 2:5 In run1: [path to exec] -s m -iters 100 -cpu 0:1 I suggest doing these sequentially, to avoid any timing weirdness from the short timing subtests done by each self-test somehow throwing each other off. By way of reference, here is the 4a73 mlucas.cfg from my N2 self-tests: Code: 18.0 2048 msec/iter = 37.48 ROE[avg,max] = [0.325223214, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0 2304 msec/iter = 43.51 ROE[avg,max] = [0.287946429, 0.343750000] radices = 288 16 16 16 0 0 0 0 0 0 2560 msec/iter = 48.17 ROE[avg,max] = [0.275669643, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 54.51 ROE[avg,max] = [0.259933036, 0.312500000] radices = 352 16 16 16 0 0 0 0 0 0 3072 msec/iter = 60.39 ROE[avg,max] = [0.316294643, 0.400000000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 65.47 ROE[avg,max] = [0.280580357, 0.375000000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 69.10 ROE[avg,max] = [0.325000000, 0.375000000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 75.63 ROE[avg,max] = [0.275892857, 0.312500000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 79.60 ROE[avg,max] = [0.267633929, 0.343750000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/ihttps://www.mersenneforum.org/mayer/README.html#downloadter = 91.84 ROE[avg,max] = [0.284375000, 0.375000000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 104.83 ROE[avg,max] = [0.323437500, 0.406250000] radices = 320 16 16 32 0 0 0 0 0 0 5632 msec/iter = 114.77 ROE[avg,max] = [0.228450230, 0.250000000] radices = 352 16 16 32 0 0 0 0 0 0 6144 msec/iter = 134.42 ROE[avg,max] = [0.240848214, 0.281250000] radices = 768 16 16 16 0 0 0 0 0 0 6656 msec/iter = 149.31 ROE[avg,max] = [0.266964286, 0.343750000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 159.99 ROE[avg,max] = [0.228906250, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 174.89 ROE[avg,max] = [0.252455357, 0.312500000] radices = 240 32 32 16 0 0 0 0 0 0 Those timings reflect no crunching going on on the a53 CPU - with jobs running on both you can expect a 5-10% timing hit. On my N2 a first-time LL-test with p ~ 89M [5120K FFT] is getting ~110ms/iter 4-threaded on the a73, thus ~4 months per test; a DC with p ~50M [2816K FFT] is getting ~220ms/iter 2-threaded on the a53. The latter core is only half as strong as the quad-a53 on my Odroid C2, so even DCs run painfully slowly on it. The quad Snapdragon CPU in each of my dozen Galaxy S7 broke-o-phones is about the same speed as the N2's quad-a73 core, by way of comparison. Once you're up and running and have a few checkpoints under your belt, I'll post an "advance peek" v19 binary - same build I recently switched all my ARMs to - which still lacks the PRP support which will go into the final v19 release, but has some speedups related to relaxing the floating-point accuracy requirements for exponents not close to the p_max for each FFT length. I'm getting 2-8% speedup (depending on FFT length, exponent and random run-to-run timing variations) from using the new code, for the ~90% of exponents at each FFT length which are eligible for the accuracy-for-speed tradeoff. From the user perspective, it's a simply drop-in binary replacement, though.
Thanks Ernst. I hope to get it up and running soon. I can wait for 4 months for the PRP version Nice website btw.

2019-06-07, 23:05   #48
ewmayer
2ω=0

Sep 2002
República de California

2×13×443 Posts

Quote:
 Originally Posted by paulunderwood Thanks Ernst. I hope to get it up and running soon. I can wait for 4 months for the PRP version Nice website btw.
But there's really no reason to wait for the PRP version - all my various ARM-based crunching devices, including the ones you'd think might be exceedingly unreliable under 24/7 load, the broke-o-phones, have proven to be superbly reliable. The one tweak I made to the v18 release based on the multiple phone DCs was to add handling for a particular data corruption error those appear more prone to than PC-style platforms, but even the DCs interrupted by said error (which proceeded to the next worktodo.in file entry before I added the error-handling logic) later completed with first-test-matching results. PRP+Gerbicz is expected to reduce the rate of bad results, but said rate is very low to begin with.

BTW, for my phones, I am requiring each one to produce 2 matching DC results prior to letting it start first-time-LL-test work. I do that via the priment.py script - first time I run it I use

./*py -d -t 0 -T DoubleCheck -u [uid] -p [pwd]

which creates a 2-entry worktodo.ini file. Then on subsequent invocations (whenever the device in question completes an LL-job of either kind) I use

./*py -d -t 0 -T SmallestAvail -u [uid] -p [pwd]

("-d" enables debug, causing the script to provide some basic informational printing of work-submit and assignment-fetch. "-t 0" means run in single-shot once-only mode, as opposed to the automated every-6-hours mode which is the default.)

Thanks for the thumbs-up on the Readme page - it's a continuing struggle to srike a balance between providing enough info but not overwhelming the new user, I rely on user feedback to help me maintain said balance.

2019-06-07, 23:30   #49
paulunderwood

Sep 2002
Database er0rr

1101010101012 Posts

Quote:
 Originally Posted by ewmayer But there's really no reason to wait for the PRP version - all my various ARM-based crunching devices, including the ones you'd think might be exceedingly unreliable under 24/7 load, the broke-o-phones, have proven to be superbly reliable. The one tweak I made to the v18 release based on the multiple phone DCs was to add handling for a particular data corruption error those appear more prone to than PC-style platforms, but even the DCs interrupted by said error (which proceeded to the next worktodo.in file entry before I added the error-handling logic) later completed with first-test-matching results. PRP+Gerbicz is expected to reduce the rate of bad results, but said rate is very low to begin with. BTW, for my phones, I am requiring each one to produce 2 matching DC results prior to letting it start first-time-LL-test work. I do that via the priment.py script - first time I run it I use ./*py -d -t 0 -T DoubleCheck -u [uid] -p [pwd] which creates a 2-entry worktodo.ini file. Then on subsequent invocations (whenever the device in question completes an LL-job of either kind) I use ./*py -d -t 0 -T SmallestAvail -u [uid] -p [pwd] ("-d" enables debug, causing the script to provide some basic informational printing of work-submit and assignment-fetch. "-t 0" means run in single-shot once-only mode, as opposed to the automated every-6-hours mode which is the default.) Thanks for the thumbs-up on the Readme page - it's a continuing struggle to srike a balance between providing enough info but not overwhelming the new user, I rely on user feedback to help me maintain said balance.
To remove any ambiguity in my terse remarks; I plan to run a first time LL test straight off the bat on the a73 (maybe leaving the a53 free for day-to-day desktop use and running some of my own code). By the time that test has finished, your Gerbicz code should be ready, hopefully.

2019-06-07, 23:43   #50
ewmayer
2ω=0

Sep 2002
República de California

2×13×443 Posts

Quote:
 Originally Posted by paulunderwood To remove any ambiguity in my terse remarks; I plan to run a first time LL test straight off the bat on the a73 (maybe leaving the a53 free for day-to-day desktop use and running some of my own code). By the time that test has finished, your Gerbicz code should be ready, hopefully.
Ah, gotcha - looking forward to seeing your 4xa73 timings and error levels.

2019-06-08, 19:10   #51
paulunderwood

Sep 2002
Database er0rr

65258 Posts

Quote:
 Originally Posted by ewmayer Ah, gotcha - looking forward to seeing your 4xa73 timings and error levels.
Code:
18.0
2048  msec/iter =   39.39  ROE[avg,max] = [0.003125000, 0.375000000]  radices = 128 16 16 32  0  0  0  0  0  0
2304  msec/iter =   44.54  ROE[avg,max] = [0.002785714, 0.375000000]  radices = 144 16 16 32  0  0  0  0  0  0
2560  msec/iter =   48.91  ROE[avg,max] = [0.002387312, 0.281250000]  radices = 160 16 16 32  0  0  0  0  0  0
2816  msec/iter =   55.53  ROE[avg,max] = [0.002627232, 0.312500000]  radices = 176 16 16 32  0  0  0  0  0  0
3072  msec/iter =   61.21  ROE[avg,max] = [0.002651786, 0.375000000]  radices = 192 16 16 32  0  0  0  0  0  0
3328  msec/iter =   65.64  ROE[avg,max] = [0.002812500, 0.312500000]  radices = 208 16 16 32  0  0  0  0  0  0
3584  msec/iter =   70.97  ROE[avg,max] = [0.002535714, 0.281250000]  radices = 224 16 16 32  0  0  0  0  0  0
3840  msec/iter =   76.91  ROE[avg,max] = [0.002471819, 0.281250000]  radices = 240 16 16 32  0  0  0  0  0  0
4096  msec/iter =   81.47  ROE[avg,max] = [0.002280134, 0.281250000]  radices = 256 16 16 32  0  0  0  0  0  0
4608  msec/iter =   94.10  ROE[avg,max] = [0.002476144, 0.281250000]  radices = 288 16 16 32  0  0  0  0  0  0
5120  msec/iter =  107.07  ROE[avg,max] = [0.003209821, 0.375000000]  radices = 320 16 16 32  0  0  0  0  0  0
5632  msec/iter =  129.73  ROE[avg,max] = [0.002598214, 0.312500000]  radices = 176 32 32 16  0  0  0  0  0  0
6144  msec/iter =  141.46  ROE[avg,max] = [0.002475446, 0.281250000]  radices = 192 32 32 16  0  0  0  0  0  0
6656  msec/iter =  152.57  ROE[avg,max] = [0.002642857, 0.312500000]  radices = 208 32 32 16  0  0  0  0  0  0
7168  msec/iter =  164.32  ROE[avg,max] = [0.002260045, 0.250000000]  radices = 224 32 32 16  0  0  0  0  0  0
7680  msec/iter =  178.51  ROE[avg,max] = [0.002350551, 0.281250000]  radices = 240 32 32 16  0  0  0  0  0  0
Having installed Debian+Mate and now running a browser and ssh sessions (in particular to Skype on Intel box), top shows 400-401% usage.

2019-06-08, 21:30   #52
ewmayer
2ω=0

Sep 2002
República de California

263768 Posts

Quote:
 Originally Posted by paulunderwood Code: 18.0 2048 msec/iter = 39.39 ROE[avg,max] = [0.003125000, 0.375000000] radices = 128 16 16 32 0 0 0 0 0 0 2304 msec/iter = 44.54 ROE[avg,max] = [0.002785714, 0.375000000] radices = 144 16 16 32 0 0 0 0 0 0 2560 msec/iter = 48.91 ROE[avg,max] = [0.002387312, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 55.53 ROE[avg,max] = [0.002627232, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 61.21 ROE[avg,max] = [0.002651786, 0.375000000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 65.64 ROE[avg,max] = [0.002812500, 0.312500000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 70.97 ROE[avg,max] = [0.002535714, 0.281250000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 76.91 ROE[avg,max] = [0.002471819, 0.281250000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 81.47 ROE[avg,max] = [0.002280134, 0.281250000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/iter = 94.10 ROE[avg,max] = [0.002476144, 0.281250000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 107.07 ROE[avg,max] = [0.003209821, 0.375000000] radices = 320 16 16 32 0 0 0 0 0 0 5632 msec/iter = 129.73 ROE[avg,max] = [0.002598214, 0.312500000] radices = 176 32 32 16 0 0 0 0 0 0 6144 msec/iter = 141.46 ROE[avg,max] = [0.002475446, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 152.57 ROE[avg,max] = [0.002642857, 0.312500000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 164.32 ROE[avg,max] = [0.002260045, 0.250000000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 178.51 ROE[avg,max] = [0.002350551, 0.281250000] radices = 240 32 32 16 0 0 0 0 0 0 Having installed Debian+Mate and now running a browser and ssh sessions (in particular to Skype on Intel box), top shows 400-401% usage.
Thanks - are those from the official v18 release binary or the advance-peek v19 one I attached above? In particular the 5632K timing points to the former - it's why I added leading radix 352 = 11*32 to v19.

Are you running a full-blown GIMPS assignment now? I'd be interested in seeing a sample of the typical checkpoint timing line from the p*.stat file. (And if you started said run using v18, what effect ctrl-c and restart using the v19 binary has - you could just use the above cfg-file for that, unless you are testing @2816K or 5632K, in which case radix-352 will likely help, timing-wise).

Will you be using the N2 for development work of your own?

Last fiddled with by ewmayer on 2019-06-08 at 21:33

2019-06-08, 22:09   #53
paulunderwood

Sep 2002
Database er0rr

65258 Posts

Quote:
 Originally Posted by ewmayer Thanks - are those from the official v18 release binary or the advance-peek v19 one I attached above? In particular the 5632K timing points to the former - it's why I added leading radix 352 = 11*32 to v19. Are you running a full-blown GIMPS assignment now? I'd be interested in seeing a sample of the typical checkpoint timing line from the p*.stat file. (And if you started said run using v18, what effect ctrl-c and restart using the v19 binary has - you could just use the above cfg-file for that, unless you are testing @2816K or 5632K, in which case radix-352 will likely help, timing-wise). Will you be using the N2 for development work of your own?
I am running v18 with a work fetched from PrimeNet -- first time LL.

Code:
INFO: no restart file found...starting run from scratch.
M9141xxxxx: using FFT length 5120K = 5242880 8-byte floats, initial residue shift
count = 6324947
this gives an average   17.435125541687011 bits per digit
Using complex FFT radices       320        16        16        32
[Jun 08 19:23:42] M914xxxxx Iter# = 10000 [ 0.01% complete] clocks = 00:18:13.72
7 [109.3727 msec/iter] Res64: 771472D5BD75657A. AvgMaxErr = 0.062755114. MaxErr
= 0.085937500. Residue shift count = 24468007.
[Jun 08 19:41:24] M914xxxxx Iter# = 20000 [ 0.02% complete] clocks = 00:17:40.61
6 [106.0617 msec/iter] Res64: 6A3AB4D6D38D864F. AvgMaxErr = 0.062874775. MaxErr
= 0.093750000. Residue shift count = 41145087.
[Jun 08 19:59:16] M914xxxxx Iter# = 30000 [ 0.03% complete] clocks = 00:17:50.35
4 [107.0355 msec/iter] Res64: 6A42564A06E2381C. AvgMaxErr = 0.062869088. MaxErr
= 0.085937500. Residue shift count = 28935869.
[Jun 08 20:17:42] M914xxxxx Iter# = 40000 [ 0.04% complete] clocks = 00:18:21.24
7 [110.1247 msec/iter] Res64: 4F6CF208BAE55456. AvgMaxErr = 0.062931570. MaxErr
= 0.085937500. Residue shift count = 80180192.
[Jun 08 20:35:48] M914xxxxx Iter# = 50000 [ 0.05% complete] clocks = 00:18:03.30
9 [108.3310 msec/iter] Res64: C9F7EABB3783A435. AvgMaxErr = 0.062861539. MaxErr
= 0.085937500. Residue shift count = 84044778.
[Jun 08 20:55:01] M914xxxxx Iter# = 60000 [ 0.07% complete] clocks = 00:19:10.38
9 [115.0389 msec/iter] Res64: DB534D6782A1A68E. AvgMaxErr = 0.062953338. MaxErr
= 0.093750000. Residue shift count = 15509133.
[Jun 08 21:12:44] M914xxxxx Iter# = 70000 [ 0.08% complete] clocks = 00:17:38.15
5 [105.8155 msec/iter] Res64: 82734EA25CAAE188. AvgMaxErr = 0.062877951. MaxErr
= 0.085937500. Residue shift count = 59420150.
[Jun 08 21:30:15] M914xxxxx Iter# = 80000 [ 0.09% complete] clocks = 00:17:26.34
2 [104.6343 msec/iter] Res64: B3DDB30E8EA490B8. AvgMaxErr = 0.062913278. MaxErr
= 0.093750000. Residue shift count = 36766103.
[Jun 08 21:47:43] M914xxxxx Iter# = 90000 [ 0.10% complete] clocks = 00:17:25.12
5 [104.5126 msec/iter] Res64: 9F0AAFBA656E82BC. AvgMaxErr = 0.062897896. MaxErr
= 0.093750000. Residue shift count = 87580274.
I will be running my own code from time to time -- it is early days. I will install Pari/GP, GMP etc. Why do you ask? I will aim not to interfere with Mlucas on the a73.

Last fiddled with by paulunderwood on 2019-06-08 at 22:10

2019-06-08, 22:15   #54
ewmayer
2ω=0

Sep 2002
República de California

2·13·443 Posts

Quote:
 Originally Posted by paulunderwood I will be running my own code from time to time -- it is early days. I will install Pari/GP, GMP etc. Why do you ask? I will aim not to interfere with Mlucas on the a73.
Mainly just interested to hear from other folks who may be doing ARM-oriented code development. Thanks for the data!

2019-06-08, 22:27   #55
paulunderwood

Sep 2002
Database er0rr

1101010101012 Posts

Quote:
 Originally Posted by ewmayer Mainly just interested to hear from other folks who may be doing ARM-oriented code development. Thanks for the data!
Well, I am running some pure C code on a R-pi 3B+ (under 64 bit Gentoo). All I had to do was compile what I had developed on x86_64 again and it runs perfectly. The pi is throttling at 80C. It runs and runs. But I have yet to delve into vector operations. Nor have I looked at ARM assembly. My efforts on Intel YASM assembly were no better than C for timings, but I did realize that swapping around Jacobi Symbol and Fermat PRP (based on Euler Phi function) tests greatly improved throughput by an amazing 300%. I think this was partially due to the Jacob Symbol test uses the % operator whereas my Fermat PRP test does not.

p.s. How often should the client report into PrimeNet?

Last fiddled with by paulunderwood on 2019-06-08 at 22:59

 Similar Threads Thread Thread Starter Forum Replies Last Post ET_ Software 2 2017-02-24 15:42 garo GPU to 72 25 2013-03-04 10:11 axn Sierpinski/Riesel Base 5 61 2008-12-08 16:28 fetofs GMP-ECM 1 2006-05-30 04:32 ewmayer Operazione Doppi Mersennes 22 2005-07-06 00:33

All times are UTC. The time now is 16:47.

Wed Sep 30 16:47:20 UTC 2020 up 20 days, 13:58, 0 users, load averages: 1.68, 1.79, 1.77