mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2019-07-05, 18:11   #12
hansl
 
hansl's Avatar
 
Apr 2019

5·41 Posts
Default

I don't have one to play with, but I wonder what sort of difference in idle power it makes if HDMI is disabled. From what I just read, on the earlier pi3 it saves about 30mA. Maybe more savings available for rpi4 since it has two HDMI ports, or at least more powerful graphics processing?

Also maybe USB ports could be disabled too(assuming you just access via SSH) for some savings?

Is there a build of powertop or similar program which breaks down what devices power is going towards?
hansl is offline   Reply With Quote
Old 2019-07-05, 19:03   #13
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

1010010002 Posts
Default

Quote:
Originally Posted by hansl View Post
I don't have one to play with, but I wonder what sort of difference in idle power it makes if HDMI is disabled. From what I just read, on the earlier pi3 it saves about 30mA. Maybe more savings available for rpi4 since it has two HDMI ports, or at least more powerful graphics processing?
I can test this, but I left the thing on my desk at work... Monday at the earliest, then.

Also, I don't know what power saving tricks the official Raspbian distribution does by default, maybe it runs cooler? But yeah, Monday.

Quote:
Originally Posted by hansl View Post
Also maybe USB ports could be disabled too(assuming you just access via SSH) for some savings?
Hmm... will have to look into it. Anyway, there is apparently a firmware update for the USB chip now available, that can reduce the power consumption somewhat - by about 300 mW.
https://www.raspberrypi.org/forums/v...vl805#p1490467
But apparently this needs to be done under 32-bit Linux (for example plain old Raspbian), trying to run the upgrade utility just gives an error message for me.

Quote:
Originally Posted by hansl View Post
Is there a build of powertop or similar program which breaks down what devices power is going towards?
Not to my knowledge, no. I was under the impression that it can only tell where the CPU power consumption is going, not the peripherals.
nomead is offline   Reply With Quote
Old 2019-07-05, 19:13   #14
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×13×443 Posts
Default

Nomead, thanks for the data. Re. idle-power, on the Odroid-C2 there is a removable jumper whose pulling-off saves some power, not sure if anything similar on your board.

Quote:
Originally Posted by nomead View Post
The Pi3B (and 3A) likes to use radix-352 at 2816K FFT size, but the Pi4 for some reason is slower with it (not by much, 78.82 ms/iter for radix-352 vs. 76.70 ms for radix-176). By the way, the same thing happens on the Cortex-A57 on the Jetson Nano. Also, only 2 of 5 radix sets for 2304K passed, so it was skipped, no entry in mlucas.cfg .
352 is mainly geared for 5632K where it makes a significant difference on most of my ARM devices, whether it also helps at 2816K is hit or miss.

Do you still have the screen log from your self-tests? I'd like to look at the 2304K self-tests outputs to see why only 2 of the various FFT-radix combos at that length passed. Thanks.
ewmayer is offline   Reply With Quote
Old 2019-07-05, 19:34   #15
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

32810 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Nomead, thanks for the data. Re. idle-power, on the Odroid-C2 there is a removable jumper whose pulling-off saves some power, not sure if anything similar on your board.

352 is mainly geared for 5632K where it makes a significant difference on most of my ARM devices, whether it also helps at 2816K is hit or miss.

Do you still have the screen log from your self-tests? I'd like to look at the 2304K self-tests outputs to see why only 2 of the various FFT-radix combos at that length passed. Thanks.
No jumpers on the Pi4... if something can be disabled, it needs to be done via software or firmware.

352 happens to help on Pi3 / BCM2837 so there it's a definite hit.

I'll attach the screenlog to this message.
Attached Files
File Type: zip screenlog_pi4.zip (6.9 KB, 47 views)
nomead is offline   Reply With Quote
Old 2019-07-05, 20:12   #16
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default

Quote:
Originally Posted by nomead View Post
352 happens to help on Pi3 / BCM2837 so there it's a definite hit.

I'll attach the screenlog to this message.
Thanks - looking at the 2304K self-tests in the log, 3 of the 5 runs suffer ROE >= 0.4375, and since that means fewer than half the tests passed, the code treats that FFT length as having failed the self-tests. Based on the good data points, here is a manually created cfg-file line for that length:
Code:
      2304  msec/iter =   61.16  ROE[avg,max] = [0.249911153, 0.343750000]  radices = 288 16 16 16  0  0  0  0  0  0
I quick-checked that length on my Odroid C2 just now and 3 of 5 tests passed so it wrote a cfgfile entry for me ... that difference had me puzzled - same code, same CPU hardware - until I recalled that the random residue shift can lead to such otherwise-identical-everything differences. If you try rerunning just that one FFT length in self-test mode via

./Mlucas -fftlen 2304 -iters 100 -cpu 0:3

you should see different residue shifts from your Mlucas -s m run, and perhaps will get the one more good data point that is needed for the cfg-file to get written. The self-test exponents are already set at the extreme high end of the range computed for each FFT length, so sometimes a little manual hackery of this kind is needed to get a complete set of cfg-file entries.
ewmayer is offline   Reply With Quote
Old 2019-07-06, 00:20   #17
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

5108 Posts
Default

Quote:
Originally Posted by ewmayer View Post
T

./Mlucas -fftlen 2304 -iters 100 -cpu 0:3

you should see different residue shifts from your Mlucas -s m run, and perhaps will get the one more good data point that is needed for the cfg-file to get written. The self-test exponents are already set at the extreme high end of the range computed for each FFT length, so sometimes a little manual hackery of this kind is needed to get a complete set of cfg-file entries.
Yup - and it gives the same 288 16 16 16 radix set on three consecutive tries, with 3 of 5 sets passed.

Oh, and here are the rest of the self-test runs.
v18.0:
Code:
      4096  msec/iter =  116.38  ROE[avg,max] = [0.000227303, 0.312500000]  radices = 256 16 16 32  0  0  0  0  0  0
      4608  msec/iter =  129.56  ROE[avg,max] = [0.000248429, 0.312500000]  radices = 288 16 16 32  0  0  0  0  0  0
      5120  msec/iter =  181.85  ROE[avg,max] = [0.000234485, 0.281250000]  radices = 160 32 32 16  0  0  0  0  0  0
      5632  msec/iter =  204.47  ROE[avg,max] = [0.000257845, 0.343750000]  radices = 176 32 32 16  0  0  0  0  0  0
      6144  msec/iter =  225.43  ROE[avg,max] = [0.000247003, 0.312500000]  radices = 192 32 32 16  0  0  0  0  0  0
      6656  msec/iter =  242.89  ROE[avg,max] = [0.000266479, 0.375000000]  radices = 208 32 32 16  0  0  0  0  0  0
      7168  msec/iter =  262.44  ROE[avg,max] = [0.000226100, 0.281250000]  radices = 224 32 32 16  0  0  0  0  0  0
      7680  msec/iter =  290.09  ROE[avg,max] = [0.000236377, 0.312500000]  radices = 240 32 32 16  0  0  0  0  0  0
preview version:
Code:
      4096  msec/iter =  124.18  ROE[avg,max] = [0.227270067, 0.281250000]  radices = 256 16 16 32  0  0  0  0  0  0
      4608  msec/iter =  130.03  ROE[avg,max] = [0.249110271, 0.312500000]  radices = 288 16 16 32  0  0  0  0  0  0
      5120  msec/iter =  154.51  ROE[avg,max] = [0.296955541, 0.375000000]  radices = 320 16 16 32  0  0  0  0  0  0
      5632  msec/iter =  166.74  ROE[avg,max] = [0.223459145, 0.281250000]  radices = 352 16 16 32  0  0  0  0  0  0
      6144  msec/iter =  226.16  ROE[avg,max] = [0.246091736, 0.343750000]  radices = 192 32 32 16  0  0  0  0  0  0
      6656  msec/iter =  243.35  ROE[avg,max] = [0.230394501, 0.312500000]  radices = 208 32 32 16  0  0  0  0  0  0
      7168  msec/iter =  265.73  ROE[avg,max] = [0.236601462, 0.312500000]  radices = 224 32 32 16  0  0  0  0  0  0
      7680  msec/iter =  283.72  ROE[avg,max] = [0.235477282, 0.343750000]  radices = 240 32 32 16  0  0  0  0  0  0
Indeed, there's a huge difference in 5120K and 5632K FFT sizes' speeds, because of those radix sets with 320 and 352.

Last fiddled with by nomead on 2019-07-06 at 00:25 Reason: added tables
nomead is offline   Reply With Quote
Old 2019-07-06, 02:42   #18
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default

Quote:
Originally Posted by nomead View Post
Yup - and it gives the same 288 16 16 16 radix set on three consecutive tries, with 3 of 5 sets passed.
If you rerun the same single-FFT-length way, you will get the same initial radix shift, and thus run-to-run data will be identical. You can, however, manually fiddle the initial shift via the -shift flag, if you like.
[timings snipped]
Quote:
Indeed, there's a huge difference in 5120K and 5632K FFT sizes' speeds, because of those radix sets with 320 and 352.
Hmm ... a healthy speedup at 5632K I can believe because radix-352 is new in v19, but radix-320 was already there in v18. Maybe rerun the 5120K self-test once more using each of the v18 and v19 builds?
ewmayer is offline   Reply With Quote
Old 2019-07-06, 06:59   #19
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

23×41 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Hmm ... a healthy speedup at 5632K I can believe because radix-352 is new in v19, but radix-320 was already there in v18. Maybe rerun the 5120K self-test once more using each of the v18 and v19 builds?
Apparently v18 happened to give excessive roundoff on both 320 16 16 32 and 320 32 16 16 so that's why it wasn't using it. So again, yes, hand-massaging the test would help here.
nomead is offline   Reply With Quote
Old 2019-07-06, 18:40   #20
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

1151810 Posts
Default

Quote:
Originally Posted by nomead View Post
Apparently v18 happened to give excessive roundoff on both 320 16 16 32 and 320 32 16 16 so that's why it wasn't using it. So again, yes, hand-massaging the test would help here.
Sounds like I need to back off a bit on the self-test exponents in v19, to make sure faster but slightly more roundoff-prone FFT radix combos don't go by the wayside like that.
ewmayer is offline   Reply With Quote
Old 2019-07-08, 08:10   #21
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

14816 Posts
Default

Okay, power saving measurements:

force_turbo=1 in the configuration file for both Gentoo and Debian to keep it at 1.5 GHz even when idle.

Baseline (Gentoo 64-bit, nothing disabled yet)
0.69A idle -> 1.29A Mlucas running a doublecheck at 2816K FFT

Raspbian (because the firmware updater only runs on 32-bit Linux) :
0.65A idle before USB update
0.59A idle after USB update
So yes, Raspbian does something different and saves a bit more power at idle.

Gentoo after USB firmware update, HDMI still on:
0.61A idle -> 1.21A Mlucas
Turning HDMI off with tvservice -o saves a further 0.02 Amps apparently.
nomead is offline   Reply With Quote
Old 2019-07-08, 08:56   #22
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2·313 Posts
Default

powertop is worth a shot if it works, on my laptop it can disable controllers for USB, ethernet, SATA and other PCI devices. The older pi's USB/ethernet controller was a power hog if I remember rightly.
M344587487 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Raspberry Pi 3A+ M344587487 Hardware 3 2018-11-17 13:20
Which SIMD flag to use for Raspberry Pi BrainStone Mlucas 14 2017-11-19 00:59
Raspberry Pi lavalamp Hobbies 10 2017-08-16 00:37
Raspberry Pi sloppyonefoot Software 1 2017-07-02 08:48
Raspberry Pi xilman Hardware 126 2017-06-01 14:42

All times are UTC. The time now is 01:05.

Tue Sep 22 01:05:42 UTC 2020 up 11 days, 22:16, 0 users, load averages: 1.46, 1.71, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.