mersenneforum.org Mlucas v19.1 available
 Register FAQ Search Today's Posts Mark Forums Read

 2021-02-12, 20:29 #2 ewmayer ∂2ω=0     Sep 2002 República de California 5·2,351 Posts README updated with corrected wget for tdulcet's mlucas.sh install script - original version would download and run it immediately, now users can first parse it, comment out the primenet.py - invocation block if they want to "try before you buy", etc. I'm also looking for an owner of a hybrid BIG/little-CPU system like Odroid N2 to test tdulcet's Mlucas v19.1 install/autotune script on such. With my guidance he's made several changes in an effort to support such systems, which typically need separate run directories for each CPU, each with an mlucas.cfg file containing FFT params properly tuned for the CPU in question.
 2021-02-15, 20:35 #3 Dylan14     "Dylan" Mar 2017 11248 Posts I've updated the PKGBUILD for Arch Linux to v19.1, which follows the procedure as described in the readme document. The fp-link patch is no longer needed, however, the sysctl-missing patch is still needed.
2021-02-15, 21:58   #4
ewmayer
2ω=0

Sep 2002
República de California

5·2,351 Posts

Quote:
 Originally Posted by Dylan14 I've updated the PKGBUILD for Arch Linux to v19.1, which follows the procedure as described in the readme document. The fp-link patch is no longer needed, however, the sysctl-missing patch is still needed.
Thanks - the latter is the sysctl-deprecated warnings? Handling those is on my v20 to-do list - it's not as simple as blanket-removing the includes from platform.h, because I always try to support older platforms within reason, so that needs proper preprocessor #ifdef wrapping to retain the include on older distros of Linux and MacOS where that header is needed.

You're getting warnings, or your version of GCC is treating those "deprecated"s as errors?

 2021-02-15, 23:02 #5 Dylan14     "Dylan" Mar 2017 22·149 Posts When I try to build on Arch Linux (which is presently on kernel version 5.10.16) the error I would get if I kept the include is: Code: platform.h:1307:12: fatal error: sys/sysctl.h: No such file or directory compilation terminated. This is using gcc 10.2.0. This would not be needed, if I was using the linux-lts kernel which is on version 5.4 and has the sysctl.h file - so doing my blanket patch is a bit risky - I should only run the patch if the kernel version is at least 5.5.
2021-02-15, 23:40   #6
ewmayer
2ω=0

Sep 2002
República de California

5×2,351 Posts

Quote:
 Originally Posted by Dylan14 When I try to build on Arch Linux (which is presently on kernel version 5.10.16) the error I would get if I kept the include is: Code: platform.h:1307:12: fatal error: sys/sysctl.h: No such file or directory compilation terminated. This is using gcc 10.2.0. This would not be needed, if I was using the linux-lts kernel which is on version 5.4 and has the sysctl.h file - so doing my blanket patch is a bit risky - I should only run the patch if the kernel version is at least 5.5.
What is needed is some way of conditionally including the file only on OS/kernel combinations which support it. I dumped all the compiler predefines for one of my Ubuntu v19 systems, 'uname -a ' indicates it's kernel 5.3:

Linux ewmayer-NUC8i3CYS 5.3.0-59-generic #53-Ubuntu SMP Wed Jun 3 15:52:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)

...but I don't see any specific-Linux-version info in the GCC predefines.

Do me a favor - for your Arch Linux distro, cd to mlucas_v19.1/src and run the following command there:

gcc -dM -E align.h < /dev/null > predefs.txt

(The align.h header is just so both the system and Mlucas predefs get dumped). Attach the resulting predefs.txt file to a post.

2021-02-15, 23:50   #7
Dylan14

"Dylan"
Mar 2017

22·149 Posts

Quote:
 Originally Posted by ewmayer Do me a favor - for your Arch Linux distro, cd to mlucas_v19.1/src and run the following command there: gcc -dM -E align.h < /dev/null > predefs.txt (The align.h header is just so both the system and Mlucas predefs get dumped). Attach the resulting predefs.txt file to a post.
See attached file.
Attached Files
 predefs.txt (29.9 KB, 249 views)

 2021-02-16, 00:24 #8 Dylan14     "Dylan" Mar 2017 22×149 Posts Okay, I have figured out a way to determine when to patch within the PKGBUILD, as seen here in the prepare function: Code: prepare() { cd "${srcdir}"/"${pkgname}"_v"${pkgver}" #Only patch if the kernel version is at least 5.5.0 kermajver=uname -r | cut -d. -f1 kerminver=uname -r | cut -d. -f2 if [$kermajver -gt 5 ]; then patch -p1 < "../../sysctl-missing.patch" elif [ $kermajver -eq 5 ] && [$kerminver -ge 5 ]; then patch -p1 < "../../sysctl-missing.patch" fi } Basically, if the kernel major version is greater then 5 then run the patch, or if the kernel major version is 5 and the minor version is at least 5 then also run the patch. Otherwise do nothing. Last fiddled with by Dylan14 on 2021-02-16 at 00:25
 2021-02-16, 20:02 #9 ewmayer ∂2ω=0     Sep 2002 República de California 5×2,351 Posts Thanks for the predefs - looking through those triggered a recollection, that I'd previously made a note-to-self for the v20 release re. this, namely that the sysctl.h deprecation was tied not to a specific Linux kernel version, but rather to the GLIBC version. That's useful because GCC predefines don't have info re. the former, but do have the GLIBC version info. Per this Github discussion, "sysctl() is deprecated and may break build with glibc >= 2.30", so we wrap the #include like so: Code: #if (__GLIBC__ < 2) || (__GLIBC_MINOR__ < 30) #warning GLIBC either not defined or version < 2.30 ... including header. #include #endif If you make that mod in your platform.h file, does that fix the problem without having to apply your patch?
2021-02-16, 20:13   #10
Dylan14

"Dylan"
Mar 2017

10010101002 Posts

Quote:
 Originally Posted by ewmayer ... If you make that mod in your platform.h file, does that fix the problem without having to apply your patch?
That fixed the issue.

 2021-02-17, 20:31 #11 Lorenzo     Aug 2010 Republic of Belarus 2·89 Posts Hello!) I just want to share my experience with Apple M1 CPU. Compiled smoothly without issues (i have included -DUSE_ARM_V8_SIMD flag according with the README page). Code: CPU Family = ARM Embedded ABI, OS = OS X, 64-bit Version, compiled with Gnu-C-compatible [llvm/clang], Version 12.0.0 (clang-1200.0.32.29). INFO: Build uses ARMv8 advanced-SIMD instruction set. CPU extensions: Code: m1@599160f8-fb7f-41df-adc2-2b7f4da1aac7 src % sysctl hw.optional hw.optional.floatingpoint: 1 hw.optional.watchpoint: 4 hw.optional.breakpoint: 6 hw.optional.neon: 1 hw.optional.neon_hpfp: 1 hw.optional.neon_fp16: 1 hw.optional.armv8_1_atomics: 1 hw.optional.armv8_crc32: 1 hw.optional.armv8_2_fhm: 1 hw.optional.armv8_2_sha512: 1 hw.optional.armv8_2_sha3: 1 hw.optional.amx_version: 2 hw.optional.ucnormal_mem: 1 hw.optional.arm64: 1 ./Mlucas -s m -cpu 0:7 Code: 19.1 2048 msec/iter = 3.32 ROE[avg,max] = [0.215347133, 0.312500000] radices = 32 32 32 32 0 0 0 0 0 0 2304 msec/iter = 4.00 ROE[avg,max] = [0.193772149, 0.250000000] radices = 144 32 16 16 0 0 0 0 0 0 2560 msec/iter = 4.28 ROE[avg,max] = [0.178074945, 0.234375000] radices = 160 32 16 16 0 0 0 0 0 0 2816 msec/iter = 4.98 ROE[avg,max] = [0.194841334, 0.281250000] radices = 176 32 16 16 0 0 0 0 0 0 3072 msec/iter = 5.27 ROE[avg,max] = [0.208759866, 0.312500000] radices = 48 32 32 32 0 0 0 0 0 0 3328 msec/iter = 5.94 ROE[avg,max] = [0.324307345, 0.406250000] radices = 208 32 16 16 0 0 0 0 0 0 3584 msec/iter = 6.01 ROE[avg,max] = [0.198822084, 0.250000000] radices = 56 32 32 32 0 0 0 0 0 0 3840 msec/iter = 6.54 ROE[avg,max] = [0.187369624, 0.250000000] radices = 60 32 32 32 0 0 0 0 0 0 4096 msec/iter = 6.88 ROE[avg,max] = [0.176231022, 0.218750000] radices = 64 32 32 32 0 0 0 0 0 0 4608 msec/iter = 7.91 ROE[avg,max] = [0.206297821, 0.281250000] radices = 288 32 16 16 0 0 0 0 0 0 5120 msec/iter = 8.42 ROE[avg,max] = [0.193601628, 0.250000000] radices = 160 16 32 32 0 0 0 0 0 0 5632 msec/iter = 9.70 ROE[avg,max] = [0.221504510, 0.281250000] radices = 352 32 16 16 0 0 0 0 0 0 6144 msec/iter = 10.67 ROE[avg,max] = [0.183728153, 0.250000000] radices = 192 16 32 32 0 0 0 0 0 0 6656 msec/iter = 11.75 ROE[avg,max] = [0.176554163, 0.218750000] radices = 208 16 32 32 0 0 0 0 0 0 7168 msec/iter = 11.84 ROE[avg,max] = [0.213558111, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0 7680 msec/iter = 13.14 ROE[avg,max] = [0.211455481, 0.281250000] radices = 240 16 32 32 0 0 0 0 0 0 8192 msec/iter = 13.50 ROE[avg,max] = [0.243920143, 0.312500000] radices = 256 16 32 32 0 0 0 0 0 0 9216 msec/iter = 15.44 ROE[avg,max] = [0.256431218, 0.343750000] radices = 288 16 32 32 0 0 0 0 0 0 10240 msec/iter = 16.75 ROE[avg,max] = [0.293991624, 0.375000000] radices = 160 32 32 32 0 0 0 0 0 0 11264 msec/iter = 18.68 ROE[avg,max] = [0.222417407, 0.281250000] radices = 352 16 32 32 0 0 0 0 0 0 12288 msec/iter = 21.39 ROE[avg,max] = [0.219849010, 0.281250000] radices = 192 32 32 32 0 0 0 0 0 0 13312 msec/iter = 23.73 ROE[avg,max] = [0.258116543, 0.312500000] radices = 208 32 32 32 0 0 0 0 0 0 14336 msec/iter = 23.98 ROE[avg,max] = [0.231325382, 0.281250000] radices = 224 32 32 32 0 0 0 0 0 0 15360 msec/iter = 26.76 ROE[avg,max] = [0.235138002, 0.281250000] radices = 240 32 32 32 0 0 0 0 0 0 16384 msec/iter = 26.98 ROE[avg,max] = [0.230396011, 0.312500000] radices = 256 32 32 32 0 0 0 0 0 0 18432 msec/iter = 31.07 ROE[avg,max] = [0.276530284, 0.375000000] radices = 288 32 32 32 0 0 0 0 0 0 20480 msec/iter = 35.91 ROE[avg,max] = [0.229381947, 0.312500000] radices = 320 32 32 32 0 0 0 0 0 0 22528 msec/iter = 37.85 ROE[avg,max] = [0.235262715, 0.296875000] radices = 352 32 32 32 0 0 0 0 0 0 24576 msec/iter = 42.70 ROE[avg,max] = [0.238062530, 0.375000000] radices = 768 16 32 32 0 0 0 0 0 0 26624 msec/iter = 60.50 ROE[avg,max] = [0.254043170, 0.312500000] radices = 208 16 16 16 16 0 0 0 0 0 ./Mlucas -s m -cpu 0:3 Looks like threads with heavy load assigned automatically to faster cores. Code: 19.1 2048 msec/iter = 3.88 ROE[avg,max] = [0.215133698, 0.312500000] radices = 32 32 32 32 0 0 0 0 0 0 2304 msec/iter = 4.84 ROE[avg,max] = [0.194502305, 0.281250000] radices = 144 32 16 16 0 0 0 0 0 0 2560 msec/iter = 5.00 ROE[avg,max] = [0.184244498, 0.250000000] radices = 40 32 32 32 0 0 0 0 0 0 2816 msec/iter = 6.03 ROE[avg,max] = [0.193770639, 0.250000000] radices = 176 32 16 16 0 0 0 0 0 0 3072 msec/iter = 6.17 ROE[avg,max] = [0.209568299, 0.281250000] radices = 48 32 32 32 0 0 0 0 0 0 3328 msec/iter = 7.15 ROE[avg,max] = [0.221850838, 0.281250000] radices = 52 32 32 32 0 0 0 0 0 0 3584 msec/iter = 7.12 ROE[avg,max] = [0.199199621, 0.281250000] radices = 56 32 32 32 0 0 0 0 0 0 3840 msec/iter = 7.90 ROE[avg,max] = [0.187449630, 0.250000000] radices = 60 32 32 32 0 0 0 0 0 0 4096 msec/iter = 8.21 ROE[avg,max] = [0.174905238, 0.218750000] radices = 64 32 32 32 0 0 0 0 0 0 4608 msec/iter = 9.57 ROE[avg,max] = [0.205330823, 0.281250000] radices = 288 32 16 16 0 0 0 0 0 0 5120 msec/iter = 10.01 ROE[avg,max] = [0.193377434, 0.250000000] radices = 160 16 32 32 0 0 0 0 0 0 5632 msec/iter = 11.74 ROE[avg,max] = [0.221915271, 0.281250000] radices = 352 32 16 16 0 0 0 0 0 0 6144 msec/iter = 12.89 ROE[avg,max] = [0.183260259, 0.250000000] radices = 192 16 32 32 0 0 0 0 0 0 6656 msec/iter = 14.32 ROE[avg,max] = [0.176914974, 0.250000000] radices = 208 16 32 32 0 0 0 0 0 0 7168 msec/iter = 14.40 ROE[avg,max] = [0.213720200, 0.281250000] radices = 224 16 32 32 0 0 0 0 0 0 7680 msec/iter = 16.16 ROE[avg,max] = [0.211763551, 0.281250000] radices = 240 16 32 32 0 0 0 0 0 0 Perfomance looks awesome for mobile CPU. Just to compare timings with AXV-2 on i3-8100 (4 cores): M1 much faster. AXV-2 on i3-8100: Code: 19.1 2048 msec/iter = 4.75 ROE[avg,max] = [0.167383863, 0.218750000] radices = 128 16 16 32 0 0 0 0 0 0 2304 msec/iter = 5.44 ROE[avg,max] = [0.182823637, 0.218750000] radices = 144 16 16 32 0 0 0 0 0 0 2560 msec/iter = 6.29 ROE[avg,max] = [0.224905364, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 6.63 ROE[avg,max] = [0.183906382, 0.230468750] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 7.42 ROE[avg,max] = [0.252202803, 0.312500000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 7.52 ROE[avg,max] = [0.225825548, 0.281250000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 8.12 ROE[avg,max] = [0.260567010, 0.375000000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 9.15 ROE[avg,max] = [0.200714048, 0.281250000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 10.92 ROE[avg,max] = [0.165220469, 0.218750000] radices = 64 32 32 32 0 0 0 0 0 0 4608 msec/iter = 11.15 ROE[avg,max] = [0.192892739, 0.250000000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 12.18 ROE[avg,max] = [0.229244523, 0.312500000] radices = 160 32 32 16 0 0 0 0 0 0 5632 msec/iter = 13.47 ROE[avg,max] = [0.187610146, 0.250000000] radices = 352 16 16 32 0 0 0 0 0 0 6144 msec/iter = 16.09 ROE[avg,max] = [0.209471649, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 16.86 ROE[avg,max] = [0.196862667, 0.250000000] radices = 208 16 32 32 0 0 0 0 0 0 7168 msec/iter = 17.38 ROE[avg,max] = [0.196444104, 0.250000000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 23.23 ROE[avg,max] = [0.239954494, 0.343750000] radices = 240 32 32 16 0 0 0 0 0 0 8192 msec/iter = 19.79 ROE[avg,max] = [0.272732764, 0.375000000] radices = 256 32 32 16 0 0 0 0 0 0 9216 msec/iter = 23.01 ROE[avg,max] = [0.242732915, 0.281250000] radices = 288 32 32 16 0 0 0 0 0 0 10240 msec/iter = 27.24 ROE[avg,max] = [0.271287049, 0.375000000] radices = 320 32 32 16 0 0 0 0 0 0 11264 msec/iter = 28.87 ROE[avg,max] = [0.271818621, 0.375000000] radices = 352 32 32 16 0 0 0 0 0 0 12288 msec/iter = 32.04 ROE[avg,max] = [0.259570478, 0.312500000] radices = 768 16 16 32 0 0 0 0 0 0 13312 msec/iter = 37.85 ROE[avg,max] = [0.254703482, 0.312500000] radices = 208 32 32 32 0 0 0 0 0 0 14336 msec/iter = 40.34 ROE[avg,max] = [0.234003331, 0.296875000] radices = 224 32 32 32 0 0 0 0 0 0 15360 msec/iter = 43.84 ROE[avg,max] = [0.245504855, 0.312500000] radices = 960 16 16 32 0 0 0 0 0 0 16384 msec/iter = 45.62 ROE[avg,max] = [0.272600878, 0.375000000] radices = 256 32 32 32 0 0 0 0 0 0 18432 msec/iter = 53.16 ROE[avg,max] = [0.236424995, 0.281250000] radices = 288 32 32 32 0 0 0 0 0 0 20480 msec/iter = 62.92 ROE[avg,max] = [0.237479031, 0.312500000] radices = 320 32 32 32 0 0 0 0 0 0 22528 msec/iter = 66.03 ROE[avg,max] = [0.228240432, 0.312500000] radices = 352 32 32 32 0 0 0 0 0 0 24576 msec/iter = 69.49 ROE[avg,max] = [0.261424145, 0.343750000] radices = 768 16 32 32 0 0 0 0 0 0 Look forward for their desktop's M1X. Good job.

 Similar Threads Thread Thread Starter Forum Replies Last Post ewmayer Mlucas 89 2021-02-01 20:37 ewmayer Mlucas 48 2019-11-28 02:53 Lorenzo Mlucas 52 2016-03-13 08:45 Unregistered Mlucas 0 2009-10-27 20:35 delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 05:24.

Thu Feb 9 05:24:45 UTC 2023 up 175 days, 2:53, 1 user, load averages: 0.87, 1.24, 1.22