mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2016-02-08, 12:30   #34
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

22×43 Posts
Default

Quote:
Originally Posted by ET_ View Post
Can you tell us anything about its performances? I guess it's running with 2 threads...

Luigi
Not fast

Code:
       448  msec/iter =   12.80  ROE[avg,max] = [0.224609375, 0.250000000]  radices =  56 16 16
       480  msec/iter =   14.04  ROE[avg,max] = [0.210880824, 0.250000000]  radices =  60 16 16
       512  msec/iter =   14.23  ROE[avg,max] = [0.281250000, 0.281250000]  radices = 128  8 16
       576  msec/iter =   16.14  ROE[avg,max] = [0.208354841, 0.250000000]  radices = 144  8 16
       640  msec/iter =   19.46  ROE[avg,max] = [0.257421875, 0.312500000]  radices = 160  8 16
       704  msec/iter =   21.52  ROE[avg,max] = [0.274654715, 0.343750000]  radices = 176  8 16
       768  msec/iter =   21.99  ROE[avg,max] = [0.209895543, 0.250000000]  radices =  48 16 16
       832  msec/iter =   24.75  ROE[avg,max] = [0.239439174, 0.312500000]  radices = 208  8 16
       896  msec/iter =   25.59  ROE[avg,max] = [0.227832031, 0.312500000]  radices =  56 16 16
       960  msec/iter =   28.33  ROE[avg,max] = [0.212360491, 0.250000000]  radices =  60 16 16
      1024  msec/iter =   28.24  ROE[avg,max] = [0.312500000, 0.312500000]  radices = 128 16 16
      1152  msec/iter =   32.89  ROE[avg,max] = [0.208562687, 0.253906250]  radices = 144 16 16
      1280  msec/iter =   40.32  ROE[avg,max] = [0.235714286, 0.312500000]  radices =  20  8 16
      1408  msec/iter =   42.28  ROE[avg,max] = [0.273688616, 0.343750000]  radices = 176 16 16
      1536  msec/iter =   44.80  ROE[avg,max] = [0.223493304, 0.281250000]  radices = 192 16 16
      1664  msec/iter =   48.48  ROE[avg,max] = [0.246149554, 0.312500000]  radices = 208 16 16
      1792  msec/iter =   51.94  ROE[avg,max] = [0.220703125, 0.281250000]  radices = 224 16 16
      1920  msec/iter =   61.24  ROE[avg,max] = [0.212430246, 0.257812500]  radices =  60 16 32
      2048  msec/iter =   56.56  ROE[avg,max] = [0.312500000, 0.312500000]  radices = 128 16 16
      2304  msec/iter =   65.73  ROE[avg,max] = [0.208895438, 0.250000000]  radices = 144 16 16
      2560  msec/iter =   79.33  ROE[avg,max] = [0.245312500, 0.281250000]  radices =  20 16 16
      2816  msec/iter =   85.93  ROE[avg,max] = [0.272896903, 0.343750000]  radices = 176 16 16
      3072  msec/iter =   91.91  ROE[avg,max] = [0.225892857, 0.281250000]  radices = 192 16 16
      3328  msec/iter =   97.41  ROE[avg,max] = [0.241322545, 0.281250000]  radices = 208 16 16
      3584  msec/iter =  105.64  ROE[avg,max] = [0.220870536, 0.250000000]  radices = 224 16 16
      3840  msec/iter =  132.28  ROE[avg,max] = [0.213867188, 0.242187500]  radices =  60 32 32
      4096  msec/iter =  116.38  ROE[avg,max] = [0.224023438, 0.250000000]  radices =  16 16 16
      4608  msec/iter =  141.80  ROE[avg,max] = [0.201425498, 0.250000000]  radices = 144 16 32
      5120  msec/iter =  162.11  ROE[avg,max] = [0.236607143, 0.281250000]  radices =  20 16 16
      5632  msec/iter =  186.77  ROE[avg,max] = [0.277120536, 0.312500000]  radices =  44 16 16
      6144  msec/iter =  192.85  ROE[avg,max] = [0.214425223, 0.250000000]  radices =  48 16 16
      6656  msec/iter =  223.12  ROE[avg,max] = [0.242299107, 0.281250000]  radices = 208 16 32
      7168  msec/iter =  230.10  ROE[avg,max] = [0.223437500, 0.281250000]  radices =  56 16 16
      7680  msec/iter =  253.42  ROE[avg,max] = [0.219891357, 0.250000000]  radices =  60 16 16
      8192  msec/iter =  252.43  ROE[avg,max] = [0.282589286, 0.312500000]  radices = 1024 16 16
      9216  msec/iter =  306.68  ROE[avg,max] = [0.208818163, 0.265625000]  radices = 144 32 32
     10240  msec/iter =  371.75  ROE[avg,max] = [0.248660714, 0.312500000]  radices = 160 32 32
     11264  msec/iter =  409.54  ROE[avg,max] = [0.275306920, 0.328125000]  radices = 176 32 32
     12288  msec/iter =  423.42  ROE[avg,max] = [0.209234401, 0.234375000]  radices =  48 16 16
     13312  msec/iter =  493.18  ROE[avg,max] = [0.236830357, 0.281250000]  radices = 208 32 32
     14336  msec/iter =  476.82  ROE[avg,max] = [0.218526786, 0.250000000]  radices =  56 16 16
     15360  msec/iter =  535.51  ROE[avg,max] = [0.217006138, 0.250000000]  radices =  60 16 16
     16384  msec/iter =  530.52  ROE[avg,max] = [0.276339286, 0.281250000]  radices = 1024 16 16
     18432  msec/iter =  606.73  ROE[avg,max] = [0.212458147, 0.250000000]  radices = 144 16 16
     20480  msec/iter =  745.91  ROE[avg,max] = [0.251116071, 0.281250000]  radices = 160 16 16
     22528  msec/iter =  822.14  ROE[avg,max] = [0.283984375, 0.328125000]  radices = 176 16 16
     24576  msec/iter =  833.16  ROE[avg,max] = [0.225502232, 0.250000000]  radices = 192 16 16
     26624  msec/iter =  975.42  ROE[avg,max] = [0.251785714, 0.281250000]  radices = 208 16 16
     28672  msec/iter =  971.73  ROE[avg,max] = [0.219098772, 0.250000000]  radices = 224 16 16
     30720  msec/iter = 1162.44  ROE[avg,max] = [0.242522321, 0.281250000]  radices = 960 16 32
     32768  msec/iter = 1075.11  ROE[avg,max] = [0.281250000, 0.281250000]  radices = 1024 16 32
Lorenzo is offline   Reply With Quote
Old 2016-02-08, 12:35   #35
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

22×43 Posts
Default

CPU Load
Code:
top - 07:33:43 up 4 days,  3:39,  2 users,  load average: 0,80, 0,58, 1,27
Tasks:  97 total,   1 running,  96 sleeping,   0 stopped,   0 zombie
%Cpu(s): 98,8 us,  0,2 sy,  0,0 ni,  0,8 id,  0,0 wa,  0,0 hi,  0,0 si,  0,2 st
KiB Mem :  2042848 total,  1076884 free,   111976 used,   853988 buff/cache
KiB Swap:   501740 total,   501740 free,        0 used.  1853956 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                    
 5580 linux1    20   0   89624  36140   1120 S 197,3  1,8   0:59.77 mlucas

Last fiddled with by Batalov on 2016-02-08 at 21:16 Reason: when in top; press 'i', then press 'W'
Lorenzo is offline   Reply With Quote
Old 2016-02-08, 12:39   #36
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

22·43 Posts
Default

But it's great that mLucas working on mainframe!!!

Lorenzo is offline   Reply With Quote
Old 2016-02-08, 22:22   #37
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

32×1,093 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
Not fast
Thanks, Lorenzo - it seems you truncated the rightmost columns of radices in posting your excerpt for the mlucas.cfg file (e.g. in the first line 448 means 448Kdoubles => complex FFT of legnth 224K = 56*16^3, i.e. there is a trailing 16 missing) - but those are easily inferred.

Just as a point of 'slow' reference, the 32768K timing is roughly what I get on my aged Core2Duo running 2-threaded (1 thread per core) using the SSE2 version of the x86_64 build. My Haswell quad (4-threaded AVX2 build) is 10x faster.

Aside from the overall slowness, the various non-powers-of-2 perform decently well with the notable exception of FFT lengths of form 15*2^n, which are uniformly dismal - the compiler really doesn't like my scalar-double radix-15 DFT macros, it seems. I guess the only positive thing I say (as with politics and economics it's all about the optimistic PR spin, you know) is that the scaling to larger runlengths is quite good - compare the 32768K and 1024K timings, for instance, with what one expects based on the asymptotic O(n log n) FFT opcount scaling.

-------------------

Also, to repeat my earlier question: Do we have any way of seeing what kind of hardware is running underneath things? IBM's version of PowerPC? It would be silly if it were actually x86_64 and the cloud setup were masking that from users.

Last fiddled with by ewmayer on 2016-02-09 at 00:14
ewmayer is offline   Reply With Quote
Old 2016-02-08, 22:37   #38
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2·31·47 Posts
Default

cat /proc/cpuinfo should reveal some details.
Mark Rose is offline   Reply With Quote
Old 2016-02-09, 07:57   #39
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

22×43 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
cat /proc/cpuinfo should reveal some details.
Not much info ...
Code:
[linux1@lorenzoibm ~]$ cat /proc/cpuinfo
vendor_id       : IBM/S390
# processors    : 2
bogomips per cpu: 20325.00
features        : esan3 zarch stfle msa ldisp eimm dfp etf3eh highgprs 
cache0          : level=1 type=Data scope=Private size=128K line_size=256 associativity=8
cache1          : level=1 type=Instruction scope=Private size=96K line_size=256 associativity=6
cache2          : level=2 type=Data scope=Private size=2048K line_size=256 associativity=8
cache3          : level=2 type=Instruction scope=Private size=2048K line_size=256 associativity=8
cache4          : level=3 type=Unified scope=Shared size=65536K line_size=256 associativity=16
cache5          : level=4 type=Unified scope=Shared size=491520K line_size=256 associativity=30
processor 0: version = FF,  identification = 016A77,  machine = 2964
processor 1: version = FF,  identification = 016A77,  machine = 2964
Quote:
LinuxOne is a specialised Z13 IBM mainframe for Linux. You can run up to 8000 VM simultaneously on it. It is a powerfull beast like IBM does, the top stuff.
So it's IBM Z13 CPU. Much more details you can find in Technical Guide. And i'm not expert but i think it's not Power architecture. It's something special ...

Last fiddled with by Lorenzo on 2016-02-09 at 08:56
Lorenzo is offline   Reply With Quote
Old 2016-02-09, 10:02   #40
alexvong1995
 
Dec 2014

37 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
cat /proc/cpuinfo should reveal some details.
Yes, this usually works very well, but proc filesystem is linux-specific, it may not work for other kernels.
There is also lscpu command which I believe simply read /proc/cpuinfo and display it in a nicer way (using your locale setting).
For FreeBSD, I find this post.

Last fiddled with by alexvong1995 on 2016-02-09 at 10:27
alexvong1995 is offline   Reply With Quote
Old 2016-02-09, 13:00   #41
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

2×587 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
Not much info ...
Code:
[linux1@lorenzoibm ~]$ cat /proc/cpuinfo
vendor_id       : IBM/S390
# processors    : 2
bogomips per cpu: 20325.00
features        : esan3 zarch stfle msa ldisp eimm dfp etf3eh highgprs 
cache0          : level=1 type=Data scope=Private size=128K line_size=256 associativity=8
cache1          : level=1 type=Instruction scope=Private size=96K line_size=256 associativity=6
cache2          : level=2 type=Data scope=Private size=2048K line_size=256 associativity=8
cache3          : level=2 type=Instruction scope=Private size=2048K line_size=256 associativity=8
cache4          : level=3 type=Unified scope=Shared size=65536K line_size=256 associativity=16
cache5          : level=4 type=Unified scope=Shared size=491520K line_size=256 associativity=30
processor 0: version = FF,  identification = 016A77,  machine = 2964
processor 1: version = FF,  identification = 016A77,  machine = 2964
So it's IBM Z13 CPU. Much more details you can find in Technical Guide. And i'm not expert but i think it's not Power architecture. It's something special ...
Wauw! That is a lot of cache!

L1 (per core)
-96 KB instruction
-128 KB Data
L2 (per core)
-2 MB instruction
-2 MB Data
L3 (shared)
64 MB eDRAM
L4 (off die, on storage controller chip)
480 MB
Quote:
The processor chip has an eight-core design, with either six, seven, or eight active cores, and operates at 5.0 GHz. Depending on the CPC drawer version (39 PU or 42 PU), 39 - 168 PUs are available on 1 - 4 CPC drawers.
IBM names it a PU, we would call it a CPUcore.
VictordeHolland is offline   Reply With Quote
Old 2016-02-09, 20:17   #42
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

32·1,093 Posts
Default

Quote:
Originally Posted by VictordeHolland View Post
Wauw! That is a lot of cache!

L1 (per core)
-96 KB instruction
-128 KB Data
L2 (per core)
-2 MB instruction
-2 MB Data
L3 (shared)
64 MB eDRAM
L4 (off die, on storage controller chip)
480 MB
IBM names it a PU, we would call it a CPUcore.
That explains the excellent timing-scaling in going to larger FFT lengths which we see in Lorenzo's cfg-file results.

If we had some relatively efficient way to map x86_64 SIMD code to this arch's SIMD, things could get rather interesting. I shall have a look at the PDF Lorenzo linked later today.
ewmayer is offline   Reply With Quote
Old 2016-02-10, 03:51   #43
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

32×1,093 Posts
Default

Quote:
Originally Posted by ewmayer View Post
If we had some relatively efficient way to map x86_64 SIMD code to this arch's SIMD, things could get rather interesting. I shall have a look at the PDF Lorenzo linked later today.
Had a look - see nothing actually resembling an instruction set reference in there. Could someone point me to one? With just 139 SIMD instructions it wouldn't have taken up more than a decent-sized chapter or appendix in such a document.

I did note this, however (Chapter 3. Central processor complex system design, p91), which mentions no floating-point among the SIMD - that would be a curious omission if indeed such are supported:

Here are some examples of SIMD instructions:
o Integer byte to quadword add, sub, and compare
o Integer byte to doubleword min, max, and average
o Integer byte to word multiply
o String find 8-bits, 16-bits, and 32-bits
o String range compare
o String find any equal
o String load to block boundaries and load/store with length
ewmayer is offline   Reply With Quote
Old 2016-02-10, 05:24   #44
alexvong1995
 
Dec 2014

37 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Had a look - see nothing actually resembling an instruction set reference in there. Could someone point me to one? With just 139 SIMD instructions it wouldn't have taken up more than a decent-sized chapter or appendix in such a document.

I did note this, however (Chapter 3. Central processor complex system design, p91), which mentions no floating-point among the SIMD - that would be a curious omission if indeed such are supported:

Here are some examples of SIMD instructions:
o Integer byte to quadword add, sub, and compare
o Integer byte to doubleword min, max, and average
o Integer byte to word multiply
o String find 8-bits, 16-bits, and 32-bits
o String range compare
o String find any equal
o String load to block boundaries and load/store with length
Just find this documentation z/architecture reference summary on the internet. Page 22 to page 25 shows the 139 vector instructions (of course I do not really try to count!), something like VMAH (vector multiple and add high)...

Also I have created a s390x testing branch (there are only 2 commits), people interested are encouraged to test if it builds and passes the test, the instruction is as followed:
$ git clone https://gitlab.com/mlucas-ll/mlucas.git
$ cd mlucas && touch * && git checkout s390x
$ mkdir build && cd build && ../configure && make -j && make -j check
(of course you must have git, gcc and make installed!)

Last fiddled with by alexvong1995 on 2016-02-10 at 05:36
alexvong1995 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mlucas and mprime on the same box daxmick Software 5 2018-01-05 09:48
Mlucas on ubuntu Damian Mlucas 17 2017-11-13 18:12
Mlucas version 17 ewmayer Mlucas 3 2017-06-17 11:18
Mlucas on Sparc - Unregistered Mlucas 0 2009-10-27 20:35
mlucas on sun delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 07:46.

Mon Nov 30 07:46:26 UTC 2020 up 81 days, 4:57, 3 users, load averages: 1.43, 1.33, 1.31

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.