![]() |
![]() |
#34 | |
Aug 2010
Republic of Belarus
AD16 Posts |
![]() Quote:
![]() Code:
448 msec/iter = 12.80 ROE[avg,max] = [0.224609375, 0.250000000] radices = 56 16 16 480 msec/iter = 14.04 ROE[avg,max] = [0.210880824, 0.250000000] radices = 60 16 16 512 msec/iter = 14.23 ROE[avg,max] = [0.281250000, 0.281250000] radices = 128 8 16 576 msec/iter = 16.14 ROE[avg,max] = [0.208354841, 0.250000000] radices = 144 8 16 640 msec/iter = 19.46 ROE[avg,max] = [0.257421875, 0.312500000] radices = 160 8 16 704 msec/iter = 21.52 ROE[avg,max] = [0.274654715, 0.343750000] radices = 176 8 16 768 msec/iter = 21.99 ROE[avg,max] = [0.209895543, 0.250000000] radices = 48 16 16 832 msec/iter = 24.75 ROE[avg,max] = [0.239439174, 0.312500000] radices = 208 8 16 896 msec/iter = 25.59 ROE[avg,max] = [0.227832031, 0.312500000] radices = 56 16 16 960 msec/iter = 28.33 ROE[avg,max] = [0.212360491, 0.250000000] radices = 60 16 16 1024 msec/iter = 28.24 ROE[avg,max] = [0.312500000, 0.312500000] radices = 128 16 16 1152 msec/iter = 32.89 ROE[avg,max] = [0.208562687, 0.253906250] radices = 144 16 16 1280 msec/iter = 40.32 ROE[avg,max] = [0.235714286, 0.312500000] radices = 20 8 16 1408 msec/iter = 42.28 ROE[avg,max] = [0.273688616, 0.343750000] radices = 176 16 16 1536 msec/iter = 44.80 ROE[avg,max] = [0.223493304, 0.281250000] radices = 192 16 16 1664 msec/iter = 48.48 ROE[avg,max] = [0.246149554, 0.312500000] radices = 208 16 16 1792 msec/iter = 51.94 ROE[avg,max] = [0.220703125, 0.281250000] radices = 224 16 16 1920 msec/iter = 61.24 ROE[avg,max] = [0.212430246, 0.257812500] radices = 60 16 32 2048 msec/iter = 56.56 ROE[avg,max] = [0.312500000, 0.312500000] radices = 128 16 16 2304 msec/iter = 65.73 ROE[avg,max] = [0.208895438, 0.250000000] radices = 144 16 16 2560 msec/iter = 79.33 ROE[avg,max] = [0.245312500, 0.281250000] radices = 20 16 16 2816 msec/iter = 85.93 ROE[avg,max] = [0.272896903, 0.343750000] radices = 176 16 16 3072 msec/iter = 91.91 ROE[avg,max] = [0.225892857, 0.281250000] radices = 192 16 16 3328 msec/iter = 97.41 ROE[avg,max] = [0.241322545, 0.281250000] radices = 208 16 16 3584 msec/iter = 105.64 ROE[avg,max] = [0.220870536, 0.250000000] radices = 224 16 16 3840 msec/iter = 132.28 ROE[avg,max] = [0.213867188, 0.242187500] radices = 60 32 32 4096 msec/iter = 116.38 ROE[avg,max] = [0.224023438, 0.250000000] radices = 16 16 16 4608 msec/iter = 141.80 ROE[avg,max] = [0.201425498, 0.250000000] radices = 144 16 32 5120 msec/iter = 162.11 ROE[avg,max] = [0.236607143, 0.281250000] radices = 20 16 16 5632 msec/iter = 186.77 ROE[avg,max] = [0.277120536, 0.312500000] radices = 44 16 16 6144 msec/iter = 192.85 ROE[avg,max] = [0.214425223, 0.250000000] radices = 48 16 16 6656 msec/iter = 223.12 ROE[avg,max] = [0.242299107, 0.281250000] radices = 208 16 32 7168 msec/iter = 230.10 ROE[avg,max] = [0.223437500, 0.281250000] radices = 56 16 16 7680 msec/iter = 253.42 ROE[avg,max] = [0.219891357, 0.250000000] radices = 60 16 16 8192 msec/iter = 252.43 ROE[avg,max] = [0.282589286, 0.312500000] radices = 1024 16 16 9216 msec/iter = 306.68 ROE[avg,max] = [0.208818163, 0.265625000] radices = 144 32 32 10240 msec/iter = 371.75 ROE[avg,max] = [0.248660714, 0.312500000] radices = 160 32 32 11264 msec/iter = 409.54 ROE[avg,max] = [0.275306920, 0.328125000] radices = 176 32 32 12288 msec/iter = 423.42 ROE[avg,max] = [0.209234401, 0.234375000] radices = 48 16 16 13312 msec/iter = 493.18 ROE[avg,max] = [0.236830357, 0.281250000] radices = 208 32 32 14336 msec/iter = 476.82 ROE[avg,max] = [0.218526786, 0.250000000] radices = 56 16 16 15360 msec/iter = 535.51 ROE[avg,max] = [0.217006138, 0.250000000] radices = 60 16 16 16384 msec/iter = 530.52 ROE[avg,max] = [0.276339286, 0.281250000] radices = 1024 16 16 18432 msec/iter = 606.73 ROE[avg,max] = [0.212458147, 0.250000000] radices = 144 16 16 20480 msec/iter = 745.91 ROE[avg,max] = [0.251116071, 0.281250000] radices = 160 16 16 22528 msec/iter = 822.14 ROE[avg,max] = [0.283984375, 0.328125000] radices = 176 16 16 24576 msec/iter = 833.16 ROE[avg,max] = [0.225502232, 0.250000000] radices = 192 16 16 26624 msec/iter = 975.42 ROE[avg,max] = [0.251785714, 0.281250000] radices = 208 16 16 28672 msec/iter = 971.73 ROE[avg,max] = [0.219098772, 0.250000000] radices = 224 16 16 30720 msec/iter = 1162.44 ROE[avg,max] = [0.242522321, 0.281250000] radices = 960 16 32 32768 msec/iter = 1075.11 ROE[avg,max] = [0.281250000, 0.281250000] radices = 1024 16 32 |
|
![]() |
![]() |
![]() |
#35 |
Aug 2010
Republic of Belarus
101011012 Posts |
![]()
CPU Load
Code:
top - 07:33:43 up 4 days, 3:39, 2 users, load average: 0,80, 0,58, 1,27 Tasks: 97 total, 1 running, 96 sleeping, 0 stopped, 0 zombie %Cpu(s): 98,8 us, 0,2 sy, 0,0 ni, 0,8 id, 0,0 wa, 0,0 hi, 0,0 si, 0,2 st KiB Mem : 2042848 total, 1076884 free, 111976 used, 853988 buff/cache KiB Swap: 501740 total, 501740 free, 0 used. 1853956 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5580 linux1 20 0 89624 36140 1120 S 197,3 1,8 0:59.77 mlucas Last fiddled with by Batalov on 2016-02-08 at 21:16 Reason: when in top; press 'i', then press 'W' |
![]() |
![]() |
![]() |
#36 |
Aug 2010
Republic of Belarus
173 Posts |
![]()
But it's great that mLucas working on mainframe!!!
![]() ![]() |
![]() |
![]() |
![]() |
#37 |
∂2ω=0
Sep 2002
República de California
2×3×1,931 Posts |
![]()
Thanks, Lorenzo - it seems you truncated the rightmost columns of radices in posting your excerpt for the mlucas.cfg file (e.g. in the first line 448 means 448Kdoubles => complex FFT of legnth 224K = 56*16^3, i.e. there is a trailing 16 missing) - but those are easily inferred.
Just as a point of 'slow' reference, the 32768K timing is roughly what I get on my aged Core2Duo running 2-threaded (1 thread per core) using the SSE2 version of the x86_64 build. My Haswell quad (4-threaded AVX2 build) is 10x faster. Aside from the overall slowness, the various non-powers-of-2 perform decently well with the notable exception of FFT lengths of form 15*2^n, which are uniformly dismal - the compiler really doesn't like my scalar-double radix-15 DFT macros, it seems. I guess the only positive thing I say (as with politics and economics it's all about the optimistic PR spin, you know) is that the scaling to larger runlengths is quite good - compare the 32768K and 1024K timings, for instance, with what one expects based on the asymptotic O(n log n) FFT opcount scaling. ------------------- Also, to repeat my earlier question: Do we have any way of seeing what kind of hardware is running underneath things? IBM's version of PowerPC? It would be silly if it were actually x86_64 and the cloud setup were masking that from users. Last fiddled with by ewmayer on 2016-02-09 at 00:14 |
![]() |
![]() |
![]() |
#38 |
"/X\(‘-‘)/X\"
Jan 2013
292910 Posts |
![]()
cat /proc/cpuinfo should reveal some details.
|
![]() |
![]() |
![]() |
#39 | |
Aug 2010
Republic of Belarus
101011012 Posts |
![]()
Not much info ...
Code:
[linux1@lorenzoibm ~]$ cat /proc/cpuinfo vendor_id : IBM/S390 # processors : 2 bogomips per cpu: 20325.00 features : esan3 zarch stfle msa ldisp eimm dfp etf3eh highgprs cache0 : level=1 type=Data scope=Private size=128K line_size=256 associativity=8 cache1 : level=1 type=Instruction scope=Private size=96K line_size=256 associativity=6 cache2 : level=2 type=Data scope=Private size=2048K line_size=256 associativity=8 cache3 : level=2 type=Instruction scope=Private size=2048K line_size=256 associativity=8 cache4 : level=3 type=Unified scope=Shared size=65536K line_size=256 associativity=16 cache5 : level=4 type=Unified scope=Shared size=491520K line_size=256 associativity=30 processor 0: version = FF, identification = 016A77, machine = 2964 processor 1: version = FF, identification = 016A77, machine = 2964 Quote:
Last fiddled with by Lorenzo on 2016-02-09 at 08:56 |
|
![]() |
![]() |
![]() |
#40 |
Dec 2014
37 Posts |
![]()
Yes, this usually works very well, but proc filesystem is linux-specific, it may not work for other kernels.
There is also lscpu command which I believe simply read /proc/cpuinfo and display it in a nicer way (using your locale setting). For FreeBSD, I find this post. Last fiddled with by alexvong1995 on 2016-02-09 at 10:27 |
![]() |
![]() |
![]() |
#41 | ||
"Victor de Hollander"
Aug 2011
the Netherlands
49816 Posts |
![]() Quote:
L1 (per core) -96 KB instruction -128 KB Data L2 (per core) -2 MB instruction -2 MB Data L3 (shared) 64 MB eDRAM L4 (off die, on storage controller chip) 480 MB Quote:
|
||
![]() |
![]() |
![]() |
#42 | |
∂2ω=0
Sep 2002
República de California
265028 Posts |
![]() Quote:
If we had some relatively efficient way to map x86_64 SIMD code to this arch's SIMD, things could get rather interesting. I shall have a look at the PDF Lorenzo linked later today. |
|
![]() |
![]() |
![]() |
#43 | |
∂2ω=0
Sep 2002
República de California
2×3×1,931 Posts |
![]() Quote:
I did note this, however (Chapter 3. Central processor complex system design, p91), which mentions no floating-point among the SIMD - that would be a curious omission if indeed such are supported: Here are some examples of SIMD instructions: o Integer byte to quadword add, sub, and compare o Integer byte to doubleword min, max, and average o Integer byte to word multiply o String find 8-bits, 16-bits, and 32-bits o String range compare o String find any equal o String load to block boundaries and load/store with length |
|
![]() |
![]() |
![]() |
#44 | |
Dec 2014
37 Posts |
![]() Quote:
Also I have created a s390x testing branch (there are only 2 commits), people interested are encouraged to test if it builds and passes the test, the instruction is as followed: $ git clone https://gitlab.com/mlucas-ll/mlucas.git $ cd mlucas && touch * && git checkout s390x $ mkdir build && cd build && ../configure && make -j && make -j check (of course you must have git, gcc and make installed!) Last fiddled with by alexvong1995 on 2016-02-10 at 05:36 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mlucas and mprime on the same box | daxmick | Software | 5 | 2018-01-05 09:48 |
Mlucas on ubuntu | Damian | Mlucas | 17 | 2017-11-13 18:12 |
Mlucas version 17 | ewmayer | Mlucas | 3 | 2017-06-17 11:18 |
Mlucas on Sparc - | Unregistered | Mlucas | 0 | 2009-10-27 20:35 |
mlucas on sun | delta_t | Mlucas | 14 | 2007-10-04 05:45 |