mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2004-01-01, 20:01   #1
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3×7×17 Posts
Default mlucas on sun

i'd like to run the latest version of mlucas on a sunfire 4800 and a enterprise e420r. i've downloaded the precompiled binary 2.7b, but as i searched the forum threads i noticed there was (is?) a 2.7c that may be somewhat faster.

presently i'm running 2.7b (prefetch) on the sunfire without a mlucas.cfg because i'm not familiar enough with tweaking the configuration file to make it optimal for that processor. i'm also running 2.7b (prefetch) on the e420r using the bundled config file.

i suppose i'm asking for a little help in the configuration file to get it running on the sunfire 4800 optimally, and if there is a faster (better?) version to run on these machines.
delta_t is offline   Reply With Quote
Old 2004-01-02, 06:09   #2
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

2·5·251 Posts
Default

You should definitely try out the 2.8beta. You can download the source from ftp://hogranch.com/pub/mayer/src/C

Get all the files in the directory. You'll need to compile it yourself and play with the configuration settings to find the optimal settings. Mlucas.c has some hints on how to go about compiling it. When I compiled it for a Sun Ultra10 a while back I used the following command:

cc -o Mlucas -Bstatic -fast -xO5 -xsafe=mem -xprefetch *.c -lm &

I found the Sun cc compiler was much better than gcc. You may need to modify a few options for your two machines.

There are some new features in 2.8 that help with automated benchmarking to come up with the optimum mlucas.cfg configuration.

I'm sure Ernst Mayer will post with more suggestions soon. Play with it and skim through Mlucas.c in the meanwhile.
garo is offline   Reply With Quote
Old 2004-01-02, 17:41   #3
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3×7×17 Posts
Default

thanks for the link. i downloaded the files and am now trying to compile it on the sunfire 4800.
i used your suggestion but added the -xtarget=ultra3

i just checked it and here's what i got:

Mlucas.c:
br.c:
mers_mod_square.c:
qfloat.c:
radix10_ditN_cy_dif1.c:
radix11_ditN_cy_dif1.c:
radix12_ditN_cy_dif1.c:
radix13_ditN_cy_dif1.c:
radix14_ditN_cy_dif1.c:
radix15_ditN_cy_dif1.c:
radix16_dif_dit_pass.c:
radix16_ditN_cy_dif1.c:
radix16_wrapper_square.c:
radix18_ditN_cy_dif1.c:
radix32_dif_dit_pass.c:
radix32_ditN_cy_dif1.c:
radix32_wrapper_square.c:
radix5_ditN_cy_dif1.c:
radix6_ditN_cy_dif1.c:
radix7_ditN_cy_dif1.c:
radix8_dif_dit_pass.c:
cg: assertion failed in file ../src/ms_pipe/sp_opt.cc at line 3373
cg: Internal error: bad memory flow arc
cg: 1 errors
cc: cg failed for radix8_dif_dit_pass.c

[1] Exit 2 cc -o Mlucas -Bstatic -fast -xO5 -xsafe=mem -xprefetch -xtarget=ultra3 *.c -lm


Last fiddled with by delta_t on 2004-01-02 at 17:44
delta_t is offline   Reply With Quote
Old 2004-01-02, 18:03   #4
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

100110011001002 Posts
Default

You've got 2 choices here: first is to build the latest version (anon-ftp to hogranch.com, cd pub/mayer/src/C, mget *) yourself (assuming you have access to the SunPro C compiler), and use the automated self-test feature (type Mlucas -h to see the options here) to help you find the best set of FFT radices for the runlengths of interest, which will go into the mlucas.cfg file, whose format and purpose is described here.


Your second choice is to try a gzipped version of the sparc binary I use (built for me by Bill Rea - our Sparcs at work only have gcc) here:

ftp://hogranch.com/pub/mayer/bin/SPA...as2.8_sparc.gz

That version of the code has pretty much the same performance as the latest code, but lacks the automated self-test feature. To use it to build your mlucas.cfg file, go to the above .../src/C ftp archive and get only the Mlucas.c file. Scroll to the bottom portion of the source file, where you'll see a table of exponents and 64-bit hex residues, with entries that look like

Code:
/* Array of distinct test cases for self-tests. Add one extra slot to vector for user-specified self-test exponents: */
struct testCase testVec[numTest+1] =
{
	/* FFT #radices  p  100-iter Res64             #bits per digit  FFT radices      AvgMaxErr  */
	/* Small:                                                                         x86  alfa */
	{ 128, 3,  2550001,"CB6030D5790E2460"},/* testVec[ 0]   19.455  16,16,16,16     .1034 .1334 */
	{ 144, 2,  2920013,"7CC1B41482BCB7C0"},/* testVec[ 1]   19.803   9,16,16,32     .1508 .2113 */
	{ 160, 6,  3265007,"B912804D7FE4A9E5"},/* testVec[ 2]   19.928  10,16,16,32     .2020 .2656 */
	{ 176, 3,  3550007,"5059094E256FB886"},/* testVec[ 3]   19.698  11,16,16,32     .1686 .2403 */
	{ 192, 6,  3900067,"4744CB8E5287DA60"},/* testVec[ 4]   19.837  12,16,16,32     .1885 .2523 */
	{ 224, 6,  4540007,"1DA37E1FAC27BC68"},/* testVec[ 5]   19.793  14,16,16,32     .2097 .2929 */
	{ 256, 7,  5190001,"15216788A374E144"},/* testVec[ 6]   19.798  16,16,16,32     .2563 .3086 */
	{ 288, 2,  5780087,"ADB1333A531F6EED"},/* testVec[ 7]   19.599   9,16,32,32     .1774 .2384 */
	{ 320, 3,  6400013,"6B2DF2F4FD779CBC"},/* testVec[ 8]   19.531  10,16,32,32     .1846 .2392 */
	{ 352, 2,  7010011,"4FC7B9144100998F"},/* testVec[ 9]   19.448  11,16,32,32     .1756 .2585 */
	{ 384, 3,  7600013,"2AFA7C90899B583E"},/* testVec[10]   19.328  12,16,32,32     .1383 .1872 */
	{ 416, 2,  8330009,"74AB1D925A0E7DB7"},/* testVec[11]   19.555  13,16,32,32     .2488 .3152 */
	{ 448, 3,  8950001,"7D9DD642E10F2525"},/* testVec[12]   19.509  14,16,32,32     .2041 .2906 */
	{ 480, 2,  9490001,"01A4E738255C522B"},/* testVec[13]   19.307  15,16,32,32     .1642 .2186 */
	{ 512, 3, 10110007,"24AAC84A6CD400BE"},/* testVec[14]   19.283  16,16,32,32     .1884 .2260 */
	/* Medium:                                                                                  */
	{ 576, 2, 11350013,"7087EA4B45F416A6"},/* testVec[15]   19.243   9,32,32,32     .1657 .2181 */
	{ 640, 2, 12590009,"93E43FC168EAF6BF"},/* testVec[16]   19.211  10,32,32,32     .1885 .2382 */
	{ 704, 1, 13799939,"7A8B6F72D5F3A862"},/* testVec[17]   19.143  11,32,32,32     .1747 .2542 */
	{ 768, 2, 15099979,"D731A6D76D99F3F5"},/* testVec[18]   19.201  12,32,32,32     .1692 .2304 */
	{ 832, 1, 16299979,"39AB362A15AF832C"},/* testVec[19]   19.132  13,32,32,32     .2154 .2632 */
	{ 896, 2, 17599997,"EDF99B1D21DE8835"},/* testVec[20]   19.182  14,32,32,32     .2041 .2773 */
	{ 960, 1, 18899999,"AF0F81144A3372A4"},/* testVec[21]   19.226  15,32,32,32     .2186 .2915 */
	{1024, 6, 20099983,"119B2956917D0CC1"},/* testVec[22]   19.169  16,32,32,32     .2457 .2934 */
	{1152, 5, 22500011,"3D81D5C9CC3D1C65"},/* testVec[23]   19.073   9,16,16,16,16  .1845 .2582 */
	{1280, 2, 25000009,"B4A3AF6909228279"},/* testVec[24]   19.073  10,16,16,16,16  .2534 .3083 */
...
Find the table rows containing FFT lengths around the current GIMPS wavefront (as of the start of 2004, you'll want 1152K and 1280K). Then do 100-iteration timing tests of the corresponding exponents, using a variety of FFT radix sets. For instance for 1152 K, a single 100-iteration self-test with radix set 0 results from pasting the following (sans my <=== comments) into your command window:

time Mlucas
22500011 <=== exponent for LL test
1152 <=== FFT length (in K) for LL test
1 <=== 0 for a full LL test, 1 for a shorter timing test
100 <=== if previous line was a 1, how many iterations for the timing test
0 <=== This is the radix set index
1 <=== 0 for error checking off, 1 for EC on.

Start with radix set 0 and increase by one each run until you start getting "radix set XYZ not available - using defaults" warnings. All radix sets should give Res64 = 3D81D5C9CC3D1C65, as per the Mlucas.c table entry. Of the radix sets you tried, pick the one that yielded the smallest runtime and add the corresponding entry to your mlucas.cfg file, e.g. if RS 3 gave the best time @1152K, your mlucas.cfg file would look like

#
# mlucas.cfg optimized for UlraSparc blah blah...
#
200000
#
1152 3

The format of the .cfg file is important - you must begin with precisely 3 #-prefixed lines, where you may enter comments to the right of the # as desired. The fourth line tells the program how many initial iterations to do with per-iteration error checking turned on - in the above example if it gets through the first 200000 iterations on a given exponent with no roundoff errors greater than roughly 0.4, it turns of EC for the rest of the run. You can see if EC slows the code down appreciably by rerunning the self-tests, but entering a 0 instead of a 1 on the last line of input. If EC-on is no more than 1 or 2 % slower than EC-off, I recommend putting a large signed 32-bit integer (say 1000000000) on line 4 of the .cfg file, to force EC to be always on.

Once you've set up your mlucas.cfg file, create a worktodo.ini file in the same dir as your executable and your .cfg filer, enter an exponent in it, and invoke the program sans any flags, e.g. with "nice Mlucas &".
ewmayer is offline   Reply With Quote
Old 2004-01-03, 09:16   #5
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3×7×17 Posts
Default

I've downloaded the source and after playing around with the compiler switches, I think I have compiled a binary for the SunFire 4800 running Solaris 9. Here is what I used:

cc -o Mlucas -dalign -fsimple=2 -fns -fsingle -xbuiltin=%all -xlibmil -Bstatic -xO5 -xsafe=mem -xprefetch -xarch=v8plusb *.c -lm

I've also used the same switches (except using -xarch=v8plusa) on an Enterprise E420R running Solaris 5.8.

I've run the self tests to set up the configuration files and I'm now running a double check as the first test on one of the CPUs.

If anyone wants any timings or the binary or anything, leave a post.

Thanks for the help.
delta_t is offline   Reply With Quote
Old 2004-01-05, 16:55   #6
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

100110011001002 Posts
Default

Sounds like you're up and running OK. Note that another nice aspect of the built-in self-test sets in the current (and future) code is that it makes it a lot easier to do run-time profiling of the binary. Just build a version with all your usual compiler flags and also with -xcollect, then run one or more of the self-test sets, then incorporate the RTP data that were collected by doing a final build with -xuse replacing -xcollect. I believe Bill Rea got a nice (10-20%) speedup at most FFT lengths this way. Note that the optimal FFT radix sets may change once profiling has been done.

Happy Hunting,
-Ernst
ewmayer is offline   Reply With Quote
Old 2004-01-05, 19:40   #7
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3·7·17 Posts
Default

Hello Ernst,

I got your PM and responded before reading this post. I'll give the profiling a try and run the self tests with and without profiling. However please see the PM regarding a question I had on the self tests.
delta_t is offline   Reply With Quote
Old 2004-01-07, 09:27   #8
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3×7×17 Posts
Default

Quote:
Originally posted by ewmayer
Just build a version with all your usual compiler flags and also with -xcollect, then run one or more of the self-test sets, then incorporate the RTP data that were collected by doing a final build with -xuse replacing -xcollect. I believe Bill Rea got a nice (10-20%) speedup at most FFT lengths this way. Note that the optimal FFT radix sets may change once profiling has been done.
Hello Ernst,

Okay, I've recompiled the code several times and have finally came up with the two versions I used for testing and timings. One is the regular compile, while the other is the runtime-profiled version. I will post the two mlucas.cfg files which includes the FFT size, the fastest radix set index, and it's associated clocks.

The RTP version is typically faster, except once you get above the 4096K FFT size, then the profiled version is a little slower on most of them.

As you said, the radix index sets are different.

Last fiddled with by delta_t on 2004-01-07 at 09:35
delta_t is offline   Reply With Quote
Old 2004-01-07, 09:29   #9
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3·7·17 Posts
Default

Here are the results before profiling.
---------

# mlucas.cfg - **before profiling**
# compile flags: cc -xarch=native -xcache=64/32/4:8192/512/2 -dalign -fsimple=2 -fns -fsingle -xbuiltin=%all -xlibmil -Bstatic -xO5 -xsafe=mem -xprefetch -xprofile=collect *.c -lm -o Mlucas
# system 8-way Sun Fire 4800 Solaris 9
1000000000
# Following lines: {FFT length(K) | Radix Set Index} # Best time
128 3 # 2.97
144 2 # 3.54
160 5 # 3.859
176 2 # 4.58
192 5 # 4.4
224 5 # 5.629
256 7 # 5.91
288 1 # 7.469
320 2 # 8.25
352 2 # 9.449
384 3 # 9.5
416 2 # 11.15
448 3 # 12.73
480 0 # 14.189
512 0 # 14.259
576 0 # 16.64
640 1 # 17.48
704 1 # 20.399
768 1 # 20.6
832 0 # 25.8
896 1 # 26
960 1 # 26.989
1024 6 # 26.48
1152 3 # 33.009
1280 1 # 39.21
1408 1 # 47.96
1536 1 # 46.909
1664 1 # 57.02
1792 2 # 1:00.369
1920 2 # 1:04.400
2048 2 # 1:04.109
2304 1 # 1:17.400
2560 2 # 1:26.709
2816 1 # 1:48.239
3072 1 # 1:49.519
3328 2 # 2:00.099
3584 1 # 2:11.750
3840 2 # 2:19.389
4096 2 # 2:21.729
4608 2 # 2:45.530
5120 2 # 3:04.009
5632 1 # 4:08.900
6144 2 # 3:52.069
6656 2 # 4:13.830
7168 1 # 4:33.470
7680 1 # 4:51.589
8192 4 # 5:04.740
delta_t is offline   Reply With Quote
Old 2004-01-07, 09:30   #10
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3·7·17 Posts
Default

Here's the runtime-profiled version.
----------

# mlucas.cfg - after profiling
# compile flags: cc -xarch=native -xcache=64/32/4:8192/512/2 -dalign -fsimple=2 -fns -fsingle -xbuiltin=%all -xlibmil -Bstatic -xO5 -xsafe=mem -xprefetch -xprofile=use:Mlucas *.c -lm -o Mlucas
# system 8-way Sun Fire 4800 Solaris 9
1000000000
# Following lines: {FFT length(K) | Radix Set Index} # Best time
128 1 # 2.68
144 2 # 3.24
160 3 # 3.6
176 2 # 4.12
192 3 # 4.36
224 6 # 5.389
256 5 # 5.809
288 1 # 6.969
320 3 # 8.009
352 2 # 8.75
384 3 # 9.22
416 2 # 10.96
448 3 # 11.529
480 1 # 11.99
512 2 # 12.439
576 0 # 15.529
640 2 # 16.57
704 1 # 20.129
768 2 # 20.449
832 0 # 24.41
896 2 # 24.289
960 1 # 26.85
1024 4 # 25.25
1152 1 # 31.929
1280 1 # 35.24
1408 1 # 42
1536 2 # 46.109
1664 1 # 54.409
1792 1 # 56.149
1920 1 # 1:00.479
2048 3 # 1:02.95
2304 3 # 1:12.409
2560 1 # 1:23.84
2816 2 # 1:38.909
3072 2 # 1:44.469
3328 2 # 1:59.510
3584 1 # 2:10.210
3840 2 # 2:11.699
4096 5 # 2:22.169
4608 3 # 2:45.460
5120 2 # 3:19.870
5632 2 # 3:51.300
6144 1 # 4:01.699
6656 1 # 4:28.350
7168 2 # 4:59.839
7680 1 # 5:22.290
8192 4 # 5:09.470
delta_t is offline   Reply With Quote
Old 2004-01-07, 09:49   #11
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

5458 Posts
Default

I am going to repost these numbers again once I do one more compile. I don't think the flags I used for these two compiles give the fastest times. I'll try again and repost.
delta_t is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mlucas and mprime on the same box daxmick Software 5 2018-01-05 09:48
Mlucas on ubuntu Damian Mlucas 17 2017-11-13 18:12
Mlucas version 17 ewmayer Mlucas 3 2017-06-17 11:18
MLucas on IBM Mainframe Lorenzo Mlucas 52 2016-03-13 08:45
Mlucas on Sparc - Unregistered Mlucas 0 2009-10-27 20:35

All times are UTC. The time now is 06:45.

Thu Nov 26 06:45:35 UTC 2020 up 77 days, 3:56, 3 users, load averages: 1.81, 1.89, 1.73

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.