mersenneforum.org Msieve GPU Linear Algebra
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2021-09-17, 19:44 #45 VBCurtis     "Curtis" Feb 2005 Riverside, CA 11×461 Posts I'm gonna hand-wave here, since only a few people have bothered taking data: When a relation set is right at the cusp of building a matrix, a few more hours sieving will save more than a few hours to solve the matrix on that same machine (meaning CPU in both cases). At the relation counts most e-small and 15e jobs are processed at, 20 more core-hours of sieving might save 5 or 10 core-hours of matrix work (again, both measured on a CPU). I've done a few experiments at home, and I have yet to find a job where the sieving required to build a matrix at TD=120 saved more CPU time than it cost. I believe this could/would be the case on really big jobs, say with matrices at 50M+ in size. We have historically sieved more than needed because BOINC computation is cheap, while matrix solving time was in short supply. So, now that GPU matrix solving makes matrices not in short supply, we should sieve less. Something like 5-10% fewer relations, which means 5-10% more jobs done per calendar month.
2021-09-17, 19:47   #46
frmky

Jul 2003
So Cal

24·139 Posts

Quote:
 Originally Posted by Xyzzy Is the bottleneck server storage space?
No. The server is currently using 467G of 3.6T.

 2021-09-18, 03:58 #47 frmky     Jul 2003 So Cal 222410 Posts For 2,2174L, 1355M relations yielded 734M uniques. With nearly 50% duplicates, we have clearly reached the limit for 16e. Anyway, filtering yielded Code: matrix is 102063424 x 102063602 (51045.3 MB) with weight 14484270868 (141.91/col) Normally I'd try to bring this down, but testing on a quad V100 system with NVLink gives Code: linear algebra completed 2200905 of 102060161 dimensions (2.2%, ETA 129h 5m) So more sieving would only save a day or so in LA. I have the cluster time, so I'll let it run.
2021-09-18, 07:02   #48
pinhodecarlos

"Carlos Pinho"
Oct 2011
Milton Keynes, UK

3·1,663 Posts

Quote:
 Originally Posted by VBCurtis We have historically sieved more than needed because BOINC computation is cheap, while matrix solving time was in short supply. So, now that GPU matrix solving makes matrices not in short supply, we should sieve less. Something like 5-10% fewer relations, which means 5-10% more jobs done per calendar month.
Totally agree with you now. And more, when someone says a number is under LA I would recommend (I know Greg!â€¦lol) to cancel all queued wus, this will also speed up next number to sieve. Sievers are wasting a few days (my experience) processing unnecessary work ( I just manually abort them to go to someone else), just be careful to not do this under any challenges since it will interfere with strategic bunkering.

2021-09-18, 13:47   #49
Xyzzy

Aug 2002

20A916 Posts

Quote:
 Originally Posted by Xyzzy If you are using RHEL 8 (8.4) you can install the proprietary Nvidia driver easily via these directions: https://developer.nvidia.com/blog/st...arity-streams/ Then you will need these packages installed: gcc make cuda-nvcc-10-2 cuda-cudart-dev-10-2-10.2.89-1 And possibly: gmp-devel zlib-devel You also have to manually adjust your path variable in ~/.bashrc: export PATH="/usr/local/cuda-10.2/bin:$PATH" Here are simpler instructions. Code: sudo subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms sudo subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms sudo subscription-manager repos --enable=codeready-builder-for-rhel-8-x86_64-rpms sudo dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo sudo dnf module install nvidia-driver:latest sudo reboot sudo dnf install cuda-11-4 echo 'export PATH=/usr/local/cuda-11.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64/:\$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
Then just use the attached archive to set up your work.

Attached Files
 msieve.tar.gz (1.90 MB, 45 views)

2021-09-18, 14:37   #50
charybdis

Apr 2020

547 Posts

Quote:
 Originally Posted by frmky For 2,2174L, 1355M relations yielded 734M uniques. With nearly 50% duplicates, we have clearly reached the limit for 16e.
Or is this just the limit for 16e with 33-bit large primes? I know you've avoided going higher because of the difficulty of the LA and the msieve filtering bug, but now that the bug is fixed and GPUs make the LA much easier, might it be worth going up to 34-bit?

2021-09-18, 15:59   #51
frmky

Jul 2003
So Cal

24×139 Posts

Quote:
 Originally Posted by charybdis Or is this just the limit for 16e with 33-bit large primes?
Does the lasieve5 code work correctly with 34-bit large primes? I know the check is commented out, but I haven't tested it.

 2021-09-19, 00:30 #52 charybdis     Apr 2020 547 Posts I tested the binary from here on 2,2174L with 34-bit large primes and it seemed to work fine. Yield was more than double that at 33-bit so definitely looks worth it, as one would expect. There were no issues with setting mfba=99 either.
 2021-09-19, 08:02 #53 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 2·2,969 Posts I looked through the code a few years ago and found no issues. Lasieve4 is also fine although it is limited to 96 bit mfba/r.
 2021-09-23, 11:46 #54 wreck     "Bo Chen" Oct 2005 Wuhan,China 2·5·17 Posts I give a try to receive NFS@Home WU and found lpbr and lpba 34 assignment of 2,2174M. Here is the polynomial file S2M2174b.poly's contents. Code: n: 470349924831928271476705309712184283829671891500377511256458133476241008159328553358384317181001385841345904968378352588310952651779460262173005355061503024245423661736289481941107679294474063050602745740433565487767078338816787736757703231764661986524341166060777900926495463269979500293362217153953866146837 skew: 1.22341 c6: 2 c5: 0 c4: 0 c3: 2 c2: 0 c1: 0 c0: 1 Y1: 1 Y0: -3064991081731777716716694054300618367237478244367204352 type: snfs rlim: 250000000 alim: 250000000 lpbr: 34 lpba: 34 mfbr: 99 mfba: 69 rlambda: 3.6 alambda: 2.6 When q is near 784M, the memory used is 743MB. Last fiddled with by wreck on 2021-09-23 at 11:49 Reason: fix file name
2021-09-23, 12:46   #55
charybdis

Apr 2020

547 Posts

Quote:
 Originally Posted by wreck Code: lpbr: 34 lpba: 34 mfbr: 99 mfba: 69
@frmky, for future reference, when I tested this I found that rational side sieving with *algebraic* 3LP was fastest. This shouldn't be too much of a surprise: the rational norms are larger, but not so much larger that 6 large primes across the two sides should split 4/2 rather than 3/3 (don't forget the special-q is a "free" large prime).

 Similar Threads Thread Thread Starter Forum Replies Last Post Timic Msieve 35 2020-10-05 23:08 aein Msieve 2 2017-10-05 01:52 fivemack Hardware 3 2017-10-03 03:11 CRGreathouse Msieve 8 2009-08-05 07:25 Damian Math 8 2007-02-12 22:25

All times are UTC. The time now is 13:26.

Sun Dec 5 13:26:01 UTC 2021 up 135 days, 7:55, 1 user, load averages: 1.27, 1.78, 1.85