![]() |
![]() |
#3169 | |
"James Heinrich"
May 2004
ex-Northern Ontario
324610 Posts |
![]() Quote:
For short-running exponents the answer is there's not much you can do -- mfaktx just doesn't scale all that well to micro assignments. Even with less_classes and tweaks, if your runtime is less than a second you're leaving lots of performance unused. You may be able to recoup some of it by running multiple mfaktc instances simultaneously to try to maximize GPU load (keep adding instances until the sum of your throughput stops increasing). |
|
![]() |
![]() |
![]() |
#3170 | |
Apr 2019
5·41 Posts |
![]() Quote:
I guess I'm basically outing myself as mersenne.ca's TJAOI-copycat here, but James already knew that anyways :) I had tried before to do much larger batches of like 1million exponents at a time, but it seemed that the rewriting of such a large worktodo was causing massive overhead, maybe that's still a big factor here possibly limiting by CPU/disk IO rather than GPU, that I should be doing even smaller batches than 10k. Also, somewhat offtopic, but on the GTX I can't see the GPU utilization % using "nvidia-smi" utility on linux (it says "N/A"), anyone know if that was just not implemented in GTX 700 series or older GPUs? Last fiddled with by hansl on 2019-07-06 at 03:04 |
|
![]() |
![]() |
![]() |
#3171 | |
"James Heinrich"
May 2004
ex-Northern Ontario
2·3·541 Posts |
![]() Quote:
Last fiddled with by James Heinrich on 2019-07-06 at 03:22 |
|
![]() |
![]() |
![]() |
#3172 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
13×373 Posts |
![]() Quote:
Factor=exponent,74,78 becomes Factor=exponent,75,78 then Factor=exponent,76,78 etc as bit levels are completed. It's only after 77,78 is completed that such a line would be removed. In the higher exponents and bit levels there's ample time to observe such behavior since run times can be days or weeks per bit level. |
|
![]() |
![]() |
![]() |
#3173 | ||
Apr 2019
5×41 Posts |
![]() Quote:
I did some testing with batches of 500, and here is a condensed version of my mfaktc.ini that I settled on: Quote:
Tweaking GPUSievePrimes for these workloads seemed to have some of the most noticeable effects (again this is just for exponents around 1e9 range and only to 55 bits). Note: I also raised GPU_SIEVE_SIZE_MAX in params.h. It doesn't make a lot of difference, but seemed like GPUSieveSize=512 was maybe a couple % faster than the default max of 128. I've basically commented out all the status printing lines in the source now(but carefully leaving in the ones that print to results file!), so I wouldn't think PrintMode should make a difference at this point, but it looked like it was maybe slightly faster with it set to 0 rather than 1 (but also maybe within margin of error). All in all I went from ~26s/500 exponents down to ~20-21s/500 after testing various changes here (on my Quadro). I still see slower times on the GTX 780 @ ~29s/500, but I haven't tested ramdisk yet. Running 2 instances on the GTX gives me ~44s/500 per instance so that averages out to ~22s/500 throughput which is a definite improvement over single instance, but still *just* slower than the Quadro. Maybe notable is that the GTX 780 is being fed by older (dual socketed) Xeon E5-2697 v2's, which are running consistenly @ 3.0Ghz vs my laptop's i7-6820HQ which runs at constant boost of 3.2Ghz. Running 2 instances on the Quadro gave me ~40s/500 which is only maybe a fraction of a second better time per batch throughput vs single instance. I'll try benchmarking on ramdisk soon and see how that fares. Last fiddled with by hansl on 2019-07-06 at 19:26 Reason: corrected Xeon model# |
||
![]() |
![]() |
![]() |
#3174 |
Apr 2019
CD16 Posts |
![]()
One small suggestion based on the issue of rewriting large worktodo.txt files.
I think it may be more efficient to work backwards from the last line, so it would only need to rewrite one line at a time vs the whole file each time it updates? I totally understand if that's not considered worth the effort for this extremely niche case though. |
![]() |
![]() |
![]() |
#3175 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
13·373 Posts |
![]()
I recall seeing reference to finding two or more factors in the same class of the same bit level being rare, and perhaps not necessary to handle both factors in that case. (I think it was in this thread, but have been unable/not patient enough to find that reference.)
Related are "In some cases it misses factors when there are multiple factors in one class close together but this is not critical. The is a known problem since the first version... This has nothing to do with the calculations itself, it is just how the results are returned from the GPU to the CPU." https://www.mersenneforum.org/showpo...&postcount=131 and "There is no result bitmap in mfaktc, just a small array of integers (32x 4 Byte) after each class is finished. The array can hold up to 10 factors per class." https://www.mersenneforum.org/showpo...4&postcount=35 I've put together some calculated estimates for two to four factors in the same bit level and same class, at https://www.mersenneforum.org/showpo...82&postcount=5. Last fiddled with by kriesel on 2019-07-08 at 20:39 |
![]() |
![]() |
![]() |
#3176 |
"Seth"
Apr 2019
3548 Posts |
![]()
I modified mfaktc to return k's where pow(2, P-1, 2*P*k+1) is very small so that NF-TF results still have some verifiable output. More details and code links in https://www.mersenneforum.org/showpo...&postcount=199
|
![]() |
![]() |
![]() |
#3177 |
"Oliver"
Mar 2005
Germany
2·3·5·37 Posts |
![]()
Hi there,
I was recently able to "reproduce" the issue where mfaktc reports 38814612911305349835664385407 as a (false) factor of M<insert prime number here>. While the origin of the factor is well known (last factor in the small selftest before doing some real work) it is still unknown why it shows up "randomly". I have some evidence that this related to hardware errors. In my case the OS reported Code:
[150260.974505] NVRM: GPU at PCI:0000:5e:00: GPU-<some UID> [150260.974510] NVRM: GPU Board Serial Number: [150260.974514] NVRM: Xid (PCI:0000:5e:00): 13, Graphics SM Warp Exception on (GPC 3, TPC 4, SM 0): Out Of Range Address [150260.974521] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ESR 0x51e730=0xc04000e 0x51e734=0x0 0x51e728=0x4c1eb72 0x51e72c=0x174 [150260.974764] NVRM: Xid (PCI:0000:5e:00): 13, Graphics SM Warp Exception on (GPC 4, TPC 0, SM 0): Out Of Range Address [150260.974769] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ESR 0x524730=0xc05000e 0x524734=0x0 0x524728=0x4c1eb72 0x52472c=0x174 [150260.974857] NVRM: Xid (PCI:0000:5e:00): 13, Graphics SM Warp Exception on (GPC 4, TPC 1, SM 1): Out Of Range Address [150260.974863] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ESR 0x524fb0=0xc05000e 0x524fb4=0x20 0x524fa8=0x4c1eb72 0x524fac=0x174 [150260.974954] NVRM: Xid (PCI:0000:5e:00): 13, Graphics SM Warp Exception on (GPC 4, TPC 2, SM 1): Out Of Range Address [150260.974959] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ESR 0x5257b0=0xc05000e 0x5257b4=0x20 0x5257a8=0x4c1eb72 0x5257ac=0x174 [150260.975044] NVRM: Xid (PCI:0000:5e:00): 13, Graphics SM Warp Exception on (GPC 4, TPC 3, SM 0): Out Of Range Address [150260.975050] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ESR 0x525f30=0xc06000e 0x525f34=0x20 0x525f28=0x4c1eb72 0x525f2c=0x174 [150260.975118] NVRM: Xid (PCI:0000:5e:00): 13, Graphics SM Warp Exception on (GPC 4, TPC 3, SM 1): Out Of Range Address [150260.975123] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ESR 0x525fb0=0xc06000e 0x525fb4=0x20 0x525fa8=0x4c1eb72 0x525fac=0x174 [150260.975201] NVRM: Xid (PCI:0000:5e:00): 13, Graphics SM Warp Exception on (GPC 4, TPC 4, SM 0): Out Of Range Address [150260.975206] NVRM: Xid (PCI:0000:5e:00): 13, Graphics SM Global Exception on (GPC 4, TPC 4, SM 0): Multiple Warp Errors [150260.975211] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ESR 0x526730=0xc04000e 0x526734=0x24 0x526728=0x4c1eb72 0x52672c=0x174 [150260.975280] NVRM: Xid (PCI:0000:5e:00): 13, Graphics SM Warp Exception on (GPC 4, TPC 4, SM 1): Out Of Range Address [150260.975284] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ESR 0x5267b0=0xc04000e 0x5267b4=0x20 0x5267a8=0x4c1eb72 0x5267ac=0x174 [150260.984693] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ChID 0010, Class 0000c5c0, Offset 00000000, Data 00000000 [169206.556478] NVRM: Xid (PCI:0000:5e:00): 13, Graphics SM Warp Exception on (GPC 5, TPC 2, SM 1): Out Of Range Address [169206.556485] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ESR 0x52d7b0=0xc07000e 0x52d7b4=0x0 0x52d7a8=0x4c1eb72 0x52d7ac=0x174 [169206.557301] NVRM: Xid (PCI:0000:5e:00): 13, Graphics Exception: ChID 0010, Class 0000c5c0, Offset 00000000, Data 00000000 [169206.624363] NVRM: Xid (PCI:0000:5e:00): 62, 0cb5(2d50) 8503d428 ffffff80 Oliver |
![]() |
![]() |
![]() |
#3178 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
13·373 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3179 |
Random Account
Aug 2009
U.S.A.
22·11·41 Posts |
![]()
I have a 250GB Samsung SSD coming tomorrow. I have been running James Heinrich's project using a RAM-drive. I feel that I should continue to run it this way. Would this be a correct statement?
Thanks. ![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1668 | 2020-12-22 15:38 |
The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |