mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2022-10-05, 08:42   #34
Rubiksmath
 
Sep 2022

2F16 Posts
Default

Quote:
Originally Posted by SethTro View Post
would you mind sharing the result of running?

Code:
$ echo "10^248+123" | ./ecm -v -cgbn 2e5 0
Okay, here it is:
Code:
CGBN<1024, 8> running kernel<56 block x 256 threads> input number is 824 bits
Computing 100 bits/call, 1/288578 (0.0%)
Computing 110 bits/call, 101/288578 (0.0%)
Computing 121 bits/call, 211/288578 (0.1%)
Computing 256 bits/call, 1585/288578 (0.5%)
Computing 655 bits/call, 5631/288578 (2.0%)
Computing 2997 bits/call, 220921/288578 (76.6%), ETA 2 + 6 = 8 seconds (~5 ms/curves)
Copying results back to CPU ...
Computing 1792 Step 1 took 8ms of CPU time / 8414ms of GPU time
Throughput: 212.971 curves per second (on average 4.70ms per Step 1)
Also, yeah I don't fully know what happened with the WSL, in the end I just installed Ubuntu dual boot and did it that way and it worked. I have a feeling its a problem on my end with how I've installed the drivers, I can try and see if I can install it on WSL again.

If it helps, here is the last line output from running configure (although as I said I've probably scuffed the drivers pretty bad so I'll try this later on a fresh install)
Code:
./configure --enable-gpu --with-cuda=/usr/local/cuda
......
checking for cInit in -lcuda ..no
configure : error: Couldn't find cuda lib
depending on how I had set it up I did get other errors, but all of them had something to do with the CUDA toolkit, as I said probably my bad and I'll try it again soon to see if I can work out what went wrong.

Last fiddled with by Rubiksmath on 2022-10-05 at 09:02 Reason: new info
Rubiksmath is offline   Reply With Quote
Old 2022-10-05, 14:27   #35
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

1,847 Posts
Default

Quote:
Originally Posted by SethTro View Post
I'm glad it's much faster! I'm curious how my 1080ti stacks up against the 30 series (if someone in this thread has a 3080/3090 I'd love to see some timing results too), would you mind sharing the result of running?

Code:
$ echo "10^248+123" | ./ecm -v -cgbn 2e5 0
...
CGBN<1024, 8> running kernel<56 block x 256 threads> input number is 824 bits
...
Computing 1694 bits/call, 134631/288578 (46.7%), ETA 8 + 7 = 14 seconds (~8 ms/curves)
Copying results back to CPU ...
Computing 1792 Step 1 took 37ms of CPU time / 14013ms of GPU time
Throughput: 127.885 curves per second (on average 7.82ms per Step 1)
With a 3090:
Code:
CGBN<1024, 8> running kernel<164 block x 256 threads> input number is 824 bits
Computing 100 bits/call, 1/288578 (0.0%)
Computing 110 bits/call, 101/288578 (0.0%)
Computing 121 bits/call, 211/288578 (0.1%)
Computing 256 bits/call, 1585/288578 (0.5%)
Computing 655 bits/call, 5631/288578 (2.0%)
Computing 2725 bits/call, 203513/288578 (70.5%), ETA 3 + 6 = 9 seconds (~2 ms/curves)
Copying results back to CPU ...
Computing 5248 Step 1 took 2393ms of CPU time / 8824ms of GPU time
Throughput: 594.753 curves per second (on average 1.68ms per Step 1)
wombatman is offline   Reply With Quote
Old 2022-10-05, 19:01   #36
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

11·43 Posts
Default

Quote:
Originally Posted by wombatman View Post
With a 3090:
Computing 5248 Step 1 took 2393ms of CPU time / 8824ms of GPU time
Throughput: 594.753 curves per second (on average 1.68ms per Step 1)
Thanks for the benchmark. It's takes 35% less time AND does 3 times an many curves at a time, that's a big improvement! Makes me want to upgrade my system :)

I occasionally look at this big run, 4.3e9 x 100,000 curves for M1217, which would complete in batches of 1792 every 3.5 days over 200 total days on my 1080. My mind races when I think about that being batches of 5248 every 2.2 days with a 3090.

I wonder why it's taking 2.3seconds of CPU time. On my system CPU time is generally very small (sub 100ms). If you could run "$ echo "10^248+123" | time ./ecm -cgbn 1e5 0" and share the timing results (e.g. "0.04user 0.10system 0:07.21elapsed 2%CPU") and any anecdotes about it taking time to print the first "Computing 100 bits/call" line (or after "copying results back to CPU").

I think the CPU time is overlapped with the GPU time but it's possible that they aren't in which case there's maybe a 25% speed up.

Last fiddled with by SethTro on 2022-10-05 at 19:01
SethTro is offline   Reply With Quote
Old 2022-10-06, 02:41   #37
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

1,847 Posts
Default

Quote:
Originally Posted by SethTro View Post
Thanks for the benchmark. It's takes 35% less time AND does 3 times an many curves at a time, that's a big improvement! Makes me want to upgrade my system :)

I occasionally look at this big run, 4.3e9 x 100,000 curves for M1217, which would complete in batches of 1792 every 3.5 days over 200 total days on my 1080. My mind races when I think about that being batches of 5248 every 2.2 days with a 3090.

I wonder why it's taking 2.3seconds of CPU time. On my system CPU time is generally very small (sub 100ms). If you could run "$ echo "10^248+123" | time ./ecm -cgbn 1e5 0" and share the timing results (e.g. "0.04user 0.10system 0:07.21elapsed 2%CPU") and any anecdotes about it taking time to print the first "Computing 100 bits/call" line (or after "copying results back to CPU").

I think the CPU time is overlapped with the GPU time but it's possible that they aren't in which case there's maybe a 25% speed up.
Oh, the CPU time is probably higher because I'm running other tasks as well (but wasn't on the GPU) that are using a good chunk of CPU time.

Running your requested command line gives:

Code:
GMP-ECM 7.0.5-dev [configured with GMP 6.2.1, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is 10^248+123 (249 digits)
Using B1=100000, B2=0, sigma=3:2295140881-3:2295146128 (5248 curves)
GPU: Using device code targeted for architecture compile_86
GPU: Ptx version is 86
GPU: maxThreadsPerBlock = 896
GPU: numRegsPerThread = 67 sharedMemPerBlock = 0 bytes
Computing 5248 Step 1 took 1490ms of CPU time / 4739ms of GPU time
1.65user 3.56system 0:05.63elapsed 92%CPU (0avgtext+0avgdata 126448maxresident)k
20288inputs+0outputs (129major+3901minor)pagefaults 0swaps
wombatman is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Just a few problems. storm5510 YAFU 3 2019-10-21 22:25
PC problems Nimras Information & Answers 6 2009-12-15 21:24
Readline problems CRGreathouse Software 11 2009-07-07 05:18
Need help with few problems Laserjet Hardware 1 2007-10-13 10:59
Two problems gribozavr Puzzles 11 2007-02-05 05:46

All times are UTC. The time now is 18:52.


Fri Dec 2 18:52:36 UTC 2022 up 106 days, 16:21, 0 users, load averages: 0.92, 0.98, 0.97

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔