mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2021-05-11, 18:57   #1
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

22·107 Posts
Default Nvidia Quadro P2200 performance on PRR test

I changed the OS on my Dell 7920 from CentOS 7.8 to Ubuntu 20.04 LTS today. That allowed me to build gpuowl, which was looking impossible on CentOS since glibc was too old.

The setup here is
* Dell 7920 tower workstation. (I say tower, as there's a rackmount version of the Dell 7920 too)
* 2 x Intel Xeon Platinum 8167M CPU (non-standard OEM units, 26-cores, 2.0 GHz )
* 384 GB RAM, not well configured due to what I believe is a motherboard fault - on one CPU, not all 6 memory channels are used, despite having 6 DIMMs. (Two memory channels have two DIMMs each).
* Nvidia Quadro P2200 graphics card, which is driving my monitor (only 1920 x 1200 @ 60 Hz). The card has 1290 Cuda cores and 5120 MB RAM. (I attached a screen shot showing some information reported by the Nvida tool in Ubuntu.)

I run gpuowl on the Nvidia graphics card, using a PRP test of 103750501. I chose that exponent, as I I have it allocated to run on the main CPUs, so I thought I would compare the two. The estimated times are

mprime
Code:
[Worker #1 May 11 18:57] Iteration: 100000 / 103750501 [0.09%], ms/iter:  1.560, ETA: 44:55:09
gpuowl
Code:
2021-05-11 19:27:55 Quadro P2200-0 103750501 OK    200000   0.19% 604684b5784bb06d 10196 us/it + check 4.44s + save 0.33s; ETA 12d 05:16
So roughly, the estimated times to complete a PRP test of 103750501 are 293 hours on the Nvidia Quadro P2200, vs 45 hours on the 26-core Xeon, so the Intel Xeon Platinum 8167M is roughly 293/45=6.5 times faster than the Nvidia Quadro P2200 on this PRP test.

For what it is worth, the Xeons cost me 300 GBP ($425 USD at today's exchange rate), which is exactly the same as what the Nvidia Quadro P2200 cost me. Both the graphics card and the CPUs were used.

Here's how the test was run with gpuowl. I've not got any configuration file or anything - I need to work out how to use the program.

Code:
drkirkby@jackdaw:~/GPU$ gpuowl -maxAlloc 4096M -prp 103750501
2021-05-11 18:54:09 GpuOwl VERSION 
2021-05-11 18:54:09 GpuOwl VERSION 
2021-05-11 18:54:09 Note: not found 'config.txt'
2021-05-11 18:54:09 config: -maxAlloc 4096M -prp 103750501 
2021-05-11 18:54:09 device 0, unique id ''
2021-05-11 18:54:09 Quadro P2200-0 103750501 FFT: 5.50M 1K:11:256 (17.99 bpw)
2021-05-11 18:54:09 Quadro P2200-0 103750501 OpenCL args "-DEXP=103750501u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0.0070585858722830054 -DIWEIGHT_STEP=-0.007009111457174139 -DIWEIGHTS={0,-0.013969095270929188,-0.02774305491917008,-0.041324604812826758,-0.054716432742092071,-0.067921188951161587,-0.080941486662717207,-0.093779902595084258,} -DFWEIGHTS={0,0.014166995379082403,0.028534694516235748,0.043105940780673195,0.05788361782360639,0.072870650148920399,0.088070003691933282,0.10348468640635508,}  -cl-std=CL2.0 -cl-finite-math-only "
2021-05-11 18:54:11 Quadro P2200-0 103750501 

2021-05-11 18:54:11 Quadro P2200-0 103750501 OpenCL compilation in 2.12 s
2021-05-11 18:54:11 Quadro P2200-0 103750501 trig table : 65 points, cos 73.98 bits, sin 73.34 bits
2021-05-11 18:54:11 Quadro P2200-0 103750501 trig table : 353 points, cos 73.48 bits, sin 73.05 bits
2021-05-11 18:54:12 Quadro P2200-0 103750501 trig table : 360449 points, cos 72.52 bits, sin 72.42 bits
2021-05-11 18:54:12 Quadro P2200-0 103750501 maxAlloc: 4.0 GB
2021-05-11 18:54:12 Quadro P2200-0 103750501 P1(0) 0 bits
2021-05-11 18:54:12 Quadro P2200-0 103750501 PRP starting from beginning
2021-05-11 18:54:17 Quadro P2200-0 103750501 OK         0 on-load: blockSize 400, 0000000000000003
2021-05-11 18:54:17 Quadro P2200-0 103750501 validating proof residues for power 8
2021-05-11 18:54:17 Quadro P2200-0 103750501 Proof using power 8
2021-05-11 18:54:30 Quadro P2200-0 103750501 OK       800   0.00% 6c8aa8e618891740 10038 us/it + check 4.18s + save 0.25s; ETA 12d 01:17
Attached Thumbnails
Click image for larger version

Name:	Quadro-P2200.png
Views:	34
Size:	179.4 KB
ID:	24872  

Last fiddled with by drkirkby on 2021-05-11 at 19:35
drkirkby is online now   Reply With Quote
Old 2021-05-11, 19:15   #2
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

22×107 Posts
Default

Here's the data sheet on the Nvidia Quadro P2200.

Intel will release no information about the Xeon Platinum 8167Ms, but it has around 35 MB cache. It's single-threaded performance is pretty poor, even for a 2 GHz CPU, but if one can use all 26 cores actively, then it offers quite a bit of performance for the money.
Attached Files
File Type: pdf quadro-p2200-datasheet-letter-974207-r4-web.pdf (542.0 KB, 29 views)

Last fiddled with by drkirkby on 2021-05-11 at 19:49
drkirkby is online now   Reply With Quote
Old 2021-05-11, 21:20   #3
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

22×7×349 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I changed the OS on my Dell 7920 from CentOS 7.8 to Ubuntu 20.04 LTS today. That allowed me to build gpuowl, which was looking impossible on CentOS since glibc was too old.
Yeah... There's an ongoing debate as to what is the safest option for Public Facing code stacks.

CentOS 7.9 is, intentionally, slow in upgrading the SW stacks (including GCC). Discussion about RedHat's decision to make CentOS sub-optimal for decision-makers is left for another thread.

Ubuntu 20.04 LTS is more "bleeding edge", but also compiles more code-stacks.

Trying to explain the subtle differences to Pointy Haired bosses can become a full-time job for those who don't constrain such discussions...

I hope that makes sense.
chalsall is offline   Reply With Quote
Old 2021-05-11, 22:01   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124618 Posts
Default

Please run and submit GPU benchmarks at
https://www.mersenne.ca/mfaktc.php for TF
https://www.mersenne.ca/cudalucas.php
since the P2200 is not in either list.
kriesel is offline   Reply With Quote
Old 2021-05-11, 22:34   #5
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

3·41·67 Posts
Default

P2200: https://www.techpowerup.com/gpu-spec...ro-p2200.c3442
GP106: https://www.techpowerup.com/gpu-specs/nvidia-gp106.g797

It should be about the same speed as a GTX 1060.

Xyzzy is offline   Reply With Quote
Old 2021-05-11, 22:52   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124618 Posts
Default

FP64 (double) performance 119.4 GFLOPS (1:32)
In other words, good reason to be slow in GpuOwl, slower in ClLucas. Really only suitable for TF. Could have bought a 6GB GTX1060 for less cost. (Search eBay for "GTX1060 6GB")

Last fiddled with by kriesel on 2021-05-11 at 23:00
kriesel is offline   Reply With Quote
Old 2021-05-11, 23:05   #7
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

22×107 Posts
Default

Quote:
Originally Posted by kriesel View Post
Please run and submit GPU benchmarks at
https://www.mersenne.ca/mfaktc.php for TF
https://www.mersenne.ca/cudalucas.php
since the P2200 is not in either list.

Lots of the posts on this topic are rather old. Where do I get mfaktc and CUDALucas from? I see several places for the same named program (e.g. Sourceforge and github). I found binaries of both mfackct and CUDALucas, but neither would run.
Code:
drkirkby@jackdaw:~/mfaktc-0.21$ ./mfaktc.exe
./mfaktc.exe: error while loading shared libraries: libcudart.so.6.5: cannot open shared object file: No such file or directory


drkirkby@jackdaw:~/CUDALucus$ ./CUDALucas
./CUDALucas: error while loading shared libraries: libcufft.so.10: cannot open shared object file: No such file or directory
I tried building one from source, but that too failed due to a lack of a library. I have not installed a CUDA development environment, but despite that, gpuowl built easily. The others are not.



Dave
drkirkby is online now   Reply With Quote
Old 2021-05-12, 01:55   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52·7·31 Posts
Default

1) Heinrich will accept benchmarks for GPUs from GpuOwL. It says so on one of the pages posted earlier.

2) Gpuowl builds without vendor libraries because rather than using a vendor provided fft library, Preda programmed his own in the gpuowl source code.
Also, GpuOwl is an OpenCL application, so has no need of or use for CUDA.

3) <broken record mode>Use the reference info.</mode>
Perhaps you already did.
https://mersenneforum.org/showthread.php?t=24607 Bookmark that.
Browser search for "software" finds it:
Available Mersenne Prime hunting software http://www.mersenneforum.org/showpos...91&postcount=2
Download and save the attachment from that linked post. And bookmark the post, since the attachment gets updated after I discover the software has.
The CUDA .dll part there is more subtle and has been updated / expanded today.

It also mentions the download mirror at mersenne.ca, which includes a few linux builds.

4) NVIDIA CUDA Windows dlls or Linux .so files (or GPU drivers for that matter) generally must be acquired separately for CUDA applications. CUDA DLLs are available at the mirror. But not apparently .so files for the various versions of the various Linux distros, so off to NVIDIA for a big free download, and extract what you need afterward; https://developer.nvidia.com/cuda-downloads or the download archive for previous versions, at https://developer.nvidia.com/cuda-toolkit-archive

I have a libcufft.so.6.5.14 but it is far too big to post, even compressed. And I'm unsure what Linux variants which .so files are compatible with. It's old, so very unlikely to like Ubuntu 20.x.
Searching https://mersenneforum.org/showthread.php?t=24607 for compatibility leads to
CUDA Toolkit compatibility vs CUDA level https://www.mersenneforum.org/showpo...1&postcount=11

Perhaps someone who uses Linux far more than I, could assist or advise regarding .so files.

Last fiddled with by kriesel on 2021-05-12 at 02:04
kriesel is offline   Reply With Quote
Old 2021-05-12, 02:19   #9
moebius
 
moebius's Avatar
 
Jul 2009
Germany

60710 Posts
Default

Quote:
Originally Posted by kriesel View Post
Please run and submit GPU benchmarks at
https://www.mersenne.ca/mfaktc.php for TF
https://www.mersenne.ca/cudalucas.php
since the P2200 is not in either list.
Yes, please make also a benchmark for my pure gpuowl list with the slightly larger exponent 77936867. Please post it in this thread:
https://mersenneforum.org/showthread...22204&page=246
moebius is offline   Reply With Quote
Old 2021-05-12, 06:17   #10
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

22·107 Posts
Default

Quote:
Originally Posted by kriesel View Post
FP64 (double) performance 119.4 GFLOPS (1:32)
In other words, good reason to be slow in GpuOwl, slower in ClLucas. Really only suitable for TF. Could have bought a 6GB GTX1060 for less cost. (Search eBay for "GTX1060 6GB")
A lot of independent software vendors of engineering products (e.g. Ansys), support Quadro cards on their products (eg HFSS).

https://www.ansys.com/content/dam/it...es-2019-r2.pdf
but not the consumer-grade gaming cards. Hence for those applications, a Quadro is a more suitable card. Conversely, for games, the Quadro is a poor choice as the games are not optimised for them.

I bought this computer for engineering applications - about the only games I play is chess and minesweeper, neither of which would benefit from a better graphics card. Hence my choice of graphics cards.

Can anyone give me any sort of ideas of what graphics cards could do a PRP test of 103750501 in under 44 hours, which is what one of my Xeons does? With the pair of Xeons I'm averaging 22 hours (roughly one a day), for exponents around that value. If an affordable graphics card could do significantly better I might be persuaded to put my hands in my pocket and buy one for GIMPS.

Dave

Last fiddled with by drkirkby on 2021-05-12 at 06:17
drkirkby is online now   Reply With Quote
Old 2021-05-12, 08:18   #11
moebius
 
moebius's Avatar
 
Jul 2009
Germany

607 Posts
Default

Quote:
Originally Posted by drkirkby View Post
Can anyone give me any sort of ideas of what graphics cards could do a PRP test of 103750501 in under 44 hours, which is what one of my Xeons does? With the pair of Xeons I'm averaging 22 hours (roughly one a day), for exponents around that value. If an affordable graphics card could do significantly better I might be persuaded to put my hands in my pocket and buy one for GIMPS.

Dave
Have a look at this sheet:
https://drive.google.com/file/d/10fC...enkBdAaRP/view
A Vega64 are at 1,755 ms/it for PRP.103750501. I think you will need minimum a AMD RX 6700 XT,

Last fiddled with by moebius on 2021-05-12 at 08:33
moebius is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GpuOwl Nvidia Hardware Accelerated Scheduling Test (Windows) xx005fs GpuOwl 4 2020-07-02 04:38
I wonder if there is a single precision version LL-test for Nvidia GPU computing Neutron3529 GPU Computing 40 2019-05-03 09:49
A dream, will stay a dream ( new Nvidia Quadro) firejuggler GPU Computing 0 2018-03-28 16:02
NVIDIA Quadro K4000 speed results benchmark sixblueboxes GPU Computing 3 2014-07-17 00:25
How to stress test nvidia gpu in Windows 7 64-bit RickC GPU Computing 5 2012-10-15 09:19

All times are UTC. The time now is 08:17.


Tue Aug 3 08:17:56 UTC 2021 up 11 days, 2:46, 0 users, load averages: 2.42, 2.58, 2.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.