mersenneforum.org Nvidia Quadro P2200 performance on PRR test
2021-05-12, 09:12   #12
drkirkby

"David Kirkby"
Jan 2021
Althorne, Essex, UK

449 Posts

Quote:
 Originally Posted by moebius Have a look at this sheet: https://drive.google.com/file/d/10fC...enkBdAaRP/view A Vega64 are at 1,755 ms/it for PRP.103750501. I think you will need minimum a AMD RX 6700 XT,
A used Vega64 is more expensive than a used Intel Xeon Platinum 8167M, but the Xeon is faster at around 1.5 ms/iteration. An AMD RX 6700 XT is well over twice the price of the Xeon.

I am fairly confident that the Dell 7920 can be speeded up considerably, as I think the RAM is running at 1200 MHz, not the 2400 MHz supported by the CPUs. At the moment I can’t get more than 4 DIMMs to work in the appropriate 6 DIMM sockets for the first CPU. So only 4 of the 6 memory channels are in used. (Two memory channels have 2 DIMMs, so although both CPUs have 6 DIMMs, they are not configured optimally). The CPU utilisation never goes above about 45% when running PRP tests, but hits 100% on trial factoring to small numbers at least. When I run the memory tests of the Passmark website, my memory is in the bottom 5% of systems submitted!!! The memory performance is appalling. In other words 95% are better.

You might reasonably ask why I don’t get the PC sorted out. Unfortunately the Dell warranty only covers the system with the original parts (2 x 8 GB DIMMs). I don’t have enough Dell DIMMs to reproduce the fault, so Dell will not do anything about it. The cost of buying Dell DIMMs is huge - I am contemplating whether it’s less hassle to just buy another motherboard from eBay. I can get a Dell 7920 sealed reconditioned motherboard for less than the cost of a single 8 GB DIMM from Dell. I feel pretty peed off about it to be honest. I don’t see why I should buy a motherboard and fix a computer under warranty, but it is looking the most cost effective way to get it repaired.

Irrespective of whether I can speed up the computer with a new motherboard or not, I don’t think buying a graphics card would be good value.

Dave.

Last fiddled with by drkirkby on 2021-05-12 at 09:18

 2021-05-12, 09:28 #13 moebius     Jul 2009 Germany 62510 Posts I paid 260 Euro for 6 months for the RX Vega64 mining version. And yet rather a RX 6800, because the RX 6700 XT only has 0.825 TLOPS FP64, otherwise it will be scarce
2021-05-12, 09:28 #13
drkirkby

"David Kirkby"
Jan 2021
Althorne, Essex, UK

449 Posts

Quote:
 Originally Posted by moebius Yes, please make also a benchmark for my pure gpuowl list with the slightly larger exponent 77936867. Please post it in this thread: https://mersenneforum.org/showthread...22204&page=246
That exponent is smaller than the one I did before, so therefore will take less time.
Code:
2021-05-12 11:00:40 Quadro P2200-0 77936867 OK       800   0.00% 1579c241dc63eca6 7271 us/it + check 3.12s + save 0.19s; ETA 6d 13:24
I will post a full benchmark at the link suggested.

Last fiddled with by drkirkby on 2021-05-12 at 10:05

2021-05-12, 10:10   #15
kruoli

"Oliver"
Sep 2017
Porta Westfalica, DE

2×11×53 Posts

Quote:
 Originally Posted by drkirkby I am fairly confident that the Dell 7920 can be speeded up considerably, as I think the RAM is running at 1200 MHz, not the 2400 MHz supported by the CPUs.
Depending on the tool you use, that reading is expected and the memory is already running at the maximum frequency. It would be very weird if it was really running at 1200 MHz (I guess you should say MTransfers instead?), since I never saw DDR4 running that slow.

Quote:
 Originally Posted by drkirkby The CPU utilisation never goes above about 45% when running PRP tests, but hits 100% on trial factoring to small numbers at least.
That is also expected. TF will run with the HT cores (2 threads per physical core), while PRP, LL, P-1 etc. will not (1 thread per physical core, so only 50 % total CPU usage in task manager, etc.), because that's way it is (usually) the fastest.

2021-05-12, 11:15   #16
drkirkby

"David Kirkby"
Jan 2021
Althorne, Essex, UK

449 Posts

Quote:
 Originally Posted by kruoli Depending on the tool you use, that reading is expected and the memory is already running at the maximum frequency. It would be very weird if it was really running at 1200 MHz (I guess you should say MTransfers instead?), since I never saw DDR4 running that slow. That is also expected. TF will run with the HT cores (2 threads per physical core), while PRP, LL, P-1 etc. will not (1 thread per physical core, so only 50 % total CPU usage in task manager, etc.), because that's way it is (usually) the fastest.
The machine come with Windoze. When I run this benchmark, which I think is fairly common
https://www.passmark.com/
it gives a report on various aspects of the computer - CPUs, graphics, memory etc. These are compared to other system submitted. I think its reasonable to assume anyone submitting a system has some interest in performance, so the list is probably not of computers in use in the world, but some higher end ones. This is how my Dell scores, on the various factors, and as a percentile.

Passmark 66th percentile -that's the overall rating.
CPU mark 99% percentile - obviously two 26-core CPUs are quick
2D mark 52nd percentile
3D mark 53rd percentile
Memory mark 22nd percentile
Disk mark 67th percentile

I know they weight the overall mark, so not single strong attribute can push the mark up much, but one weak one can drop it a lot. I think it's pretty clear that the CPUs are working well, but the memory is working very poorly. This is for a machine, which this YouTube video
the Dell 7920 and an HP Z8 can jointly claim to be the worlds fastest workstation. I think it's clear there's something very wrong with the memory performance on my machine. I think it was CPU-Z, which is one of the other CPU performance indicators, that indicates the RAM is running at 1200 MHz (yes, 1.2 GT/s would seem more logical).

There are 24 DIMM sockets - 12 for each CPU. For each CPU, 6 are coloured white, and 6 black. One should occupy the white one first, then the black. But any attempt to put more than 4 DIMMs in the white ones for CPU0 (first CPU) will result in the machine failing to power on. Only if I move the DIMMs to the black sockets, so putting 2 DIMMs on some memory channels, and none on others, can I fit 6 DIMMs on the first CPU. The problem does not occur with the 2nd CPU, and swapping the positions of the CPUs does not help, so I don't think it is a CPU fault. I've also tried the original 8-core CPU, and have the exact same issue.

Unfortunately dealing with Dell is a nightmare. Long waits on the phone, to get to Indian call centres, with staff with little knowledge. Sometime after waiting on the UK phone for a long time (20 minutes or so), one gets a message to dial another number, which starts +91, so India. I'm very unimpressed with Dell, but their argument is that it is not supported with Kingston DIMMs. So I seem in a catch-22

* Pay a fortune for new Dell DIMMs
* Pay a lot of money for 2nd hand Dell DIMMs, then no doubt Dell would argue they don't support used components.
* Replace the motherboard myself.

Dave (not a happy Dell customer).

Last fiddled with by drkirkby on 2021-05-12 at 11:29

 2021-05-12, 11:31 #17 drkirkby   "David Kirkby" Jan 2021 Althorne, Essex, UK 449 Posts PS, I will try disabling hyperthreading in the BIOS, and see if the apparent CPU untilisation rises. I'm running linux, so don't use task manager. Later I will test with a smaller PRP exponent that will fit in the cache. I think performance drops a lot when memory is accessed, but its clear no affordable graphics card is going to worth using in place of the current configuration. Last fiddled with by drkirkby on 2021-05-12 at 11:33
 2021-05-12, 11:47 #18 drkirkby   "David Kirkby" Jan 2021 Althorne, Essex, UK 7018 Posts Yes, with hyperthreading disabled in the BIOS, the CPU usage is closer to maximum. It's 19.2% idle, which is a lot lower than it was with hyperextending enabled. Anyway, I still think there is an issue with the RAM on this, but are pondering my options for resolving it. Code: top - 12:43:25 up 4 min, 1 user, load average: 39.49, 18.73, 7.32 Tasks: 698 total, 1 running, 697 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 2.7 sy, 77.8 ni, 19.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st top - 12:43:27 up 4 min, 1 user, load average: 40.17, 19.22, 7.54 MiB Mem : 386593.7 total, 384006.8 free, 1778.2 used, %Cpu(s): 0.0 us, 2.7 sy, 80.6 ni, 16.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 stfree, 0.0 used. 382 MiB Mem : 386593.7 total, 384007.4 free, 1777.7 used, 808.7 buff/cache MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 3821 127:47.64 mprime 784.8 avail Mem 1 root 20 0 167596 11824 8520 S 0.0 0.0 0:02.52 systemd 2273 drkirkby 30 10 4542704 572836 6736 S 4331 0.1 129:30.71 mprime 1841 drkirkby 20 0 4127312 309948 119092 S 1.7 0.1 0:06.74 gnome-s+ 1339 root 20 0 82212 4068 3456 S 0.4 0.0 0:00.13 irqbala+ 1583 root -51 0 0 0 0 S 0.4 0.0 0:01.03 irq/167+ 1585 root 20 0 0 0 0 S 0.4 0.0 0:01.08 nv_queue 2233 drkirkby 20 0 823232 50752 38268 S 0.4 0.0 0:01.53 gnome-t+ 2413 drkirkby 20 0 21180 4528 3180 R 0.4 0.0 0:00.04 top 1 root 20 0 167596 11824 8520 S 0.0 0.0 0:02.52 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par+ 5 root 20 0 0 0 0 I 0.0 0.0 0:00.09 kworker+ 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker+ 7 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker+ 8 root 20 0 0 0 0 I 0.0 0.0 0:00.22 kworker+ 9 root 20 0 0 0 0 I 0.0 0.0 0:00.01 kworker+ 10 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_perc+
2021-05-12, 15:25   #19
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×2,293 Posts

Quote:
 Originally Posted by drkirkby Can anyone give me any sort of ideas of what graphics cards could do a PRP test of 103750501 in under 44 hours, which is what one of my Xeons does? With the pair of Xeons I'm averaging 22 hours (roughly one a day), for exponents around that value. If an affordable graphics card could do significantly better I might be persuaded to put my hands in my pocket and buy one for GIMPS. Dave
Radeon VII GPUs can do it in about half that. These used to retail new for $600. (Now fully functional used no-warranty VIIs are going for ~US$2K, or somewhat above the list price of a scarce new Radeon Pro VII.)
All GPU models' prices are currently badly inflated by the GPU and general chip shortage. This should eventually pass.

Numerous benchmark timings vs. GpuOwL version and fft length for RX480 and Radeon VII are available (where else, reference thread post attachments!)
Note, those tabulated values are from Windows with reduced GPU power, and can be beaten by operating the GPU at full power, or on Linux with ROCm driver, or both.
The Radeon VII GPU Gpuowl stellar performance is why folks in the know such as Woltman and Mayer and Preda adopted Radeon VII GPUs for PRP testing. They also report somewhat higher total throughput by running two instances per GPU.

Any GPU scoring at least half the GhzD/day rating of a Radeon VII in James Heinrich's table should meet your 44 hour threshold or come close, properly installed & operated.

2021-05-12, 15:51   #20
masser

Jul 2003
Behind BB

41·47 Posts

Quote:
 Originally Posted by kriesel FP64 (double) performance 119.4 GFLOPS (1:32) In other words, good reason to be slow in GpuOwl, slower in ClLucas. Really only suitable for TF. Could have bought a 6GB GTX1060 for less cost. (Search eBay for "GTX1060 6GB")
(I got a good deal on a GTX 1050 because of this post. Thanks!)

2021-05-13, 21:02   #21
moebius

Jul 2009
Germany

10011100012 Posts

Quote:
 Originally Posted by masser (I got a good deal on a GTX 1050 because of this post. Thanks!)
Nice.The gpu has a GP107 chipset. I also ask you for a gpuowl benchmark (when you receive the card) as mentioned earlier in the thread. Of course, it's more of a good trial factoring card.

