![]() |
|
![]() |
|
Thread Tools |
![]() |
#1 |
"Ed Hall"
Dec 2009
Adirondack Mtns
33·167 Posts |
![]()
Title Note: Only the first three posts are concerned with the original title.
I have several desktop Ubuntu machines running headless using remote access via ssh and VNC. Would a rack server, such as a Dell Poweredge R720XD, work the same? Would server RAM for a rack server be the same, or would maybe the rack server only use low profile RAM modules? Anything else to consider? Thanks! Last fiddled with by EdH on 2021-03-23 at 13:40 |
![]() |
![]() |
![]() |
#2 |
"Curtis"
Feb 2005
Riverside, CA
5,279 Posts |
![]()
The main thing to consider is that rack machines use high volume / high speed / very very high noise fans.
I was gifted a Dell C6100 years ago (4 nodes, dual 4-core DDR2-era Xeons in each node), and besides the ~kilowatt it drew, I found I could not run the thing at home because the fan noise from two rooms away was louder than the 5-fan desktop three feet away. Think closer to a quiet vacuum cleaner than a desktop. It's possible a 1U server might want low-profile memory, but most servers of the sort you might find used are 2U and highly likely to take normal ECC ram. There are a couple flavors of ECC, though- I believe registered and unregistered sticks cannot be used in the same system, so one should wait until one has physical possession of a machine, check the memory type/label, and order upgrades to match. |
![]() |
![]() |
![]() |
#3 |
"Ed Hall"
Dec 2009
Adirondack Mtns
33×167 Posts |
![]()
Excellent point! Two routers I picked up are way too loud to use in the house. A loud server might fit that profile.
I need to study the RAM issue a bit. My latest build has a motherboard that only accepts 24GB of non or unbuffered ECC, but 96GB of registered. Something I was looking at seemed to say registered performs quite a bit slower than the others. |
![]() |
![]() |
![]() |
#4 |
P90 years forever!
Aug 2002
Yeehaw, FL
173208 Posts |
![]()
And with Gerbicz error checking, higher priced ECC ram no longer provides benefits to the PRP tester.
|
![]() |
![]() |
![]() |
#5 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
6,481 Posts |
![]() Quote:
Code:
2021-03-18 21:26:38 roa/radeonvii 340000607 OK 96450000 28.37%; 4139 us/it; ETA 11d 16:00; 2560ad1b65fd94e8 (check 2.36s) 26 errors 2021-03-18 21:30:07 roa/radeonvii 340000607 EE 96500000 28.38%; 4140 us/it; ETA 11d 16:01; 3a397a709767f8b4 (check 2.24s) 26 errors 2021-03-18 21:30:09 roa/radeonvii 340000607 OK 96450000 loaded: blockSize 400, 2560ad1b65fd94e8 2021-03-18 21:33:39 roa/radeonvii 340000607 OK 96500000 28.38%; 4140 us/it; ETA 11d 16:00; 614a86697c2e4515 (check 2.41s) 27 errors 2021-03-18 21:37:08 roa/radeonvii 340000607 OK 96550000 28.40%; 4140 us/it; ETA 11d 15:59; 38b95efc08eff002 (check 2.46s) 27 errors Last fiddled with by kriesel on 2021-03-19 at 02:50 |
|
![]() |
![]() |
![]() |
#6 |
Aug 2006
597910 Posts |
![]() |
![]() |
![]() |
![]() |
#7 | |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
645610 Posts |
![]() Quote:
ECC protects all data flows, not just the FFT data. So the RAM data for code, the OS, the drivers, etc. are also less likely to be corrupted, leading to fewer crashes, more uptime, and more throughput. This is in addition to kriesel's point about fewer rewinds during the test. |
|
![]() |
![]() |
![]() |
#8 | |
"Robert Gerbicz"
Oct 2005
Hungary
30408 Posts |
![]() Quote:
Check me: the expected(!) cost of error checking+potential rollback(s) over the fixed p iterations is: Code:
F(p,e,B)=q=(1-e)^B;return(p*2/sqrt(B)/q+p*(1/q-1)) Using a slightly simplified case when we assume that the probability for 2 or more errors in a given block is much smaller than the probability of a single error. What is the case in all real situations, unless you have a very faulty gpu/cpu where error rate is too large (say e>1/4). In your case p=340000607; e=27./(96550000+50000*27); B=50000 [was it fixed?]. (notice that we used more than 96550000 iterations due to the rollbacks, assuming that 27 errros is the total error count, does it save this count or start with count=0 after a restart?]. Code:
? F(p,e,50000) %113 = 7804224.9927642080311375201439985915790 ? F(p,e,23400) %114 = 6675385.4735534531717799947002467584233 |
|
![]() |
![]() |
![]() |
#9 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
145218 Posts |
![]()
Thanks for the suggestion on optimizing GEC check interval ("blocksize") according to error rate.
I'm working on a detailed case study report, but here is an initial response. Note that the shorter gpuowl GEC interval, computing the check on the cpu more frequently, to reduce gpu throughput impact of large blocks with errors, may cost some mprime/prime95 throughput on the cpu. That is not accounted for in the optimization calculation yet. A Radeon VII gpu is far more powerful than any cpu I have, so that is probably a rather small effect. Also in gpuowl there is some flexibility independently on check interval B and individual GEC block size b. So I think the sqrt(B) does not quite match gpuowl's behavior. If I understand your math, e is per iteration, so (1-e)^B gets closer to 1 (reliable) with smaller B, which helps avoid 3-strikes-and-program-exit. That is an issue on one of my RX550 gpus, sometimes, at B=50000, and lowering B is helping there. GpuOwl GEC error detection count per exponent primality test is stored in the header of each exponent's PRP save file. Mostly. For sufficiently low error rate, there is one error per detection; at high error rate, there might be more than one error per detection, as you mentioned. The error counting is also modal; isolated EE occurrences are all normally counted, but 3 in rapid back to back succession that cause program exit are counted as 1 in the file header, and the known-bad residues are not saved to the file. This 340M exponent/gpu combo is having only isolated occurrences, so the rapid-fire EE situation is not an issue in this case. An error detection, or I think a set of up to 3 such could go entirely uncounted, if the user does CTRL-C twice soon enough to cause the program to terminate immediately without saving its most recent progress. (First CTRL-C goes to the program for orderly termination, second terminates it immediately without saving, third terminates the batch script running the program.) So the counts we would obtain directly from logs or save files are lower bounds for how many GEC error detections occurred in the course of an exponent's PRP test. A similar undercount occurs with iterations. GpuOwl has the ability to automatically adjust GEC check/log interval depending on error occurrence somewhat. There is greater adjustability with the -log option on the command line or in gpuowl's config.txt, but that requires multiples of 10,000 iterations, per the program's help output. Check interval initially was 800 then immediately 200k until errors appeared, then stepped down to 100k after the first error, later 50k after the second, and stayed at 50k since. It's logged up to 9 error detections per day. The past 24 hours there were zero. I think it may recover to higher blocksize if enough time passes after the last error detection. I have been periodically lowering gpu ram clock frequency to try to lower error rate. (Intending to vary error rate, downward.) Maximum gpu clock frequency used in this exponent's run was at least 990 MHz. Since 2021-03-19 20:09 it has been 900. MHz. (All gpuowl log file times are local; in this case, US CDT UTC-0500 after 2am 2021-03-14, US CST UTC-0600 before. Except for result lines' embedded date/time stamps which are UTC.) Some supporting detail follows. Gpuowl PRP file header fields are Code:
>more 340000607.owl OWL PRP 10 340000607 115850000 400 ec48bc26a538efcb 29 Gpuowl help output says in part Code:
2020-09-07 09:43:38 gpuowl v6.11-380-g79ea0cc ... -block <value> : PRP GEC block size, or LL iteration-block size. Must divide 10'000. -log <step> : log every <step> iterations. Multiple of 10'000. -jacobi <step> : (LL-only): do Jacobi check every <step> iterations. Default 1'000'000. Code:
2021-02-23 13:06:07 GpuOwl VERSION v7.2-53-ge27846f ... -block <value> : PRP error-check block size. Must divide 10'000. -log <step> : log every <step> iterations. Multiple of 10'000. Observed GEC interval in the p=340000607 PRP run: Code:
2021-03-14 06:08:00 roa/radeonvii Expected maximum carry32: 89CA0000 2021-03-14 06:08:04 roa/radeonvii OpenCL args "-DEXP=340000607u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DMM_CHAIN=2u -DMM2_CHAIN=3u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xf.b18fc6dd93bcp-4 -DIWEIGHT_STEP_MINUS_1=-0xf.d866d332c56p-5 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2021-03-14 06:08:13 roa/radeonvii OpenCL compilation in 8.93 s 2021-03-14 06:08:16 roa/radeonvii 340000607 OK 18800 loaded: blockSize 400, 05170f8523ce46a4 2021-03-14 06:08:16 roa/radeonvii validating proof residues for power 9 2021-03-14 06:08:16 roa/radeonvii Proof using power 9 2021-03-14 06:08:21 roa/radeonvii 340000607 OK 19600 0.01%; 3939 us/it; ETA 15d 12:02; a41eedf5cb826649 (check 2.33s) 2021-03-14 06:20:21 roa/radeonvii 340000607 OK 200000 0.06%; 3975 us/it; ETA 15d 15:09; 73ddddae5b6e44d8 (check 2.28s) 2021-03-14 06:33:40 roa/radeonvii 340000607 OK 400000 0.12%; 3983 us/it; ETA 15d 15:43; bcae6edea14aff1e (check 2.44s) 2021-03-14 06:46:59 roa/radeonvii 340000607 OK 600000 0.18%; 3984 us/it; ETA 15d 15:34; 294ad9a594748b84 (check 2.47s) 2021-03-14 07:00:19 roa/radeonvii 340000607 OK 800000 0.24%; 3991 us/it; ETA 15d 16:04; 35097de005cc0e41 (check 2.42s) 2021-03-14 07:13:40 roa/radeonvii 340000607 OK 1000000 0.29%; 3990 us/it; ETA 15d 15:41; f9cc27618d78064d (check 2.28s) 2021-03-14 07:27:00 roa/radeonvii 340000607 OK 1200000 0.35%; 3989 us/it; ETA 15d 15:23; 06be4e94037e3e1f (check 2.36s) 2021-03-14 07:40:23 roa/radeonvii 340000607 OK 1400000 0.41%; 4002 us/it; ETA 15d 16:24; ae27bb42be2d5452 (check 2.30s) 2021-03-14 07:53:43 roa/radeonvii 340000607 OK 1600000 0.47%; 3990 us/it; ETA 15d 15:03; 3af49b33d3719483 (check 2.31s) 2021-03-14 08:07:03 roa/radeonvii 340000607 OK 1800000 0.53%; 3987 us/it; ETA 15d 14:31; a92994ab8a700441 (check 2.36s) 2021-03-14 08:20:25 roa/radeonvii 340000607 OK 2000000 0.59%; 4000 us/it; ETA 15d 15:31; 8a517da67fc6dc32 (check 2.34s) 2021-03-14 08:33:45 roa/radeonvii 340000607 EE 2200000 0.65%; 3991 us/it; ETA 15d 14:31; 40593e3281ef501f (check 2.31s) 2021-03-14 08:33:48 roa/radeonvii 340000607 OK 2000000 loaded: blockSize 400, 8a517da67fc6dc32 2021-03-14 08:40:29 roa/radeonvii 340000607 OK 2100000 0.62%; 3984 us/it; ETA 15d 13:58; ce1ebf29fc1bd481 (check 2.30s) 1 errors 2021-03-14 08:47:12 roa/radeonvii 340000607 OK 2200000 0.65%; 4014 us/it; ETA 15d 16:39; d287dbffb45552c5 (check 2.28s) 1 errors ... 2021-03-14 18:51:17 roa/radeonvii 340000607 OK 11200000 3.29%; 4010 us/it; ETA 15d 06:14; a9adcf39425a4344 (check 2.26s) 1 errors 2021-03-14 18:58:00 roa/radeonvii 340000607 EE 11300000 3.32%; 4018 us/it; ETA 15d 06:50; 6170aa84438debf5 (check 2.16s) 1 errors 2021-03-14 18:58:03 roa/radeonvii 340000607 OK 11200000 loaded: blockSize 400, a9adcf39425a4344 2021-03-14 19:01:26 roa/radeonvii 340000607 OK 11250000 3.31%; 4010 us/it; ETA 15d 06:10; aa4dd62179fe524a (check 2.26s) 2 errors 2021-03-14 19:04:49 roa/radeonvii 340000607 OK 11300000 3.32%; 4025 us/it; ETA 15d 07:28; 37f520680880635f (check 2.55s) 2 errors ... 2021-03-19 13:41:09 roa/radeonvii 340000607 OK 110250000 32.43%; 4175 us/it; ETA 11d 02:26; 5cf5fafe08768177 (check 2.37s) 28 errors 2021-03-19 13:44:39 roa/radeonvii 340000607 EE 110300000 32.44%; 4156 us/it; ETA 11d 01:10; bed66bfc267390c1 (check 2.26s) 28 errors 2021-03-19 13:44:42 roa/radeonvii 340000607 OK 110250000 loaded: blockSize 400, 5cf5fafe08768177 2021-03-19 13:48:12 roa/radeonvii 340000607 OK 110300000 32.44%; 4160 us/it; ETA 11d 01:26; 8c75b681d85f1c84 (check 2.46s) 29 errors 2021-03-19 13:51:42 roa/radeonvii 340000607 OK 110350000 32.46%; 4157 us/it; ETA 11d 01:09; 81b74b9d2bb8c032 (check 2.44s) 29 errors ... 2021-03-20 15:28:06 roa/radeonvii 340000607 OK 131350000 38.63%; 4165 us/it; ETA 10d 01:23; 964dae11549f69da (check 2.39s) 29 errors 2021-03-20 15:31:36 roa/radeonvii 340000607 OK 131400000 38.65%; 4166 us/it; ETA 10d 01:24; 0342098bcec3a40d (check 2.38s) 29 errors |
![]() |
![]() |
![]() |
#10 | |
"Robert Gerbicz"
Oct 2005
Hungary
110001000002 Posts |
![]() Quote:
we have q=(1-e)^B probability that in a single block all B iterations is good. You have 2*sqrt(B) iterations to do the in the check per block, and we have p/B blocks in the p iterations. If an event has pr=q probability to pass, then in average you need 1/pr=1/q trials to see this. [for Maths: https://en.wikipedia.org/wiki/Geometric_distribution 's mean]. So in average there'll be p/B*2*sqrt(B)*1/q=p*2/sqrt(B)/q iterations in the check. 1/q-1 is the expected number of rollbacks for a single B block. Hence the total number of rollbacks will be: p/B*(1/q-1), giving the total number of iterations in rollbacks=p*(1/q-1). And the sum of these two terms=p*2/sqrt(B)/q+p*(1/q-1), what we needed. There is one little issue here: you could have also an error in the error checking's iterations, but for "large" B the probability for this is much smaller than in the "main" iterations, because per block we have only 2*sqrt(B) iterations, what is much smaller than B. Actually the number of rollbacks=p/B*(1/q-1) is a monotone decreasing function in B. But we have no problem with that, the task is not to minimize the number of rollbacks, but to minimize the expected number of iterations. What I don't understand with that block=400 and (give a name) superblock=50000 is that in optimal setup shouldn't be B=superblock=block^2 ? Ofcourse handle the last few iterations from the p iterations seperately or go past the p iterations. |
|
![]() |
![]() |
![]() |
#11 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
195116 Posts |
![]()
I've seen it stated elsewhere, by Preda I think, that GEC overhead is 0.2% with b=1000 but 0.5% with b=400, without effect of detected error. For maller b, as in ~sqrt(50000), sqrt(20000), sqrt(10000), overhead would get substantial. Also there are probably programming advantages to having b fixed during a run. B needs to be an integral multiple of b, and b constant during a B-long interval, don't they? Having log entries in multiples of round numbers is more convenient from an end user point of view, than having seemingly random iteration counts appear as the sum of various round number B values plus the occasional 2b at start and stop if b sometimes= 224, 141, or 100. Maybe it's partly esthetics.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
OFFICIAL "SERVER PROBLEMS" THREAD | ewmayer | PrimeNet | 2485 | 2022-05-22 11:44 |
newPGen "Data Execution Prevention" on Windows Server R2 2012 | MisterBitcoin | Software | 4 | 2017-02-21 15:50 |
AMD Announces Industry's First "Supercomputing" Server Graphics Card | ET_ | GPU Computing | 23 | 2013-11-18 17:49 |
Server has been "busy" and/or "unavailable | Grant | Information & Answers | 0 | 2008-01-13 22:45 |
"Archive" server - community input requested | delta_t | PrimeNet | 8 | 2007-03-09 20:24 |