mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2021-01-05, 20:11   #100
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

22·32·11 Posts
Default

Quote:
Originally Posted by preda View Post
The cache (L1/L2/L3) is used transparently for the *global* memory operations. It is managed automatically by the cache control (probably a variant of LRU), not explicitly by the software. So yes, GpuOwl should benefit from L3 without code changes.

Separate from the caches there is the *local* memory (LDS), which is managed explicitly by the software.
Thanks!
Viliam Furik is online now   Reply With Quote
Old 2021-01-07, 17:13   #101
moebius
 
moebius's Avatar
 
Jul 2009
Germany

547 Posts
Default

Sad but true, the "best buck for the bang at PRP" at my location is the ASUS Radeon™ RX Vega 56 ROG Strix OC 8GB for 272€ (334 US-$) at Saturn/Media Markt, since the big brother, the Vega 64 and the Radeon VII, are practically no longer available. I don't have any specific values ​​for the card, but they should be slightly below Vega 64. But that is also due to various people who should actually be shot dead, like this miner Four times the price for the current generation graphic cards with only twice the performance is a cheekiness to say the least.
https://pbs.twimg.com/card_img/13461...jpg&name=small
moebius is offline   Reply With Quote
Old 2021-01-07, 18:38   #102
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

769 Posts
Default

The person that did the 6900XT benchmarks is looking to sell it or a reference 6900XT for MSRP (they just acquired a reference model and haven't decided which one to keep yet). They're from the US so are looking for a US buyer. If you're interested PM me and I'll get you in contact. I met them a month ago by answering a forum question about rocm, I can't vouch for them (beyond seeming genuine) any more than I can vouch for most of you, so it's at your own risk. I can supply all correspondence with them if you're interested and let you make up your own mind.


https://forum.level1techs.com/t/big-...port/166054/10
M344587487 is online now   Reply With Quote
Old 2021-01-07, 20:03   #103
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

174608 Posts
Default

RX 6800
Code:
2021-01-07 13:33:05 gfx1030-0 OpenCL compilation in 2.54 s
2021-01-07 13:33:05 gfx1030-0 77936867 OK        0 loaded: blockSize 400, 0000000000000003
2021-01-07 13:33:06 gfx1030-0 77936867 OK      800   0.00%;  871 us/it; ETA 0d 18:51; 1579c241dc63eca6 (check 0.39s)
2021-01-07 13:36:00 gfx1030-0 77936867 OK   200000   0.26%;  870 us/it; ETA 0d 18:47; f0b04b45b0855bd2 (check 0.39s)
2021-01-07 13:38:55 gfx1030-0 77936867 OK   400000   0.51%;  873 us/it; ETA 0d 18:48; c03f94396a5aa29e (check 0.39s)
2021-01-07 13:41:49 gfx1030-0 77936867 OK   600000   0.77%;  869 us/it; ETA 0d 18:40; b9decd65ca71b629 (check 0.38s)
2021-01-07 13:44:43 gfx1030-0 77936867 OK   800000   1.03%;  866 us/it; ETA 0d 18:33; 21ebf3636148f663 (check 0.41s)
2021-01-07 13:47:35 gfx1030-0 77936867 OK  1000000   1.28%;  862 us/it; ETA 0d 18:25; 9bf9d9e6bff4286e (check 0.39s)
2021-01-07 13:50:29 gfx1030-0 77936867 OK  1200000   1.54%;  868 us/it; ETA 0d 18:30; da42731132f00140 (check 0.38s)
2021-01-07 13:53:23 gfx1030-0 77936867 OK  1400000   1.80%;  864 us/it; ETA 0d 18:22; 56d6868b11382d6c (check 0.38s)
Xyzzy is offline   Reply With Quote
Old 2021-01-07, 21:51   #104
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

769 Posts
Default

Nice. Do you mind doing a benchmark spanning every FFT? It might give us an idea of how infinity cache scales. If you can run bash scripts with WSL then this might do the trick executed from the gpuowl directory:
Code:
#!/bin/bash
for W in 256 512 1024 4096; do
    for H in 256 512 1024; do
        for M in 1 3 4 5 6 7 8 9 10 11 12 13 14 15; do
            if [ $M -gt 1 ] || [ $((W*H)) -lt $((512*512)) ]; then
                E=$((W*M*H*2*6+1001))
                ./gpuowl -prp $E -fft $W:$M:$H -nospin -iters 10000 >>benchfft.log
                rm -r $E
            fi
        done
    done
done
Alternatively, attached is an unrolled batch file that calls gpuowl.exe and also needs to be executed from the gpuowl directory. It has the benefit of being sorted ascending by exponent (if you don't want to do the very large tests they can be deleted, no idea how long the very large FFT's take but at only 10k iterations everything else shouldn't take too long) but with the drawback of being completely untested (instead of just mostly untested like the bash script is) and it does no cleanup.
Attached Files
File Type: zip bench.zip (1.3 KB, 20 views)
M344587487 is online now   Reply With Quote
Old 2021-01-08, 00:14   #105
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

18C16 Posts
Default

Two instance benchmarks would be great, too. Possibly for all FFTs.
Viliam Furik is online now   Reply With Quote
Old 2021-01-08, 01:04   #106
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

24·499 Posts
Default

We will look into running the benchmark.

We have two of these cards that we bought. Both were (up until today) factory sealed.

If someone wants the second card drop us a PM.
Xyzzy is offline   Reply With Quote
Old 2021-01-09, 02:53   #107
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

22·32·11 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We will look into running the benchmark.
Could you please run some mfakto, too? Possibly some 2M range exponent from 67 to 68 bits, or similar (I have observed peak performance for my 2080Ti in this range).
Viliam Furik is online now   Reply With Quote
Old 2021-01-09, 13:53   #108
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

24×499 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
Could you please run some mfakto, too? Possibly some 2M range exponent from 67 to 68 bits, or similar (I have observed peak performance for my 2080Ti in this range).
Do you have a worktodo.txt file of the ranges to be tested?
Xyzzy is offline   Reply With Quote
Old 2021-01-09, 18:36   #109
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

22·32·11 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
Do you have a worktodo.txt file of the ranges to be tested?
I thought you'll just pick some exponent, but I can give you one, too.

Code:
Factor=2000003,67,68
One exponent is enough for me. Just make sure that the GPU is not doing any heavy work besides the mfakto.
Viliam Furik is online now   Reply With Quote
Old 2021-01-10, 01:54   #110
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

24×499 Posts
Default

Code:
mfakto 0.15pre7-MGW (64bit build)


Runtime options
  Inifile                   mfakto.ini
  Verbosity                 1
  SieveOnGPU                yes
  MoreClasses               yes
  GPUSievePrimes            81157
  GPUSieveProcessSize       24 Kib
  GPUSieveSize              96 Mib
  FlushInterval             0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300 s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 compact
  V5UserID                  none
  ComputerID                none
  TimeStampInResults        yes
  VectorSize                2
  GPUType                   AUTO
  SmallExp                  no
  UseBinfile                mfakto_Kernels.elf
Compiletime options

Select device - Get device info:
WARNING: Unknown GPU name, assuming GCN. Please post the device name "gfx1030 (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.

OpenCL device info
  name                      gfx1030 (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 2.0 AMD-APP (3188.4) (3188.4 (PAL,LC))
  maximum threads per block 1024
  maximum threads per grid  1073741824
  number of multiprocessors 30 (1920 compute elements)
  clock rate                1815 MHz

Automatic parameters
  threads per grid          0
  optimizing kernels for    GCN

Loading binary kernel file mfakto_Kernels.elf
Compiling kernels.
  GPUSievePrimes (adjusted) 81206
  GPUsieve minimum exponent 1037054
Started a simple selftest ...
Selftest statistics
  number of tests           30
  successful tests          30

selftest PASSED!

got assignment: exp=2000003 bit_min=67 bit_max=68 (14.95 GHz-days)
Starting trial factoring M2000003 from 2^67 to 2^68 (14.95 GHz-days)
Using GPU kernel "cl_barrett15_69_gs_2"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 09 19:52 | 4617 100.0% |  0.516   0m00s |   2606.77    81206    0.00%
no factor for M2000003 from 2^67 to 2^68 [mfakto 0.15pre7-MGW cl_barrett15_69_gs_2]
tf(): total time spent:  8m  5.565s (2659.35 GHz-days / day)

ERROR: get_next_assignment(): no valid assignment found in "worktodo.txt"
Xyzzy is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Navi (RX 5700, RX 5700XT) M344587487 GPU Computing 29 2019-11-28 14:00

All times are UTC. The time now is 10:17.

Mon Mar 8 10:17:12 UTC 2021 up 95 days, 6:28, 0 users, load averages: 1.39, 1.28, 1.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.