mersenneforum.org  

Go Back   mersenneforum.org > Search Forums

Showing results 1 to 25 of 1000
Search took 0.22 seconds.
Search: Posts Made By: preda
Forum: Software 2020-09-22, 21:09
Replies: 25
Views: 968
Posted By preda
I'm thinking of trying out a SP + DP...

I'm thinking of trying out a SP + DP implementation in gpuowl. If that would achieve a similar effictive performance on Radeon VII, it should be a net gain on Nvidia.

For the twiddles, I would...
Forum: GPU Computing 2020-09-22, 20:16
Replies: 9
Views: 273
Posted By preda
Try to use the rocm-smi script to set the RAM...

Try to use the rocm-smi script to set the RAM frequency, I'm curious whether that works. Something along the lines of:

rocm-smi --setmclk 2
rocm-smi --autorespond y --setmemoverdrive 10

If it...
Forum: GPU Computing 2020-09-22, 09:16
Replies: 9
Views: 273
Posted By preda
How does the ROCm 3.8 performance look like? I...

How does the ROCm 3.8 performance look like? I understand that you can't compare directly because powerplay ain't working anymore..

I opened an issue about powerplay:...
Forum: GPU Computing 2020-09-22, 05:11
Replies: 9
Views: 273
Posted By preda
Does "clinfo" work? Did your system update...

Does "clinfo" work? Did your system update recently? upgraded something?

Are you running ROCm? did ROCm update?
Forum: Software 2020-09-22, 03:41
Replies: 25
Views: 968
Posted By preda
George, how did you work out the number of usable...

George, how did you work out the number of usable bits? (14, 38, 62)

I'm a bit surprised by the big jump from double-SP (14) to triple-SP (38), is that correct?
Forum: Software 2020-09-20, 23:13
Replies: 25
Views: 968
Posted By preda
About the "twiddles", they can be computed in...

About the "twiddles", they can be computed in double-SP this way:

The hardware (GPU) provides v. fast but poor-accuracy SP sin/cos. Precompute a SP table with the difference between the "ideal"...
Forum: Software 2020-09-20, 23:07
Replies: 25
Views: 968
Posted By preda
If the above is correct, it means that a...

If the above is correct, it means that a double-MUL is 3xMUL+1xADD, and a double-ADD is 4xADD, which is rather efficient. (but I don't know how bad is the double-ADD approximation).
Forum: Software 2020-09-20, 22:38
Replies: 25
Views: 968
Posted By preda
My takeaway from the above paper is: ...

My takeaway from the above paper is:

double-SP multiplication is fast when FMA is available:
we represent a value "x" by a pair of SP (a,b) such that x=a+b, and "a" much larger than "b".
Then...
Forum: Software 2020-09-20, 10:39
Replies: 25
Views: 968
Posted By preda
Here's an article I found on the topic, I still...

Here's an article I found on the topic, I still have to read it carefully http://www.andrewthall.com/papers/df64_qf128.pdf
Forum: Software 2020-09-19, 21:06
Replies: 25
Views: 968
Posted By preda
My intuition about using SP in the classic FFT...

My intuition about using SP in the classic FFT way is: the twiddles are too small.

The twiddles (trigonometric values used in the FFT), if represented as SP, do not have enough precision for the...
Forum: Software 2020-09-19, 21:03
Replies: 25
Views: 968
Posted By preda
Were you using a form of fixed-point to represent...

Were you using a form of fixed-point to represent the "float without exponent" as ints?
Forum: Software 2020-09-19, 09:11
Replies: 25
Views: 968
Posted By preda
Yes. But a number of INT32 is used anyway to do...

Yes. But a number of INT32 is used anyway to do pointer arithmetic, conditional expression, etc; so it's not like when doing a "pure SP" FFT the INT32 is idle.
Forum: Software 2020-09-19, 03:14
Replies: 25
Views: 968
Posted By preda
Some previous discussion: ...

Some previous discussion:
https://www.mersenneforum.org/showthread.php?t=23926
Forum: Software 2020-09-18, 22:48
Replies: 25
Views: 968
Posted By preda
AKA "The Holy Grail" :)

AKA "The Holy Grail" :)
Forum: GpuOwl 2020-09-18, 21:51
Replies: 56
Views: 4,675
Posted By preda
maxAlloc is in Megabytes, so 100'000 indicates...

maxAlloc is in Megabytes, so 100'000 indicates 100GB.

Maybe you should start with a conservativelly small value, such as 3000 or 7000, if you expect GPUs with at least 4GB or at least 8GB of RAM....
Forum: Software 2020-09-18, 21:46
Replies: 25
Views: 968
Posted By preda
The future is 24bit

In light of Nvidia's new GPU launch, it appears we need to find a way of doing big convolutions using SP FP (FP32). This has been an elusive task in the past.

That new GPU has 2x FP32 vs. INT32,...
Forum: GpuOwl 2020-09-13, 23:07
Replies: 2,472
Views: 146,369
Posted By preda
I don't know. I verified that the residue you see...

I don't know. I verified that the residue you see at #800 (281087c3716953d2) is correct (I get the same), so the error is affecting the check or something around it, not the core computation. Can you...
Forum: Marin's Mersenne-aries 2020-09-12, 22:03
Replies: 8
Views: 417
Posted By preda
You should also mention the factored-to (TF)...

You should also mention the factored-to (TF) value you used, as it affects the probabilities.

mprime uses BS (Brent-Suyama extension) (when E>2, usually E==6), which may also bring a slight...
Forum: GpuOwl 2020-09-10, 21:45
Replies: 58
Views: 1,993
Posted By preda
try -log <N> can be put in...

try

-log <N>

can be put in config.txt
Forum: GpuOwl 2020-09-10, 21:43
Replies: 58
Views: 1,993
Posted By preda
Still the core problem above is *why* did the...

Still the core problem above is *why* did the write of the residue fail (and produce a checksum mismatch), thus not allowing to use the power=8. The "fallback" of proof power, with its imperfections,...
Forum: GpuOwl 2020-09-10, 08:56
Replies: 58
Views: 1,993
Posted By preda
Sorry I'm a bit late to this discussion, but yes...

Sorry I'm a bit late to this discussion, but yes I confirm GpuOwl does not handle PRP-CF. It's a mishap that the assignment form for a PRP-CF is accepted instead of being clearly rejected outright,...
Forum: GpuOwl 2020-09-10, 08:46
Replies: 2,472
Views: 146,369
Posted By preda
You might try a lower -block value, something...

You might try a lower -block value, something like 200 or 100 or 50. This should lower the number of kernels that are queued at once to the GPU, but it has the side effect of reducing a bit the...
Forum: GpuOwl 2020-09-08, 00:04
Replies: 58
Views: 1,993
Posted By preda
Do you have the proof file? Would be in a folder...

Do you have the proof file? Would be in a folder named "uploaded", in pool/ if you use -pool or in the run directory of gpuowl otherwise, and is named something like 10496897-8.proof . If you have...
Forum: GpuOwl 2020-09-07, 11:28
Replies: 58
Views: 1,993
Posted By preda
Interesting. It looks as if that file was not...

Interesting. It looks as if that file was not written correctly, or corrupted on disk. When trying to use it for proof generation, the checksum mismatch was discovered. On restart, it tries first the...
Forum: GpuOwl 2020-09-07, 06:34
Replies: 56
Views: 4,675
Posted By preda
Moebious' post #37...

Moebious' post #37 https://mersenneforum.org/showpost.php?p=555557&postcount=37 indicates that he got it to work, yes. The problem likely was something to do with the generated files...
Showing results 1 to 25 of 1000

 
All times are UTC. The time now is 06:13.

Wed Sep 23 06:13:34 UTC 2020 up 13 days, 3:24, 0 users, load averages: 1.74, 1.67, 1.67

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.