mersenneforum.org  

Go Back   mersenneforum.org > Search Forums

Showing results 1 to 25 of 1000
Search took 0.13 seconds.
Search: Posts Made By: preda
Forum: Software 2021-09-24, 08:31
Replies: 7
Views: 709
Posted By preda
In my understanding, VIRT in "top" for a process...

In my understanding, VIRT in "top" for a process indicates "virtual" memory that is not mapped to physical memory. That would happen for example after a malloc() but before writing anything to the...
Forum: Lounge 2021-09-06, 19:12
Replies: 1,787
RIP
Views: 183,951
Posted By preda
Ivan Patzaichin

Ivan Patzaichin was a legendary Romanian canoeist: https://en.wikipedia.org/wiki/Ivan_Patzaichin
Forum: Math 2021-07-06, 17:27
Replies: 24
Views: 2,645
Posted By preda
Maybe related, please see the thread about what I...

Maybe related, please see the thread about what I coined "PRP-1": https://www.mersenneforum.org/showthread.php?t=23628
Forum: mersenne.ca 2021-06-14, 19:20
Replies: 16
Views: 1,390
Posted By preda
Yes AFAIK the new formula is correct. ...

Yes AFAIK the new formula is correct.

fancySum(a, b) = a + b * (1 - a) == a + b - a * b
Forum: mersenne.ca 2021-06-14, 06:03
Replies: 16
Views: 1,390
Posted By preda
Let's consider a story: on a dangerous trip,...

Let's consider a story: on a dangerous trip, somebody must first cross a lake, and afterwards the forest. In the lake there's an aligator that would eat him with 90% chances. In the unlikely event...
Forum: Software 2021-06-02, 13:58
Replies: 54
Views: 10,600
Posted By preda
There's one more interesting factoid about the...

There's one more interesting factoid about the difference between "classic" FFT and NTT:

when working with complex numbers in the classic FFT, the inverse transform is equal to the conjugate of...
Forum: Software 2021-06-02, 13:33
Replies: 54
Views: 10,600
Posted By preda
In the same setup (FFT 4M, Radeon VII, exponent...

In the same setup (FFT 4M, Radeon VII, exponent around 107M) I gained about 33% performance by tweaking (with inline assembly) the low-level modular primitives (add, sub, mul). So now the performance...
Forum: Software 2021-05-27, 07:03
Replies: 54
Views: 10,600
Posted By preda
I was testing on a Radeon VII.

I was testing on a Radeon VII.
Forum: Software 2021-05-26, 20:33
Replies: 54
Views: 10,600
Posted By preda
I did some preliminary performance measurements,...

I did some preliminary performance measurements, and the initial results are a bit dissapointing -- the NTT being about 3x slower than the equivalent FP64. It seems the code is compute-bound now (vs....
Forum: Software 2021-05-25, 06:50
Replies: 54
Views: 10,600
Posted By preda
Yes this is one option I considered. It's tricky...

Yes this is one option I considered. It's tricky to do the "shift every (couple of) FFT levels" well, because if it's done the conservative way (shift every level) it's wasteful on the precision...
Forum: Software 2021-05-25, 06:41
Replies: 54
Views: 10,600
Posted By preda
There is a difference of 3 bits between M61 and...

There is a difference of 3 bits between M61 and the above 64-bit prime, which explains 3/2=1.5 bits of the difference.

Another important element that enables 25bits is that now I'm using...
Forum: Software 2021-05-25, 00:49
Replies: 54
Views: 10,600
Posted By preda
Lately I've been experimenting with some non-FP64...

Lately I've been experimenting with some non-FP64 FFT transforms.

These are briefly some directions I've looked into, for representing the values that the FFT operates on:

1. a set of 4 SP...
Forum: Math 2021-05-17, 12:17
Replies: 2
Views: 750
Posted By preda
On the GPU, we are limited by the small number of...

On the GPU, we are limited by the small number of "VGPRs" (registers) per workgroup that are available. Because we're operating at the upper limit of VGPRs, there's no much room to operate on two...
Forum: GpuOwl 2021-05-13, 17:54
Replies: 4
Views: 550
Posted By preda
It's fine to run without a config.txt if you...

It's fine to run without a config.txt if you don't need it. It's just a facility to put the flags that you would otherwise pass on the command line, in a file. The format is exactly what you'd put on...
Forum: Hardware 2021-04-29, 10:05
Replies: 77
Views: 11,148
Posted By preda
TF (trial factoring) does not use FFTs. For...

TF (trial factoring) does not use FFTs.

For primality testing Mersenne numbers, there is LL and PRP; the two are very similar from an implementation perspective. They both require squaring very...
Forum: Software 2021-04-27, 11:20
Replies: 54
Views: 10,600
Posted By preda
The cost of small multiplication

(sorry for the below being so trivial)

At the core of a FFT there are "small multiplications", word-size or some small multiple of word-size, and I've been thinking a bit about their cost.
...
Forum: Software 2021-04-27, 10:20
Replies: 54
Views: 10,600
Posted By preda
Some interesting threads on the topic: ...

Some interesting threads on the topic:

https://www.mersenneforum.org/showthread.php?t=19486
https://www.mersenneforum.org/showthread.php?t=22622
Forum: Software 2021-04-23, 06:51
Replies: 52
Views: 8,895
Posted By preda
Thank you for the explanation of Shoup's mul-mod....

Thank you for the explanation of Shoup's mul-mod.
In GCN (AMD GPU), there is a 32-bit mul_hi instruction, but there is no 64-bit mul_hi. "emulating" the 64-bit mul_hi is slow, almost as slow as the...
Forum: Software 2021-04-18, 18:12
Replies: 52
Views: 8,895
Posted By preda
So, does this mean that p=M31 has all the...

So, does this mean that p=M31 has all the required roots-of-two for the IBDWT for Z/pZ NTT? so, is NTT(M31) a viable alternative to FGT?

Or, the problem with M31 is that it doesn't have the...
Forum: GPU Computing 2021-04-15, 17:48
Replies: 2
Views: 794
Posted By preda
From what I understand, OpenCL 3.0 is closer to...

From what I understand, OpenCL 3.0 is closer to OpenCL 1.x than to OpenCL 2.0. I.e. 3.0 is not "more" than 2.0, but instead it reduces the mandatory feature-set to the level of 1.x and offers...
Forum: GpuOwl 2021-04-03, 17:12
Replies: 16
Views: 2,018
Posted By preda
Nice, I like it! The owl has a foxy look :)

Nice, I like it! The owl has a foxy look :)
Forum: Information & Answers 2021-03-31, 11:55
Replies: 7
Views: 841
Posted By preda
I guess it's because of the "chmod 777...

I guess it's because of the "chmod 777 expand.py". Do you need that? (expand.py already has rights 775)
Forum: Hardware 2021-03-29, 07:43
Replies: 16
Views: 1,486
Posted By preda
The need for the general-MUL vs. MUL-3 only...

The need for the general-MUL vs. MUL-3 only appears when changing the "L" step dinamically during a test. This is something GpuOwl does not support (and thus gets away with using MUL-3), but prime95...
Forum: GpuOwl 2021-03-28, 18:58
Replies: 82
Views: 14,781
Posted By preda
The multiplication time was excessive before the...

The multiplication time was excessive before the restart. One possible cause would be the GPU RAM becoming over-allocated for some reason, which would slow everything down a lot.

If you catch it...
Forum: Hardware 2021-03-21, 20:02
Replies: 16
Views: 1,486
Posted By preda
A small advantage of "b" being fixed is that, in...

A small advantage of "b" being fixed is that, in the GEC verification, we have a multiplication by "3" (the PRP base). When "b" is variable, this multiplication must be changed to a general...
Showing results 1 to 25 of 1000

 
All times are UTC. The time now is 19:40.


Tue Oct 19 19:40:44 UTC 2021 up 88 days, 14:09, 0 users, load averages: 2.28, 1.59, 1.61

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.