Thread: Prime95 sorcery
View Single Post
Old 2022-05-28, 16:40   #5
jtravers
 
May 2022

112 Posts
Default

Quote:
Originally Posted by axn View Post
We store about 18 bits per IEEE fp64, so it is more lime 6-7M. But I guess that just strengthens your case.
True

Quote:
Originally Posted by axn View Post
Modern processors can do 8-16 floating point ops/cycles (including muls & adds). Data movement latency can be hidden with enough compute operations. So the 10 cycles/kernel might be more like 0.2 cycles.
I think this gets to the crux of it - it is a long time since I looked into processor architecture and assumed that 1 operation per core per cycle was standard. This would account for the large discrepancy.

Quote:
Originally Posted by axn View Post
These are all O(n) which is a smaller component. And also, no explicit modulo due to IBDWT.
Accepted

Quote:
Originally Posted by axn View Post
George (or Ernst) could give you the actual details (as opposed to just superficial knowledge that I have).
Thanks but I think that your "superficial knowledge" was more than adequate. I should work on bringing mine up to that level
jtravers is offline   Reply With Quote