View Single Post
Old 2020-07-05, 22:44   #8
retina's Avatar
"The unspeakable one"
Jun 2006
My evil lair

22·17·97 Posts

Originally Posted by preda View Post
If using FP, we'd like to maximize the mantissa size. One way would be to reduce the number of bits in the exponent (to have a wider mantissa). Another would be to use larger FP than DP. (e.g. 80bit or 128bit FP)

Next, what would be a good elementary operation (a basic building block) for computing large FP-FFTs. For example, what we have right now (on CPUs/GPUs) is FMA ("fused multiply add") which is generally useful but not particularly great for FFTs (especially in the "high register pressure" context of the GPUs)

I was thinking of having some giant "twiddle OP":
twiddle(A,B,C): return (A*B+C, A*B-C)

where A,B,C are complex values; such an OP may be great for FFTs.
Sure, you could do all that. But at 130nm and 10mm x 10mm you would be doing well to fit just one multiply unit on the chip.

And what RAM interface would you use? With no space for internal caches you will need an awesome RAM interface.

I think this open source chip will be useless for anything that needs high performance with large numbers.

Last fiddled with by retina on 2020-07-05 at 22:44
retina is offline   Reply With Quote