![]() |
![]() |
#1 |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
37×163 Posts |
![]()
This I probably too restricted to be useful to the forum but it sounds interesting.
https://www.theregister.com/AMP/2020...chip_hardware/ |
![]() |
![]() |
![]() |
#2 |
Dec 2012
The Netherlands
111000010002 Posts |
![]()
So what is the commercial benefit for them?
|
![]() |
![]() |
![]() |
#3 |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
37·163 Posts |
![]() |
![]() |
![]() |
![]() |
#4 |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2×32×7×53 Posts |
![]() |
![]() |
![]() |
![]() |
#5 | |
∂2ω=0
Sep 2002
República de California
5×2,351 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#6 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
23×1,361 Posts |
![]() |
![]() |
![]() |
![]() |
#7 |
"Mihai Preda"
Apr 2015
11·131 Posts |
![]()
My dream would be a processor (hardware) that is designed to compute large convolutions, i.e. squaring giant numbers. I wonder how much faster a specialized design could be compared to the "general purpose" CPUs (including GPUs) we use now.
For example, such a design could either go down the established floating-point-FFT route, or could have fast specialized integer units for NTTs (the problem with NTTs right now is that the current CPUs/GPUs are much faster at FP, so FP-FFT wins). If using FP, we'd like to maximize the mantissa size. One way would be to reduce the number of bits in the exponent (to have a wider mantissa). Another would be to use larger FP than DP. (e.g. 80bit or 128bit FP) Next, what would be a good elementary operation (a basic building block) for computing large FP-FFTs. For example, what we have right now (on CPUs/GPUs) is FMA ("fused multiply add") which is generally useful but not particularly great for FFTs (especially in the "high register pressure" context of the GPUs) I was thinking of having some giant "twiddle OP": twiddle(A,B,C): return (A*B+C, A*B-C) where A,B,C are complex values; such an OP may be great for FFTs. Last fiddled with by preda on 2020-07-05 at 22:34 |
![]() |
![]() |
![]() |
#8 | |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
150268 Posts |
![]() Quote:
And what RAM interface would you use? With no space for internal caches you will need an awesome RAM interface. I think this open source chip will be useless for anything that needs high performance with large numbers. Last fiddled with by retina on 2020-07-05 at 22:44 |
|
![]() |
![]() |
![]() |
#9 | |
∂2ω=0
Sep 2002
República de California
5×2,351 Posts |
![]()
Specialized traansform-doing hardware for DSPs is widespread, the problem for us is that the precision and convo-size needs of the mobile-telecoms industry are rather different than ours.
Quote:
An even bigger "why no such hardware instruction?" for me is complex multiply - I recall having a huge "WTF?" moment when I first saw the x86 SSE2 instruction set specification, seeing instantly how potentially useful it was for the scientific computing community, and seeing how Intel/AMD apparently completely disregarded the needs of said community in their instruction set design. And w.r.to CMUL, all these years later, they still omit it. "Guys, we'd be OK if there were such a SIMD instruction and the latency was high, just give us enough registers to be able to hide the latency and we'll be in Happyville." Intel et al are fabulous (ha, made a punny on 'fabless') when it comes to hardware, but absolute shit w.r.to instruction set design. Having worked with both the old DEC Alpha ISA and the current ARMv8 one, the contrast with Intel's fumbling-in-the-dark long an painful road from MMX to SSE and beyond is massive. AVX-512 is actually halfway decent despite the lack of instructions like vector both-halves-of-128-bit product and CMUL, but it took them, what, 20 years to get there? A specialized form of CMUL in which one operand is a root of unity would be really useful for convolutions - if there were some digital magic by which one could cheaply interconvert between Cartesian and polar form for a complex number, one could do such a twidde mul by converting the 2 inputs to polar form and doing 1 real add of the 2 angles, then back to (x,y) representation. Last fiddled with by ewmayer on 2020-07-05 at 22:58 |
|
![]() |
![]() |
![]() |
#10 | |
"Mihai Preda"
Apr 2015
11×131 Posts |
![]() Quote:
Do you have some links to your posts or threads discussing the instructions? (if not too much trouble) Last fiddled with by preda on 2020-07-06 at 00:24 |
|
![]() |
![]() |
![]() |
#11 | |
"Bob Silverman"
Nov 2003
North of Boston
2×33×139 Posts |
![]() Quote:
registers. The board could do 512 x 512 bit integer multiples and 1024/512 bit divides with remainder in just a few cycles. 1024 bit Add/subt took two cycles. [one to handle the carries]. It had a small instruction set. I was part of a team that designed such a board in the late 1980's when I was at MITRE. it could do a 512 x 512 bit Montgomery multiply in 7 cycles using Texas Instrument 32x32 bit signal processing chips in parallel (with Karatcuba). Division with remainder took 10 cycles. The clock rate was slow [only 10 KHz] because the board was built with prototype wirewrap. [proof of concept] I expect that the register size could be increased to at least 2048 with modern hardware. The board was designed to do very fast public key crypto operations. Its instruction set was small but included, e.g. register bit count as well as msb and lsb. It supported a full set of arithmetic and logical operations. One would load up the registers and do all computations within the registers until the final answer was moved off board. It had a very small instruction cache: 4K Last fiddled with by R.D. Silverman on 2020-07-06 at 00:44 Reason: pagination |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Primenet web design | Madpoo | PrimeNet | 603 | 2023-01-22 14:38 |
Strongest chesscomputer of the world (open source) | NormanRKN | Chess | 4 | 2021-01-05 01:36 |
Database design | xilman | Astronomy | 1 | 2017-04-30 22:25 |
new intel design | tha | Hardware | 5 | 2007-04-19 11:38 |
DRM, the end of open source, "grass roots", and creativity? | E_tron | Soap Box | 1 | 2005-08-18 09:45 |