20110203, 00:14  #1 
Bemusing Prompter
"Danny"
Dec 2002
California
100101111000_{2} Posts 
translating double to single precision?
I know that it's generally very hard to efficiently convert single to double precision, except in the case of nVidia's highend Fermi GPUs. But what about the other way around? For example, would 100 GFLOPS of double precision easily convert to 200 GFLOPS of single precision?
Sorry if this is a dumb question. 
20110203, 00:28  #2 
"Mark"
Apr 2003
Between here and the
29×223 Posts 
Simply put, no. In other words, An FPU cannot do twice as much SP work in the same amount of time as DP work just because the size of the variables it is working with are half the size. One of the reasons is that the FP registers are 64 bits and can only hold one value. You can't put two 32bit values in a 64bit FP register. WRT vector programming, that doesn't apply.
Last fiddled with by rogue on 20110203 at 00:29 
20110204, 06:26  #3 
Bemusing Prompter
"Danny"
Dec 2002
California
2^{3}·3·101 Posts 
Yeah, I was thinking of vector processing. I do know that Intel's "Sandy Bridge" chips are supposed to be up to twice as fast as those of the previous generation due to the use of 256bit registers. Strangely, the FLOPS numbers of the newly released chips do not reflect this, but then again, FLOPS are not the only means of measuring a processor's performance.

20120911, 23:55  #4 
Bemusing Prompter
"Danny"
Dec 2002
California
100101111000_{2} Posts 
Sorry for bumping such an old thread, but some of the slides from IDF 2012 show that Intel's newer chips are able to do twice as many SP FLOPS as DP FLOPS per clock cycle. Interesting.
Last fiddled with by ixfd64 on 20120911 at 23:56 Reason: missing "to" 
20120912, 01:13  #5  
∂^{2}ω=0
Sep 2002
República de California
10110110011100_{2} Posts 
Quote:


20120912, 05:10  #6 
Romulan Interpreter
"name field"
Jun 2011
Thailand
2^{4}×613 Posts 
That for sure. Win for SP would be for (about) 9 to 12 SPFlops per DPFlop. Think about a very simple example: multiplying two DPFloat numbers A*f+B and C*f+D, where f is the size of a SPFloat, you need 4 SPFloats to store them and you need 4 SPFlops to multiply them (or 3 with Karatsuba, with some overload of additions and subtractions). If you can multiply the two DPFloats in a single flop, then you are 4 times faster already. Add this to the ability to store larger numbers (when you do carry propagation) and/or more accurate/higher precision and you see that 2 times (even 4 times) faster SPFlops is not enough to beat DP.
Another example, think to very fast video cards, which can get almost 2 TeraFlops of SP, but only 300400 GigaFlops of DP (56 times less). If "times 4" or "times 5" would be enough, why the manufacturers don't use (micro)programming to do a DPFlop with 4 SPflops? 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
does halfprecision have any use for GIMPS?  ixfd64  GPU Computing  9  20170805 22:12 
Fast double precision Division  __HRB__  Programming  21  20120110 02:10 
so what GIMPS work can single precision do?  ixfd64  Hardware  21  20071016 03:32 
Double precision GPUs coming very soon  dsouza123  Hardware  4  20071015 02:20 
double precision in LL tests  drew  Software  4  20060808 04:08 