mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2011-10-23, 07:29   #1
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

1001010010002 Posts
Default Haswell New Instructions / AVX2

While George hasn't yet released a version of Prime95 that officially supports AVX, preliminary benchmarks have shown significant increases in performance. Now, Intel promises that its Haswell microarchitecture (which probably won't hit the market until 2013) will support AVX2, which extends most existing instructions to 256 bits. My question is, will AVX2 offer additional benefits compared to AVX?
ixfd64 is online now   Reply With Quote
Old 2011-10-23, 14:39   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11×673 Posts
Default

256-bit integer AVX will be useful for TF.

Fused multiply add will either useful or very, very useful depending on how it is implemented.

The new gather/permute instructions may be somewhat useful.
Prime95 is online now   Reply With Quote
Old 2011-10-24, 01:27   #3
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Quote:
Originally Posted by Prime95 View Post
256-bit integer AVX will be useful for TF.

Fused multiply add will either useful or very, very useful depending on how it is implemented.

The new gather/permute instructions may be somewhat useful.
Tasty. We talking 5-10%, or 15-20+%?
Dubslow is offline   Reply With Quote
Old 2011-10-24, 02:31   #4
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11100111010112 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Tasty. We talking 5-10%, or 15-20+%?
A complete shot in the dark: nearly double the TF speed. For LL testing, FMA and the scatter/gather possible improvements, maybe 5%.

Preliminary AVX testing shows 20+% speed improvement (comparing 32-bit AVX executable to 64-bit version 26).

Last fiddled with by Prime95 on 2011-10-24 at 02:32
Prime95 is online now   Reply With Quote
Old 2011-10-24, 03:52   #5
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

179510 Posts
Default

Yeah, but I think my low-end GPU smokes even the 2x improvement in TF speed....unless P95 is talking about sieving, in which case it will make a *big* difference....
Christenson is offline   Reply With Quote
Old 2011-10-24, 04:22   #6
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

23·33·11 Posts
Default

Pfft, I was expecting something like 2x or 4x because FMA and AVX are both supposed to double the FLOPS performance. Wishful thinking, I guess. :P
ixfd64 is online now   Reply With Quote
Old 2011-10-24, 07:49   #7
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Preliminary AVX testing shows 20+% speed improvement (comparing 32-bit AVX executable to 64-bit version 26).
That's 20%+ in TF, not LL? With the whole GPU thing, I'm not too worried about TF.
Dubslow is offline   Reply With Quote
Old 2011-10-24, 08:14   #8
axn
 
axn's Avatar
 
Jun 2003

132×29 Posts
Default

Quote:
Originally Posted by Dubslow View Post
That's 20%+ in TF, not LL? With the whole GPU thing, I'm not too worried about TF.
Me thinks that is for LL. And that George's numbers are for improvement of AVX2 over AVX.
axn is online now   Reply With Quote
Old 2011-10-24, 11:28   #9
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5×359 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
Pfft, I was expecting something like 2x or 4x because FMA and AVX are both supposed to double the FLOPS performance. Wishful thinking, I guess. :P
That's probably because there's only so much bandwidth to main memory...and P95 saturates it.
Christenson is offline   Reply With Quote
Old 2011-10-24, 20:20   #10
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by axn View Post
Me thinks that is for LL. And that George's numbers are for improvement of AVX2 over AVX.
I hope so. That would also imply a pretty decent improvement with the switch to AVX, whenever that comes.
Dubslow is offline   Reply With Quote
Old 2011-11-12, 06:33   #11
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

237610 Posts
Default

It looks like Haswell may not be as "glorious" as originally planned. According to recent articles, Haswell chips will max out at quad-core for consumers, which is a far cry from the original "8 cores by default" claim. None of the articles mention anything about FMA or vector coprocessors, either. But then again, it's well over a year from being released, and many things could change during this time.
ixfd64 is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sandy Bridge vs Haswell AVX2 L-L Completion Times danmur Information & Answers 16 2016-12-14 15:09
AVX2 weirdness bsquared Programming 1 2016-01-17 17:26
Bignum arithmetic in the AVX2 world fivemack Software 2 2012-11-30 22:23
Instructions to manual LLR? OmbooHankvald PSearch 3 2005-08-05 20:28
Instructions please? jasong Sierpinski/Riesel Base 5 10 2005-03-14 04:03

All times are UTC. The time now is 02:36.

Tue Apr 13 02:36:56 UTC 2021 up 4 days, 21:17, 1 user, load averages: 1.87, 2.14, 2.35

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.