mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2003-03-28, 19:19   #1
cmokruhl10
 
Mar 2003

2 Posts
Default The effect of SSE2 in P4s

Just for grins, I ran benchmarks on my P4 2320MHz with and without SSE2. To disable SSE2, I put "CpuSupportsSSE2=0" in local.ini. The end result is SSE2 is worth about a 3x speedup. At a similar clock rate, an Athlon crushes the P4 without SSE2. So, it stands to reason that an Athlon with SSE2 would be a sweet processor.


WITH SSE2:
[code:1]Intel(R) Pentium(R) 4 CPU 1.60GHz
CPU speed: 2319.69 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 version 23.2, RdtscTiming=1
Best time for 384K FFT length: 16.327 ms.
Best time for 448K FFT length: 19.262 ms.
Best time for 512K FFT length: 21.948 ms.
Best time for 640K FFT length: 28.516 ms.
Best time for 768K FFT length: 34.684 ms.
Best time for 896K FFT length: 42.614 ms.
Best time for 1024K FFT length: 46.313 ms.
Best time for 1280K FFT length: 62.494 ms.
Best time for 1536K FFT length: 76.973 ms.
Best time for 1792K FFT length: 96.039 ms.
Best time for 2048K FFT length: 108.204 ms.[/code:1]


WITHOUT SSE2:
[code:1]Intel(R) Pentium(R) 4 CPU 1.60GHz
CPU speed: 2319.60 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 8 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 version 23.2, RdtscTiming=1
Best time for 384K FFT length: 49.217 ms.
Best time for 448K FFT length: 58.415 ms.
Best time for 512K FFT length: 65.181 ms.
Best time for 640K FFT length: 84.251 ms.
Best time for 768K FFT length: 106.694 ms.
Best time for 896K FFT length: 123.732 ms.
Best time for 1024K FFT length: 143.338 ms.
Best time for 1280K FFT length: 184.298 ms.
Best time for 1536K FFT length: 221.057 ms.
Best time for 1792K FFT length: 261.316 ms.
Best time for 2048K FFT length: 298.231 ms.[/code:1]

ATHLON:
[code:1]AMD Athlon(tm) XP 2600+
CPU speed: 2254.60 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 64 KB
L2 cache size: 256 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 256
Prime95 version 23.2, RdtscTiming=1
Best time for 384K FFT length: 26.646 ms.
Best time for 448K FFT length: 30.303 ms.
Best time for 512K FFT length: 33.531 ms.
Best time for 640K FFT length: 43.511 ms.
Best time for 768K FFT length: 53.481 ms.
Best time for 896K FFT length: 62.682 ms.
Best time for 1024K FFT length: 71.481 ms.
Best time for 1280K FFT length: 92.451 ms.
Best time for 1536K FFT length: 111.819 ms.
Best time for 1792K FFT length: 137.076 ms.
Best time for 2048K FFT length: 153.008 ms.[/code:1]
cmokruhl10 is offline   Reply With Quote
Old 2003-03-29, 00:22   #2
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

2×7×13×47 Posts
Default

Yes, but the AMD at 2300MHz is at the limit of it's design... The P4 can scale much farther... The reason the P4 is so crappy in IPC (long pipeline) is precisely the reason it can scale so high...

In other words, life is (always) a compromise...
Xyzzy is offline   Reply With Quote
Old 2003-03-29, 01:17   #3
pakaran
 
pakaran's Avatar
 
Aug 2002

3·83 Posts
Default

Quote:
Originally Posted by Xyzzy
Yes, but the AMD at 2300MHz is at the limit of it's design...
Don't they have a 3200+ out now?
pakaran is offline   Reply With Quote
Old 2003-03-29, 01:59   #4
xtreme2k
 
xtreme2k's Avatar
 
Aug 2002

2568 Posts
Default

The point of this post is zero if you ask me.

SSE2 is part of what makes a P4 a P4. If you want to cripple a P4 and make an Athlon XP looks better then thats fine. But it doesnt mean anything.
xtreme2k is offline   Reply With Quote
Old 2003-03-29, 02:08   #5
cmokruhl10
 
Mar 2003

210 Posts
Default

The point of this post is to show that the only reason P4s are the GIMPS "CPU of choice" is SSE2. For every other distributed computing project, Athlons are the "CPU of choice" by quite a margin. All I'm saying is that if AMD released an Athlon with SSE2 (quite possible... and coming), it would eclipse the P4.
cmokruhl10 is offline   Reply With Quote
Old 2003-03-29, 03:27   #6
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

205528 Posts
Default

Quote:
Originally Posted by pakaran
Don't they have a 3200+ out now?
I don't know... That PR rating crap is so complicated... I think they have a 2.1 or 2.2GHz CPU in real GHz...
Xyzzy is offline   Reply With Quote
Old 2003-03-29, 09:01   #7
trif
 
trif's Avatar
 
Aug 2002

CA16 Posts
Default

Quote:
Originally Posted by cmokruhl10
The point of this post is to show that the only reason P4s are the GIMPS "CPU of choice" is SSE2. For every other distributed computing project, Athlons are the "CPU of choice" by quite a margin. All I'm saying is that if AMD released an Athlon with SSE2 (quite possible... and coming), it would eclipse the P4.
That would depend on whether the SSE2 implementation is done well. It would also be nice if they didn't cripple the rest of the chip when doing so, the way the P4 dogs at things that don't use SSE2. SSE2 is just an instruction set. RC5 taught us about huge differences in performance depending on how a single rotate instruction was implemented in the hardware.
trif is offline   Reply With Quote
Old 2003-03-29, 20:52   #8
NookieN
 
NookieN's Avatar
 
Aug 2002

59 Posts
Default Re: The effect of SSE2 in P4s

Quote:
Originally Posted by cmokruhl10
So, it stands to reason that an Athlon with SSE2 would be a sweet processor.
[/code]
Not necessarily. A P4 can at best complete one SSE2 FP instruction per cycle (though it's usually much less than that). It's unlikely an Athlon (or Opteron) would be able to complete more than one high-precision SIMD instruction in a cycle. So the P4 would probably still have an advantage in SSE2 due to its higher clock rate.

It will still be interesting to see how the Opteron/Athlon 64 performs with Prime though.
NookieN is offline   Reply With Quote
Old 2003-06-17, 11:18   #9
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

Some slides on optimization for opteron even say, that for double precision one should use x87 since peak throughput are nearly the same as for SSE2 with the plus of having more instructions (log2, sin, cos ...).

Athlon+AMD64 can do FADD+FMUL+FLD/FST per cycle or SSE2 (AMD64 only) with 1 FMUL and 1 FADD. Unfortunately an ADDPD can't be run in parallel to a MULPD because each of them needs 2 decoder ports of the 3. But an ADDPS could be issued since it needs one port. (2 ports are needed to issue double FP operations for each half).

On P4 there are also only one 64bit FP op per cycle possible with SSE2. But P4 reaches higher clocks which is it's advantage here.

But SSE2 offers also to have 32 double precision FP numbers in the 16 registers at once while x87 still has only 8 regs.

DDB
Dresdenboy is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mandela effect, general discussion and how would you study it scientifically? jasong Lounge 12 2018-04-15 05:13
Murphy's law in full effect against me JuanTutors Lounge 3 2007-06-15 16:42
RSA and SSE2 Cyclamen Persicum Math 5 2003-11-10 07:41
Is TF from 2^64 to 2^65 using SSE2? TauCeti Software 3 2003-10-17 06:30
SSE2 ? TauCeti NFSNET Discussion 8 2003-06-30 12:58

All times are UTC. The time now is 00:36.


Wed Feb 1 00:36:52 UTC 2023 up 166 days, 22:05, 0 users, load averages: 0.69, 0.97, 0.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔