Thread: mlucas on sun
View Single Post
Old 2004-01-07, 09:27   #8
delta_t's Avatar
Nov 2002
Anchorage, AK

3×7×17 Posts

Originally posted by ewmayer
Just build a version with all your usual compiler flags and also with -xcollect, then run one or more of the self-test sets, then incorporate the RTP data that were collected by doing a final build with -xuse replacing -xcollect. I believe Bill Rea got a nice (10-20%) speedup at most FFT lengths this way. Note that the optimal FFT radix sets may change once profiling has been done.
Hello Ernst,

Okay, I've recompiled the code several times and have finally came up with the two versions I used for testing and timings. One is the regular compile, while the other is the runtime-profiled version. I will post the two mlucas.cfg files which includes the FFT size, the fastest radix set index, and it's associated clocks.

The RTP version is typically faster, except once you get above the 4096K FFT size, then the profiled version is a little slower on most of them.

As you said, the radix index sets are different.

Last fiddled with by delta_t on 2004-01-07 at 09:35
delta_t is offline   Reply With Quote