mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2013-01-27, 22:09   #1
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2·11·172 Posts
Default Small mlucas issue on non-x86

I thought I might as well see just how slowly mlucas runs on a last-year's ARM.

The current downloadable tarfile of mlucas (Mlucas_10.09.2011; I appreciate this is old, is there a newer place to look?) doesn't build unless USE_SSE2 is defined, because the section around lines 1441 to 1458 of radix16_ditN_cy_dif1.c (only) uses *bjmodn0 which is incorrect if !USE_SSE2

I did

#if !defined(USE_SSE2)
#define BJSTAR
#else
#define BJSTAR *
#endif

then replaced *bjmodn0 with BJSTAR modn0

but I appreciate that makes the code a bit ugly.

It's really not terribly fast:

Code:
M2614999: using FFT length 128K = 131072 8-byte floats.
 this gives an average   19.950859069824219 bits per digit
Using complex FFT radices         8        16        32        16
1000 iterations of M2614999 with FFT length 131072 = 128 K
Res64: 1A184504D2DE2D3C. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Program: E3.0x
Res mod 2^36     =          20717645116
Res mod 2^35 - 1 =           5934292942
Res mod 2^36 - 1 =           4090378120
Clocks = 00:00:45.939

M42643801: using FFT length 2304K = 2359296 8-byte floats.
 this gives an average   18.074799007839626 bits per digit
Using complex FFT radices         9         8         8         8        16        16
10 iterations of M42643801 with FFT length 2359296 = 2304 K
Res64: 9BDB491DF4C00002. AvgMaxErr N/A. MaxErr = 0.000000000. Program: E3.0x
Res mod 2^36     =          59940798466
Res mod 2^35 - 1 =          11033316518
Res mod 2^36 - 1 =          15286304084
Clocks = 00:00:10.410
(for comparison, a 3.4GHz Sandy Bridge machine gave 0.045s/i for 42643801 and 0.0021s/i for 2614999; so about 22 times faster)

I'm trying different compiler options; I tried enabling multi-threading but got a message saying that the sensitivity list for radix44 needed updating. Have you got a newer version of that?

Last fiddled with by fivemack on 2013-01-28 at 14:51
fivemack is offline   Reply With Quote
Old 2013-01-28, 14:01   #2
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24·3·11 Posts
Default

What is a last year ARM? :)

Also what compiler flags did you try and what gcc version do you use?
ldesnogu is offline   Reply With Quote
Old 2013-01-28, 14:32   #3
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2·11·172 Posts
Default

ODROID-X, Exynos 4412 @ 1.4GHz (apparently, though /proc/cpuinfo says 2000 bogomips). Running Ubuntu 12.04.

So it's a Cortex-A9; you might reasonably argue that that is an October 2007 CPU, but the Exynos 4412 was only announced in April 2012, and I bought the board on 14 September 2012. I think I should get stuff working nicely on this board before contemplating an A15-based replacement.

I compiled with gcc-4.6.2 -march=v7-a -mcpu=cortex-a9.

Looking at the disassembly, it is using vfp instructions.

It is slightly embarrassing given my current workplace, but even with an ARM ARM in front of me I can't work out whether this architecture has instructions that treat a 128-bit register as two doubles ...

Last fiddled with by fivemack on 2013-01-28 at 14:35
fivemack is offline   Reply With Quote
Old 2013-01-28, 14:39   #4
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24·3·11 Posts
Default

Quote:
Originally Posted by fivemack View Post
ODROID-X, Exynos 4412 @ 1.4GHz (apparently, though /proc/cpuinfo says 2000 bogomips). Running Ubuntu 12.04.
I guess this means the board booted at 1 GHz.

Quote:
So it's a Cortex-A9; I compiled with gcc-4.6.2 -march=v7-a -mcpu=cortex-a9.

Looking at the disassembly, it is using vfp instructions.
Can you check if it is passing function parameters in FP registers? I guess it should since IIRC Ubuntu is using the hard FP ABI.

Quote:
It is slightly embarrassing given my current workplace, but even with an ARM ARM in front of me I can't work out whether this architecture has instructions that treat a 128-bit register as two doubles ...
Heh, the ARM ARM is not easy to read ARMv7 doesn't have SIMD with doubles. ARMv8 does, but you'll have to wait for silicon to arrive...
ldesnogu is offline   Reply With Quote
Old 2013-01-28, 21:03   #5
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

263768 Posts
Default

Quote:
Originally Posted by fivemack View Post
I thought I might as well see just how slowly mlucas runs on a last-year's ARM.

The current downloadable tarfile of mlucas (Mlucas_10.09.2011; I appreciate this is old, is there a newer place to look?) doesn't build unless USE_SSE2 is defined, because the section around lines 1441 to 1458 of radix16_ditN_cy_dif1.c (only) uses *bjmodn0 which is incorrect if !USE_SSE2
Hi, Tom:

Send me your e-mail address and I'll be happy to provide you with the recent tarball being used by myself and the new-prime verifiers.

It's high time for me to update the code at my ftp page, I suppose - would really like to get AVX support finished before spending time on release packaging, though.
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mlucas on ubuntu Damian Mlucas 17 2017-11-13 18:12
Sieving with powers of small primes in the Small Prime variation of the Quadratic Sieve mickfrancis Factoring 2 2016-05-06 08:13
MLucas on IBM Mainframe Lorenzo Mlucas 52 2016-03-13 08:45
Mlucas on Sparc - Unregistered Mlucas 0 2009-10-27 20:35
mlucas on sun delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 01:11.

Tue Sep 22 01:11:06 UTC 2020 up 11 days, 22:22, 0 users, load averages: 1.63, 1.81, 1.74

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.