20130127, 22:09  #1 
(loop (#_fork))
Feb 2006
Cambridge, England
6,323 Posts 
Small mlucas issue on nonx86
I thought I might as well see just how slowly mlucas runs on a lastyear's ARM.
The current downloadable tarfile of mlucas (Mlucas_10.09.2011; I appreciate this is old, is there a newer place to look?) doesn't build unless USE_SSE2 is defined, because the section around lines 1441 to 1458 of radix16_ditN_cy_dif1.c (only) uses *bjmodn0 which is incorrect if !USE_SSE2 I did #if !defined(USE_SSE2) #define BJSTAR #else #define BJSTAR * #endif then replaced *bjmodn0 with BJSTAR modn0 but I appreciate that makes the code a bit ugly. It's really not terribly fast: Code:
M2614999: using FFT length 128K = 131072 8byte floats. this gives an average 19.950859069824219 bits per digit Using complex FFT radices 8 16 32 16 1000 iterations of M2614999 with FFT length 131072 = 128 K Res64: 1A184504D2DE2D3C. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Program: E3.0x Res mod 2^36 = 20717645116 Res mod 2^35  1 = 5934292942 Res mod 2^36  1 = 4090378120 Clocks = 00:00:45.939 M42643801: using FFT length 2304K = 2359296 8byte floats. this gives an average 18.074799007839626 bits per digit Using complex FFT radices 9 8 8 8 16 16 10 iterations of M42643801 with FFT length 2359296 = 2304 K Res64: 9BDB491DF4C00002. AvgMaxErr N/A. MaxErr = 0.000000000. Program: E3.0x Res mod 2^36 = 59940798466 Res mod 2^35  1 = 11033316518 Res mod 2^36  1 = 15286304084 Clocks = 00:00:10.410 I'm trying different compiler options; I tried enabling multithreading but got a message saying that the sensitivity list for radix44 needed updating. Have you got a newer version of that? Last fiddled with by fivemack on 20130128 at 14:51 
20130128, 14:01  #2 
Jan 2008
France
3·179 Posts 
What is a last year ARM? :)
Also what compiler flags did you try and what gcc version do you use? 
20130128, 14:32  #3 
(loop (#_fork))
Feb 2006
Cambridge, England
14263_{8} Posts 
ODROIDX, Exynos 4412 @ 1.4GHz (apparently, though /proc/cpuinfo says 2000 bogomips). Running Ubuntu 12.04.
So it's a CortexA9; you might reasonably argue that that is an October 2007 CPU, but the Exynos 4412 was only announced in April 2012, and I bought the board on 14 September 2012. I think I should get stuff working nicely on this board before contemplating an A15based replacement. I compiled with gcc4.6.2 march=v7a mcpu=cortexa9. Looking at the disassembly, it is using vfp instructions. It is slightly embarrassing given my current workplace, but even with an ARM ARM in front of me I can't work out whether this architecture has instructions that treat a 128bit register as two doubles ... Last fiddled with by fivemack on 20130128 at 14:35 
20130128, 14:39  #4  
Jan 2008
France
3·179 Posts 
Quote:
Quote:
Quote:


20130128, 21:03  #5  
∂^{2}ω=0
Sep 2002
República de California
23152_{8} Posts 
Quote:
Send me your email address and I'll be happy to provide you with the recent tarball being used by myself and the newprime verifiers. It's high time for me to update the code at my ftp page, I suppose  would really like to get AVX support finished before spending time on release packaging, though. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Mlucas on ubuntu  Damian  Mlucas  17  20171113 18:12 
Sieving with powers of small primes in the Small Prime variation of the Quadratic Sieve  mickfrancis  Factoring  2  20160506 08:13 
MLucas on IBM Mainframe  Lorenzo  Mlucas  52  20160313 08:45 
Mlucas on Sparc   Unregistered  Mlucas  0  20091027 20:35 
mlucas on sun  delta_t  Mlucas  14  20071004 05:45 