mersenneforum.org Contributions to msieve source code
 Register FAQ Search Today's Posts Mark Forums Read

 2014-04-25, 12:42 #1 mickfrancis   Apr 2014 Marlow, UK 3816 Posts Contributions to msieve source code I've managed to find a very small change to the source code that speeds up the factoring of an 80 digit number by about 23% and a 90 digit one by nearer 30% (I'm currently testing with a 100 digit number - should know this evening). How do I go about sharing this change? I'm running on a Windows VM with i7 processor, using gcc and Cygwin - I can't tell whether the speed-up will apply to other architectures...
2014-04-25, 13:58   #2
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

2B1016 Posts

Quote:
 Originally Posted by mickfrancis I've managed to find a very small change to the source code that speeds up the factoring of an 80 digit number by about 23% and a 90 digit one by nearer 30% (I'm currently testing with a 100 digit number - should know this evening). How do I go about sharing this change? I'm running on a Windows VM with i7 processor, using gcc and Cygwin - I can't tell whether the speed-up will apply to other architectures...
If it is very small, post the changes here, otherwise put the patch somewhere accessible and post a link. If you don't have anywhere accessible mail me (paul@leyland.vispa.com) some of the details and I'll try to sort it out from there.

The original author reads this forum but recently posted that Real Life (tm) is taking up most of his time and he won't be able to much, if any, development work.

FWIW, many folk here are interested in msieve to factor rather larger numbers, those in the 120-170 digit range by GNFS or 180-250 by SNFS. Your optimizations are less likely to be useful to them because they tend to use the ggnfs siever in conjuction with the msieve toolkit for everything else.

Paul

 2014-04-25, 15:06 #3 firejuggler     "Vincent" Apr 2010 Over the rainbow 2·32·149 Posts If the speed improvement is that large (claiming 23-30% on a C100) that would be interesting, even if it just for giving idea to others. If thoses numbers hold, it will be a noticeable change in ECM.
 2014-04-25, 16:11 #4 CRGreathouse     Aug 2006 3×1,993 Posts Definitely interesting (though I usually use yafu for numbers of that size).
 2014-04-25, 17:10 #5 bsquared     "Ben" Feb 2007 3·1,193 Posts Although it has not been under active development for quite some time, msieve's QS has undergone significant optimization. I'm kinda surprised that a small change could result in such large improvements. I'd very much like to see your updates.
 2014-04-25, 18:08 #6 mickfrancis   Apr 2014 Marlow, UK 23×7 Posts Yes, it is impressively optimised. I was rather surprised by the result, and before I post the changes I'll wait for RSA-100 to finish (this took 11 hours before, so should finish in the next few hours). If there is a similar improvement with this I'll post the change, if not I'll hang my head in shame and apologise
2014-04-25, 18:20   #7
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

254208 Posts

Quote:
 Originally Posted by mickfrancis Yes, it is impressively optimised. I was rather surprised by the result, and before I post the changes I'll wait for RSA-100 to finish (this took 11 hours before, so should finish in the next few hours). If there is a similar improvement with this I'll post the change, if not I'll hang my head in shame and apologise
Shame on you indeed Sir!

Even if a C100 is no worse than before, the improvements for smaller numbers are well worth including.

Paul

2014-04-25, 18:55   #8
R.D. Silverman

Nov 2003

22×5×373 Posts

Quote:
 Originally Posted by mickfrancis I've managed to find a very small change to the source code that speeds up the factoring of an 80 digit number by about 23% and a 90 digit one by nearer 30% (I'm currently testing with a 100 digit number - should know this evening). How do I go about sharing this change? I'm running on a Windows VM with i7 processor, using gcc and Cygwin - I can't tell whether the speed-up will apply to other architectures...
Please describe the nature of the change; What part of the algorithm
was involved?

 2014-04-25, 22:39 #9 jasonp Tribal Bullet     Oct 2004 3·1,181 Posts I'd be happy to fold it in; post the changes here or PM me.
 2014-04-26, 10:20 #10 mickfrancis   Apr 2014 Marlow, UK 23×7 Posts Underwhelmed... (Struggling to type due to shamed head-hanging...) It would appear that rumours of a significant speed-up have been greatly exaggerated... Factoring RSA-100, which took almost exactly 11 hours before on my (slow!) VM (at work) took 10h46 after the change - 2% improvement - disappointing. Hard to do timings when the 'tin' is shared by other VMs, I guess, but it's the only place I have a modern Cygwin. The change is in the function add_to_hashtable in sieve_core.c (possibly pplicable to other sieve_core_xxx.c files?). This is very much a hot-spot, and has clearly been carefully optimised, including pre-fetching to maximise cache usage. What I did was this: Instead of allocating a new bucket_entry_t (struct) on the stack, setting its contents then copying to the entry's list, avoid this copy by setting the list's entry's fields directly. This is probably in the cache due to the PREFETCH. The body of the function is as follows, with the mods conditionally compiled on the __MDF__ flag: Code: static void add_to_hashtable(bucket_t *entry, uint32 sieve_offset, uint32 mask, uint32 prime_index, uint32 logprime) { /* add a 'sieve update' to the hashtable bin pointed to by 'entry'. Hashing is based on the top few bits of sieve_offset, and the range of sieve values represented by one hash bin is given by 'mask'. Note that after the first polynomial it's unlikely that any hash bin will need to grow in size. */ #if defined(__MDF__) uint32 i = entry->num_used++; #else uint32 i = entry->num_used; bucket_entry_t new_entry; #endif /* always prefetch; we can't rely on the processor automatically knowing when to do this */ if (!(i % 8)) PREFETCH(entry->list + i + 8); if (i == entry->num_alloc) { entry->num_alloc = 2 * i; entry->list = (bucket_entry_t *)xrealloc(entry->list, 2 * i * sizeof(bucket_entry_t)); } #if defined(__MDF__) bucket_entry_t *new_entry = entry->list + i; new_entry->logprime = logprime; new_entry->prime_index = prime_index; new_entry->sieve_offset = sieve_offset & mask; #else new_entry.logprime = logprime; new_entry.prime_index = prime_index; new_entry.sieve_offset = sieve_offset & mask; entry->list[i] = new_entry; entry->num_used++; #endif } Note that this includes moving the auto-increment of entry->num_used up to the assignmen to i. This may need approval from the style police, and may provide no benefit (and wasn't in the version used for my timings). It would also be also possible to combine the sieve_offset and mask parameters into a single masked_sieve_offset (by using & at the call sites), to reduce the number of parameters, but I suspect this would make little or no difference with clever optimisers. I would be interested to know what difference, if any, this change makes for other people. Apologies for any time wasted, I'll adopt a test-before-post approach from now on!
 2014-04-26, 16:03 #11 jasonp Tribal Bullet     Oct 2004 1101110101112 Posts It's a pretty nice patch, and on my Core 2 it does drop the sieving time by a few seconds at 80 digits, so I committed it. Thanks!

 Similar Threads Thread Thread Starter Forum Replies Last Post graysky Linux 6 2016-04-25 23:03 mohamed Msieve 8 2013-12-14 01:04 Sam Kennedy Factoring 7 2012-11-22 18:24 reezer Prime Sierpinski Project 11 2009-09-11 10:47 reezer Software 1 2007-02-08 12:57

All times are UTC. The time now is 18:53.

Wed Dec 1 18:53:07 UTC 2021 up 131 days, 13:22, 1 user, load averages: 1.58, 1.33, 1.33