mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2014-04-25, 12:42   #1
mickfrancis
 
Apr 2014
Marlow, UK

3816 Posts
Default Contributions to msieve source code

I've managed to find a very small change to the source code that speeds up the factoring of an 80 digit number by about 23% and a 90 digit one by nearer 30% (I'm currently testing with a 100 digit number - should know this evening). How do I go about sharing this change?

I'm running on a Windows VM with i7 processor, using gcc and Cygwin - I can't tell whether the speed-up will apply to other architectures...
mickfrancis is offline   Reply With Quote
Old 2014-04-25, 13:58   #2
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2·3·1,699 Posts
Default

Quote:
Originally Posted by mickfrancis View Post
I've managed to find a very small change to the source code that speeds up the factoring of an 80 digit number by about 23% and a 90 digit one by nearer 30% (I'm currently testing with a 100 digit number - should know this evening). How do I go about sharing this change?

I'm running on a Windows VM with i7 processor, using gcc and Cygwin - I can't tell whether the speed-up will apply to other architectures...
If it is very small, post the changes here, otherwise put the patch somewhere accessible and post a link. If you don't have anywhere accessible mail me (paul@leyland.vispa.com) some of the details and I'll try to sort it out from there.

The original author reads this forum but recently posted that Real Life (tm) is taking up most of his time and he won't be able to much, if any, development work.

FWIW, many folk here are interested in msieve to factor rather larger numbers, those in the 120-170 digit range by GNFS or 180-250 by SNFS. Your optimizations are less likely to be useful to them because they tend to use the ggnfs siever in conjuction with the msieve toolkit for everything else.


Paul
xilman is offline   Reply With Quote
Old 2014-04-25, 15:06   #3
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

9A916 Posts
Default

If the speed improvement is that large (claiming 23-30% on a C100) that would be interesting, even if it just for giving idea to others. If thoses numbers hold, it will be a noticeable change in ECM.
firejuggler is online now   Reply With Quote
Old 2014-04-25, 16:11   #4
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

593810 Posts
Default

Definitely interesting (though I usually use yafu for numbers of that size).
CRGreathouse is offline   Reply With Quote
Old 2014-04-25, 17:10   #5
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

1101000011012 Posts
Default

Although it has not been under active development for quite some time, msieve's QS has undergone significant optimization. I'm kinda surprised that a small change could result in such large improvements. I'd very much like to see your updates.
bsquared is offline   Reply With Quote
Old 2014-04-25, 18:08   #6
mickfrancis
 
Apr 2014
Marlow, UK

23·7 Posts
Default

Yes, it is impressively optimised. I was rather surprised by the result, and before I post the changes I'll wait for RSA-100 to finish (this took 11 hours before, so should finish in the next few hours). If there is a similar improvement with this I'll post the change, if not I'll hang my head in shame and apologise
mickfrancis is offline   Reply With Quote
Old 2014-04-25, 18:20   #7
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2×3×1,699 Posts
Default

Quote:
Originally Posted by mickfrancis View Post
Yes, it is impressively optimised. I was rather surprised by the result, and before I post the changes I'll wait for RSA-100 to finish (this took 11 hours before, so should finish in the next few hours). If there is a similar improvement with this I'll post the change, if not I'll hang my head in shame and apologise
Shame on you indeed Sir!

Even if a C100 is no worse than before, the improvements for smaller numbers are well worth including.

Paul
xilman is offline   Reply With Quote
Old 2014-04-25, 18:55   #8
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26·113 Posts
Default

Quote:
Originally Posted by mickfrancis View Post
I've managed to find a very small change to the source code that speeds up the factoring of an 80 digit number by about 23% and a 90 digit one by nearer 30% (I'm currently testing with a 100 digit number - should know this evening). How do I go about sharing this change?

I'm running on a Windows VM with i7 processor, using gcc and Cygwin - I can't tell whether the speed-up will apply to other architectures...
Please describe the nature of the change; What part of the algorithm
was involved?
R.D. Silverman is offline   Reply With Quote
Old 2014-04-25, 22:39   #9
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

348910 Posts
Default

I'd be happy to fold it in; post the changes here or PM me.
jasonp is offline   Reply With Quote
Old 2014-04-26, 10:20   #10
mickfrancis
 
Apr 2014
Marlow, UK

23×7 Posts
Default Underwhelmed...

(Struggling to type due to shamed head-hanging...)

It would appear that rumours of a significant speed-up have been greatly exaggerated... Factoring RSA-100, which took almost exactly 11 hours before on my (slow!) VM (at work) took 10h46 after the change - 2% improvement - disappointing.

Hard to do timings when the 'tin' is shared by other VMs, I guess, but it's the only place I have a modern Cygwin.

The change is in the function add_to_hashtable in sieve_core.c (possibly pplicable to other sieve_core_xxx.c files?). This is very much a hot-spot, and has clearly been carefully optimised, including pre-fetching to maximise cache usage.

What I did was this: Instead of allocating a new bucket_entry_t (struct) on the stack, setting its contents then copying to the entry's list, avoid this copy by setting the list's entry's fields directly. This is probably in the cache due to the PREFETCH.

The body of the function is as follows, with the mods conditionally compiled on the __MDF__ flag:

Code:
static void add_to_hashtable(bucket_t *entry,
                 uint32 sieve_offset,
                 uint32 mask,
                 uint32 prime_index,
                 uint32 logprime) {
    /* add a 'sieve update' to the hashtable bin pointed
       to by 'entry'. Hashing is based on the top few
       bits of sieve_offset, and the range of sieve values
       represented by one hash bin is given by 'mask'.

       Note that after the first polynomial it's unlikely
       that any hash bin will need to grow in size. */

#if defined(__MDF__)
    uint32 i = entry->num_used++;
#else 
    uint32 i = entry->num_used;
    bucket_entry_t new_entry;
#endif

    /* always prefetch; we can't rely on the processor
       automatically knowing when to do this */
    if (!(i % 8))
        PREFETCH(entry->list + i + 8);

    if (i == entry->num_alloc) {
        entry->num_alloc = 2 * i;
        entry->list = (bucket_entry_t *)xrealloc(entry->list,
                        2 * i * sizeof(bucket_entry_t));
    }
#if defined(__MDF__)
    bucket_entry_t *new_entry = entry->list + i;
    new_entry->logprime       = logprime;
    new_entry->prime_index    = prime_index;
    new_entry->sieve_offset   = sieve_offset & mask;
#else
    new_entry.logprime = logprime;
    new_entry.prime_index = prime_index;
    new_entry.sieve_offset = sieve_offset & mask;
    entry->list[i] = new_entry;
    entry->num_used++;
#endif
}
Note that this includes moving the auto-increment of entry->num_used up to the assignmen to i. This may need approval from the style police, and may provide no benefit (and wasn't in the version used for my timings).

It would also be also possible to combine the sieve_offset and mask parameters into a single masked_sieve_offset (by using & at the call sites), to reduce the number of parameters, but I suspect this would make little or no difference with clever optimisers.

I would be interested to know what difference, if any, this change makes for other people.

Apologies for any time wasted, I'll adopt a test-before-post approach from now on!
mickfrancis is offline   Reply With Quote
Old 2014-04-26, 16:03   #11
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3×1,163 Posts
Default

It's a pretty nice patch, and on my Core 2 it does drop the sieving time by a few seconds at 80 digits, so I committed it.

Thanks!
jasonp is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Source code to mprime 289 out there somewhere? graysky Linux 6 2016-04-25 23:03
Source Code for msieve ? mohamed Msieve 8 2013-12-14 01:04
Source code for my program Sam Kennedy Factoring 7 2012-11-22 18:24
llrnet - source code? reezer Prime Sierpinski Project 11 2009-09-11 10:47
Support for other OSs on x86/source code reezer Software 1 2007-02-08 12:57

All times are UTC. The time now is 23:39.

Wed Nov 25 23:39:58 UTC 2020 up 76 days, 20:50, 3 users, load averages: 1.72, 1.30, 1.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.