Go Back > Extra Stuff > Programming

Thread Tools
Old 2009-10-12, 20:06   #1
(loop (#_fork))
fivemack's Avatar
Feb 2006
Cambridge, England

143578 Posts
Default Shape of a CUDA lattice siever

These are very random thoughts; I am wondering whether a dumb implementation of lattice sieving running on nvidia's ludicrously fast parallel hardware might manage to run at reasonable speed.

Jason has already written the basic primitives, with a reasonably fast modular inverse.

Lattice reduction is not obviously a serious problem, and that at least is grotesquely parallel (one prime per thread).

Enumerating lattice points: I should read the bucket-sieving paper carefully, because at the moment I think it's necessary to divide the sieve region into slices (eg X=-2^15..2^15, Y=Y0..Y0+2^11) to get the buckets to fit in the GPU memory, but IIRC their clever lattice reduction lets you list the lattice points in Y order.

Pulling entries out of buckets stored in global memory into 16k shared sieve arrays on each of 27 thread-blocks on the GPU: probably fine with one block per bucket. Not sure how persistent the shared memory is, but synchronising and writing 16k back to global memory is trivial.

But I can't see how getting the entries into the buckets is going to be anything other than a nightmare on stilts: it's a random store at best (as opposed to the pleasant CPU case where you can be reasonably sure that the ends of the buckets are all in cache), contention on the last-index-for-this-bucket array is enormous, and I don't know what the behaviour when umpteen threads write different values to the same location in global memory is.

GPGPU guides tend to advise against bucketing, and just suggest writing things out in arbitrary order and then doing a compaction pass and a parallel sort; maybe 60GB/second bandwidth to global memory is enough to let you be foolish in that way. List factor-base primes by the number of bucket-entries they're expected to generate and partition among threads so you don't have to be insanely generous with bucket sizes, I suppose.
fivemack is offline   Reply With Quote
Old 2009-10-12, 21:44   #2
Tribal Bullet
jasonp's Avatar
Oct 2004

67208 Posts

Originally Posted by fivemack View Post
Jason has already written the basic primitives, with a reasonably fast modular inverse.

Not sure how persistent the shared memory is, but synchronising and writing 16k back to global memory is trivial.
Shared memory becomes undefined once a kernel exits but otherwise exists for the lifetime of a single kernel launch. All the thread blocks using a single multiprocessor get to see its shared memory; I think that if a previous block already filled up shared memory then future blocks get to see the filled-up version.

For the record, Alex and Bob are primarily responsible for that fast modular inverse.
jasonp is offline   Reply With Quote
Old 2012-12-16, 01:07   #3
pinhodecarlos's Avatar
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

114268 Posts

Is it now portable the lattice siever to CUDA code or OpenCL code?

pinhodecarlos is offline   Reply With Quote

Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
QS Lattice Siever R.D. Silverman Factoring 31 2018-10-08 17:17
Compiling 64 bit lattice siever on gcc 4.8.5 chris2be8 Factoring 6 2018-02-06 17:22
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
Msieve / lattice siever with degree 7/8 poly Batalov Msieve 54 2010-01-13 19:45
ggnfs lattice siever misses some primes fivemack Factoring 1 2008-01-18 13:47

All times are UTC. The time now is 16:23.

Thu Apr 15 16:23:28 UTC 2021 up 7 days, 11:04, 1 user, load averages: 2.68, 2.86, 2.90

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.