mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > CADO-NFS

Reply
 
Thread Tools
Old 2009-06-02, 19:49   #34
joral
 
joral's Avatar
 
Mar 2008

1101112 Posts
Default

No, I thought the working directory needed to previously exist, so it was there, but empty. This then showed up when running bwc.pl with :complete.

Thanks for the information. As I get further along I may have more errors to ask about.
joral is offline   Reply With Quote
Old 2009-06-02, 22:11   #35
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

11000111010002 Posts
Default I don't quite understand the benchmark output

OK: I'm using the command line

Code:
% /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench snfs.small -impl bucket
and getting plausible diagnostics rather than error messages. Trying all the *_bench tools, deleting snfs.small-bucket.bin between runs:

u128
T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs
22 iterations in 101s, 4.61/1, 17.77 ns/coeff

u64
T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs
38 iterations in 102s, 2.67/1, 10.29 ns/coeff

u64k says 'T0 : Check failed Aborted'

u64n also says this.

I assume u128 would want to do exactly half as many iterations as u64, so would be quicker in total; should I be getting a 'k' or 'n' parameter to u64k or u64n in some way?

If u128 does 5582216/128 iterations, the total runtime would be ~200k seconds, which seems pretty good since msieve lanczos took 108242 wall-time seconds with four threads - but I'm not sure whether there's not another factor two hiding somewhere in the block Wiedemann algorithm.

So, time to try threading.

Code:
/home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/balance --in snfs.small --out cabbage --nslices 2x2 --ramlimit 8G
gives me a message 'Matrix has more rows than columns \n Perhaps the matrix should have been transposed first', and produces cabbage.row_perm, cabbage.col_perm and cabbage.h[01].v[01]. Then

Code:
taskset 0f /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench -impl bucket -nthreads 4 -- cabbage.h0.v0 cabbage.h1.v0 cabbage.h0.v1 cabbage.h1.v1
runs with occasionally 400% CPU and says

19 iterations in 102s, 5.35/1, 20.62 ns/coeff

Does this mean that threads are treading on one another's toes and four threads are slower than one, or that each thread has done 19 iterations in 102 seconds for a total speed of effectively 5.16 ns/coeff ?
fivemack is offline   Reply With Quote
Old 2009-06-02, 23:46   #36
joral
 
joral's Avatar
 
Mar 2008

5×11 Posts
Default

Ok. A little farther.

Now I have had the following:

Computing trsp(x)*M^100
..........Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code

Then a little later...

Failed check at iteration 100
/cado-nfs/linalg/bwc/u64_krylov: exited with status 1

Tried with a different seed, and it failed at iteration 1900.

I know I had trouble with the msieve version of block lanczos if the matrix was too sparse, I believe it was. Is there a similar condition here which could cause it to fail?

Last fiddled with by joral on 2009-06-02 at 23:47
joral is offline   Reply With Quote
Old 2009-06-03, 08:45   #37
thome
 
May 2009

2210 Posts
Default

Quote:
Originally Posted by fivemack View Post
OK: I'm using the command line

Code:
% /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench snfs.small -impl bucket
and getting plausible diagnostics rather than error messages. Trying all the *_bench tools, deleting snfs.small-bucket.bin between runs:

u128
T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs
22 iterations in 101s, 4.61/1, 17.77 ns/coeff

u64
T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs
38 iterations in 102s, 2.67/1, 10.29 ns/coeff
ok -- which kind of cpu is this ? These figures seem a bit large.

Quote:
u64k says 'T0 : Check failed Aborted'

u64n also says this.
bug. u64k and u64n make little sense for benches, but that's definitely a bug. I'll try to reproduce it.

Quote:
I assume u128 would want to do exactly half as many iterations as u64, so would be quicker in total; should I be getting a 'k' or 'n' parameter to u64k or u64n in some way?
For information, the k in u64k is hard-coded (anyway this code is never used). Setting n for u64n_bench is done with --nbys=128 (for n=2).

Quote:
If u128 does 5582216/128 iterations, the total runtime would be ~200k seconds, which seems pretty good since msieve lanczos took 108242 wall-time seconds with four threads - but I'm not sure whether there's not another factor two hiding somewhere in the block Wiedemann algorithm.
N/m+N/n+N/n -- so three times as much. But I wonder. Your timings exceed what I get normally, so perhaps there's something wrong somewhere. Was your matrix transposed ? If not, i.e. if relation-sets are rows and ideals are columns, then you should use the -t option to the bench program, otherwise the matrix gets organized the wrong way around.

Quote:
So, time to try threading.

Code:
/home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/balance --in snfs.small --out cabbage --nslices 2x2 --ramlimit 8G
gives me a message 'Matrix has more rows than columns \n Perhaps the matrix should have been transposed first',
This warning is innocuous cruft, since bwc tools now properly handle matrices in both directions -- although this hints at the fact the arguments you've tried don't direct them to do so.

Quote:
and produces cabbage.row_perm, cabbage.col_perm and cabbage.h[01].v[01]. Then

Code:
taskset 0f /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench -impl bucket -nthreads 4 -- cabbage.h0.v0 cabbage.h1.v0 cabbage.h0.v1 cabbage.h1.v1
runs with occasionally 400% CPU and says

19 iterations in 102s, 5.35/1, 20.62 ns/coeff

Does this mean that threads are treading on one another's toes and four threads are slower than one, or that each thread has done 19 iterations in 102 seconds for a total speed of effectively 5.16 ns/coeff ?
The number of seconds here (102, 5.35) is cpu, not wct. So four threads do effectively one iteration every 1.34s wct, which isn't exactly 4times better than 1thread, but relatively acceptable. Threads do tread on one another's toes indeed, because of the memory access penalties. Since the penalty is not large here, I suppose you have opterons maybe.

E.
thome is offline   Reply With Quote
Old 2009-06-03, 08:49   #38
thome
 
May 2009

268 Posts
Default

Quote:
Originally Posted by joral View Post
Ok. A little farther.

Now I have had the following:

Computing trsp(x)*M^100
..........Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Normal for the u64_secure program. It effectively does transposed multiplications, which are somewhat slower.

Quote:
Then a little later...

Failed check at iteration 100
/cado-nfs/linalg/bwc/u64_krylov: exited with status 1

Tried with a different seed, and it failed at iteration 1900.
That's a problem. The fact that it doesn't even deterministically fails suggest that perhaps your RAM could be accused, but I wouldn't conclude that too soon.

Care for sharing your matrix ?

Quote:
I know I had trouble with the msieve version of block lanczos if the matrix was too sparse, I believe it was. Is there a similar condition here which could cause it to fail?
If it's very sparse, and if I got padding coeffs wrong in some corner case, maybe, but I doubt it.

E.
thome is offline   Reply With Quote
Old 2009-06-03, 10:52   #39
thome
 
May 2009

2×11 Posts
Default

Quote:
Originally Posted by fivemack View Post
u64k says 'T0 : Check failed Aborted'

u64n also says this.
Now fixed. Thanks.
thome is offline   Reply With Quote
Old 2009-06-03, 10:53   #40
thome
 
May 2009

2×11 Posts
Default

Quote:
Originally Posted by joral View Post
Computing trsp(x)*M^100
..........Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
Warning: Doing many iterations with bad code
This warning no longer appears (yes, there's a new tarball).

E.
thome is offline   Reply With Quote
Old 2009-06-03, 10:56   #41
joral
 
joral's Avatar
 
Mar 2008

5·11 Posts
Default

Quote:
The fact that it doesn't even deterministically fails suggest that perhaps your RAM could be accused, but I wouldn't conclude that too soon.
I'm going to run some more tests to be sure, but as I recall it is deterministic in this:

If I leave the seed parameter unchanged, it always fails at the same iteration.

It's about a 280 Mb matrix file ungzipped, so I'll see what it compresses to and where I can put it.
joral is offline   Reply With Quote
Old 2009-06-03, 11:12   #42
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

23×797 Posts
Default

The machine I'm doing the benchmarks on is a single-socket 2.66GHz Core i7 (256k 10-cycle L2 cache per core + 8192k 19-cycle L3 cache per four cores + 12G DDR3/8500); I am a little surprised that I don't have to give a load of cache parameters to bench, if it's running one thread blocking for the 256k cache rather than the 8192k one then I could understand it being a bit slow.

Will try more sensible benchmarks (correct transpose parameters, trying 1x4 2x2 4x1 decompositions on four cores and 1x8 2x4 4x2 8x1 decompositions on eight-threads-on-four-cores) with new tarball tonight; I've left a make-matrix-from-relations job running today on a set of relations from a very large SNFS job, and will mention if that falls over in interesting ways. It's using an awful lot of memory (17G vsize, 10G rsize), but I have an awful lot of memory and a fast swap disc.

Last fiddled with by fivemack on 2009-06-03 at 11:15
fivemack is offline   Reply With Quote
Old 2009-06-03, 13:28   #43
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

22×883 Posts
Default

In case it becomes an issue: the latest GGNFS lattice sievers do not print all the factors of relations; they skip multiplicity beyond 1 and skip printing factors smaller than 1000, so that both of these have to be rediscovered by any relation-reading code.
jasonp is offline   Reply With Quote
Old 2009-06-03, 17:13   #44
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

23×797 Posts
Default Benchmark with -t fails entirely

I issue the command

Code:
nfsslave2@cow:/scratch/fib1039/with-cado$ /home/nfsslave2/cado/cado-nfs-20090603-r2189/build/cow/linalg/bwc/u128_bench -t --impl bucket snfs.small
and it produces a lot of output at the 'large' level before failing with

Code:
Lsl 56 cols 3634827..3699734 w=778884, avg dj=7.2, max dj=34365, bucket hit=1/1834.7-> too sparse
Switching to huge slices. Lsl 56 to be redone
Flushing 56 large slices
Hsl 0 cols 3634827..5582056 (30*64908) ..............................
 w=16383453, avg dj=0.3, max dj=29376, bucket block hit=1/10.2
u128_bench: /home/nfsslave2/cado/cado-nfs-20090603-r2189/linalg/bwc/matmul-bucket.cpp:610: void split_huge_slice_in_vblocks(builder*, huge_slice_t*, huge_slice_raw_t*, unsigned int): Assertion `(n+np)*2 == (size_t) (spc - sp0)' failed.
Aborted
The enormous filtering run got terminated by something that kills SSH sessions that have produced no output for ages, will try that again.
fivemack is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
CADO-NFS on windows jux CADO-NFS 22 2019-11-12 12:08
CADO help henryzz CADO-NFS 4 2017-11-20 15:14
CADO and WinBlows akruppa Programming 22 2015-12-31 08:37
CADO-NFS skan Information & Answers 1 2013-10-22 07:00
CADO R.D. Silverman Factoring 4 2008-11-06 12:35

All times are UTC. The time now is 07:15.

Wed Dec 2 07:15:48 UTC 2020 up 83 days, 4:26, 1 user, load averages: 1.25, 1.50, 1.50

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.