Register FAQ Search Today's Posts Mark Forums Read

 2009-06-02, 19:49 #34 joral     Mar 2008 1101112 Posts No, I thought the working directory needed to previously exist, so it was there, but empty. This then showed up when running bwc.pl with :complete. Thanks for the information. As I get further along I may have more errors to ask about.
 2009-06-02, 22:11 #35 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 11000111010002 Posts I don't quite understand the benchmark output OK: I'm using the command line Code: % /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench snfs.small -impl bucket and getting plausible diagnostics rather than error messages. Trying all the *_bench tools, deleting snfs.small-bucket.bin between runs: u128 T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs 22 iterations in 101s, 4.61/1, 17.77 ns/coeff u64 T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs 38 iterations in 102s, 2.67/1, 10.29 ns/coeff u64k says 'T0 : Check failed Aborted' u64n also says this. I assume u128 would want to do exactly half as many iterations as u64, so would be quicker in total; should I be getting a 'k' or 'n' parameter to u64k or u64n in some way? If u128 does 5582216/128 iterations, the total runtime would be ~200k seconds, which seems pretty good since msieve lanczos took 108242 wall-time seconds with four threads - but I'm not sure whether there's not another factor two hiding somewhere in the block Wiedemann algorithm. So, time to try threading. Code: /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/balance --in snfs.small --out cabbage --nslices 2x2 --ramlimit 8G gives me a message 'Matrix has more rows than columns \n Perhaps the matrix should have been transposed first', and produces cabbage.row_perm, cabbage.col_perm and cabbage.h[01].v[01]. Then Code: taskset 0f /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench -impl bucket -nthreads 4 -- cabbage.h0.v0 cabbage.h1.v0 cabbage.h0.v1 cabbage.h1.v1 runs with occasionally 400% CPU and says 19 iterations in 102s, 5.35/1, 20.62 ns/coeff Does this mean that threads are treading on one another's toes and four threads are slower than one, or that each thread has done 19 iterations in 102 seconds for a total speed of effectively 5.16 ns/coeff ?
 2009-06-02, 23:46 #36 joral     Mar 2008 5×11 Posts Ok. A little farther. Now I have had the following: Computing trsp(x)*M^100 ..........Warning: Doing many iterations with bad code Warning: Doing many iterations with bad code Warning: Doing many iterations with bad code Warning: Doing many iterations with bad code Then a little later... Failed check at iteration 100 /cado-nfs/linalg/bwc/u64_krylov: exited with status 1 Tried with a different seed, and it failed at iteration 1900. I know I had trouble with the msieve version of block lanczos if the matrix was too sparse, I believe it was. Is there a similar condition here which could cause it to fail? Last fiddled with by joral on 2009-06-02 at 23:47
2009-06-03, 08:45   #37
thome

May 2009

2210 Posts

Quote:
 Originally Posted by fivemack OK: I'm using the command line Code: % /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench snfs.small -impl bucket and getting plausible diagnostics rather than error messages. Trying all the *_bench tools, deleting snfs.small-bucket.bin between runs: u128 T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs 22 iterations in 101s, 4.61/1, 17.77 ns/coeff u64 T0 snfs.small: 5582216 rows 5582056 cols 259518490 coeffs 38 iterations in 102s, 2.67/1, 10.29 ns/coeff
ok -- which kind of cpu is this ? These figures seem a bit large.

Quote:
 u64k says 'T0 : Check failed Aborted' u64n also says this.
bug. u64k and u64n make little sense for benches, but that's definitely a bug. I'll try to reproduce it.

Quote:
 I assume u128 would want to do exactly half as many iterations as u64, so would be quicker in total; should I be getting a 'k' or 'n' parameter to u64k or u64n in some way?
For information, the k in u64k is hard-coded (anyway this code is never used). Setting n for u64n_bench is done with --nbys=128 (for n=2).

Quote:
 If u128 does 5582216/128 iterations, the total runtime would be ~200k seconds, which seems pretty good since msieve lanczos took 108242 wall-time seconds with four threads - but I'm not sure whether there's not another factor two hiding somewhere in the block Wiedemann algorithm.
N/m+N/n+N/n -- so three times as much. But I wonder. Your timings exceed what I get normally, so perhaps there's something wrong somewhere. Was your matrix transposed ? If not, i.e. if relation-sets are rows and ideals are columns, then you should use the -t option to the bench program, otherwise the matrix gets organized the wrong way around.

Quote:
 So, time to try threading. Code: /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/balance --in snfs.small --out cabbage --nslices 2x2 --ramlimit 8G gives me a message 'Matrix has more rows than columns \n Perhaps the matrix should have been transposed first',
This warning is innocuous cruft, since bwc tools now properly handle matrices in both directions -- although this hints at the fact the arguments you've tried don't direct them to do so.

Quote:
 and produces cabbage.row_perm, cabbage.col_perm and cabbage.h[01].v[01]. Then Code: taskset 0f /home/nfsslave2/cado/cado-nfs-20090528-r2167/build/cow/linalg/bwc/u128_bench -impl bucket -nthreads 4 -- cabbage.h0.v0 cabbage.h1.v0 cabbage.h0.v1 cabbage.h1.v1 runs with occasionally 400% CPU and says 19 iterations in 102s, 5.35/1, 20.62 ns/coeff Does this mean that threads are treading on one another's toes and four threads are slower than one, or that each thread has done 19 iterations in 102 seconds for a total speed of effectively 5.16 ns/coeff ?
The number of seconds here (102, 5.35) is cpu, not wct. So four threads do effectively one iteration every 1.34s wct, which isn't exactly 4times better than 1thread, but relatively acceptable. Threads do tread on one another's toes indeed, because of the memory access penalties. Since the penalty is not large here, I suppose you have opterons maybe.

E.

2009-06-03, 08:49   #38
thome

May 2009

268 Posts

Quote:
 Originally Posted by joral Ok. A little farther. Now I have had the following: Computing trsp(x)*M^100 ..........Warning: Doing many iterations with bad code Warning: Doing many iterations with bad code Warning: Doing many iterations with bad code Warning: Doing many iterations with bad code
Normal for the u64_secure program. It effectively does transposed multiplications, which are somewhat slower.

Quote:
 Then a little later... Failed check at iteration 100 /cado-nfs/linalg/bwc/u64_krylov: exited with status 1 Tried with a different seed, and it failed at iteration 1900.
That's a problem. The fact that it doesn't even deterministically fails suggest that perhaps your RAM could be accused, but I wouldn't conclude that too soon.

Care for sharing your matrix ?

Quote:
 I know I had trouble with the msieve version of block lanczos if the matrix was too sparse, I believe it was. Is there a similar condition here which could cause it to fail?
If it's very sparse, and if I got padding coeffs wrong in some corner case, maybe, but I doubt it.

E.

2009-06-03, 10:52   #39
thome

May 2009

2×11 Posts

Quote:
 Originally Posted by fivemack u64k says 'T0 : Check failed Aborted' u64n also says this.
Now fixed. Thanks.

2009-06-03, 10:53   #40
thome

May 2009

2×11 Posts

Quote:
 Originally Posted by joral Computing trsp(x)*M^100 ..........Warning: Doing many iterations with bad code Warning: Doing many iterations with bad code Warning: Doing many iterations with bad code Warning: Doing many iterations with bad code
This warning no longer appears (yes, there's a new tarball).

E.

2009-06-03, 10:56   #41
joral

Mar 2008

5·11 Posts

Quote:
 The fact that it doesn't even deterministically fails suggest that perhaps your RAM could be accused, but I wouldn't conclude that too soon.
I'm going to run some more tests to be sure, but as I recall it is deterministic in this:

If I leave the seed parameter unchanged, it always fails at the same iteration.

It's about a 280 Mb matrix file ungzipped, so I'll see what it compresses to and where I can put it.

 2009-06-03, 11:12 #42 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 23×797 Posts The machine I'm doing the benchmarks on is a single-socket 2.66GHz Core i7 (256k 10-cycle L2 cache per core + 8192k 19-cycle L3 cache per four cores + 12G DDR3/8500); I am a little surprised that I don't have to give a load of cache parameters to bench, if it's running one thread blocking for the 256k cache rather than the 8192k one then I could understand it being a bit slow. Will try more sensible benchmarks (correct transpose parameters, trying 1x4 2x2 4x1 decompositions on four cores and 1x8 2x4 4x2 8x1 decompositions on eight-threads-on-four-cores) with new tarball tonight; I've left a make-matrix-from-relations job running today on a set of relations from a very large SNFS job, and will mention if that falls over in interesting ways. It's using an awful lot of memory (17G vsize, 10G rsize), but I have an awful lot of memory and a fast swap disc. Last fiddled with by fivemack on 2009-06-03 at 11:15
 2009-06-03, 13:28 #43 jasonp Tribal Bullet     Oct 2004 22×883 Posts In case it becomes an issue: the latest GGNFS lattice sievers do not print all the factors of relations; they skip multiplicity beyond 1 and skip printing factors smaller than 1000, so that both of these have to be rediscovered by any relation-reading code.
 2009-06-03, 17:13 #44 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 23×797 Posts Benchmark with -t fails entirely I issue the command Code: nfsslave2@cow:/scratch/fib1039/with-cado\$ /home/nfsslave2/cado/cado-nfs-20090603-r2189/build/cow/linalg/bwc/u128_bench -t --impl bucket snfs.small and it produces a lot of output at the 'large' level before failing with Code: Lsl 56 cols 3634827..3699734 w=778884, avg dj=7.2, max dj=34365, bucket hit=1/1834.7-> too sparse Switching to huge slices. Lsl 56 to be redone Flushing 56 large slices Hsl 0 cols 3634827..5582056 (30*64908) .............................. w=16383453, avg dj=0.3, max dj=29376, bucket block hit=1/10.2 u128_bench: /home/nfsslave2/cado/cado-nfs-20090603-r2189/linalg/bwc/matmul-bucket.cpp:610: void split_huge_slice_in_vblocks(builder*, huge_slice_t*, huge_slice_raw_t*, unsigned int): Assertion (n+np)*2 == (size_t) (spc - sp0)' failed. Aborted` The enormous filtering run got terminated by something that kills SSH sessions that have produced no output for ages, will try that again.

 Similar Threads Thread Thread Starter Forum Replies Last Post jux CADO-NFS 22 2019-11-12 12:08 henryzz CADO-NFS 4 2017-11-20 15:14 akruppa Programming 22 2015-12-31 08:37 skan Information & Answers 1 2013-10-22 07:00 R.D. Silverman Factoring 4 2008-11-06 12:35

All times are UTC. The time now is 07:15.

Wed Dec 2 07:15:48 UTC 2020 up 83 days, 4:26, 1 user, load averages: 1.25, 1.50, 1.50