mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2020-02-03, 22:39   #1
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

5·112·13 Posts
Default Msieve benchmarking

We have uploaded a data file to use for msieve benchmarking.

https://www.dropbox.com/s/si1kyxq7ye...nchmark.tar.gz

It would be cool if timings for various setups were posted here.

If you need help, please ask!

Attached Files
File Type: gz msieve-1.54.x86_64.gz (342.0 KB, 74 views)
File Type: gz remdups4.gz (4.8 KB, 55 views)
Xyzzy is offline   Reply With Quote
Old 2020-02-15, 23:29   #2
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

10001100100112 Posts
Default

Machine: HP Z620, dual 10-core Ivy Bridge Xeon @2.6ish ghz
64GB memory per socket
-nc1 was run with target-density 134. After remdups and adding freerels in, msieve states 99.3M unique relations. Matrix came out 4.57M dimensions. TD=140 did not complete filtering.
Using taskset -c 10-19 to lock to socket #2 with VBITS=128 msieve compilation option.
10-threaded ETA after 1% of job: 5 hr 25 min.
5-threaded ETA after 1% of job: 9 hr 30 min.

Future tests will explore ETAs of smaller target densities, as well as splitting the job over two sockets.

Last fiddled with by VBCurtis on 2020-02-18 at 01:56
VBCurtis is offline   Reply With Quote
Old 2020-02-17, 22:20   #3
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3,347 Posts
Default

Machine: 2 sockets of 20-core Cascade-Lake Xeon
Just used the default density.
matrix is 5149968 x 5150142 (1913.9 MB) with weight 597210677 (115.96/col)

Here is a basic 40 threaded job across both sockets (actually, I guess it is thread-limited to 32 threads):
4 hrs 58 min: /msieve -v -nc2 -t 40

Using MPI helps a lot. Here are various configurations using different VBITS settings (timings after 1% elasped):

Code:
2x20 core VBITS=64
2x20 core VBITS=64
2 hrs 30 min: mpirun -np  4 msieve -nc2 2,2 -v -t 10
2 hrs 43 min: mpirun -np  8 msieve -nc2 2,4 -v -t 5
3 hrs  1 min: mpirun -np 20 msieve -nc2 4,5 -v -t 2
3 hrs 23 min: mpirun -np 40 msieve -nc2 5,8 -v
3 hrs 23 min: mpirun -np 40 msieve -nc2 8,5 -v

2x20 core VBITS=128
2 hrs 32 min: mpirun -np  8 msieve -nc2 2,4 -v -t 5
2 hrs 36 min: mpirun -np 20 msieve -nc2 4,5 -v -t 2
2 hrs 45 min: mpirun -np 40 msieve -nc2 5,8 -v
2 hrs 47 min: mpirun -np  4 msieve -nc2 2,2 -v -t 10
2 hrs 54 min: mpirun -np 40 msieve -nc2 8,5 -v

2x20 core VBITS=256
2 hrs 43 min: mpirun -np 40 msieve -nc2 5,8 -v
2 hrs 44 min: mpirun -np  8 msieve -nc2 2,4 -v -t 5
2 hrs 47 min: mpirun -np 40 msieve -nc2 8,5 -v
2 hrs 49 min: mpirun -np  4 msieve -nc2 2,2 -v -t 10
3 hrs  2 min: mpirun -np 20 msieve -nc2 4,5 -v -t 2
VBITS=128 seems to be most-consistently fast.

Grids that were significantly less square (e.g., 2x20 or 4x1 -t 10) didn't do as well.
bsquared is offline   Reply With Quote
Old 2020-02-18, 01:50   #4
RichD
 
RichD's Avatar
 
Sep 2008
Kansas

1100011011002 Posts
Default

I have a Sandy Bridge Core-i5 (4 cores, no HT). Would this small machine be beneficial for a benchmark? I have two versions of Msieve; one with VBITS=128 and an older one without VBITS I use for poly search (GPU enabled). When I created the VBITS=128 I noticed about a 10% boost (or more) in my post-processing speed.
RichD is offline   Reply With Quote
Old 2020-02-18, 01:58   #5
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

449910 Posts
Default

RichD- yes! I'd like to see how various generations of hardware compare, regular desktop or Xeon-grade. This also helps others see if perhaps their msieve copy isn't as fast as it could be (e.g. compiling it oneself can prove *much* faster if the binary one finds online isn't compiled for the same architecture).
VBCurtis is offline   Reply With Quote
Old 2020-02-18, 03:28   #6
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

5·112·13 Posts
Default

Is there any way to tell how many cores and what target density was used by viewing the log file?

Maybe we are looking in the wrong place? Or maybe we can patch the source to include this info?

Xyzzy is offline   Reply With Quote
Old 2020-02-18, 04:51   #7
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

11×409 Posts
Default

Target density is listed in the log just below the polynomial, before msieve begins reading relations. If no density line is evident, then none was specified by the user and default density of 70 was used.
I believe the number of cores is listed when -nc2 phase begins; something like "8 threads" usually appears in the lines just before the first ETA is printed to the log.

Last fiddled with by VBCurtis on 2020-02-18 at 04:52
VBCurtis is offline   Reply With Quote
Old 2020-02-18, 12:53   #8
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

5×112×13 Posts
Default

Since there is no mention of threads or target density in the log files, these runs must have been done with the default target density and one thread.

linux.log.gz is an AMD 1920X CPU with quad-channel DDR4-2666 memory.
windows.log.gz is an Intel i7-9700K CPU with dual-channel DDR4-3200.

We will re-run these later with various settings to tune our systems better.

Attached Files
File Type: gz linux.log.gz (2.7 KB, 106 views)
File Type: gz windows.log.gz (2.7 KB, 108 views)
Xyzzy is offline   Reply With Quote
Old 2020-02-18, 14:44   #9
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

334710 Posts
Default

Looks like 2 threads for 43.9 hrs in the linux case:
Quote:
Originally Posted by linux.log.gz

Thu Jan 30 18:22:25 2020 commencing Lanczos iteration (2 threads)
Thu Jan 30 18:22:25 2020 memory use: 1762.9 MB
Thu Jan 30 18:23:17 2020 linear algebra at 0.0%, ETA 46h11m
Thu Jan 30 18:23:34 2020 checkpointing every 120000 dimensions
Sat Feb 1 14:08:28 2020 lanczos halted after 81439 iterations (dim = 5149917)
Sat Feb 1 14:08:33 2020 recovered 25 nontrivial dependencies
Sat Feb 1 14:08:33 2020 BLanczosTime: 158039
Summary:
43.9 hrs: 2 threads Linux AMD 1920X CPU with quad-channel DDR4-2666 memory
36.6 hrs: 1 thread Windows Intel i7-9700K CPU with dual-channel DDR4-3200

Last fiddled with by bsquared on 2020-02-18 at 14:46
bsquared is offline   Reply With Quote
Old 2020-03-10, 12:45   #10
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

5×112×13 Posts
Default

Here are benchmarks for 1 through 12 cores on our 1920X and a pretty chart.

The blue line in the chart represents perfect additional core utilization. For example, two cores would be twice as fast as one.

We graphed the linear algebra times.

All benchmarks were done on an otherwise idle system. IRL, with lots of stuff running, things slow down dramatically.

Attached Thumbnails
Click image for larger version

Name:	graph.png
Views:	141
Size:	19.0 KB
ID:	21858  
Attached Files
File Type: gz msieve.log.tar.gz (23.6 KB, 71 views)
Xyzzy is offline   Reply With Quote
Old 2020-03-10, 18:07   #11
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

11·409 Posts
Default

Xyzzy-
While unlikely, it is possible that 20 or 24 threads yields a bit of improvement. Hyperthreads don't always help on matrix solving, but since this is a benchmark thread it might be nice to demonstrate that.

I suggest 20 as alternative because using every possible HT might be impacted by any background process, but that effect should be reduced if we leave a few HTs 'open'. I've found situations where using N-1 cores runs faster than N cores, for what I presume are similar reasons.
VBCurtis is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PFGW benchmarking carpetpool Hardware 4 2019-09-30 20:06
Looking for benchmarking help with a Phenom or PhenomII X6 mrolle Software 25 2012-03-14 14:15
GMP 5.0.1 vs GMP 4.1.4 benchmarking unconnected GMP-ECM 5 2011-04-03 16:16
Benchmarking dual-CPU machines garo Software 2 2010-09-27 20:33
Benchmarking challenge! Xyzzy Software 17 2003-08-26 15:43

All times are UTC. The time now is 10:23.

Wed Dec 2 10:23:31 UTC 2020 up 83 days, 7:34, 1 user, load averages: 2.13, 2.23, 2.02

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.