mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2016-12-05, 04:10   #1
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

E8116 Posts
Default L1-BITS values for gnfs siever compilation

Recently some members helped me solve my compilation trouble with the ggnfs package. Thanks you to all that helped.

But, now I have some new questions. Empirical data from my machines appears to point to lower L1_BITS values as running faster than higher values, although I haven't done really extensive research. I'm hoping familiar users already know the answers so I don't have to research heavily.

To the point, in more than one machine setting the L1_BITS value to 14 vs. 15 vs. 16 lead to the increase of time per relation as much of a difference as 0.04107 secs/rel (14 bits) vs. 0.05295 secs/rel (16 bits).

I have not studied whether the number of relations is affected in either direction. Is it possible that I just happen to be working with a range of composites size that is better handled by smaller L1_BITS value or is this something that is due to my ancient hardware, or, is it that I am looking at something wrong?

Thanks...
EdH is offline   Reply With Quote
Old 2016-12-05, 04:52   #2
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

13×367 Posts
Default

My very elementary understanding is that setting refers to the core's L1 cache size. Intel chips of all but the oldest vintage work best at 15 bits (that's 32k, right?), while some AMD chips work best at 14 though I don't recall which specific generation.

I assume this refers to data cache size, but I'm no programmer so that's just a guess.

I have, at times, wondered if some machine workloads involving heavy thread-loads might benefit from 14 for this setting, even if the architecture is compatible with 15 in theory. I'm pretty sure the software should find no different relations from the different settings, but I'd like confirmation of such.
VBCurtis is offline   Reply With Quote
Old 2016-12-05, 04:57   #3
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

23·11·107 Posts
Default

L1_bits should be set such that the L1 cache size (of the CPU where you will run) = 2^L1_bits. That seems to be the paradigm of the code.
8Kb -> L1_bits=13
16Kb -> L1_bits=14 (old AMD chips and old Pentiums)
32Kb -> L1_bits=15 (most Intel chips)
64Kb -> L1_bits=16 (Phenoms, for example)

L1_bits=16 really makes for slightly faster sievers for the Phenoms, but the same binary runs slower (than that with L1_bits=15) on Xeons. There were some exceptions when I tried various binaries (quite a few years ago).

P.S. While I was typing, this already became a cross-post, but I'll just leave it here.
Batalov is offline   Reply With Quote
Old 2016-12-05, 18:50   #4
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

47·79 Posts
Default

Thanks! I'll study this a bit more, after I find out why my factmsieve.py mysteriously quit working this morning. I can't even run it manually. But, that is for a different thread, even though the future use of the L1_BITS value depends on it working.
EdH is offline   Reply With Quote
Old 2016-12-05, 20:36   #5
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

72018 Posts
Default

The machine that I have currently running with L1_BITS=14 lists the following via lswh:
Code:
          configuration: cores=2 enabledcores=2 threads=2
        *-cache:0
             description: L1 cache
             physical id: 700
             size: 32KiB
             capacity: 32KiB
             capabilities: internal write-back data
Is the 32KiB shared between the two cores, or is it supposed to be per core? If shared, I guess that would explain why 14 bits works better...
EdH is offline   Reply With Quote
Old 2016-12-05, 22:16   #6
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

10110111011012 Posts
Default

Per core. What is the cpu? I assume modern intel?
It does raise a good point though. Hyperthreading gives a large speed improvement in sieving. We would possibly get an even better speed with L1_bits one lower with hyperthreading as the L1 cache would be shared between the threads.
It would be nice to find why hyperthreading helps and fix the slowdown at somepoint.

Last fiddled with by henryzz on 2016-12-05 at 22:17
henryzz is offline   Reply With Quote
Old 2016-12-05, 23:07   #7
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

47·79 Posts
Default

Quote:
Originally Posted by henryzz View Post
Per core. What is the cpu? I assume modern intel?
It does raise a good point though. Hyperthreading gives a large speed improvement in sieving. We would possibly get an even better speed with L1_bits one lower with hyperthreading as the L1 cache would be shared between the threads.
It would be nice to find why hyperthreading helps and fix the slowdown at somepoint.
Here's the entire cpu data from lshw:
Code:
     *-cpu
          description: CPU
          product: Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz
          vendor: Intel Corp.
          physical id: 400
          bus info: cpu@0
          slot: Microprocessor
          size: 1866MHz
          width: 64 bits
          clock: 1066MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow
          configuration: cores=2 enabledcores=2 threads=2
        *-cache:0
             description: L1 cache
             physical id: 700
             size: 32KiB
             capacity: 32KiB
             capabilities: internal write-back data
        *-cache:1
             description: L2 cache
             physical id: 701
             size: 2MiB
             capacity: 2MiB
             capabilities: internal varies unified
The following machine runs better at 15 than 16. I haven't tried 14, yet:
Code:
     *-cpu
          description: CPU
          product: Intel(R) Core(TM)2 Duo CPU     U7600  @ 1.20GHz
          vendor: Intel Corp.
          physical id: 4
          bus info: cpu@0
          version: Intel(R) Core(TM)2 Duo CPU     U7600  @ 1.20GHz
          slot: U10
          size: 1200MHz
          capacity: 1200MHz
          width: 64 bits
          clock: 133MHz
          capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx x86-64 constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow vnmi flexpriority cpufreq
        *-cache:0
             description: L1 cache
             physical id: 5
             slot: Internal L1 Cache
             size: 64KiB
             capacity: 64KiB
             capabilities: burst internal write-back unified
        *-cache:1
             description: L2 cache
             physical id: 6
             slot: Internal L2 Cache
             size: 2MiB
             capacity: 2MiB
             capabilities: burst external write-back unified
EdH is offline   Reply With Quote
Old 2016-12-06, 08:06   #8
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Check the processor specifications online. Sometimes the L1 cache might really be two separate stores, one for data and one for instructions. Only the former is usable for user program data (as the name suggests).
Dubslow is offline   Reply With Quote
Old 2016-12-06, 14:39   #9
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

371310 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Check the processor specifications online. Sometimes the L1 cache might really be two separate stores, one for data and one for instructions. Only the former is usable for user program data (as the name suggests).
So, IOW, that second CPU I listed should be considered 32kiB:
Quote:
On-die, primary 32-kB instruction cache and 32-kB write-back data cache per core
Bummer, but good to know. Unfortunately, that means a lot of research for all these machines. I don't think I have more than two of any particular processor.

Just to note, the first cpu says:
Quote:
Two 32-KB Level 1 data caches
So, both cpus appear equal in this spec...

Thanks much! That sheds some more light.
EdH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GMP under win64 compilation paulunderwood Programming 1 2015-10-30 02:18
CUDA 64-bit compilation wombatman Msieve 1 2014-02-08 18:40
MSieve 1.51 (GPU) compilation wombatman Msieve 28 2013-05-16 16:29
LLR compilation error nuggetprime Software 1 2008-08-29 15:17
Request for compilation fivemack Factoring 12 2008-06-13 06:07

All times are UTC. The time now is 23:10.

Thu May 6 23:10:58 UTC 2021 up 28 days, 17:51, 0 users, load averages: 2.34, 2.44, 2.37

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.