![]() |
![]() |
#1 |
"Ed Hall"
Dec 2009
Adirondack Mtns
363510 Posts |
![]()
Recently some members helped me solve my compilation trouble with the ggnfs package. Thanks you to all that helped.
But, now I have some new questions. Empirical data from my machines appears to point to lower L1_BITS values as running faster than higher values, although I haven't done really extensive research. I'm hoping familiar users already know the answers so I don't have to research heavily. To the point, in more than one machine setting the L1_BITS value to 14 vs. 15 vs. 16 lead to the increase of time per relation as much of a difference as 0.04107 secs/rel (14 bits) vs. 0.05295 secs/rel (16 bits). I have not studied whether the number of relations is affected in either direction. Is it possible that I just happen to be working with a range of composites size that is better handled by smaller L1_BITS value or is this something that is due to my ancient hardware, or, is it that I am looking at something wrong? Thanks... |
![]() |
![]() |
![]() |
#2 |
"Curtis"
Feb 2005
Riverside, CA
2·2,339 Posts |
![]()
My very elementary understanding is that setting refers to the core's L1 cache size. Intel chips of all but the oldest vintage work best at 15 bits (that's 32k, right?), while some AMD chips work best at 14 though I don't recall which specific generation.
I assume this refers to data cache size, but I'm no programmer so that's just a guess. I have, at times, wondered if some machine workloads involving heavy thread-loads might benefit from 14 for this setting, even if the architecture is compatible with 15 in theory. I'm pretty sure the software should find no different relations from the different settings, but I'd like confirmation of such. |
![]() |
![]() |
![]() |
#3 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
32×17×61 Posts |
![]()
L1_bits should be set such that the L1 cache size (of the CPU where you will run) = 2^L1_bits. That seems to be the paradigm of the code.
8Kb -> L1_bits=13 16Kb -> L1_bits=14 (old AMD chips and old Pentiums) 32Kb -> L1_bits=15 (most Intel chips) 64Kb -> L1_bits=16 (Phenoms, for example) L1_bits=16 really makes for slightly faster sievers for the Phenoms, but the same binary runs slower (than that with L1_bits=15) on Xeons. There were some exceptions when I tried various binaries (quite a few years ago). P.S. While I was typing, this already became a cross-post, but I'll just leave it here. |
![]() |
![]() |
![]() |
#4 |
"Ed Hall"
Dec 2009
Adirondack Mtns
70638 Posts |
![]()
Thanks! I'll study this a bit more, after I find out why my factmsieve.py mysteriously quit working this morning. I can't even run it manually. But, that is for a different thread, even though the future use of the L1_BITS value depends on it working.
|
![]() |
![]() |
![]() |
#5 |
"Ed Hall"
Dec 2009
Adirondack Mtns
E3316 Posts |
![]()
The machine that I have currently running with L1_BITS=14 lists the following via lswh:
Code:
configuration: cores=2 enabledcores=2 threads=2 *-cache:0 description: L1 cache physical id: 700 size: 32KiB capacity: 32KiB capabilities: internal write-back data |
![]() |
![]() |
![]() |
#6 |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×41×71 Posts |
![]()
Per core. What is the cpu? I assume modern intel?
It does raise a good point though. Hyperthreading gives a large speed improvement in sieving. We would possibly get an even better speed with L1_bits one lower with hyperthreading as the L1 cache would be shared between the threads. It would be nice to find why hyperthreading helps and fix the slowdown at somepoint. Last fiddled with by henryzz on 2016-12-05 at 22:17 |
![]() |
![]() |
![]() |
#7 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
5×727 Posts |
![]() Quote:
Code:
*-cpu description: CPU product: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz vendor: Intel Corp. physical id: 400 bus info: cpu@0 slot: Microprocessor size: 1866MHz width: 64 bits clock: 1066MHz capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow configuration: cores=2 enabledcores=2 threads=2 *-cache:0 description: L1 cache physical id: 700 size: 32KiB capacity: 32KiB capabilities: internal write-back data *-cache:1 description: L2 cache physical id: 701 size: 2MiB capacity: 2MiB capabilities: internal varies unified Code:
*-cpu description: CPU product: Intel(R) Core(TM)2 Duo CPU U7600 @ 1.20GHz vendor: Intel Corp. physical id: 4 bus info: cpu@0 version: Intel(R) Core(TM)2 Duo CPU U7600 @ 1.20GHz slot: U10 size: 1200MHz capacity: 1200MHz width: 64 bits clock: 133MHz capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx x86-64 constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow vnmi flexpriority cpufreq *-cache:0 description: L1 cache physical id: 5 slot: Internal L1 Cache size: 64KiB capacity: 64KiB capabilities: burst internal write-back unified *-cache:1 description: L2 cache physical id: 6 slot: Internal L2 Cache size: 2MiB capacity: 2MiB capabilities: burst external write-back unified |
|
![]() |
![]() |
![]() |
#8 |
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
160658 Posts |
![]()
Check the processor specifications online. Sometimes the L1 cache might really be two separate stores, one for data and one for instructions. Only the former is usable for user program data (as the name suggests).
|
![]() |
![]() |
![]() |
#9 | |||
"Ed Hall"
Dec 2009
Adirondack Mtns
5×727 Posts |
![]() Quote:
Quote:
![]() Just to note, the first cpu says: Quote:
Thanks much! That sheds some more light. |
|||
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
GMP under win64 compilation | paulunderwood | Programming | 1 | 2015-10-30 02:18 |
CUDA 64-bit compilation | wombatman | Msieve | 1 | 2014-02-08 18:40 |
MSieve 1.51 (GPU) compilation | wombatman | Msieve | 28 | 2013-05-16 16:29 |
LLR compilation error | nuggetprime | Software | 1 | 2008-08-29 15:17 |
Request for compilation | fivemack | Factoring | 12 | 2008-06-13 06:07 |