Register FAQ Search Today's Posts Mark Forums Read

 2021-01-18, 22:49 #892 charybdis   Apr 2020 22510 Posts How's CPU usage looking? Is msieve actually doing anything or is it just hanging? And while we're at it, how's memory usage?
2021-01-19, 05:43   #893
frmky

Jul 2003
So Cal

2·3·347 Posts

Quote:
 Originally Posted by pinhodecarlos Maybe get in touch with Greg from NFS@Home to see if he can give any support or advise?!
This is well outside of anything I've run. My largest has been about 1 billion relations. However, if the relations are available online to download I'll be happy to play with it.

 2021-01-19, 10:13 #894 wreck     "Bo Chen" Oct 2005 Wuhan,China 167 Posts If possible , could you give it another try with unique relations less than 1600M? As a comparison, VBCurtis done 2,2330L (gnfs 207) with 162M unique relations. And , in my memory, there is a time that fivemack finish a nfs job using relations count 720M successfully, while 800M failed (using lpb33 ). A rough guess is that sometimes ago, there is a barrier near 800M, now it jump to 1600M for some reason.
 2021-01-19, 15:18 #895 VBCurtis     "Curtis" Feb 2005 Riverside, CA 22×7×132 Posts No, I didn't use 162M uniques. Did you miss a zero? This job is tougher than the GNFS-207 by quite a lot, and uses bounds which are expected to require more relations (36/33 should require more than 35/34). 2e9 relations may not be enough, but is quite surely not too many. Citing relations counts for 33-lp jobs is totally irrelevant to this job, which is using much larger bounds. The number of relations left heading into merge shows rather clearly that this is not oversieved. There is no reason to think the old msieve large-dataset bug is the culprit here. However, Charybdis' idea to cull all 36-bit-large-prime relations from the dataset and try to filter as a 33/35 job has merit.
2021-01-19, 15:25   #896
ryanp

Jun 2012
Boulder, CO

22×3×23 Posts

Quote:
 Originally Posted by VBCurtis There is no reason to think the old msieve large-dataset bug is the culprit here. However, Charybdis' idea to cull all 36-bit-large-prime relations from the dataset and try to filter as a 33/35 job has merit.
I'm willing to try culling the 36-bit large prime relations. Would you be able to construct the "grep" command? I don't quite know the msieve relation format well enough.

2021-01-19, 15:47   #897
charybdis

Apr 2020

32×52 Posts

Quote:
 Originally Posted by ryanp I'm willing to try culling the 36-bit large prime relations. Would you be able to construct the "grep" command? I don't quite know the msieve relation format well enough.
grep -v ",[8-9a-f]........$" should remove all lines ending with a 36-bit prime, which is what we need.  2021-01-23, 22:13 #898 frmky Jul 2003 So Cal 2×3×347 Posts There's definitely an msieve filtering bug. Good to have a data set that triggers it. Unfortunate that msieve needs to run for 15 hours to trigger it. Let's see what gdb says... Code: commencing singleton removal, initial pass memory use: 41024.0 MB reading all ideals from disk memory use: 39309.4 MB commencing in-memory singleton removal begin with 2074342591 relations and 1985137022 unique ideals reduce to 992888838 relations and 765115141 ideals in 20 passes max relations containing the same ideal: 35 reading ideals above 720000 commencing singleton removal, initial pass memory use: 21024.0 MB reading all ideals from disk memory use: 46989.5 MB keeping 913886427 ideals with weight <= 200, target excess is 5352837 commencing in-memory singleton removal begin with 992888838 relations and 913886427 unique ideals reduce to 992241034 relations and 913238552 ideals in 15 passes max relations containing the same ideal: 200 removing 8630643 relations and 8331224 ideals in 2000000 cliques commencing in-memory singleton removal [kepler-0-0:29616] *** Process received signal *** [kepler-0-0:29616] Signal: Segmentation fault (11) [kepler-0-0:29616] Signal code: Address not mapped (1) [kepler-0-0:29616] Failing at address: 0x7f001013550c [kepler-0-0:29616] [ 0] /lib64/libpthread.so.0(+0xf5e0)[0x7eff125965e0] [kepler-0-0:29616] [ 1] ./msieve993_new[0x43ffd0] [kepler-0-0:29616] [ 2] ./msieve993_new[0x463ae7] [kepler-0-0:29616] [ 3] ./msieve993_new[0x43c2fb] [kepler-0-0:29616] [ 4] ./msieve993_new[0x4288dd] [kepler-0-0:29616] [ 5] ./msieve993_new[0x415bc4] [kepler-0-0:29616] [ 6] ./msieve993_new[0x405b1b] [kepler-0-0:29616] [ 7] ./msieve993_new[0x404987] [kepler-0-0:29616] [ 8] ./msieve993_new[0x40454c] [kepler-0-0:29616] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7eff11f03c05] [kepler-0-0:29616] [10] ./msieve993_new[0x4045f2] [kepler-0-0:29616] *** End of error message ***  2021-01-25, 01:21 #899 wreck "Bo Chen" Oct 2005 Wuhan,China 167 Posts Could you compile a debug version to see which line it crashing?  2021-01-27, 02:42 #900 frmky Jul 2003 So Cal 2·3·347 Posts We’re in filter_purge_singletons_core() at common/filter/singleton.c:441, looping through the relations counting the number of times that each ideal occurs. But the last relation is broken! We’re at the last relation since i = num_relations-1. The ideal_list for this last relation contains entries greater than num_ideals. Don’t know why though… Probably an overflow somewhere. But where? Trying to track it down but might take a while since I don't have a lot of time to devote to this and individual tests take a day. Code: read 2050M relations read 2060M relations read 2070M relations found 578077506 hash collisions in 2074342600 relations commencing duplicate removal, pass 2 found 9 duplicates and 2074342591 unique relations memory use: 16280.0 MB reading ideals above 1549860864 commencing singleton removal, initial pass memory use: 41024.0 MB reading all ideals from disk memory use: 39309.4 MB commencing in-memory singleton removal begin with 2074342591 relations and 1985137022 unique ideals reduce to 992888838 relations and 765115141 ideals in 20 passes max relations containing the same ideal: 35 reading ideals above 720000 commencing singleton removal, initial pass memory use: 21024.0 MB reading all ideals from disk memory use: 46989.5 MB keeping 913886427 ideals with weight <= 200, target excess is 5352837 commencing in-memory singleton removal begin with 992888838 relations and 913886427 unique ideals reduce to 992241034 relations and 913238552 ideals in 15 passes max relations containing the same ideal: 200 removing 8630643 relations and 8331224 ideals in 2000000 cliques commencing in-memory singleton removal Program received signal SIGSEGV, Segmentation fault. 0x000000000044a178 in filter_purge_singletons_core (obj=0x6de250, filter=0x7fffffffc710) at common/filter/singleton.c:441 441 freqtable[ideal]++; Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7_4.2.x86_64 gmp-6.0.0-15.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) backtrace #0 0x000000000044a178 in filter_purge_singletons_core (obj=0x6de250, filter=0x7fffffffc710) at common/filter/singleton.c:441 #1 0x0000000000475e26 in filter_purge_cliques (obj=0x6de250, filter=0x7fffffffc710) at common/filter/clique.c:646 #2 0x0000000000443cf6 in filter_make_relsets (obj=0x6de250, filter=0x7fffffffc710, merge=0x7fffffffc6e0, min_cycles=5352837) at common/filter/filter.c:65 #3 0x000000000042f0fb in do_merge (obj=0x6de250, filter=0x7fffffffc710, merge=0x7fffffffc6e0, target_density=130) at gnfs/filter/filter.c:187 #4 0x000000000042fad0 in nfs_filter_relations (obj=0x6de250, n=0x7fffffffc960) at gnfs/filter/filter.c:411 #5 0x00000000004172ac in factor_gnfs (obj=0x6de250, input_n=0x7fffffffcb40, factor_list=0x7fffffffcbd0) at gnfs/gnfs.c:153 #6 0x0000000000404dcd in msieve_run_core (obj=0x6de250, n=0x7fffffffcb40, factor_list=0x7fffffffcbd0) at common/driver.c:158 #7 0x00000000004051b4 in msieve_run (obj=0x6de250) at common/driver.c:268 #8 0x00000000004038a4 in factor_integer ( buf=0x7fffffffd650 "38315657995194363034877423503084547947166751578940985843521212522635100246118059073205923746544331860205171086654671434719340358393954962433533212457600196112076644876654207767427267797808629935905445"..., flags=1027, savefile_name=0x0, logfile_name=0x0, nfs_fbfile_name=0x0, seed1=0x7fffffffd64c, seed2=0x7fffffffd648, max_relations=0, cpu=cpu_core, cache_size1=32768, cache_size2=20971520, num_threads=0, which_gpu=0, nfs_args=0x7fffffffdcee "target_density=130") at demo.c:235 #9 0x00000000004046bd in main (argc=4, argv=0x7fffffffd988) at demo.c:601 (gdb) info frame Stack level 0, frame at 0x7fffffffc340: rip = 0x44a178 in filter_purge_singletons_core (common/filter/singleton.c:441); saved rip 0x475e26 called by frame at 0x7fffffffc370 source language c. Arglist at 0x7fffffffc2b8, args: obj=0x6de250, filter=0x7fffffffc710 Locals at 0x7fffffffc2b8, Previous frame's sp is 0x7fffffffc340 Saved registers: rip at 0x7fffffffc338 (gdb) info locals ideal = 2057043263 i = 983610390 j = 5 freqtable = 0x7fff1d2ad010 relation_array = 0x7ff47e0f1010 curr_relation = 0x7ffc79bde3a0 old_relation = 0x7f1fd8001e8480 orig_num_ideals = 913238552 num_passes = 32767 num_relations = 983610391 num_ideals = 913238552 new_num_relations = 8630643 (gdb) print *curr_relation$2 = {rel_index = 15834702, ideal_count = 36 '\$', gf2_factors = 69 'E', connected = 156 '\234', ideal_list = {885450581, 598542783, 158747510, 638930804, 786848709, 2057043263, 3845, 186587920, 18476918, 67526419, 598542783, 872055544, 2057043265, 2046824196, 3942562, 102078889, 58908383, 865042570, 2057043267, 872418055, 9125741, 85351335, 11880544, 43981132, 865042570, 873512089, 893921179, 2057043271, 2567, 93072473, 26460704, 33365801, 865042570, 517341201, 275602560, 862343378, 2057043273, 83889159, 66167424, 46818875, 59842776, 59333874, 194384291, 865042570, 172206968, 2057043276, 50334725, 905653709, 628443801, 865042570, 801305779, 869019178, 2057043277, 2046821898, 20184373, 101514515, 16353075, 87715774, 36505563, 58989284, 865042570, 598565998, 334060622, 469101029, 2057043280, 83889158, 73623668, 106612925, 359795440, 9473259, 157931537, 772472752, 2057043282, 218106376, 140592574, 157045250, 477152215, 866943502, 6146950, 41607604, 44380953, 772472752, 2057043284, 150998022, 105306193, 842728936, 7879065, 444703037, 772472752, 403730401, 2057043289, 83889414, 320662844, 329981033, 248067990, 772472752, 23316642, 631501233, 2057043290, 822087174}} (gdb)
 2021-01-31, 05:49 #901 wreck     "Bo Chen" Oct 2005 Wuhan,China A716 Posts After read the code (msieve r1030) about eight hours (1767 code line read, folder common/filter, file singleton.c, clique.c, etc.), this filter problem seems not easy to solve. But here are some thinkings. 1. From common/filter/filter_priv.h The definition of ideal_map_t is typedef struct { uint32 payload : 30; /* offset in list of ideal_relation_t structures where the linked list of ideal_relation_t's for this ideal starts */ uint32 clique : 1; /* nonzero if this ideal can participate in a clique */ uint32 connected : 1; /* nonzero if this ideal has already been added to a clique under construction */ } ideal_map_t; the maximum value of payload is 2^30, which is about 1000M. If the ideal is more than 1000M in function purge_cliques_core(), it is possible that the filter would not work properly. Here when entering into purge_cliques_core function, the relation count is 992888838, less than 2^30, so here this 30 bit should not be the reason of the crash. 2. 2057043265 = 0x7A98FD41 0x3A98FD41 = 983104833 This number is near the num_relations (983610391). It is possible that the ideal_map_t.clique bit is not cleared propered in function purge_cliques_core(). But this is also a guess. 3. In function filter_purge_singletons_core(). curr_relation->ideal_count is 36, but there are 3 values in curr_relation->ideal_list is the same (865042570). curr_relation->ideal_list[17] curr_relation->ideal_list[24] curr_relation->ideal_list[32] It is a little strange. 4. In function purge_cliques_core(), Line 370 ideal_map[ideal].payload = num_reverse++; the variable num_reverse is possiblly exceed 2^32, while its type is uint32. 5. A question. Does ryanp give a try to use unique relations less than 1600M? If done, what's the result?
 2021-01-31, 07:53 #902 Happy5214     "Alexander" Nov 2008 The Alamo City 10718 Posts The length of freqtable is num_ideals (line 430), and ideal (the index) is greater than that, so the array reference is out-of-bounds and thus we get the segfault. The real question is why there are so many entries in ideal_list that are above num_ideals. Last fiddled with by Happy5214 on 2021-01-31 at 07:55

 Similar Threads Thread Thread Starter Forum Replies Last Post Xyzzy GPU Computing 1 2017-05-17 20:22 Mark Rose GPU Computing 52 2016-07-02 12:11 firejuggler GPU Computing 12 2016-02-23 06:55 Elhueno Homework Help 5 2008-06-12 16:37 jchein1 Factoring 30 2005-05-30 14:43

All times are UTC. The time now is 22:52.

Tue Apr 13 22:52:14 UTC 2021 up 5 days, 17:33, 1 user, load averages: 3.05, 2.56, 2.35