mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Segmentation fault in msieve. (https://www.mersenneforum.org/showthread.php?t=17411)

include 2012-11-10 11:59

Segmentation fault in msieve.
 
Hello, all!

I try ti factoring a number with msieve-1.50 and received this error.
I build msieve with option MPI=1.
If you need additional information - contact me and I try to give it.

[msieve-1.50]$ ./msieve 58423868674360640853764652174229357809650619008996889226248823243442030054249 -q
[include-laptop:04187] *** Process received signal ***
[include-laptop:04187] Signal: Segmentation fault (11)
[include-laptop:04187] Signal code: Address not mapped (1)
[include-laptop:04187] Failing at address: 0xa0
[include-laptop:04187] [ 0] [0xb77a040c]
[include-laptop:04187] [ 1] /usr/lib/openmpi/libmpi.so.1(PMPI_Bcast+0x16) [0xb752b746]
[include-laptop:04187] [ 2] ./msieve() [0x8095962]
[include-laptop:04187] [ 3] ./msieve() [0x806c290]
[include-laptop:04187] [ 4] ./msieve() [0x805946a]
[include-laptop:04187] [ 5] ./msieve() [0x804cb71]
[include-laptop:04187] [ 6] ./msieve() [0x804b942]
[include-laptop:04187] [ 7] ./msieve() [0x804b178]
[include-laptop:04187] [ 8] /lib/libc.so.6(__libc_start_main+0xf5) [0xb73053d5]
[include-laptop:04187] [ 9] ./msieve() [0x804b4b9]
[include-laptop:04187] *** End of error message ***
Segmentation fault

Best regards,
Evgeny.

Batalov 2012-11-10 17:58

Looks like you have two incompatible libmpi.so - built with one, run with the other (the system one). You can link against the static lib to check if this is the reason.

jrk 2012-11-10 20:10

I think the problem is that the MPQS code does not set up the MPI grid, and this example is using MPQS. This leads to an error in the lanczos code, which always uses MPI when available.

When I try to run the example with an MPI-aware msieve, it fails in block_lanczos() in common/lanczos/lanczos.c, on this line:
[code] /* tell all the MPI processes whether a post lanczos matrix
was constructed */

MPI_TRY(MPI_Bcast(&have_post_lanczos, 1, MPI_INT, 0,
obj->mpi_la_col_grid))[/code]
And outputs:
[code][atlas:12145] *** An error occurred in MPI_Bcast
[atlas:12145] *** on communicator MPI_COMM_WORLD
[atlas:12145] *** MPI_ERR_COMM: invalid communicator
[atlas:12145] *** MPI_ERRORS_ARE_FATAL (goodbye)
[/code]

include 2012-11-13 20:17

[QUOTE=Batalov;317853]Looks like you have two incompatible libmpi.so - built with one, run with the other (the system one). You can link against the static lib to check if this is the reason.[/QUOTE]

Hello!

This is a output of find comand in /usr/lib folder
find /usr/lib -name \*libmpi\*
/usr/lib/openmpi/libmpi.so.1.0.3
/usr/lib/openmpi/libmpi_f90.so.1.1.0
/usr/lib/openmpi/libmpi.so
/usr/lib/openmpi/libmpi.so.1
/usr/lib/openmpi/libmpi_f77.so.1
/usr/lib/openmpi/libmpi_f77.so.1.0.3
/usr/lib/openmpi/libmpi_f90.so.1
/usr/lib/openmpi/libmpi_f77.so
/usr/lib/openmpi/libmpi_f90.so
/usr/lib/openmpi/libmpi_cxx.so
/usr/lib/openmpi/libmpi_cxx.so.1
/usr/lib/openmpi/libmpi_cxx.so.1.0.1

Try to update openmpi and rebuild msieve. Nothing change.

Sorry for long answers.

jasonp 2012-11-14 00:59

jrk is right, building with MPI and running the QS code will never work. It's not that difficult to *make* it work, but there's no point in doing so for such a small input (one thread will finish the resulting matrix in about one second)


All times are UTC. The time now is 01:45.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.