mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > YAFU

Reply
 
Thread Tools
Old 2016-01-07, 04:48   #1
Demonslay335
 
Demonslay335's Avatar
 
Jan 2016

5 Posts
Default Understanding NFS

Hello! I have been recently helping victims of a particular ransomware that has been discovered to have a weakness in it's implementation of storing the AES key. In assisting with this, I (and several others) have seemingly been thrust into the exciting world of factoring large numbers!

I've been slowly understanding atleast how to use Yafu to factor these numbers (ranging from C78 to C156 on average), but would like to learn more to make sure we are utilizing the program to its fullest potential. The goal is to factor as fast as possible in order to help all of these victims.

My current interest is in learning how to interpret the NFS step, and most importantly, have a rough idea on an ETA possibly. It's fine if a number legitimately needs to take a day or two to factor, but it's the mystery of whether it will finish in a minute or keep running through the night that is frustrating.

Can I have some help on interpreting these lines?

Quote:
nfs: commencing algebraic side lattice sieving over range: 1837931 - 1841931
nfs: commencing algebraic side lattice sieving over range: 1841931 - 1845931
nfs: commencing algebraic side lattice sieving over range: 1865931 - 1869931
nfs: commencing algebraic side lattice sieving over range: 1861931 - 1865931
nfs: commencing algebraic side lattice sieving over range: 1845931 - 1849931
nfs: commencing algebraic side lattice sieving over range: 1849931 - 1853931
nfs: commencing algebraic side lattice sieving over range: 1853931 - 1857931
nfs: commencing algebraic side lattice sieving over range: 1857931 - 1861931
Warning: lowering FB_bound to 1837930.
Warning: lowering FB_bound to 1857930.
Warning: lowering FB_bound to 1841930.
Warning: lowering FB_bound to 1853930.
Warning: lowering FB_bound to 1849930.
Warning: lowering FB_bound to 1861930.
Warning: lowering FB_bound to 1865930.
Warning: lowering FB_bound to 1845930.
total yield: 47281, q=1841941 (0.01336 sec/rel)
total yield: 49675, q=1869947 (0.01328 sec/rel)
total yield: 53341, q=1853939 (0.01333 sec/rel)
total yield: 55511, q=1845931 (0.01292 sec/rel)
total yield: 55440, q=1861961 (0.01297 sec/rel)
total yield: 57361, q=1865939 (0.01292 sec/rel)
total yield: 58309, q=1857931 (0.01281 sec/rel)
total yield: 56229, q=1849933 (0.01345 sec/rel)
This is part of the following command, with yafu 1.34.

Quote:
yafu-x64 factor(0x497FC916CFDCD6C67651B9A7009F9239FD0AAEAD53F738D31C47E8AAC737B08FF7200D7ACA0BA5AC5C6D753A12A99D2596735CD44B831E5D0DDCB749BBE9F035) -threads 8
Of course after ECM curves, this is reduced to a C108 for NFS to run with, but it has been running NFS for about 24 hours on my i7 920, and I just don't know how close to completion it is. I like that the SIQS sieving gives me a more clear-cut ETA, but realize yafu optimizes which algorithm it should run based on the composite; I'm no major mathematician in this field, so I trust its judgement.

Thanks for any help, and for this very helpful program.
Demonslay335 is offline   Reply With Quote
Old 2016-01-07, 13:50   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

354310 Posts
Default

The source of the uncertainty here is that the numbers everyone is trying to factor from Teslacrypt are random, and hence have factors of random sizes. If a particular number has small factors they can be found with specialized factorization methods very quickly; but if it has no small factors you are left having to use the quadratic sieve and number field sieve. Those methods don't care whether factors are large or small, and the time they take is predictable and based only on the size of the number to be factored. Unfortunately they take a significant amount of time (i.e. overnight for a 120 digit input, 1-2 weeks for a 150 digit input).

So the time needed to factor a random number completely depends on the size of the factors. Of course for a random number you don't know the size of the factors until you find them, so the best we can do is hope that the number contains small factors, try methods which will only work if that's the the case, and if they succeed the uncertainty is reduced because the number to factor has been reduced in size. Once you try all of the specialized methods for a little while and they stop succeeding, and the number hasn't been completely factored yet, you can be fairly sure any remaining factors are large and can switch to the high-powered factorization methods.

If the Teslacrypt authors had not made a mistake but instead made the numbers to be factored strong semiprimes, i.e. numbers with only two factors and both of them large, then you can skip all the specialized methods because only QS and NFS have a chance of succeeding, and every number to factor will take the worst-case time.
jasonp is offline   Reply With Quote
Old 2016-01-07, 14:06   #3
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

67738 Posts
Default

Jason's answer is excellent - I'll just try to add a little bit more.

As you've probably seen, NFS can be thought of as 3 separate processes:
1) polynomial search
2) sieving
3) post processing

although 3) can be further broken down into
a) filtering
b) linear algebra (matrix step)
c) square root

The lines you posted come from the sieving step. After this has gone on awhile you should start to see yafu printing lines like this (you may have to run the command with the -v option):

Code:
nfs: found 105308 relations, need at least 1936352 (filtering ETA: 0h 9m), continuing with sieving ...
For given job parameters, yafu knows about how many relations it needs to find during the sieving step. It keeps track of the rate at which you are finding them in order to come up with an ETA for the beginning of step 3 (filtering). Sieving proceeds in smallish batches, so these messages should be printed every few minutes or so.

Once you start filtering, you are nearly done, so the ETA for filtering is close to the ETA for the whole job. Postprocessing time also increases with size of the input number, but for C150 or less it will be a small fraction of the total time.

Last fiddled with by bsquared on 2016-01-07 at 14:08 Reason: -v is required when running the job...
bsquared is offline   Reply With Quote
Old 2016-01-07, 15:03   #4
Demonslay335
 
Demonslay335's Avatar
 
Jan 2016

1012 Posts
Default

Thanks for the information. How did you ever guess it was TeslaCrypt. ;)

I paused that particular task, re-ran tune(), and resumed it overnight. It took only 3 hours this time - maybe something goofed up my first tune().

I've paused and resumed another task with the verbose flag, and it is telling me 7h ETA. Very helpful! Is 10M relations reasonable for a C121?
Demonslay335 is offline   Reply With Quote
Old 2016-01-07, 15:16   #5
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

Quote:
Originally Posted by Demonslay335 View Post
Thanks for the information. How did you ever guess it was TeslaCrypt. ;)
We've had a lot of new members in the last two weeks
jasonp is offline   Reply With Quote
Old 2016-01-07, 15:22   #6
Demonslay335
 
Demonslay335's Avatar
 
Jan 2016

5 Posts
Default

I figured, lol. Sorry I missed that post - I jumped straight to yafu as it has been (for the most part) quicker at getting factors for most of my cases. I tried msieve with CUDA on my GTX 760, and it didn't seem like it was doing as well. I'll take a deeper read through that topic.

Never got hit by the virus myself, I'm just helping others.

Thanks for the help. Hello from BleepingComputer!

Last fiddled with by Demonslay335 on 2016-01-07 at 15:23
Demonslay335 is offline   Reply With Quote
Old 2016-01-07, 15:40   #7
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

3×1,193 Posts
Default

Attached Thumbnails
Click image for larger version

Name:	Screen Shot 01-07-16 at 09.38 AM.PNG
Views:	370
Size:	35.4 KB
ID:	13685  
bsquared is offline   Reply With Quote
Old 2016-01-07, 17:06   #8
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

72·31 Posts
Default

Quote:
Originally Posted by jasonp View Post
The source of the uncertainty here is that the numbers everyone is trying to factor from Teslacrypt are random, and hence have factors of random sizes. If a particular number has small factors they can be found with specialized factorization methods very quickly; but if it has no small factors you are left having to use the quadratic sieve and number field sieve.
..
So the time needed to factor a random number completely depends on the size of the factors. Of course for a random number you don't know the size of the factors until you find them,
..
If the Teslacrypt authors had not made a mistake but instead made the numbers to be factored strong semiprimes, i.e. numbers with only two factors and both of them large, then you can skip all the specialized methods because only QS and NFS have a chance of succeeding, and every number to factor will take the worst-case time.
I've checked 12 fully factored numbers from the appropriate forum and for all of them it was true that there exists K and L for that n=K*L, where the factor and cofactor (so K and L) is in the (2^252,2^256) interval. ( I can still imagine that the size of the factor,cofactor could be anything, but with smaller probability ).

Note that this is similar to a standard choice for RSA, but in those cases K and L is prime, but not here, and this gives that you can use this fact to early abort in Ecm (or to continue it to a larger level !).

Example: check this Tesla number: http://factordb.com/index.php?query=...98158773983475
after Ecm finds the p27 say at t30 and no other large factor. Say p27 divides K,
then with large probability there is only one missing prime factor of K since log(sqrt(2^256/p27))/log(10)=25.3, and we can also see the size of this unknown factor, since K is in (2^252,2^256), if P is the missing factor, then

Code:
2^252<K<2^256, and
P*p27<=K<=P*p27*5^2*251, from these:
10^45.61<P<10^50.62
And we are also almost sure that there is only one or two missing prime factor of L, where we see only small factors of L (these can be 5,5^2,251). Pushing Ecm to t35-t36 gives that it is very likely that we miss only one large factor of L, and that is huge (Q>2^252/5^2/251), larger than P. So we can stop Ecm.

Last fiddled with by R. Gerbicz on 2016-01-07 at 17:15 Reason: edit url
R. Gerbicz is offline   Reply With Quote
Old 2016-01-08, 15:26   #9
Demonslay335
 
Demonslay335's Avatar
 
Jan 2016

5 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
I've checked 12 fully factored numbers from the appropriate forum and for all of them it was true that there exists K and L for that n=K*L, where the factor and cofactor (so K and L) is in the (2^252,2^256) interval. ( I can still imagine that the size of the factor,cofactor could be anything, but with smaller probability ).

Note that this is similar to a standard choice for RSA, but in those cases K and L is prime, but not here, and this gives that you can use this fact to early abort in Ecm (or to continue it to a larger level !).

Example: check this Tesla number: http://factordb.com/index.php?query=...98158773983475
after Ecm finds the p27 say at t30 and no other large factor. Say p27 divides K,
then with large probability there is only one missing prime factor of K since log(sqrt(2^256/p27))/log(10)=25.3, and we can also see the size of this unknown factor, since K is in (2^252,2^256), if P is the missing factor, then

Code:
2^252<K<2^256, and
P*p27<=K<=P*p27*5^2*251, from these:
10^45.61<P<10^50.62
And we are also almost sure that there is only one or two missing prime factor of L, where we see only small factors of L (these can be 5,5^2,251). Pushing Ecm to t35-t36 gives that it is very likely that we miss only one large factor of L, and that is huge (Q>2^252/5^2/251), larger than P. So we can stop Ecm.
So if I am understanding you correctly, if I've only factored down to, say, a C125 or less after around t35 of curves, I might as well abort ECM and start factoring that cofactor with only QS/NFS. That would potentially slim a good hour off some of these.

Is there any other pattern I can use to my advantage? Kind of along the lines you are saying, I am noticing when I get down to a C130 or less after ECM, I end up with only two primes once factored, and the sum of their digits is close to the digits of the composite. Is there a way to make yafu skip a block of primes, or give it a suggested range maybe? Or does it kind of already figure that out on its own based on maths above my head. :P
Demonslay335 is offline   Reply With Quote
Old 2016-01-08, 15:34   #10
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

Quote:
Originally Posted by Demonslay335 View Post
I am noticing when I get down to a C130 or less after ECM, I end up with only two primes once factored, and the sum of their digits is close to the digits of the composite. Is there a way to make yafu skip a block of primes, or give it a suggested range maybe?
If there are only two factors, by definition their digit count is close to the digit count of the product. Don't make the rookie mistake of thinking the sieve methods have anything in common with trial division, there are no blocks of primes that anyone is trying (we'd never finish even one factorization of this size if that's what the code was doing).

The sieve methods don't care what the factors look like; even if you know they are very nearly the same size, it has no effect on the runtime of NFS.
jasonp is offline   Reply With Quote
Old 2016-01-08, 15:48   #11
Demonslay335
 
Demonslay335's Avatar
 
Jan 2016

5 Posts
Default

Quote:
Originally Posted by jasonp View Post
If there are only two factors, by definition their digit count is close to the digit count of the product. Don't make the rookie mistake of thinking the sieve methods have anything in common with trial division, there are no blocks of primes that anyone is trying (we'd never finish even one factorization of this size if that's what the code was doing).

The sieve methods don't care what the factors look like; even if you know they are very nearly the same size, it has no effect on the runtime of NFS.
I at least understood the difference between trial division and sieving to some extent, but I'm definitely still a rookie in this world. :P The animation on this wiki page kind of helped illustrate the process to me: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

Man, 6 years out of calculus class and I feel like I've forgotten everything about number theory... thanks for all of the info, even if it's basic stuff, lol. easy for this stuff to go over your head when you've not had to do it everyday in a long time.

Even if I'm mostly using this for helping with this current ransomware decryption efforts, this whole process has definitely peeked my interest in how these algorithms work. Factoring is kind of fun. :)
Demonslay335 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Understanding assignment rules Fred PrimeNet 3 2016-05-19 13:40
Understanding CPUs on the Benchmark Page Fred Hardware 33 2016-04-27 04:42
Understanding the LL proof and then more related stuff following it Raman Math 4 2012-05-24 05:37
LL Test: Understanding the math zacariaz Homework Help 32 2007-05-16 15:18
Are Parts of Human Experience Beyond Scientific Understanding? edorajh Science & Technology 11 2007-01-22 06:04

All times are UTC. The time now is 02:30.


Sat Dec 4 02:30:45 UTC 2021 up 133 days, 20:59, 0 users, load averages: 1.48, 1.32, 1.29

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.