mersenneforum.org  

Go Back   mersenneforum.org > Math Stuff > Tales From the Crypt(o)

Reply
 
Thread Tools
Old 2019-05-25, 11:01   #34
TacticalCoder
 
May 2019

13 Posts
Default

Quote:
Originally Posted by kevinhake View Post
Hi! Forum newbie here :)

I also took a look at the problem late last year and a test program using MPIR made me realize it was solvable in only a few years with general purpose hardware. I had a guess that an FPGA could do it in months... never got an implementation off the ground tho (my HDL skills are not really existant!).

To help answer TacticalCoder's question about why it was solvable so much faster:
Some theories:
  1. Rivest made a typo for his initial value. Perhaps squaringsPerSecond should have been 30000 and not 3000. It probably wasn't a typo tho.
Hi there,


I'm a bit of a hurry and my situation is "complicated" as I put my core i7-6700 in the time capsule at the MIT and now it's pure hell to find a sixth gen CPU where I live (plus credit cards issues, whatever)... Whatever, I'll come back later but here are a few notes ^ ^



The GMP authors kindly tried GMP from 1999 on 1999 hardware for me: they got basically 3800 sq/s. The original estimate was made by someone from RSA (not Rivest) and he got 3400 sq/s estimate (as compared to the 3800 sq/s the GMP maintainers just got on 1999 hardware running 1999 GMP).



So the original estimate was super close to the actual value back then. However latest GMP version, on the same 1999 hardware, is basically 3.4x faster.


So software alone there's more than a 3x speedup already. And at point GMP stopped being optimized for 32 bit architectures: GMP are sure another 25% speedup could be had.



Switch from 32 to 64 bit then brought a 4x speedup basically: it's not just "twice the bits", it's also less instructions needed.



Some instructions are now taking less cycles per instruction: so speedup there too.



Basically from my research original estimates were very close (and this has been tested)... But it's hard to predict the future, even for Ron Rivest ; )


If anyone wants to look at Ron talking then me (private link atm):


https://www.youtube.com/watch?v=hmwT...ature=youtu.be


The ceremony with the capsule being opened / re-sealed (4 minutes vids):



https://www.youtube.com/watch?v=9qKk...ature=youtu.be



Talk to you all later,
TacticalCoder is offline   Reply With Quote
Old 2019-05-25, 14:48   #35
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

3×223 Posts
Default

Quote:
Originally Posted by TacticalCoder View Post
EDIT 2: on my Core i7-6700 using R. Gerbicz' code I get about 910k sq/s. Simon Peffer's team's FPGA (a Xilinx vu9p) is getting about 15M sq/s (they'll have the solution by the 11th of May). There's a third team which is actually building an ASIC and they're using a FPGA to validate their ASIC design. Ballpark estimation from what I understood reading the emails: 10x improvement compared to the FPGA. So the ASIC team should have hardware solving this in a matter of days (as of opposed to 2 years' ish using good code and a fast machine).


EDIT 3: I'm neither a cryptographer nor a number cruncher. I just happened to realize upon reading the problem and doing a few tests that it was possible to solve this before 2034 ; )
There was a thread a while ago started by Air Squirrels that explored the feasibility of using an FPGA as a co-processor for GIMPS:
https://www.mersenneforum.org/showthread.php?t=21634

I wonder if using the ASIC being discussed would be more feasible ( assuming it works ) ?
tServo is offline   Reply With Quote
Old 2019-05-25, 15:45   #36
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

101110101102 Posts
Default

The new puzzle from Ronald L. Rivest is out!!!!!
http://people.csail.mit.edu/rivest/p...new-puzzle.txt
Now n has 3072 bits, t=2^56, with milestone puzzles, you have 15 years to match the original year 2034.
R. Gerbicz is offline   Reply With Quote
Old 2019-05-25, 16:58   #37
TacticalCoder
 
May 2019

D16 Posts
Default

Quote:
Originally Posted by 2M215856352p1 View Post
The error check you proposed has a 2-5% overhead (my estimate) and has a 1/c chance of leaving an error undetected.
If I'm not mistaken this is the error check Shamir suggested to Rivest for the LCS35 puzzle and that error check is explained in the original puzzle description : )
TacticalCoder is offline   Reply With Quote
Old 2019-05-25, 16:59   #38
TacticalCoder
 
May 2019

13 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
The new puzzle from Ronald L. Rivest is out!!!!!
http://people.csail.mit.edu/rivest/p...new-puzzle.txt
Now n has 3072 bits, t=2^56, with milestone puzzles, you have 15 years to match the original year 2034.
YES! My name in the first paragraph of the new puzzle description : )
TacticalCoder is offline   Reply With Quote
Old 2019-05-25, 17:08   #39
TacticalCoder
 
May 2019

11012 Posts
Default

Quote:
Originally Posted by kevinhake View Post
Somewhat related, I found in MPIR that performing a square operation followed by modulo was actually faster than the powmod function...
Now that is very interesting because I'm 99.99% positive that the CPU + GMP version I first tried this on in late 2014 it was faster to do square + mod than powm. Which is why I then ran the whole computation using two instructions. But then the GMP guys (and Gerbicz here too I think) showed me I was wrong (at least on the latest version of GMP / Core i7). I'm sure I didn't "see things" back then.

Also MPIR is a fork of GMP right?

So maybe I tried on an old GMP version which for whatever reason did behave that way.
TacticalCoder is offline   Reply With Quote
Old 2019-05-25, 20:28   #40
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2·32·83 Posts
Default

For the original puzzle the seed value for p was in the message, but easily we can get the very likely original seed value for (the other prime) q.
Code:
? nextprime(lift(Mod(5,2^1024)^67229033994487263541238668735266739881324356552223677298756897562133))
%1 = 40853923313791905056390006230327028982596606945346318846549337657379367189284051739393623398120773967666443351405399733470781263258210328375697986182263865342952431430451931199880262185216184022204644820522254304557640038220504811993426511340296997627585129527426806995411578324937406976125239089383993333731
if you see both of these seeds (the seed for p is 712238904468723561162000937465778229877598711342253664788091132335) then my wild guess is that these are far from random, the large digits (5-9) and the small ones (0-4) are in larger groups. You could make some tests on it.
R. Gerbicz is offline   Reply With Quote
Old 2019-05-25, 23:08   #41
TacticalCoder
 
May 2019

11012 Posts
Default

Quote:
Originally Posted by theyowguy View Post
I managed to enable GMP in my java implementation and it takes my 2 year old machine ~18.5 minutes to generate 100 million t's using GMP (I think that works out to almost 29 years to get a solution).

Sounds an order of magnitude too low for GMP though!



22 minutes to do 1 billion operation using GMP on my core (ex) i7-6700 from 2015. "ex" because now my core i7-6700 ain't mine anymore: he's inside the time capsule and belongs to the MIT : )


I was saving a file every 1 billion iteration to disk.



And that's not the most optimized way to do it. I was doing about 750 000 steps per second. While preparing my slides and reading some of the suggestions here by Gerbicz I could reach about 880 000 steps per second IIRC. Someone with a core i9 told me he had 990 000 steps per second (probably that a bit more is doable).
TacticalCoder is offline   Reply With Quote
Old 2019-05-26, 03:39   #42
2M215856352p1
 
May 2019

3·37 Posts
Default

Quote:
Originally Posted by TacticalCoder View Post
If I'm not mistaken this is the error check Shamir suggested to Rivest for the LCS35 puzzle and that error check is explained in the original puzzle description : )
You are right. Thanks for pointing that out. If I have not mistaken, the Gerbicz error check applies and it is much stronger than the error check Shamir suggested back in 1999.
2M215856352p1 is offline   Reply With Quote
Old 2019-05-26, 23:55   #43
kevinhake
 
"Kevin W. Hake"
May 2019

5 Posts
Default

Quote:
Originally Posted by TacticalCoder View Post
Now that is very interesting because I'm 99.99% positive that the CPU + GMP version I first tried this on in late 2014 it was faster to do square + mod than powm. Which is why I then ran the whole computation using two instructions. But then the GMP guys (and Gerbicz here too I think) showed me I was wrong (at least on the latest version of GMP / Core i7). I'm sure I didn't "see things" back then.

Also MPIR is a fork of GMP right?

So maybe I tried on an old GMP version which for whatever reason did behave that way.
Yes GMP is a fork, and not very well maintained, tho with decent support for visual studio.
When I implemented the square + mod it was slower until I used the lower level interface to reuse buffers between the two. But yeah I should try it in the latest GMP and see.
kevinhake is offline   Reply With Quote
Old 2019-06-03, 00:07   #44
TacticalCoder
 
May 2019

13 Posts
Default

Quote:
Originally Posted by axn View Post
Would that be faster than Shamir's trick?

Just quoting a random message talking about verification perfs...


For this kind of problem (LCS35 / CSAIL19) the speed of the verification method has not much importance if I'm not mistaken because you can always, as the second team did (the ones using a FPGA) parallelize as many verification as you want as soon as you have the "main" computation starting to give up intermediate results. For the main computation you want to be as fast as possible: so no verification at all (this also simplifies, for example, FPGA code/setup).


Once the main computation has computed up to, say, 32 billion (I'm just making up an example using multiple of 1 bn and 32 cores), you can start 32 computation on 32 cores on other machine(s), using whatever method you want, and have each core start at a different intermediate result (taken from the main computation). So you're effectively computing in parallel.


This works as long as your have a main computation giving you intermediate results.


Which means there's no need to implement any verification check at all in the "main" computation. And the cost of the verification is not very important: you can parallelize the second computation to as many cores as you want (so using say the verification Shamir suggested in the original LCS35 puzzle with a bigger prime ain't a problem).


The cryptophage team did something similar but, as far as I know, they didn't implement any verification at all: they just ran the same computation parallelized, on regular PCs (while the main one was running on a FPGA). But they could just as easily have added a check to the second computation.


This keeps the FPGA / ASIC implementation simple (and faster).


But then it's late and I may be way off!

Last fiddled with by TacticalCoder on 2019-06-03 at 00:16 Reason: grammar
TacticalCoder is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Crypto News Nick Tales From the Crypt(o) 52 2020-12-17 21:16
I hate this time of year davieddy Lounge 4 2009-10-18 04:39
Crypto 2007 R.D. Silverman Lounge 2 2007-08-08 20:24
crypto game MrHappy Lounge 0 2005-01-19 16:27
Is it time to change the CPU year measurement? E_tron Lounge 7 2004-06-29 10:17

All times are UTC. The time now is 14:32.


Fri Oct 22 14:32:32 UTC 2021 up 91 days, 9:01, 1 user, load averages: 1.45, 1.35, 1.32

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.