 2014-07-12, 16:33 #1 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 192916 Posts This year's small computer After a certain degree of fuss (the only EU supplier who had it in stock was SECO in Italy, and more than three months elapsed between order and delivery), my nVidia Jetson TK1 board arrived yesterday. Cost £178.49 including delivery. Take it out of the box, plug in an Ethernet cable, turn it on, ask your DHCP server which address it most recently handed out, and you can ssh into it without trouble; building GMP6 required explicitly giving armv7l in the system triplet (it was auto-detected as 'neon' which the rest of the configure didn't understand), but after about an hour it's running gmp-ecm. Not especially fast: Code: echo "(10^71-1)/9" | ecm -c 1 1e6 takes 8.5 seconds on one CPU (also 8.5 seconds on each CPU if you run four in parallel), so five times slower than one core of i7/4930K, but twice as fast as the ODROID Cortex-A9 that I had previously. Comparing with a more realistic competitor, one core of Avoton C2750 takes 4.7 seconds; obviously 64-bit arithmetic is really useful for ECM! There is a quite capable GPU, but I haven't found a build of the nVidia tools for the Ubuntu-14.04 that is installed on the board; so I have it doing polynomial selection without involving the GPU. Haven't got a spare power meter to work out how much electricity it uses yet; there is a fan, but it's pretty quiet. I imagine next year's small computer will be an AMD Seattle, with 64-bit Cortex-A57 processors, and I wouldn't be amazed if that was more competitive with the Avoton; but nor would I be amazed if Intel brought out a 3GHz 16-core 14nm lots-of-Atoms SoC. For a more serious problem (2340_736), Code: echo 18140989185283655973469449579704944039975362878755465578374551246406222559010534117329360835005049754874622232679931513368003677417541836205899311159219761141488825676850446673 | gmpecm/ecm-6.4.4/ecm -v -c 1 4e8 Step 1 took 7107196ms Step 2 took 1387825ms Last fiddled with by fivemack on 2014-07-12 at 16:40
 Originally Posted by fivemack After a certain degree of fuss (the only EU supplier who had it in stock was SECO in Italy, and more than three months elapsed between order and delivery), my nVidia Jetson TK1 board arrived yesterday. Cost £178.49 including delivery.
Playing with small computers myself.

Four Parallellas fitted with 16-core Epiphany co-processors arrived from Adapteva about 3 weeks ago. Not yet had chance to play with them, partly for lack of connecting gubbins (now largely sorted) but mostly because of Real Life™ issues. The four credit card machines, four US-pronged power supplies (one of the connecting gubbins problems here in the UK), four SD cards pre-loaded with Ubuntu, shipping, sundry baksheesh and a "free" T-shirt cost a total of £463.46 --- about \$790 at the spot rate today.

Also got hold of a DEO-Nano FPGA dev kit at about 15% of the above cost. Cute little thing, also CC sized, with no processor at all --- unless you want to build one yourself. A 32-bit RISC processor is freely available and clocks in at 100 BogoMIPS --- roughly five times as fast as my first Linux box.

 2014-07-12, 19:43 #3 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 10111001000002 Posts I had a 32-bit binary around so I gave it a whirl: Code: GMP-ECM 6.2.3 [powered by GMP 4.2.1_MPIR_1.1.1] [ECM] (10^71-1)/9 Input number is (10^71-1)/9 (71 digits) Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=3673851391 Step 1 took 2714ms Step 2 took 3370ms 64-bit Code: GMP-ECM 6.4.2 [configured with MPIR 2.5.1] [ECM] (10^71-1)/9 Input number is (10^71-1)/9 (71 digits) Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=1986288319 Step 1 took 2153ms Step 2 took 2184ms Code: GMP-ECM 7.0-dev [configured with MPIR 2.6.0, --enable-openmp] [ECM] (10^71-1)/9 Input number is (10^71-1)/9 (71 digits) Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=1:12988648 Step 1 took 1779ms Step 2 took 2168ms Around a 1.5x speedup for 64-bit but it is also an older ecm and mpir. This is on a Q6600
 2014-09-30, 00:06 #5 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 3×19×113 Posts In a moment of lunacy, I ran linear algebra on the Jetson board. The right options seem to be la_block=8192 la_superblock=98304 (the board has 32k-per-core L1 caches and a shared 2M L2 cache over four cores; la_superblock=196608 is about the same speed, 393216 is a good deal slower, la_block=16384 is a lot slower). It takes just under ten hours on four threads for a 1.98M matrix, compared to just over two hours for four threads on i7/4770. Not bad for a machine that fits under an iPad Mini.
 2014-09-30, 10:55 #7 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 10111001000002 Posts That's quite nice. Have you had a chance to measure power yet?
 2014-09-30, 12:48 #8 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 11001001010012 Posts Power metering is mostly telling me that USB3 external hard drives are much more serious power hogs that I'd anticipated: 25-30W while spinning up, 8W at idle. Will switch around the drives and the devboard tonight and see if I can get some better numbers.
 2014-09-30, 19:20 #9 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 144518 Posts Sitting at a command prompt: 3.5W Running one thread of ECM: 7.0W Running four threads of ECM: up to 13.7W in stage-1, up to 14.2W in stage-2 Reading the cycles for msieve (from a USB3 drive whose power is not being monitored): 4.0W Making the matrix: 5.5W Constructing packed rows: 3.8W Actual four-thread LA phase: 13.0W
 2014-09-30, 21:18 #10 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 25·5·37 Posts That looks like it is possibly slightly more power efficient than a modern pc(quite a bit better than my Q6600). It is a shame it doesn't have enough memory for large LA jobs or it would be a good machine to leave on a 6 month job.
