mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2018-05-30, 23:22   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1D7916 Posts
Default clLucas-specific reference material

Note, this is present only for historical purposes, or for use of hardware that can't run gpuowl but can run cllucas. Gpuowl is about twice as fast as cllucas on the same hardware and has other advantages, including superior error detection and superior gpu device support. Cllucas is no longer being maintained, while gpuowl is frequently updated.
See the gpuowl reference thread at https://www.mersenneforum.org/showthread.php?t=23386
and the Available Software summary at http://www.mersenneforum.org/showpos...91&postcount=2

This thread is intended to hold only reference material specifically for clLucas, the OpenCL based Lucas Lehmer test program (not to be confused with CUDALucas).
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383. Off-topic posts may be moved or removed, to keep the reference threads clean, tidy, and useful.)

Table of contents
  1. This post
  2. Run time versus exponent or fft length for the RX550 of clLucas 1.04 https://www.mersenneforum.org/showpo...71&postcount=2
  3. clLucas bug and wish list https://www.mersenneforum.org/showpo...72&postcount=3
  4. Getting started with clLucas https://www.mersenneforum.org/showpo...73&postcount=4
  5. cllucas v1.04 -h help output https://www.mersenneforum.org/showpo...00&postcount=5
  6. Interim file sizes https://www.mersenneforum.org/showpo...55&postcount=6
  7. etc tbd

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-11-15 at 16:12 Reason: added interim file sizes
kriesel is online now  
Old 2018-05-31, 02:03   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·5·503 Posts
Default Run time versus exponent or fft length for the RX550 of clLucas 1.04

Learning how to run and benchmark clLucas took a while, since its user interface differed in a lot of details from CUDALucas and CUDAPm1, already familiar to me. Benchmarking convenience features in the CUDA code were not fully carried over to the clLucas code, so I supplemented them with a set of Windows batch files.

During fft benchmarking clLucas, I found its timings were not very reproducible. So I put the timings for the various settings choices and fft lengths into a large spreadsheet, and reran timings on the shorter timing cases, iteratively as the minimum per fft length moved about in the parameter choices.

After this extensive benchmarking of the various fft lengths, thread counts, and sixtepfft choice, I ran a single double-check using the 3670016 (3584K) fft length. I obtained a per iteration timing of 18.41msec, and found the fft benchmark output of clLucas understates the time to do an actual full iteration by a factor of 18.41/9.36 =~1.97:1.
For comparison, gpuOwL v1.9-74f1a38 4M -legacy took 10.88msec on the same gpu.
Non-power-of-two fft lengths in clLucas were plentiful but many did not provide speed advantages over its power-of-two lengths, and none provide speed advantages over gpuOwL's small set of power-of-two lengths in their useful ranges. clLucas offers larger fft lengths than gpuOwL, so can run exponents gpuOwL does not currently support.

I sliced and diced the clLucas benchmark a few different ways in plots.
The first attachment shows all threads and sixstep choices plotted together, above 1M fft length.
The second shows the per-fft-minimum timings versus fft length.
The third shows the ratio for each fft length of max timing option / min timing option.
Dividing the fft timing by the fft length in K to flatten the plot is shown in the fourth attachment.
Note all clLucas values are over 2 microseconds per K fft timing, and remember to about double it to get ~4 microsecond/K iteration timing scale. The power of two ffts are the low points there. For comparison, gpuOwL's 5.01msec/2048K is 2.44microsec/K; 10.88 msec/4096K is 2.66 microsecond/K iteration timing; 21.26msec/8192K is 2.60 microsecond/K.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Thumbnails
Click image for larger version

Name:	cllucas fft timings threads and sixstep above 1M.png
Views:	343
Size:	127.2 KB
ID:	18418   Click image for larger version

Name:	fft timings plot 1024 to 1024x65536.png
Views:	334
Size:	71.6 KB
ID:	18419   Click image for larger version

Name:	ratio of fft timings per fft length 1k and up.png
Views:	315
Size:	78.4 KB
ID:	18420   Click image for larger version

Name:	msec per k optimized per fft length.png
Views:	326
Size:	48.1 KB
ID:	18421  

Last fiddled with by kriesel on 2019-11-18 at 14:28
kriesel is online now  
Old 2018-05-31, 02:09   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

165718 Posts
Default clLucas bug and wish list

The attachment contains some observations made while getting familiar with cllucas 1.04 for Windows. As always, it is shared in appreciation of the efforts of the code author and those who helped in the early development and testing. Where applicable I've included pointers to thread posts. Please PM me with any additions or corrections. This particular list is based on less usage than others I've made for other software (partly because I only recently acquired AMD gpus).


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf cllucas bug and wish list table.pdf (146.6 KB, 356 views)

Last fiddled with by kriesel on 2019-11-18 at 14:28
kriesel is online now  
Old 2018-05-31, 02:32   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·5·503 Posts
Default Getting started with clLucas

It was necessary to build some batch files to benchmark clLucas. Extending them provides a guided sequence for setting up and tuning clLucas after the program files have been placed in a working directory.

Unzip, read them, then proceed.
The main is cllstart.
It will prompt for actions and wait.
Ctrl-C will stop it.
Use at your own risk.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: zip cllucas-batchfiles.zip (8.2 KB, 319 views)

Last fiddled with by kriesel on 2019-11-18 at 14:29
kriesel is online now  
Old 2019-08-12, 16:32   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·5·503 Posts
Default cllucas v1.04 -h help output

Note, if output redirection is used, only the Platform line(s) are redirected; the rest is apparently sent to stderr.

Code:
Platform 0 : Advanced Micro Devices, Inc.
$ clLucas -h|-v

$ clLucas [-d device_number] [-info] [-sixstepfft] [-i inifile] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-polite iteration] [-k] ex
ponent|input_filename

$ clLucas [-d device_number] [-info] [-sixstepfft] [-i inifile] [-polite iteration] -r

$ clLucas [-d device_number] [-info] [-sixstepfft] -clfftbench start end distance

                       -h          print this help message
                       -v          print version number
                       -info       print device information
                       -sixstepfft use Six Step FFT
                       -i          set .ini file name (default = "clLucas.ini")
                       -f          set fft length (if round off error then exit)
                       -s          save all checkpoint files
                       -t          check round off error all iterations
                       -polite     GPU is polite every n iterations (default -polite 1) (-polite 0 = GPU aggressive)
                       -clfftbench exec clFFT benchmark (Ex. $ ./clLucas -d 1 -clfftbench 1048576 8388608 1048576 )
                       -r          exec residue test.
                       -k          enable keys (p change -polite, t disable -t, s change -s)

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-18 at 14:29
kriesel is online now  
Old 2021-11-15, 15:43   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·5·503 Posts
Default Interim file sizes

Based on a brief test on a large exponent, 1143276383, requiring fft length 64Mi, p or q file size is 536870928 bytes, or fft_length x 8 bytes + 16 bytes each. So a p file and a q file pair for that exponent together occupy just over 1 GiB.
It appears to be storing double precision float format of the interim residue. Some other programs store a much more compact packed binary representation that is independent of fft_length.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-11-15 at 16:11
kriesel is online now  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL-specific reference material kriesel kriesel 33 2023-03-06 22:59
Mfaktc-specific reference material kriesel kriesel 9 2022-05-15 13:21
Mfakto-specific reference material kriesel kriesel 5 2020-07-02 01:30
CUDALucas-specific reference material kriesel kriesel 9 2020-05-28 23:32
CUDAPm1-specific reference material kriesel kriesel 12 2019-08-12 15:51

All times are UTC. The time now is 18:51.


Thu Mar 30 18:51:09 UTC 2023 up 224 days, 16:19, 0 users, load averages: 1.09, 0.94, 0.85

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔