mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2022-04-04, 19:55   #122
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

22·1,663 Posts
Default

Now that I finally have an Apple M1, I am starting the process of getting mtsieve to build on it. So far I have only worked on mfsieve, the CPU-only multi-factorial siever.

For a range that I have been sieving, a single core on my 10-core i9 iMac is less than half as fast as a single core on the 8 core M1 MacBook Pro. I was not expecting it to be that much faster.

On the downside the GPU code in mfsievecl on that i9 is at least 50x faster than the CPU code on that same i9.

In order to compare "apples to apples" in the GPU I will need to use Metal instead of OpenCL for the GPU kernel. That will be a bit more work since I have not worked with Metal. I don't think that Metal will be too terrible to work with, but to have it build with Metal on OS X and OpenCL on other platforms could require a lot of effort and it adds yet another code path to support.

If anyone has an M1 and wants to do some sieving (CPU only), send me a PM and I will make it a priority to get that program running on OS X.

My first focus will probably be to get as many running on the CPU as I can before tackling Metal as that is mostly busy work. I think that some of the sieving programs require AVX or x86 ASM for the main loop, so those are going to have lower priority.

My goal is "no ARM ASM" for the M1 ports. I see nothing which makes that impossible.

Of course if anyone wants to pitch in with the effort, send me a PM.

There appears to be a bug in Apple's OpenCL driver. The multi-sequence kernel used by srsieve2cl crashes immediately. Works fine on other platforms. Apply won't fix it since they want everyone to use Metal. I haven't had issues with any other kernels on OS X, so the problem could reside between the keyboard and the chair. I have no idea how easy or difficult it could be to fix that issue if I wanted to get it to work.
rogue is online now   Reply With Quote
Old 2022-04-08, 18:50   #123
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

19FC16 Posts
Default

Apparently the OpenCL framework exists on the M1 as I can compile and link programs with it. But the kernels do not work correctly, and not in a predictable way, at least not that I have been able to figure out yet. I could submit a bug report to Apple, but I would not expect Apple to fix. I suspect it is only there for backward compatibility and if your code doesn't work, then you have to switch to Metal. If I can get the same bad results with Metal, then I will submit a bug report for Apple.
rogue is online now   Reply With Quote
Old 2022-04-19, 18:58   #124
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

665210 Posts
Default

Support of Metal is requiring changes to the framework. The area affected the most are the makefile, the GPU kernel code, how the GPU workers create the kernels, and the GpuWorker classes.

It appears that the Metal kernel and OpenCL kernel have few differences. I should be able to write a single kernel that can be compiled for both Metal and OpenCL. This means that updating kernels to support both should be very easy.

I have modified the makefile so that it can convert the kernel source into a header that can be included by the GpuWorker. This was a manual process previously.

The makefile also has the ability to create a metallib file on OS X. The application does not use that library, but the process to create that library is a quick way for me to identify syntax bugs in the kernel that I otherwise would only discover at runtime.

The KernelArgument class is gone. This is due to how Metal manages them as the M1 shares memory between the GPU and CPU. This means that the Kernel class has new methods to add arguments and is completely responsible for managing CPU and GPU memory needed for the GpuWorkers. The key is that the GpuWorkers are mostly "agnostic" regarding OpenCL or Metal.

In short lots of interesting things, but I haven't tested anything yet. My biggest fear is that I cannot use Metal in the way that I think I can use it. That is for later this week. If I can get mfsiieve to build and run with both OpenCL and Metal (with the correct results) then migrating the other GPU sievers to use support both should be fairly easy. My second biggest fear is getting incorrect results and trying figure out the root cause.
rogue is online now   Reply With Quote
Old 2022-04-21, 21:22   #125
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

147748 Posts
Default

Well, this is annoying. I discovered that g++ in msys2 produces a buggy version of xyyxsieve when compiled with -O3. Works fine with -O2 and only impact the AVX code when using -O3. I will modify the makefile so that Windows uses -O2 to compile some of the xyyx source files. Fortunately it only affects two source files, but is very annoying. gcwsieve also uses the same AVX routines, but it has no problems with -O3, so definitely a compiler bug. Since I am in the middle of refactoring a lot of code, I cannot submit a bug report at this time.

And yes, I updated to the latest g++ in msys2. This requires a few other changes to my code to be compliant with the newer compiler. Fortunately that isn't too painful to change.

At this time my focus is to get everything built on Windows and OS X (x86) using OpenCL then commit. I have had to do a lot of refactoring to get as far as I have. Fortunately the refactoring programs using OpenCL takes about an hour each, assuming I don't try to bite off too much by doing more.

I do realize that porting some of these programs to ARM will be a lot more work and I will not be porting some due to x86 routines they are specific to those sieves. pixsieve and afsieve are two examples. Fortunately those are not widely used so they can wait. It is more likely that I can remove the x86 code completely from them yet not lose speed, but that remains to be seen.
rogue is online now   Reply With Quote
Old 2022-05-04, 18:07   #126
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

22×1,663 Posts
Default

I am closing to finishing the first set of changes. Most of these changes support the refactoring of the GPU logic to support both OpenCL and Metal abstractly. In other words, the Worker won't know if the underlying kernel is running in OpenCl or Metal.

The only sieve that is broken right now is the multi-sequence GPU sieve for srsieve2cl. The single sequence one works fine. I did find a slowdown in the framework with the GPU kernel for single sequence sieving in srsieve2cl. I've added a command line switch for that kernel that can improve the speed by 50% over the previous build. This same change will probably benefit gfndsievecl, but it will have to wait.

Some of the sieves (not GPU) will compile and run on M1 out of the box since they don't rely on x86 asm. More will have such support (not GPU) with the upcoming release.

Once the current set of changes is working, I will commit all of my changes and post new Windows builds. Then comes the next fun part of the Metal support that started this whole thing.
rogue is online now   Reply With Quote
Old 2022-06-06, 10:33   #127
twobombs
 
Jun 2022

2 Posts
Default

sorry to break into this thread, found this by google. I am looking for a setting in srsieve2cl to generate primes in the range from 1^19 and beyond. this is to generatie a feed for a quantum algorithm called Shors'. I can go to 1^18 but beyond that I get a range error. ( see attachment ) I must be doing something wrong, right ? :)
need ranges to be from 56/64 bits (fp64) all the way to 4096 bits ( BigINT range )
Attached Thumbnails
Click image for larger version

Name:	Screenshot 2022-06-06 at 12.30.12.png
Views:	13
Size:	569.4 KB
ID:	26982  
twobombs is offline   Reply With Quote
Old 2022-06-13, 15:06   #128
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

22·1,663 Posts
Default

Quote:
Originally Posted by twobombs View Post
sorry to break into this thread, found this by google. I am looking for a setting in srsieve2cl to generate primes in the range from 1^19 and beyond. this is to generatie a feed for a quantum algorithm called Shors'. I can go to 1^18 but beyond that I get a range error. ( see attachment ) I must be doing something wrong, right ? :)
need ranges to be from 56/64 bits (fp64) all the way to 4096 bits ( BigINT range )
Sorry, but I didn't see this until now.

srsieve2cl is limited to 2^62. Technically I could probably raise to 2^63, but nobody is sieving that deeply so the mtsieve framework has no support for p > 2^63.

Last fiddled with by rogue on 2022-06-13 at 15:07
rogue is online now   Reply With Quote
Old 2022-06-13, 15:15   #129
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11001111111002 Posts
Default

Before I get to Metal support I decided to make more changes. Kim Wallisch (the creator of primesieve) gave me a few hints on how to use his library more efficiently.

Along with his changes I am making another big change. With this change the framework will adjust CPU worksize "on the fly" in an effort to ensure that each "chunk" of work needs between 1 and 5 seconds of time to process. This adjustment will only occur after p > 1e5. This provide two benefits. The first is that ^C will terminate CPU workers more quickly so if you have an overly large worksize it will terminate within 5 seconds rather than you having to wait much longer. The second is that if you have lot of workers, this should do a better job of ensuring that all workers have work. For those of you with 32 cores, I will be curious to see if the next release will do better at using all of those cores.
rogue is online now   Reply With Quote
Old 2022-06-13, 19:16   #130
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

22×1,663 Posts
Default

I have committed my changes, but have not officially released yet. I have done some testing, but not a lot. I have only committed the code so that ryanp and others who build on linux can run tests on their environments.
rogue is online now   Reply With Quote
Old 2022-06-13, 20:27   #131
ryanp
 
ryanp's Avatar
 
Jun 2012
Boulder, CO

2×199 Posts
Default

Quote:
Originally Posted by rogue View Post
I have committed my changes, but have not officially released yet. I have done some testing, but not a lot. I have only committed the code so that ryanp and others who build on linux can run tests on their environments.
Updated to r192. I am getting this; it looks like probably a simple fix to use bool instead?

Code:
$ make -j 16 srsieve2
g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o core/App_cpu.o core/App.cpp 
g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o core/Worker_cpu.o core/Worker.cpp 
g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/GenericWorker_cpu.o sierpinski_riesel/GenericWorker.cpp 
g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/CisOneWithOneSequenceWorker_cpu.o sierpinski_riesel/CisOneWithOneSequenceWorker.cpp 
g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/CisOneWithMultipleSequencesWorker_cpu.o sierpinski_riesel/CisOneWithMultipleSequencesWorker.cpp 
core/App.cpp: In member function ‘void App::Sieve()’:
core/App.cpp:489:7: error: ‘boolean’ was not declared in this scope; did you mean ‘bool’?
  489 |       boolean gotNewWork = false;
      |       ^~~~~~~
      |       bool
core/App.cpp:514:13: error: ‘gotNewWork’ was not declared in this scope
  514 |             gotNewWork = true;
      |             ^~~~~~~~~~
core/App.cpp:519:12: error: ‘gotNewWork’ was not declared in this scope
  519 |       if (!gotNewWork)
      |            ^~~~~~~~~~
make: *** [makefile:237: core/App_cpu.o] Error 1
ryanp is offline   Reply With Quote
Old 2022-06-13, 20:32   #132
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

22·1,663 Posts
Default

Quote:
Originally Posted by ryanp View Post
Updated to r192. I am getting this; it looks like probably a simple fix to use bool instead?

Code:
$ make -j 16 srsieve2
g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o core/App_cpu.o core/App.cpp 
g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o core/Worker_cpu.o core/Worker.cpp 
g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/GenericWorker_cpu.o sierpinski_riesel/GenericWorker.cpp 
g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/CisOneWithOneSequenceWorker_cpu.o sierpinski_riesel/CisOneWithOneSequenceWorker.cpp 
g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/CisOneWithMultipleSequencesWorker_cpu.o sierpinski_riesel/CisOneWithMultipleSequencesWorker.cpp 
core/App.cpp: In member function ‘void App::Sieve()’:
core/App.cpp:489:7: error: ‘boolean’ was not declared in this scope; did you mean ‘bool’?
  489 |       boolean gotNewWork = false;
      |       ^~~~~~~
      |       bool
core/App.cpp:514:13: error: ‘gotNewWork’ was not declared in this scope
  514 |             gotNewWork = true;
      |             ^~~~~~~~~~
core/App.cpp:519:12: error: ‘gotNewWork’ was not declared in this scope
  519 |       if (!gotNewWork)
      |            ^~~~~~~~~~
make: *** [makefile:237: core/App_cpu.o] Error 1
Fixed. Don't know why it compiles for me. boolean must have been added to a more recent C++.
rogue is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mtsieve rogue Software 654 2022-06-08 19:36
srsieve/sr2sieve enhancements rogue Software 304 2021-11-06 13:51
LLRnet enhancements kar_bon No Prime Left Behind 10 2008-03-28 11:21
TODO list and suggestions/comments/enhancements Greenbank Octoproth Search 2 2006-12-03 17:28
Suggestions for future enhancements Reboot It Software 16 2003-10-17 01:31

All times are UTC. The time now is 01:40.


Sun Jul 3 01:40:20 UTC 2022 up 79 days, 23:41, 0 users, load averages: 1.44, 1.50, 1.51

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔