mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2023-09-07, 01:46   #45
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

24·32·53 Posts
Default

Quote:
Originally Posted by axn View Post
IIUC, 64x64=128 (and 128/64=(64,64)) can only be done in x64 mode, so this whole thread is a non-starter.
Strangely enough the _udiv128 intrinsic is available in x86 mode.
R.D. Silverman is offline   Reply With Quote
Old 2023-09-07, 02:46   #46
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2×47×73 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
(3) The mov instruction seems to be mov <destination> <src> but Intel's opcode documentation
has the arguments reversed.
Intel syntax is 100% dst, src. AT&T is src, dst.

AT&T syntax is awful IMO. MASM syntax is less bad (but still awful). Intel syntax is better (i.e. usable).
retina is online now   Reply With Quote
Old 2023-09-07, 03:51   #47
axn
 
axn's Avatar
 
Jun 2003

547410 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Strangely enough the _udiv128 intrinsic is available in x86 mode.
By x86 mode, I assume it is generating 32-bit executable. That would mean, it can't possibly execute as a native instruction in x86 mode since the required registers (RAX,RDX) are not available.

According to https://learn.microsoft.com/en-us/cp...?view=msvc-170, this is available in x64. Might the compiler be generating a code sequence to emulate the behavior?

BTW, if you have to do multiple divisions with same divisor, and this is performance critical, stay away from the DIV instruction - they are _slow_.
axn is offline   Reply With Quote
Old 2023-09-07, 04:35   #48
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

24×32×53 Posts
Default

Quote:
Originally Posted by axn View Post
By x86 mode, I assume it is generating 32-bit executable. That would mean, it can't possibly execute as a native instruction in x86 mode since the required registers (RAX,RDX) are not available.

According to https://learn.microsoft.com/en-us/cp...?view=msvc-170, this is available in x64. Might the compiler be generating a code sequence to emulate the behavior?

BTW, if you have to do multiple divisions with same divisor, and this is performance critical, stay away from the DIV instruction - they are _slow_.

We are in agreement.
R.D. Silverman is offline   Reply With Quote
Old 2023-09-08, 13:07   #49
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DFB16 Posts
Default

Note that the gcc assembler toolchain has supported intel format for many years (example).

RDS: In Gladman's assembly, PROC and ENDPROC are not instructions but hints to the assembler that guide what section of an object file the generated assembly is linked into. You can probably get MSVC to generate assembly language and look at the directives like these that it uses. Unfortunately this doesn't get you out of knowing the parameter passing and stack handling conventions in x86 and x64. In particular, Gladman's code doesn't need to do any pushes and pops because the x64 calling conventions use registers for the first few input parameters and treat rax/rdx as volatile across calls. There's no stack handling because the function doesn't need a frame pointer, which is good because the stack setup conventions for x64 are very painful, and you have to use them to make debuggers work on x64.

Last fiddled with by jasonp on 2023-09-08 at 13:22
jasonp is offline   Reply With Quote
Old 2023-09-08, 15:36   #50
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

24×32×53 Posts
Default

Quote:
Originally Posted by jasonp View Post
Note that the gcc assembler toolchain has supported intel format for many years (example).

RDS: In Gladman's assembly, PROC and ENDPROC are not instructions but hints to the assembler that guide what section of an object file the generated assembly is linked into. You can probably get MSVC to generate assembly language and look at the directives like these that it uses. Unfortunately this doesn't get you out of knowing the parameter passing and stack handling conventions in x86 and x64. In particular, Gladman's code doesn't need to do any pushes and pops because the x64 calling conventions use registers for the first few input parameters and treat rax/rdx as volatile across calls. There's no stack handling because the function doesn't need a frame pointer, which is good because the stack setup conventions for x64 are very painful, and you have to use them to make debuggers work on x64.
Terrific. I assume that if I want to make additional space on the stack I push the frame pointer (to save it) and
increment the stack pointer by the appropriate amount. Then do the reverse on exit. Is there anything else
that needs to be done? When using MASM it takes care of managing stack space for you.

I found 'Microsoft Learn' : https://learn.microsoft.com/en-us/cp...?view=msvc-170

it has been helpful.
R.D. Silverman is offline   Reply With Quote
Old 2023-09-08, 16:16   #51
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

24×32×53 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Terrific. I assume that if I want to make additional space on the stack I push the frame pointer (to save it) and
increment the stack pointer by the appropriate amount. Then do the reverse on exit. Is there anything else
that needs to be done? When using MASM it takes care of managing stack space for you.

I found 'Microsoft Learn' : https://learn.microsoft.com/en-us/cp...?view=msvc-170

it has been helpful.
One thing that I don't see is a discussion of pointer sizes used by the compiler. I presume that in x64 the compiler
does use the entire address space; that 4 byte pointers are inadequate and that it uses 8 bytes, including void*.
Up to 4 inputs integers are passed in registers by the compiler as you indicated. Are pointers passed the
same way?

I also guess that if passing ints, rather than int 64's that one could pack two of them together then unpack
inside the .asm, allowing 8 params to be passed and avoiding stack handling.

I have a copy of Dunne's book on Windows 64 bit ASM, but it doesn't say a lot about conventions used
by different compilers. The thing I hate most about this is that .asm's are so <expletive deleted>
non-portable. If I write a routine for windows and want to port it to Linux, it must be completely
re-written. gcc syntax is totally different from Microsoft's ML64. There is much weirdness. The x64
kernel library is named kernel32.lib, for example.

Dunne's book assumes the use of VS 2017. I assume that calling conventions are the same for VS 2019
and VS 2022. I have not installed VS 2022 yet. I'm still using VS 2019. I don't want to introduce
another potential source of trouble.

I have decided to take plunge and do a full convert of my NFS code to x64. However, to make things
easier much of my .asm code can be replaced by C with the additional use of _umul64 and _udiv64.
I have looked for an intrinsic that does multiply and add, but can't find one. There are others that would be useful
as well if they exist. e.g. add with carry, sign extend a _int64 to _int128 using the rax:rdx register pair,
64 bit shifts using rax:rdx (one can do sign extend if 64 bit arithmetic shifts are available etc.). I will keep on
looking.

etc. etc.

Hey! I'm retired. It will keep me busy for a while. I might also redo my BL code using AVX.... another learning curve to climb....
R.D. Silverman is offline   Reply With Quote
Old 2023-09-08, 23:18   #52
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2·47·73 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
One thing that I don't see is a discussion of pointer sizes used by the compiler. I presume that in x64 the compiler
does use the entire address space; that 4 byte pointers are inadequate and that it uses 8 bytes, including void*.
Both 64-bit and 32-bit pointers can be used in 64-bit code.

There are compiler settings to set which one to use.

32-bit pointers requires the OS to cooperate when allocating memory to keep all addresses below 4G. Windows and Linux support 32-bit pointers in 64-bit mode. Other OSes vary in their support.
retina is online now   Reply With Quote
Old 2023-09-20, 21:38   #53
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

1100011002 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I have a copy of Dunne's book on Windows 64 bit ASM, but it doesn't say a lot about conventions used
by different compilers. The thing I hate most about this is that .asm's are so <expletive deleted>
non-portable. If I write a routine for windows and want to port it to Linux, it must be completely
re-written. gcc syntax is totally different from Microsoft's ML64. There is much weirdness. The x64
kernel library is named kernel32.lib, for example.
Yep, it's kinda upside-down and backwards. When I wrote my sieves I wrote everything in Linux and used a GCC compiler on/for Windows. Except for the CUDA sieve where I couldn't, so I wrote a 64-bit multiply for Win32 in C. Slow, but it wasn't on the GPU, so it didn't matter.

Quote:
Originally Posted by R.D. Silverman View Post
I have looked for an intrinsic that does multiply and add, but can't find one.
The only processor I've found with an integer fused multiply-add (FMA) is an Nvidia GPU. SSE and AVX have FMA for floats and doubles, if that should happen to float your boat.
Ken_g6 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Please vote about upgrading MASM Prime95 Software 19 2017-11-09 19:16

All times are UTC. The time now is 19:16.


Fri Sep 29 19:16:03 UTC 2023 up 16 days, 16:58, 0 users, load averages: 1.26, 1.11, 1.01

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔