View Single Post
Old 2020-12-30, 06:22   #152
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

101101011010012 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
It seems the M1 has some undocumented instructions that may be useful for Mlucas: https://gist.github.com/dougallj/7a7...e7c36dc75e0d6f
Interesting, but ugh - that kind of nonportability is only worth it if it offers huge performance benefits for one's application.

Brief update re. Mlucas-on-M1: Laurent Desnogues has a gcc-under-brew build working, we are playing around to see what maximizes total throughput on the big+little processor pair. I need to ask him how much detail I may release publicly, for now let me just say that 4-threaded performance on the big core alone is well more than 10x that of my Odroid C2, clock-for-clock. (But the C2 ain't exactly world-beating, so that's not saying all that much, except "the M1 doesn't suck").

I finished debug of some code mods designed to accommodate clang-on-M1's tighter-than-gcc macro-#args constraint, tested on my Odroid but waiting to hear whether it solves his Clang build issues on M1. We want both build options to be able to compare timings, clearly - the asm shouldn't care too much, but all the surrounding C code might.
ewmayer is offline   Reply With Quote