mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2023-01-04, 00:04   #23
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×17×227 Posts
Default

Quote:
Originally Posted by Dr Autonomy View Post
For the record, Grade school squaring is NOT time complexity O(p2), it's O(Tp). The latter is approx. half the former.

GP_LLT was written to achieve that "impossible" task..
Small differences in scaling's leading constant value magnitude are of little interest, compared to the large difference in order between ~k1 w2 and k2 w log w log log w where w is number of words required for an exponent p, a LARGE p value. The half, and considerable overhead, of a complex algorithm, are overcome by sufficiently large p.
From Don Knuth on at least, grade school multiplication has been understood as ~c m n complexity, where m and n are operand lengths. In squaring, m = n, hence c m2. Yes almost half the digit by digit multiplies can be skipped, just added in twice, after initializing an array with the diagonal terms, as I did back in the 80's, or shifted left by one bit and added once if that is faster. Half is little help when countered by an order disadvantage of thousands or millions for the operands of interest. https://en.wikipedia.org/wiki/Big_O_...mmon_functions

Still waiting to see any real signs of appreciation of the feedback that you requested.

Just because there's 2n or 4n sets of registers in a processor core, does not mean one gets 2n or 4n computational throughput. Good code in this area tends to be memory bound. I have systems with 256 and 272 hyperthreads (xeon phi are x4).

Last fiddled with by kriesel on 2023-01-04 at 00:55
kriesel is offline   Reply With Quote
Old 2023-01-04, 00:37   #24
charybdis
 
charybdis's Avatar
 
Apr 2020

102310 Posts
Default

Quote:
Originally Posted by Dr Autonomy View Post
For the record, Grade school squaring is NOT time complexity O(p2), it's O(Tp). The latter is approx. half the former.
Wrong. It is complexity O(n2). It is also complexity O(Tn). Or complexity O(1000000n2) for that matter. I suggest you read up on big-O notation.
charybdis is offline   Reply With Quote
Old 2023-01-04, 01:47   #25
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

1A7316 Posts
Default

Quote:
Originally Posted by Dr Autonomy View Post
Grade school squaring is NOT time complexity O(p2), ...
Yeah it is.

But no matter, it still doesn't stand a chance when compared to FFT.

If FFTs scare you, then it would be useful to have code doing NTT LLs. Even though LLs are not much use nowadays for MP testing, they are still good for verification when/if a new MP is found.

Are you up to it?

Last fiddled with by retina on 2023-01-04 at 03:33
retina is offline   Reply With Quote
Old 2023-01-04, 03:20   #26
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

101100001000012 Posts
Default

Quote:
Originally Posted by retina View Post
Are you up to it?
An early mentor of mine introduced me to the concept of the FFT.

The idea of using it to do large multiplication blew my mind at the time. What people are doing with it now continues to blow my mind.
chalsall is offline   Reply With Quote
Old 2023-01-04, 21:23   #27
Dr Autonomy
 
Jan 2023
Alberta, Canada

22·3 Posts
Default

"Methodology Proof Of Concept" - "MPOC".

Compares timings for FFT and Karatsuba, of varying block sizes, against grade-school multiplication, as a percentage.
Attached Files
File Type: zip MPOC.zip (22.1 KB, 28 views)
Dr Autonomy is offline   Reply With Quote
Old 2023-01-04, 22:15   #28
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

254168 Posts
Default

What's with the posting of the .exe? Can't you post some source code? Please. Or instead post timings of selected exponents.
Uncwilly is online now   Reply With Quote
Old 2023-01-04, 22:27   #29
jnml
 
Feb 2012
Prague, Czech Republ

2×101 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
What's with the posting of the .exe? Can't you post some source code? Please. Or instead post timings of selected exponents.
I for one suggest to delete any post(s) that links to a binary executable, zipped/archived or not, that is not cryptographically signed by
a commonly trusted party, as soon as a moderator sees it.

(FTR: I'm don't use Windows anywhere, it's not about me.)
jnml is offline   Reply With Quote
Old 2023-01-05, 19:54   #30
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·227 Posts
Default Comparison of estimated and measured run times

Following is an attempt to compute a lower bound estimate for run time of GP_LLT primality testing at 1Gbit and 1Gdigit, and provide comparative values for other available software. It is based on Dr Autonomous posts earlier in this thread, as I understood them (64Ki fft length pass, and up to 4096 wide grammar school O(n^2) pass, various assumptions that favor GP_LLT, and empirical measurements of prime95 speed on specific hardware, & extrapolation to larger fft lengths unimplemented in prime95. Comparison to Mlucas benchmarks and estimated gpuowl run time are also included.

Benchmarking in prime95 v30.8b15, 64Ki length, 4 cores 1 worker gives 0.09 msec/iter; 4 cores 4 workers gives 0.26 msec/iter each in Windows on an otherwise-idle i5-1035G1 ("modern" 10th generation AVX512 capable laptop i5 processor https://www.intel.com/content/www/us...fications.html).
Let us suppose for run time estimation purposes that GP_LLT is equally efficient as prime95's, at the 64Ki fft performance level, take the GP_LLT author's word that the layers above are grammar school multiplication, and estimate its performance at 64Mi. Also assume it is a triangular squaring, which would be 1024^2/2 + 1024/2 partial products to compute, or 524800 superdigit multiplies. (I think it very unlikely anyone's early version of an fft implementation would be equally as efficient as the hardware-specific-tuned and fastest-benchmark-of-alternatives-wins ffts of George or Ernst, or close, but assuming speed parity is convenient for ballpark estimating run time lower bound. I would be astounded to find any that were significantly faster on same hardware and inputs and still producing correct results.)
To compute a simple lower bound for iteration time, assume no setup time and no more summing and carry propagation time beyond that included in the 64Ki fft, and zero overhead for any error checking. (Which means zero time used in the grammar school multiplication pass of the GP_LLT code, for setup, summing partial products, double for the off-diagonal terms, carry propagation with -2 included, and any error checks, yet accurate results computed.) Also perfect parallelism performance at 4 cores, or more cores on other hardware. And zero incremental time for mod 2^p-1 at each iteration at the grammar school multiplication level. And efficient enough code throughout, that GP_LLT is memory bandwidth bound, like prime95 typically is.

Estimate GP_LLT iteration time as 0.26 msec/64Ki_superdigit x 524800 superdigit_multiplies/iteration / 4 cores = 34.112 seconds/iteration.
A 1Gbit exponent's 1G iterations would take > 34.112 seconds/iteration x 1G iterations / 3600 / 24 / 365 ~ 1082. years.
The GP_LLT author indicated using 4096 superdigits wide for OBD iterations, in grammar school multiplication.
A gigadigit exponent's 3.322G iterations would take ~ 34.112 seconds x 4^2 x 3.322E9 / 3600 / 24 / 365 = 57,494. years. (I've seen worse scaling.)

(In prime95 we could do better than that, because it supports non-power-of-two lengths. But GP_LLT perhaps does not, at least yet, since the author indicated using 4096 width of 64K size superdigits = 256M, for OBD computations, not the 192M that Mlucas could do it with. I don't yet understand why, because in my ancient experience it's straightforward to code a grammar school algorithm with length variable at granularity 1 word. Which would save ~1-(3/4)^2 of the run time, ~44%.)

Now let's compare prime95's 64Mi or 56Mi fft benchmark times for the same i5-1035G1 processor. (Posted in last attachment of https://www.mersenneforum.org/showpo...19&postcount=5)
For 64Mi, best throughput is 6.23 iters/sec; for 56Mi, 7.46 iters/sec, corresponding to 0.1605 and 0.134 seconds/iter. (For no use of hyperthreading; hyperthreading gave lower performance in prime95 benchmarking.)
Using the slower 64Mi timing, prime95 could do one 1G PRP or LL in 0.1605 1G / 3600 / 24 / 365 ~ 5.09 years.
That is 1082. / 5.09 ~ 213. times faster than the estimated lower bound run time for GP_LLT.
(The 56Mi length would be ~213. * 7.46 / 6.23 ~ 255. times faster than the estimate computed here for GP_LLT at 1Gbit.) A factor of two or more speed disadvantage on same hardware is enough to nearly eliminate usage of established software. CUDALucas is no longer being used much on hardware that can run gpuowl, for example.

For a rough estimate of what run time for prime95 might be on the i5-1035G1 if extended to 192Mi fft, consider the 16M and 48M benchmark throughputs, 30.78 iters/sec and 8.73 respectively, a ratio of 3.526. Extrapolating from 64Mi to 192Mi by that ratio gives 6.23 /3.526 ~ 1.77 iters/sec, ~566. msec/iter for the i5-1035G1, corresponding to ~59.6 years.
Performing a similar extrapolation on a Xeon Phi 7250 benchmark using single-worker nonHT timings (see second benchmark attachment at https://www.mersenneforum.org/showpo...4&postcount=11) yields ~23.94 iter/sec / (105.72 /34.84) ~ 7.89 iter/sec, corresponding to 0.127 sec/iteration, ~13.35 years. (I think it unlikely that George will create such an extension while there are more productive uses for his time.)

The highest performance CPU hardware I have currently (Xeon Phi 7250 68 core & X4 HT) benchmarked at 30.7 iters/sec at 56Mi in prime95 (without hyperthreading, which was also benchmarked, and at almost every fft length supported, slower). That's 4.12 times the performance of the i5-1035G1.
That would reduce GP_LLT's OBD primality test run time estimate to ~13,955. years.
Using dimension 3072^2 rather than 4096^2 in the grammar school pass would reduce that estimate to ~7,850. years.

Buying newer compatible hardware with ten or even a hundred times the performance if available at an acceptable price is not sufficient to reduce the GP_LLT run time estimate to a sufficiently small fraction of a human lifetime.

Next I estimate what replacing the grammar school multiplication pass in GP_LLT with Karatsuba might provide.
I optimistically assume the overhead of Karatsuba's increased number of additions and subtractions would be zero.
Using maximum recursion depth of 12 for the 4096 = 2^12 wide pass, to reduce the number of 64Ki fft based superdigit multiplications required, (3/4)^12 ~0.0317. 0.0317 x 13955 ~ 442. years.
Or recursion level 11 for the hypothesized 3072 wide pass, 0.0422 x 7850 ~ 332. years.
And remember, these are thought to be optimistic lower bounds due to the effect of numerous favorable assumptions and approximations.

In Mlucas, on dual-Xeon e5-2697v2 hardware or e5-2690, which in typical prime95 GIMPS work is slower than the Xeon Phi's combined throughput, I have benchmarked 192M fft length down to about 400. msec/iter. https://www.mersenneforum.org/showpo...8&postcount=20 For minimal prime exponent OBD primality testing on a no-known-factor candidate, that would be p~3,321,928,171 or higher, and 3321928170 iterations or more in PRP (PRP chosen for superior error detection possibilities, pretty much mandatory at such long expected run times);
0.4 sec/iteration x 3321928170 iterations / 3600 / 24 / 365 = 140. years. So ~70 years on the Xeon Phi (which I've not benchmarked in Mlucas in Linux yet to be able to use all or most available cores).
That's still much too long for someone my age or near it. It's also orders of magnitude too long to run without state of the art number-theory-based error detection and correction such as the Gerbicz error check for PRP type 1.

In gpuowl of a recent efficient fast version, if extended to higher fft lengths than currently supported and released, iteration time would be ~60. msec/iteration at 192M fft length (3-pass such as 4K:6:4K), corresponding to
0.06 sec/iteration x 3321928170 iterations / OBD primality test / 3600 / 24 / 365 ~ 6.32 years per OBD PRP/GEC test.
Gpuowl's superior speed and power efficiency on selected modern GPUs is why even the authors of prime95 and Mlucas
run a significant amount of work on gpuowl. An LL test of equal duration is almost certain to be in error, even if guarded by the Jacobi symbol check's 50% detection rate, and other error checks known to be applicable to LL on fft.

To be competitive for GIMPS use, a fast accurate implementation of the best available algorithm is needed.
To be competitive, GP_LLT or any primality test software needs a multipass completely fft implementation, like is implemented in prime95, Mlucas, and gpuowl. I speculate a shallow x2, x3 or x4 fft length extension by grammar school or Karatsuba pass might be justifiable for very rare use if it is easier to implement than fully fft.

Last fiddled with by kriesel on 2023-01-05 at 20:08
kriesel is offline   Reply With Quote
Old 2023-01-06, 20:43   #31
Dr Autonomy
 
Jan 2023
Alberta, Canada

1210 Posts
Default

Quote:
Originally Posted by kriesel View Post
The highest performance CPU hardware I have currently (Xeon Phi 7250 68 core & X4 HT) benchmarked at 30.7 iters/sec at 56Mi in prime95 (without hyperthreading, which was also benchmarked, and at almost every fft length supported, slower). That's 4.12 times the performance of the i5-1035G1.
That would reduce GP_LLT's OBD primality test run time estimate to ~13,955. years.
Using dimension 3072^2 rather than 4096^2 in the grammar school pass would reduce that estimate to ~7,850. years.
I think your analysis does not fully take into account the fact that ALL of what you are calling the "superdigit mulitiplications" in GP_LLT are performed concurrently - i.e. in parallel. They are implemented as software threads which, if I'm not mistaken, the OS will attempt to implement, as far as possible, as hardware threads.

But maybe I'm wrong. The way to find out how good your hypothetical estimate is, would be to simply run a few (say, three) iterations of any Mersenne number you like under GP_LLT on your "Xeon Phi 7250 68 core" machine, exit, then view the resultant "rdl" file with "Proof.exe". It will tell you EXACTLY how long each iteration took, in microseconds (actually, tenths of a microsecond).

What's difficult about that? The hardest part would be uploading to this forum the resultant screenshot, or better yet, both the "rdl" and "snp" files. That's all I'm looking for here.
Dr Autonomy is offline   Reply With Quote
Old 2023-01-06, 23:12   #32
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·227 Posts
Default

Quote:
Originally Posted by kriesel View Post
Benchmarking in prime95 v30.8b15, 64Ki length, 4 cores 1 worker gives 0.09 msec/iter; 4 cores 4 workers gives 0.26 msec/iter each in Windows on an otherwise-idle i5-1035G1 ("modern" 10th generation AVX512 capable laptop i5 processor
That's an example of 4 threads on a single task having efficiency less than one; .09msec x 4 cores=0.36 core-msec vs 0.26 msec for single-core tasks; 72% thread efficiency at x4.

Quote:
Estimate GP_LLT iteration time as 0.26 msec/64Ki_superdigit x 524800 superdigit_multiplies/iteration / 4 cores = 34.112 seconds/iteration.
(Assumed perfect multithread efficiency of utilization of the 4 physical cores for GP_LLt.) We can not justify assuming /8 hyperthreads, because caches, memory data lines, etc are not duplicated, only registers and not much else IIRC. Even Intel only claims ~30% gain from hyperthreading, and that depends on there being some idle time, which well designed number crunching software avoids. That would be 1.3x, not 2x.https://www.intel.com/content/www/us...threading.html

Quote:
The highest performance CPU hardware I have currently (Xeon Phi 7250 68 core & X4 HT) benchmarked at 30.7 iters/sec at 56Mi in prime95 (without hyperthreading, which was also benchmarked, and at almost every fft length supported, slower). That's 4.12 times the performance of the i5-1035G1.
That 4.12 times performance ratio is the MEASURED AGGREGATE THROUGHPUT, of 68 slower-clocked lower-wattage x4 HT cores on the 7250 Xeon Phi & very fast MCDRAM, divided by the measured aggregate throughput of 4 faster clocked newer-design x2 HT cores on the i5-1035G1 & SODIMM ram. It would be wrong to multiply that figure by number of cores ratio and number of hyperthreads ratio between the two processors. The prime95 program can support and benchmark dozens of cores and hundreds of threads provided for by this hardware, and that's what the 4.12 ratio is based on.

Quote:
Originally Posted by Dr Autonomy View Post
I think your analysis does not fully take into account the fact that ALL of what you are calling the "superdigit multiplications" in GP_LLT are performed concurrently - i.e. in parallel. They are implemented as software threads which, if I'm not mistaken, the OS will attempt to implement, as far as possible, as hardware threads.
Clearly you are wrong about the effect of launching many more threads in parallel than the hardware provides simultaneous execution support for. It's likely to cause thrashing and suboptimal performance. A better strategy is to do launches in batches of ~n, to ~nxHT, where n is the number of physical cores, or perhaps keep nx(HT+1) in the queue until work for the iteration is nearly completed. (Prime95 does a thorough job of keeping the available hardware well utilized with a single thread per core during fft operations and cache hit rate high; hyperthreading, that is doubling or more the number of threads per physical core, very frequently results in lowered performance, probably due to memory bandwidth and lowered cache efficiency.)

4096^2 =16Mi threads at once, or ~half that, for a few iterations on an OBD LL? For an app that is performance-critical, yikes! What do you think that does to the caches when milions of threads compete for a a relative few cores and get cycled in and out before completion? Or can you guarantee they each complete within a timeslice? Or what it does to total RAM usage?

"A basic kernel stack is 12K on 32-bit Windows and 24K on 64-bit Windows."
https://techcommunity.microsoft.com/...ds/ba-p/723824 (somewhat dated, referencing XP, found in a web search specifying Windows 10)
Also, https://superuser.com/questions/1469...-in-windows-10

4096^2 = 16M threads x 24KB/thread = 384 GB for the thread count overhead it appears may be needed to time a few iterations on an OBD with your software. That's the maximum possible installed costly LRDIMM memory in the Xeon Phi, and 6 times the max in the i5-1035G1, and 3 times the memory currently installed in any single box accessible to me. That's for the thread management, not the app's data storage or code, or the operating system, or page caching, etc. Maybe that explains why you won't post results; perhaps you can't run it on your old i5 for lack of memory.

The above linked techcommunity article concludes with
"In any case, applications that create enough threads or processes to get anywhere near these limits should rethink their design, as there are almost always alternate ways to accomplish the same goals with a reasonable number. For instance, the general goal for a scalable application is to keep the number of threads running equal to the number of CPUs (with NUMA changing this to consider CPUs per node) and one way to achieve that is to switch from using synchronous I/O to using asynchronous I/O and rely on I/O completion ports to help match the number of running threads to the number of CPUs."

In post 9 you indicate you've run M6972593, on your i5 of unspecified model and memory capacity.
2M fft in prime95 covers to ~39.5M exponent; 512K would to ~10M; 256K to ~5M.
So GP_LLT might employ 64K superdigit x 8x8 grammar school pass which would be a very manageable 64 threads. Or 8x8/2 +8/2 = 36. The small number of threads would have hidden there the impact of n^2/2 threads exploding in number for larger exponents, which I regard as a fundamental design flaw if I've understood your latest post correctly, that you launch a separate thread for each of the required grammar school squaring partial products' computation as simultaneously as possible. Spawning so many threads is not free in memory, nor in processor time.

Quote:
But maybe I'm wrong.
That seems clear to me. But perhaps you've been unclear in your explanation of how the program operates. Something that source code would help with.

Quote:
The way to find out how good your hypothetical estimate is, would be to simply run a few (say, three) iterations of any Mersenne number you like under GP_LLT on your "Xeon Phi 7250 68 core" machine, exit, then view the resultant "rdl" file with "Proof.exe". It will tell you EXACTLY how long each iteration took, in microseconds (actually, tenths of a microsecond).
Somehow, the software you posted as an attachment and described as providing milliseconds/iteration timing, now provides 100ns/iteration timing resolution? Re iteration times, I'm confused. Post 9 says milliseconds (units?). post 21 says microseconds. post 31 says tenths of microseconds (resolution?). This is the sort of circus that ensues from withholding source code and other documentation.

Quote:
What's difficult about that?
Planning. Preparation, both for possibly hostile software, and for thorough investigation of speed and accuracy, and GP_LLT data file content and structure, etc, at multiple exponents, well spaced, for empirical determination of run time scaling, both for GP_LLT and for prime95 & perhaps others in the same sandbox instance. Other priorities. And I'm under no obligation to do your software testing and black box debugging for you.

I notice you steadfastly refuse to run yourself and disclose well documented benchmarks on well described hardware. Or source code. Or disclose what hardware is required. Any competent author of assembly code ought REMEMBER what instruction set enhancements are required because he studied and coded for them. We have no such difficulties getting such essential information from the authors of the software in common use here. And they respond graciously to feedback both requested or volunteered, whether positive, neutral, or negative. Try it.
Is the software even capable of performing the computations you claim it does? Or is it just a scout program for credit card numbers etc. or a new zero-day vehicle? Does it contain a 64Ki fft, but lifted from somewhere else, perhaps now distributed without authorization or attribution? Source code and build instructions address such doubts.

I am increasingly left with the impression you don't know what you're talking about, and don't know you don't know, and try to compensate for our reluctance to trust our systems or networks to a pig-in-a-poke executable or two or three by bluster. That does not work well around here. (At least with a pig one can hear a squeal and know the species!)

Continuing on, assuming for the moment it's a legit program, and just beset by a culture clash of some sort:

Suggestions for command line parameters:
-exponent <base10_integer>
-LL [<iters>]
-TF <kmin> <kmax>
-logfile <filename.txt>
implying perform on exponent base10_integer, either -LL for iters iterations (or to completion if not specified), or TF from k=kmin through k=kmax, or maybe both, TF first,
optionally logging every LL iteration or TF class, to a plain text ASCII logfile named filename.txt or whatever, until completion or program halt, frequently flushing the buffer to disk for minimal loss in case of early program termination, power fail, etc.
Plain text logs are standard in GIMPS apps.

I don't recall seeing mention whether the program periodically saves a work in progress state for possible resumption if interrupted. If so, launch without any parameters, or perhaps with command line parameter -resume, would continue the interrupted work from that saved state.

Last fiddled with by kriesel on 2023-01-06 at 23:44
kriesel is offline   Reply With Quote
Old 2023-01-06, 23:47   #33
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

37·157 Posts
Default

Just stop assuming it's a legit program. Any "author" who badgers people into testing when he refuses to do it himself should be assumed to be misrepresenting his software.
VBCurtis is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Any way to pipe GMP mpz_t into PFGW? hansl Software 3 2019-04-09 19:44
Is there any program... mart_r Software 2 2009-11-15 20:06
So you think you can program rogue Lounge 5 2009-10-02 15:02
Program Primeinator Information & Answers 5 2009-07-16 21:42
which program? drakkar67 Prime Sierpinski Project 14 2005-11-29 06:25

All times are UTC. The time now is 13:56.


Mon Jun 5 13:56:17 UTC 2023 up 291 days, 11:24, 0 users, load averages: 1.09, 1.04, 1.05

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔