20090510, 23:07  #1 
Aug 2004
Melbourne, Australia
2^{3}×19 Posts 
An arithmetic permutation
This problem was proposed by Ian Wanless. It's published in a list of problems from the 21st British Combinatorial Conference: P.J. Cameron, Research Problems from the BCC21, Discrete Mathematics (2009), doi:10.1016/j.disc.2009.04.016
Let be a prime congruent to or modulo . Let and be the sets of quadratic residues and nonresidues modulo . Consider the permutation whose tworow form is the following array (all entries modulo ): For example, when and is the single cycle . Conjecture: Always is a single cycle of order . Comment: The conjecture can be formulated in terms of permutation polynomials. It has been verified for . 
20090511, 11:55  #2  
Nov 2003
1110100100100_{2} Posts 
Quote:
how is derived from L. Please explain further. There may be a purely grouptheoretic explanation. Also, what does the word 'single' mean as in 'single cycle'? Clearly is a 'single' cycle. I fail to see how anyone would consider it to be more than one cycle. May I also suggest that you be careful in using the word 'cycle' in a context that is both numbertheoretic and combinatoric. In a number theory sense, a 'cycle' is the full orbit of (the powers of an integer) mod p. It can not contain 0 unless the integer is not coprime to p. You mean 'cycle' in its combinatoric sense, as a permutation. 

20090511, 12:14  #3  
Aug 2004
10000001_{2} Posts 
Quote:
is simply writing the permutation as a product of cycles, and in this case there is only one cycle in the product. Quote:
Chris 

20090511, 12:25  #4  
Feb 2005
2^{2}·3^{2}·7 Posts 
Quote:
Quote:
Last fiddled with by maxal on 20090511 at 12:26 

20090512, 21:37  #5 
Aug 2004
Melbourne, Australia
2^{3}×19 Posts 
Yep, I think maxal and Chris Card have got the correct idea. I spoke to Ian and he wasn't even aware that this had been published. He mentioned this conjecture to me several years ago and I wasn't able to make any progress.
I thought I'd just post a copy of it here, in case anyone was interested. 
20100530, 02:20  #6 
Apr 2010
2^{2}·37 Posts 
Note that takes the form of a commutator: Define
represents the cycle (0,1,...p1), and is a permutation as long as (which is why p must be 1 or 3 mod 8). Then . Hence is in the commutator subgroup of which is . This implies that is an even permutation. This is compatible with the conjecture because p is odd, and oddlength cycles are even permutations. being an even permutation also means that cannot have a cycle decomposition into an even number of cycles (with fixed points counted as well), because then the number of evenlength cycles would have to be odd, which would make an odd permutation. Hence, if is not a pcycle, its shortest cycle has length at most floor(p/3). If additionally has no fixed points (which can be shown), then its longest cycle has length at most p4. 
20100601, 17:23  #7 
"Lucan"
Dec 2006
England
1100101001010_{2} Posts 

20100613, 00:39  #8 
Apr 2010
2^{2}×37 Posts 
Program
I have written a small ISO C++ program for numerical verification of the conjecture.
The program reads a number p from stdin, checks it for validity (e.g. primality and congruence to 1 or 3 mod 8) and checks by brute force whether where is the permutation map associated with p. The result is reported to stdout, then the next input number is read, and so on, until the input reaches endoffile (which is ControlD for tty users). Correctness: Consider the cycle decomposition of . The program checks whether the cycle containing 0 has length dividing p. Since p is prime and , the checked condition is equivalent to being a pcycle. Unfortunately, I have not found a representation of that remains compact when nested. Therefore, I have not implemented a squareandmultiply scheme. Instead, the program walks through the iterations one by one. Thus it uses O(p) steps and O(p log p) bit operations. On 32bit systems with 256 MBytes available for program data, the program can check primes up to 2^31. On platforms with size_t longer than 32 bits, you can reasonably try p values up to the amount of available RAM, measured in bits. The reason why the program needs O(p) memory is that every iteration needs to evaluate two Jacobi symbols. Instead of calculating them from scratch each time, the program first sets up a table with all Jacobi symbol values and then looks them up during the iteration. The setup goes fast: To enumerate the first n squares, you just need to add the first n odd positive integers. (This is done mod p of course.) While the setup is easy but not very cachefriendly, the lookups made during the iteration happen to reference nearby locations in at least 75% of the time, thus the processor's caches are used efficiently. As a side benefit, the setup can check whether the input number is prime or not, simply by testing if a square has been tabled more than once. (Prime powers are ruled out too because the program marks 0 as a "square" before filling the rest of the table.) Here are some examples, run on my Powerbook G4 (a PPC 7450 @ 1.5 GHz). Lines that begin with $ contain shell (bash) commands. I use Pari/GP to output suitable primes for the program. Code:
$ g++4.3 O3 arithperm.cc o arithperm $ max=100; echo "default(primelimit,$max); \ forprime(p=3,$max,if(((p%8)(p%4))==0,print(p)))"  gp q  \ ./arithperm "$max" 3: true 11: true 17: true 19: true 41: true 43: true 59: true 67: true 73: true 83: true 89: true 97: true Code:
$ time echo 100000049  ./arithperm 100000049: true real 0m11.585s user 0m11.300s sys 0m0.115s Let's test the largest prime below 2^31 that is congruent to 1 or 3 mod 8: Code:
$ time echo 2147483587  ./arithperm 2147483587: true real 5m46.655s user 5m38.374s sys 0m2.909s I have tried another version of the program that does not use a table and computes Jacobi symbols directly. After all, computing a Jacobi symbol value of two nbit machineword integers should need only two additional registers and at most about 2n shifts, subtracts, negations, tests and jumps. With n=32 and zerocycle branch cost, this should deliver comparable performance to the tablebased approach. However, my highlevel C++ Jacobi code is about five times slower yet, therefore I present the tablebased version here for experiments. For details, look at the source . If you want to check the iteration, uncomment the line containing cerr << in the function permute, recompile, run and input 11 (or whatever you can check). Do not forget to comment out that line and to recompile again before starting larger runs. 
20100617, 22:01  #9 
Apr 2010
10010100_{2} Posts 
More timings
My machine has a 500 kByte (= 4 MBit) L2 cache; therefore I ran the following test:
Code:
$ time echo 4000043  ./arithperm 4000043: true real 0m0.128s user 0m0.096s sys 0m0.014s On the other hand, onthefly computation of Jacobi symbols, as well as the rest of the iteration, can be carried out in registers alone (at least on the PowerPC). Now that would be nice: No memory consumption, and no CPU idling. However, my current code for the "binary" approach to Jacobi symbol computation needs about 6 cycles per bit of the Jacobi symbol operands (2x32) which at 1.5 GHz results in about 250ns per Jacobi symbol, thus more than 500ns per iteration. That is discouraging. For the time being, I recommend using the tablebased version I already posted. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Antipodean arithmetic  ewmayer  Science & Technology  34  20151003 01:31 
Some arithmetic...  science_man_88  science_man_88  11  20140730 22:53 
modular arithmetic  science_man_88  Miscellaneous Math  42  20110726 02:02 
Simple Arithmetic!  mfgoode  Puzzles  82  20070502 08:13 
Easy Arithmetic  davar55  Puzzles  6  20070320 17:47 