mersenneforum.org  

Go Back   mersenneforum.org > Fun Stuff > Puzzles

Reply
 
Thread Tools
Old 2022-02-17, 21:58   #1
fatphil
 
fatphil's Avatar
 
May 2003

2·127 Posts
Default Simultaneously Square Sums of Mostly Squares

Late last year I came up with this puzzle, but brute force and ignorance are now beginning to stall when it comes to finding solutions, so I thought perhaps you guys would like a stab:

Find triples of positive integers a,b,c where all of three permutations of the not-quite-sums-of-squares a+b²+c², a²+b+c², and a²+b²+c, are perfect squares.

Here's the background to the problem, and what I've got so far: http://fatphil.org/maths/trivia/squaresums.html
fatphil is offline   Reply With Quote
Old 2022-02-18, 17:20   #2
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

3·11·181 Posts
Default

Quote:
Originally Posted by fatphil View Post
Late last year I came up with this puzzle, but brute force and ignorance are now beginning to stall when it comes to finding solutions, so I thought perhaps you guys would like a stab:

Find triples of positive integers a,b,c where all of three permutations of the not-quite-sums-of-squares a+b²+c², a²+b+c², and a²+b²+c, are perfect squares.

Here's the background to the problem, and what I've got so far: http://fatphil.org/maths/trivia/squaresums.html
I note that there are no solutions to the 2-variable version, a^2 + b and b^2 + a both squares with a, b positive integers: Assuming 1 ≤ a ≤ b, we have b^2 < b^2 + a ≤ b^2 + b < (b + 1)^2.

If a ≤ b ≤ c then

a^2 + b^2 + c ≤ a^2 + b + c^2 ≤ a + b^2 + c^2

Difference between first and second terms (c-b)(b+c-1)

Difference between second and third terms (b-a)(b+a-1)

Difference between first and third term (c-a)(c+a+1)

The differences perhaps give some information about the restrictions (mod 8), although these have already been worked out.

Note that (b+c-1) + (c-b) = 2*c - 1 is odd, and similarly for the factors of the other differences. Thus, all differences are the product of an even factor and an odd factor, hence all are even.

If a difference of two squares is even, it is divisible by 4. [If both squares are even, both are divisible by 4. If both are odd, both are congruent to 1 (mod 8) so their difference is divisible by 8.]

Therefore, if a^2 + b^2 + c, a^2 + b + c^2, and a + b^2 + c^2 are all squares, none of c-b, b+c+1, b-a, b+a+1, c-a, or c+a+1 can be congruent to 2 (mod 4). Three of them must be divisible by 4.

Last fiddled with by Dr Sardonicus on 2022-02-18 at 17:21 Reason: xifnif posty
Dr Sardonicus is offline   Reply With Quote
Old 2022-02-18, 22:12   #3
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

157510 Posts
Default

Quote:
Originally Posted by fatphil View Post
Here's the background to the problem, and what I've got so far: http://fatphil.org/maths/trivia/squaresums.html
Replace the worker with the following:
Code:
void *worker(void *data)
{
	uint64_t a;
	uint64_t b;
	uint64_t c;
 
	(void)data;
 
	while ((a = ga++) < high) {
		if ((a & 7) == 5)
			continue;
 
                uint64_t rc=a;
		for (b = 1; b <= a; b++) {
			const uint8_t f3 = f2[a & 7][b & 7];
			if (f3) {
                                uint64_t a2b2 = a * a + b * b;
                                while(rc*rc<a2b2)rc++;

     		        	c=rc*rc-a2b2;
			        if (c<=b && f3 & (1 << (c & 7)) &&
				    issquare(a * a + b + c * c) &&
				    issquare(a + b * b + c * c))
					printf("%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", c, b, a);
			}
		}
	}
 
	return NULL;
}
The basic idea is that there is at most one good value for c if you're given a>=b (where b>=c is also known)
and that is c=(sqrtint(a^2+b^2)+1)^2-(a^2+b^2), and replaced/saved also some sqrt operations in your code.
[the next c value would be c>=(b+1)^2-b^2>b, because sqrt(a^2+b^2)>=b]
Tested, works for a<10^5, but check it.
R. Gerbicz is offline   Reply With Quote
Old 2022-02-19, 21:25   #4
fatphil
 
fatphil's Avatar
 
May 2003

2×127 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
Replace the worker with the following:
Code:
                                while(rc*rc<a2b2)rc++;

     		        	c=rc*rc-a2b2;
			        if (c<=b && f3 & (1 << (c & 7)) &&
Yup, that's what I've been running with most of the time:

Code:
                // the square greater or equal to a^2+b^2, and its root
                uint64_t rt_ge_a2b2 = a + 1;
                uint64_t sq_ge_a2b2 = a2 + 2 * a + 1;

                for (b = 1; b <= a; b+=db1) {
                        db1 ^= db2;
                        const uint64_t f3 = f2[a & 7][b & 7];
                        if (!f3)
                                continue;

                        const uint64_t a2b2 = a2 + b * b;
                        while (sq_ge_a2b2 < a2b2) {
                                sq_ge_a2b2 += 2 * rt_ge_a2b2 + 1;
                                rt_ge_a2b2++;
                        }
                        uint64_t c = sq_ge_a2b2 -a2b2;
                        // only one c that creates a square a^2+b^2+c is <= b
                        if (c <= b) {
(IIRC about 2 months ago, the code was *exactly* your code, to the character!)

I've not fed my changes back into inz's repo yet. There's a few more percent available here or there through micro-optimisations, but I was hoping for a new fresh mathematically-oriented attack, rather than just taking the same approach. In particular as it would be nice to be able to double-check the results.

Cheers for your input though, and clearly you've got the insights into the problem.

Last fiddled with by fatphil on 2022-02-19 at 21:27
fatphil is offline   Reply With Quote
Old 2022-02-19, 21:35   #5
fatphil
 
fatphil's Avatar
 
May 2003

2×127 Posts
Default

Quote:
Originally Posted by Dr Sardonicus View Post
I note that there are no solutions to the 2-variable version
Yeah, that's close enough to the problem the olympiad video was about, which was made me realise this puzzle was just part of a family in the first place.
fatphil is offline   Reply With Quote
Old 2022-02-19, 22:35   #6
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

32×52×7 Posts
Default

Quote:
Originally Posted by fatphil View Post
Yup, that's what I've been running with most of the time:

(IIRC about 2 months ago, the code was *exactly* your code, to the character!)
OK, (almost) completely new approach.
Code:
Using M=100800 with t=8 threads done a=1..8000000, found 126 solutions in 1627 sec.
Roughly seven times faster what could be your code, and 20 or so times faster than nz's original code for large input.

with the following code: (or download the attachment)
Code:
#include <stdint.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <inttypes.h>
#include <stdatomic.h>
#include <pthread.h>
#include <time.h>

//compile: gcc -Wall -Wextra -O3 -o ss sssms.c -lm -lpthread
 
#define M 100800

static inline bool issquare(uint64_t x)
{
	uint64_t sqr = sqrtl(x);
	return sqr * sqr == x;
}
 
static _Atomic uint64_t ga = ATOMIC_VAR_INIT(1);
static _Atomic int res=ATOMIC_VAR_INIT(0);
static _Atomic int numsol=ATOMIC_VAR_INIT(0);
static uint64_t high;

int compare( const void* a, const void* b){
     int int_a = * ( (int*) a );
     int int_b = * ( (int*) b );

     if ( int_a == int_b ) return 0;
     else if ( int_a < int_b ) return -1;
     else return 1;
}

const uint64_t A[2]={3467829026060516384ULL,3467829027134258208ULL};

static inline bool valid(uint64_t a){
    
    a&=127;
    return (A[a>>6]&(1ULL<<(a&63)))==0;
}

void *worker(void *data){

    (void)data;
    
    int i,j,count,rm,*arr,asize=4096;
    uint64_t a,b,c,a2b2,T[64];
    bool R64[64];
    arr=(int*)malloc(asize*sizeof(int));

    while((rm=res++)<M){
        
        if((M&7)==0&&(rm&7)==5)continue;
 
        a=(ga/M)*M+rm;
        if(a<ga)  a+=M;
        
        for(i=0;i<64;i++)T[i]=0;
        for(i=0;i<64;i++)R64[i]=false;
        for(i=0;i<64;i++)R64[(i*i)&63]=true;

        int r0=(a&63);
        bool found=false;
        for(int r1=0;r1<64;r1++)
            for(int r2=0;r2<64;r2++)
                if(R64[(r0+r1*r1+r2*r2)&63]&&R64[(r0*r0+r1+r2*r2)&63]&&R64[(r0*r0+r1*r1+r2)&63]){
                    T[r1]+=(1ULL<<r2);
                    found=true;
                }
        
        if((M&63)==0&&!found)  continue;
        
        int64_t m=1,m2=M;
        bool cont=false;
        
        for(int p=2;p<=m2;p++)if(m2%p==0){
            m2/=p;
            int64_t q=p;
            while(m2%p==0){m2/=p;q*=p;}
            
            bool check[q],found=false;
            
            for(i=0;i<q;i++)check[i]=false;
            for(int64_t I=0;I<q;I++)check[(I*I)%q]=true;
            
            int64_t R0=rm%q;
            for(int64_t R1=0;R1<q&&!found;R1++)
                for(int64_t R2=0;R2<q;R2++)
                    if(check[(R0+R1*R1+R2*R2)%q]&&check[(R0*R0+R1+R2*R2)%q]&&check[(R0*R0+R1*R1+R2)%q]){
                        found=true;
                        break;
                    }
            
            if(!found){
                cont=true;
                break;
            }
        }
        if(cont)continue;
        
        m=1;
        m2=M;
        arr[0]=0;
        count=1;
        for(int p=2;p<=m2;p++)if(m2%p==0){
            m2/=p;
            int64_t q=p;
            while(m2%p==0){m2/=p;q*=p;}
            
            int64_t inv=1;
            for(;(inv*m)%q != 1;inv++);
             
            int ns=0;
            int rt[q];
            bool check[q];
            
            for(i=0;i<q;i++)check[i]=false;
            for(int64_t I=0;I<q;I++)check[(I*I)%q]=true;
            
            int64_t R0=rm%q;
            for(int64_t R1=0;R1<q;R1++)
                for(int64_t R2=0;R2<q;R2++)
                    if(check[(R0+R1*R1+R2*R2)%q]&&check[(R0*R0+R1+R2*R2)%q]&&check[(R0*R0+R1*R1+R2)%q]){
                        rt[ns++]=R1;
                        break;
                    }
            
            if(asize<count*ns){
               asize=count*ns;
               arr=(int*)realloc(arr,asize*sizeof(int));
            }
            
            int pos=count*ns-1;
            for(i=count-1;i>=0;i--)
                for(j=ns-1;j>=0;j--){
                    int64_t x=arr[i];
                    int64_t r=rt[j];
                    int64_t mult=((r-x)*inv)%q;
                    if(mult<0)mult+=q;
                    arr[pos--]=x+mult*m;
                }
            m*=q;
            count*=ns;            
        }
        
        qsort(arr,count,sizeof(int),compare);
        
        int arr0=arr[0];
        int new_arr0=M+arr[0]-arr[count-1];

        for(i=count-1;i>0;i--)
            arr[i]=arr[i]-arr[i-1];
        
        for(;a<=high;a+=M){
    
        if(!valid(a))continue;
            
        int rec=0,ext[count];
        uint64_t rc=a;
        for(i=0;i<count;i++)ext[i]=0;
        
        arr[0]=0;
		for(b=arr0;b<=a;){
            int sh=(int) (256.0*(sqrt(a*a+(b+1)*(b+1))-sqrt(a*a+b*b)));
            if(sh>rec){
                rec=sh;
                for(i=0;i<count;i++)
                    ext[i]=(arr[i]*sh)>>8;
            }
            
            for(i=0;i<count;i++){
                b+=arr[i];
                rc+=ext[i];
                
                a2b2=a*a+b*b;

                while(rc*rc<a2b2)rc++;                
                
     			c=rc*rc-a2b2;
				if (c<=b && T[b&63] & (1ULL << (c & 63)) &&
				    issquare(a * a + b + c * c) &&
				    issquare(a + b * b + c * c) && c>0 && b<=a){
					   printf("%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", c, b, a);
                       numsol++;
                    }
			}
            
			arr[0]=new_arr0;
		}
    }}
    free(arr);
    
    return NULL;
}


int main(int argc, char **argv)
{
	int t = 1;
	int i;
	high = 1000000;

    if (argc >= 3) {
		ga = strtoull(argv[1], NULL, 10);
		high = strtoull(argv[2], NULL, 10);
 
		if (argc >= 4)
			t = atoi(argv[3]);
	}
	uint64_t a0=ga,a1=high;
 
    printf("Started the a=%" PRIu64 "..%" PRIu64 " range with M=%d using t=%d threads.\n",a0,a1,M,t);
    
	pthread_t threads[t - 1];
 
    time_t sec=time(NULL);
	for (i = 0; i < t - 1; i++)
		pthread_create(&threads[i], NULL, worker, NULL);
	worker(NULL);
 
	for (i = 0; i < t - 1; i++)
		pthread_join(threads[i], NULL);
 
    printf("Using M=%d with t=%d threads done a=%" PRIu64 "..%" PRIu64 ", found %d solutions in %ld sec.\n",
           M,t,a0,a1,numsol,time(NULL)-sec);
    
	return 0;
}
Used a pretty large wheel M=100800=2^6*3^2*5^2*7,
with fixed a%M you can construct an array that could be the possible values for b%M. And we also avoid a lot of jumping work when you update c, because of the sparse b values. Of course the solutions will appear in random order due to the search for "a" in arithmetic progression.

ps. you could try larger M values, but I haven't seen much better running times (and using larger M values could increase the time thanks to the more inits).

ps2. M should be divisible by 64, otherwise it is missing some solutions! OK, it is logical, because when it doesn't hold, then the final check at the "if (c<=b && T[b&63]" is not that good, because (a%64) is not fixed in these cases. [And you can still use 128|M or 256|M etc]
Attached Files
File Type: c sssms.c (5.7 KB, 47 views)

Last fiddled with by R. Gerbicz on 2022-02-20 at 00:19 Reason: In text the constant was wrong for M.
R. Gerbicz is offline   Reply With Quote
Old 2022-02-20, 14:21   #7
fatphil
 
fatphil's Avatar
 
May 2003

111111102 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
OK, (almost) completely new approach.
Roughly seven times faster what could be your code, and 20 or so times faster than nz's original code for large input.
Sounds excellent! I've just started it on a known range that I ran with either my mark-I or mark-II code to check that the results are the same. Alas the chunks I was testing were multi-day on multi-core, so it could take more than a day to complete even if it's as fast as you say.

The latest version I'm running does have a p=3 sieve to reduce the work, and I was contemplating a p=5 layer too, but felt that just applying additional sieve layers was just getting small constant factors. Eventually N^2 would win, I was just putting off the inevitable.
fatphil is offline   Reply With Quote
Old 2022-02-20, 15:58   #8
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

10111010101012 Posts
Default

Assuming a ≤ b ≤ c [as on the web site and my previous post] we can use

a ≥ (1 + sqrtint(b^2 + c^2))^2 - b^2 - c^2

to rule out swathes of pairs (b, c). Let f be the fractional part of sqrt(b^2 + c^2).

If f is "too small," it will force a to be too large for a ≤ b to hold. We have

a ≥ (1 - f + sqrt(b^2 + c^2))^2 - b^2 - c^2, i.e.

(1) a ≥ (1 - f)^2 + 2*(1 - f)*sqrt(b^2 + c^2).

Since we are assuming a ≤ b ≤ c, we have

sqrt(b^2 + c^2) ≥ sqrt(2)*a. Therefore, we always have

2*(1 - f)*sqrt(2)*a < a, or

1 - f < 1/(2*sqrt(2)), or

(2) f > 1 - 1/(2*sqrt(2)).
Dr Sardonicus is offline   Reply With Quote
Old 2022-02-20, 17:50   #9
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

32×52×7 Posts
Default

Quote:
Originally Posted by fatphil View Post
Sounds excellent! I've just started it on a known range that I ran with either my mark-I or mark-II code to check that the results are the same. Alas the chunks I was testing were multi-day on multi-core, so it could take more than a day to complete even if it's as fast as you say.
OK, but testing smaller trivial ranges is also not that bad, because those contain many solutions so it is easier and quicker to find a problem, when observed that the code needs 64|M I've justed tests when it doesn't hold for the trivial a=1..1e6 range and not surprisingly it found only 80 solutions out of 87. [it is also giving that when the number of (known) solutions is small in a range a wrong M value could even find all solutions].

For larger ranges it is somewhat faster, but of course its is still testing const*(a1^2-a0^2) pair of (a,b) numbers for fixed M (when testing a=a0..a1).

You wouldn't go that large, but I think that a<2^31.5-M should be true to avoid overflow in unsigned 64 bits.
R. Gerbicz is offline   Reply With Quote
Old 2022-02-20, 20:31   #10
fatphil
 
fatphil's Avatar
 
May 2003

2×127 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
OK, but testing smaller trivial ranges is also not that bad, because those contain many solutions so it is easier and quicker to find a problem, when observed that the code needs 64|M I've justed tests when it doesn't hold for the trivial a=1..1e6 range and not surprisingly it found only 80 solutions out of 87. [it is also giving that when the number of (known) solutions is small in a range a wrong M value could even find all solutions].

For larger ranges it is somewhat faster, but of course its is still testing const*(a1^2-a0^2) pair of (a,b) numbers for fixed M (when testing a=a0..a1).

You wouldn't go that large, but I think that a<2^31.5-M should be true to avoid overflow in unsigned 64 bits.
Should I stop my test, or should I let it just complete so that everything it misses can be known? Am I right in thinking that all results are delivered in mod M order? The last thing printed was ~24000 mod M and that was way earlier today, so I'm well over a quarter of the way through. It's slow as I've given it just 1 core while the main code runs on 6 on an 8-core box (which isn't mine - I no longer have any boxes that can crunch numbers at all).

The range I'm testing is 60M-70M (it's the range futhest back in my tmux history), so well beyond the 26-bits needed for an exact square root. However, we did some tests, and the sqrt test works even when the sqrt loses precision, because the comparison is with the re-squared value as integers, it's not comparing a root with an integer as floats. We flung some test numbers at it, and couldn't get it to fail. Is there any other reason to think that 2^25 would be an upper limit to the code?
fatphil is offline   Reply With Quote
Old 2022-02-20, 20:57   #11
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

32×52×7 Posts
Default

Quote:
Originally Posted by fatphil View Post
Am I right in thinking that all results are delivered in mod M order? The last thing printed was ~24000 mod M and that was way earlier today, so I'm well over a quarter of the way through. It's slow as I've given it just 1 core while the main code runs on 6 on an 8-core box (which isn't mine - I no longer have any boxes that can crunch numbers at all).
Using one core that is true, and it is almost true for many cores/threads, because we're testing res values in increasing order, just check a%M values.

Quote:
Originally Posted by fatphil View Post
well beyond the 26-bits needed for an exact square root. However, we did some tests, and the sqrt test works even when the sqrt loses precision, because the comparison is with the re-squared value as integers, it's not comparing a root with an integer as floats. We flung some test numbers at it, and couldn't get it to fail. Is there any other reason to think that 2^25 would be an upper limit to the code?
Good question, but it seems that the result is exact for square numbers up to N<2^64:
Code:
    uint64_t missed=0;
    for(uint64_t i=0;i<4294967296ULL;i++)
        if(!issquare(i*i))missed++;
    printf("missed=%" PRIu64"\n",missed);
it gives no missing squares, so at least it won't miss any solution, but could give (some) false solutions, but this is still not the case, tested n^2-1 and n^2+1 values (for n>1), and reported all of them as non-squares. So likely it won't give false results.
R. Gerbicz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sums of all Squares davar55 Puzzles 183 2019-12-12 22:31
Basic Number Theory 12: sums of two squares Nick Number Theory Discussion Group 0 2016-12-11 11:30
Sums of three squares CRGreathouse Math 6 2009-11-06 19:20
Computer Factoring 2 Exponents Simultaneously? jinydu Lounge 13 2005-03-07 01:37
Identifing perfect squares and calculating square roots.. dsouza123 Math 2 2003-07-19 17:17

All times are UTC. The time now is 08:56.


Sun Sep 25 08:56:12 UTC 2022 up 38 days, 6:24, 0 users, load averages: 1.17, 1.38, 1.25

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔