20140327, 11:27  #1 
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
13600_{8} Posts 
Bunghole linear regression
I am trying to fit a linear regression to the borehole function using latin hypercube sampling.
http://www.sfu.ca/~ssurjano/borehole.html Outside of a narrow range of points(46252) I get warnings about rank deficiency when I run the regress function with upto quadratic interactions. Lower than 46 obviously won't work as there are 45 predictors. Why am I having trouble with more than 252? The code I am using: Code:
tNorm= @(mu, sigma, L, U, N, D) norminv((1D)*normcdf(L,mu,sigma)+D*normcdf(U,mu,sigma),mu,sigma); n=2^84; X=ones(n,1); X(:,2:9)=lhsdesign(n,8); X(:,2)=tNorm(0.10,0.0161812,0.05,0.15,n,X(:,2)); X(:,3)=exp(tNorm(7.71,1.0056,log(100),log(50000),n,X(:,3))); X(:,4)=X(:,4)*(11560063070)+63070; X(:,5)=X(:,5)*(1110990)+990; X(:,6)=X(:,6)*(11663.10)+63.10; X(:,7)=X(:,7)*(820700)+700; X(:,8)=X(:,8)*(16801120)+1120; X(:,9)=X(:,9)*(120459855)+9855; Y=(2*pi*X(:,4).*(X(:,5)X(:,7)))./(log(X(:,3)./X(:,2)).*(1+(2*X(:,8).*X(:,4))./(log(X(:,3)./X(:,2)).*X(:,2).^2.*X(:,9)) + X(:,4)./X(:,6))); col=size(X,2)+1; for i=2:9 for j=i:9 X(:,col)=X(:,i).*X(:,j); col=col+1; %X=cat(2,X,X(:,i).*X(:,j)); end end lm=regress(Y,X); residuals=abs(Y(lm'*(X'))'); rss=sum(residuals.^2); 
20140330, 14:53  #2 
Jun 2005
USA, IL
193 Posts 
Are the columns of X linearly dependent when n is over 252? (which would cause regress to start setting as many variables to zero as necessary to attempt resolution of the rank deficiency)

20140330, 17:21  #3 
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
1780_{16} Posts 
That is what everything points to. Even when I make all the predictors a uniform distribution I get this problem.
I get the same problem when using a function to produce LPTAU sampling(with the same range working). If they have the same properties like this I would be surprised. 
20140411, 15:10  #4 
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
2^{7}×47 Posts 
I think I have got to the bottom of this.
Due to it being a polynomial model I had introduced multicollinearity. https://www.google.co.uk/search?q=mu...PM6n8gP8qIDoBQ For some reason that still didn't solve it completely. r still had high multicollinearity with its square and its interactions with other variables. I have started fitting the logarithm of r instead which has removed the problem. I am not certain whether that was the correct course of action though. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
restarting nfs linear algebra  cubaq  YAFU  2  20170402 11:35 
Restarting linear algebra  wombatman  Msieve  2  20131009 15:54 
A nonlinear differential equation  Random Poster  Math  2  20100718 22:31 
Linear algebra at 600%  CRGreathouse  Msieve  8  20090805 07:25 
Linear algebra crashes  10metreh  Msieve  3  20090202 08:34 