20170420, 08:24  #1 
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
3·3,613 Posts 
Experiments in photometry
Tom provided me with some images of the Pleaides. Pretty pictures derived from that data have been posted in another thread. Since then I've been learning how to do photometry.
First off, all the images with reasonably round stars were stacked together to increase the SNR. Then the DAOPHOT package in IRAF was used to find the stars in the DSLR green channel and perform aperture photometry  an interesting learning exercise in its own right. This produced a list of around 20K "stars", some of which are doubtless peaks in the sky noise. Some genuine stars were missed because their images were saturated. Star coordinates are in pixels and magnitudes have an arbitrary zero offset. A search of the NOMAD1 catalogue for RA,Dec coordinates and BVR magnitudes yielded 5429 entries when constrained to a 2degree square with all magnitudes brighter than 17.0. IRAF's wcsctrans utility converted the catalogue's coordinates to pixels and then xyxymatch produced a list of correspondences between the two files. From this data I made a file with measured magnitudes (which I call M) matched with the corresponding B, V and R magnitudes  1239 of them. Then the fun began. Some matches were clearly coincidences  faint NOMAD1 stars matched with noise. Some were double identifications: two close NOMAD1 stars unresolved in Tom's images. To see what was going on the file was loaded into R (the dataanalysis package, not the photometry band!) and various least squares fits performed. This was in itself a learning exercise because I really don't know how to drive R. As expected, the faintest matches showed enormous residuals so I slung out everything fainter than V=14 and refitted. The initial fits showed what background reading has lead me to expect: the DSLR green channel is very well correlated with the V band but less so with B and R. Good. It also showed a number of outliers scattered pretty much at random, presumably due to false matches to faint stars or unresolved doubles, and quite a lot of scatter at the bright end, where it's much more difficult to measure the star images. Removing those yielded the following results. Code:
lm(formula = M ~ V, data = df) Residuals: Min 1Q Median 3Q Max 0.66014 0.15829 0.02395 0.13325 0.88001 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 3.560642 0.086998 40.93 <2e16 *** V 1.000470 0.007241 138.17 <2e16 ***  Signif. codes: 0 β***β 0.001 β**β 0.01 β*β 0.05 β.β 0.1 β β 1 Residual standard error: 0.2242 on 450 degrees of freedom Multiple Rsquared: 0.977, Adjusted Rsquared: 0.9769 Fstatistic: 1.909e+04 on 1 and 450 DF, pvalue: < 2.2e16 Code:
lm(formula = M ~ V + V2, data = df) Residuals: Min 1Q Median 3Q Max 0.58434 0.14479 0.01454 0.10051 0.85987 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 7.093384 0.433187 16.375 < 2e16 *** V 0.337277 0.080168 4.207 3.12e05 *** V2 0.030318 0.003652 8.302 1.21e15 ***  Signif. codes: 0 β***β 0.001 β**β 0.01 β*β 0.05 β.β 0.1 β β 1 Residual standard error: 0.209 on 449 degrees of freedom Multiple Rsquared: 0.98, Adjusted Rsquared: 0.9799 Fstatistic: 1.102e+04 on 2 and 449 DF, pvalue: < 2.2e16 A plot of the residuals from the linear is attached. It's broadly as I expected, with more scatter at the faint end, but the residuals are disappointingly large and there's an obvious correlation up to V=11 or so. Not sure how to proceed from here but I suspect that paying closer attention to DAOPHOT would be a good start. Alternatively, perhaps the data just isn't good enough for photometry below V=11. All this series of harangues have been documenting my attempt to educate myself. I think I'm learning, slowly. 
20170420, 16:22  #2 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2^{4}×3^{2}×41 Posts 
One way of combatting the heteroscedasticity you are seeing is log transforming the dependent variable. Once the result is back transformed you should be able to predict M better with V, if that is your aim.
It looks like you may be calculating the residuals by hand. A shortcut is the residuals function. Plotting the linear model object produces some plots that might help identify if there are points that are huge outliers skewing your linear model. I would imagine that some of the residuals are from misclassified stars for example. It does look like there is something that significantly changes when V reaches ~11. Would the sensor lose accuracy once the "correct" V goes above 11? Assuming that the camera was perfect you should expect a fit of 0+1*V. Given that the slope of very close to this I am surprised by the apparent slope less than 11. I suspect that there are two separate things going on messing up this fit. Any modelling will only end up correcting errors since we know that the slope should be 1. 
20170420, 17:03  #3  
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
3·3,613 Posts 
Quote:
Not sure why you are surprised. The fit is Code:
V 1.000470 0.007241 138.17 <2e16 *** The fit should not have an intercept of zero because the photometry was with respect to an arbitrary zero magnitude. Without precise knowledge of the sky brightness in terms of photons per square arcsec per second, an equally precise knowledge of the properties of the optics and detectors, and knowledge of the air mass between the star and the detector, it's impossible to perform absolute photometry. Differential photometry by its very nature typically has a nonzero constant offset. At the faint end there will inevitably be a much greater scatter because the estimated star photons become an ever smaller fraction of the estimated sky photons and of the intrinsic noise in the detector. Why there appears to be a breakdown in linearity escapes me at the moment. It may well arise from my inexperience at performing the initial aperture photometry. That's something I'm going to explore further. Thanks for the tip about log transformation. Something else to explore. I can see that you know vastly more about R and data modelling than I. Consider yourself on notice for further questioning. 

20170420, 17:15  #4  
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
3×3,613 Posts 
Quote:
Code:
summary (foo) Call: lm(formula = M ~ V, data = df) Residuals: Min 1Q Median 3Q Max 0.16135 0.04791 0.01078 0.04423 0.18402 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 4.803201 0.085455 56.21 <2e16 *** V 0.868028 0.009436 92.00 <2e16 ***  Signif. codes: 0 β***β 0.001 β**β 0.01 β*β 0.05 β.β 0.1 β β 1 Residual standard error: 0.07683 on 60 degrees of freedom Multiple Rsquared: 0.993, Adjusted Rsquared: 0.9928 Fstatistic: 8463 on 1 and 60 DF, pvalue: < 2.2e16 The residual plot is attached. It looks fine to my untutored eye. Last fiddled with by xilman on 20170420 at 17:17 

20170420, 20:16  #5  
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
5904_{10} Posts 
Quote:
Care is needed removing outliers. Any fit will be good if all the outliers are removed. Quote:


20170420, 20:22  #6  
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
1710_{16} Posts 
Quote:
I don't think we are going to win with this data. I suspect that a piecewise model is needed. Even then whenever we divide the dataset there is still some appearance of a pattern. I would not like to get this data from a medical trial. 

20170420, 20:40  #7  
If I May
"Chris Halsall"
Sep 2002
Barbados
2^{3}·1,231 Posts 
Quote:
OpenCV was initially underwritten by Intel, and is still being supported. Codifying known and published algorithms. The libraries are licence under the BSD licence. Could this possibly help? 

20170421, 09:02  #8  
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
3×3,613 Posts 
Quote:
General philosophy aside, I can make the data available for others to analyze, though I suspect not many here will want half a gig of NEFformat images. More likely is the table of matched NOMAD1 values and DAOPHOTreduced observational magnitudes and their estimated errors. Given a day or so I could also download neighbouring areas of NOMAD1 data and match them with the observations. This effort should produce another few thousand entries. 

20170421, 09:08  #9 
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
3×3,613 Posts 
Another thought occurred overnight. Some of the stars are almost certain to be variables. If their brightness at the time of observation was significantly different from their catalogued value the residuals would be larger. There are (at least) two issues which could bias the selection. Those variables which are too faint to reach the cutoff would be removed from consideration, despite reaching the NOMAD1 criterion. Another is that I do not know, yet, whether NOMAD1 records maximum, minimum or some kind of weighted mean magnitudes for variables. Whichever is chosen there is an opportunity for bias.

20170423, 19:18  #10 
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
25127_{8} Posts 
OK, here's the summary of the raw fit to four times as many catalogue entries. Raw in that no attempt has been made to remove outliers or data with magnitudes way fainter than can be plausibly justified.
To be clear, what I'm interested in is finding V given a value of M measured by aperture photometry on a stacked set of images. Code:
> summary (foo) Call: lm(formula = V ~ M, data = df) Residuals: Min 1Q Median 3Q Max 6.0316 0.5887 0.4071 0.0404 6.9252 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 2.06930 0.24615 8.407 <2e16 *** M 0.93904 0.01507 62.326 <2e16 ***  Signif. codes: 0 β***β 0.001 β**β 0.01 β*β 0.05 β.β 0.1 β β 1 Residual standard error: 1.151 on 2886 degrees of freedom Multiple Rsquared: 0.5737, Adjusted Rsquared: 0.5736 Fstatistic: 3884 on 1 and 2886 DF, pvalue: < 2.2e16 Last fiddled with by xilman on 20170423 at 19:19 Reason: Remeber to upload images 
20170423, 19:24  #11 
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
2A57_{16} Posts 
ISTM from fit1.jpg that if I arbitrary remove all data with abs(residual) greater than, say, 0.1 the remainder will enable me to predict V from M as required. I've absolutely no theoretical justification for making that decision. Any ideas and analysis are very welcome.
Last fiddled with by xilman on 20170423 at 19:28 Reason: s/if if/if/ 