mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Hobbies > Astronomy

Reply
 
Thread Tools
Old 2017-04-20, 08:24   #1
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

3·3,613 Posts
Default Experiments in photometry

Tom provided me with some images of the Pleaides. Pretty pictures derived from that data have been posted in another thread. Since then I've been learning how to do photometry.

First off, all the images with reasonably round stars were stacked together to increase the SNR. Then the DAOPHOT package in IRAF was used to find the stars in the DSLR green channel and perform aperture photometry --- an interesting learning exercise in its own right. This produced a list of around 20K "stars", some of which are doubtless peaks in the sky noise. Some genuine stars were missed because their images were saturated. Star coordinates are in pixels and magnitudes have an arbitrary zero offset.

A search of the NOMAD1 catalogue for RA,Dec coordinates and BVR magnitudes yielded 5429 entries when constrained to a 2-degree square with all magnitudes brighter than 17.0. IRAF's wcsctrans utility converted the catalogue's coordinates to pixels and then xyxymatch produced a list of correspondences between the two files. From this data I made a file with measured magnitudes (which I call M) matched with the corresponding B, V and R magnitudes --- 1239 of them.

Then the fun began. Some matches were clearly coincidences --- faint NOMAD1 stars matched with noise. Some were double identifications: two close NOMAD1 stars unresolved in Tom's images. To see what was going on the file was loaded into R (the data-analysis package, not the photometry band!) and various least squares fits performed. This was in itself a learning exercise because I really don't know how to drive R. As expected, the faintest matches showed enormous residuals so I slung out everything fainter than V=14 and refitted.

The initial fits showed what background reading has lead me to expect: the DSLR green channel is very well correlated with the V band but less so with B and R. Good. It also showed a number of outliers scattered pretty much at random, presumably due to false matches to faint stars or unresolved doubles, and quite a lot of scatter at the bright end, where it's much more difficult to measure the star images. Removing those yielded the following results.

Code:
lm(formula = M ~ V, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.66014 -0.15829 -0.02395  0.13325  0.88001 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3.560642   0.086998   40.93   <2e-16 ***
V           1.000470   0.007241  138.17   <2e-16 ***
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Residual standard error: 0.2242 on 450 degrees of freedom
Multiple R-squared:  0.977,	Adjusted R-squared:  0.9769 
F-statistic: 1.909e+04 on 1 and 450 DF,  p-value: < 2.2e-16
Plotting the data suggested a quadratic might give better results (I have no theoretical justification). This happened:
Code:
lm(formula = M ~ V + V2, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.58434 -0.14479 -0.01454  0.10051  0.85987 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 7.093384   0.433187  16.375  < 2e-16 ***
V           0.337277   0.080168   4.207 3.12e-05 ***
V2          0.030318   0.003652   8.302 1.21e-15 ***
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Residual standard error: 0.209 on 449 degrees of freedom
Multiple R-squared:   0.98,	Adjusted R-squared:  0.9799 
F-statistic: 1.102e+04 on 2 and 449 DF,  p-value: < 2.2e-16
The fit is not noticeably better.

A plot of the residuals from the linear is attached. It's broadly as I expected, with more scatter at the faint end, but the residuals are disappointingly large and there's an obvious correlation up to V=11 or so. Not sure how to proceed from here but I suspect that paying closer attention to DAOPHOT would be a good start. Alternatively, perhaps the data just isn't good enough for photometry below V=11.

All this series of harangues have been documenting my attempt to educate myself. I think I'm learning, slowly.
Attached Thumbnails
Click image for larger version

Name:	M45_V_residuals.jpg
Views:	115
Size:	41.5 KB
ID:	15953  
xilman is online now   Reply With Quote
Old 2017-04-20, 16:22   #2
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

24×32×41 Posts
Default

One way of combatting the heteroscedasticity you are seeing is log transforming the dependent variable. Once the result is back transformed you should be able to predict M better with V, if that is your aim.
It looks like you may be calculating the residuals by hand. A shortcut is the residuals function.
Plotting the linear model object produces some plots that might help identify if there are points that are huge outliers skewing your linear model. I would imagine that some of the residuals are from misclassified stars for example.

It does look like there is something that significantly changes when V reaches ~11. Would the sensor lose accuracy once the "correct" V goes above 11?
Assuming that the camera was perfect you should expect a fit of 0+1*V. Given that the slope of very close to this I am surprised by the apparent slope less than 11. I suspect that there are two separate things going on messing up this fit. Any modelling will only end up correcting errors since we know that the slope should be 1.
henryzz is offline   Reply With Quote
Old 2017-04-20, 17:03   #3
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

3·3,613 Posts
Default

Quote:
Originally Posted by henryzz View Post
One way of combatting the heteroscedasticity you are seeing is log transforming the dependent variable. Once the result is back transformed you should be able to predict M better with V, if that is your aim.
It looks like you may be calculating the residuals by hand. A shortcut is the residuals function.
Plotting the linear model object produces some plots that might help identify if there are points that are huge outliers skewing your linear model. I would imagine that some of the residuals are from misclassified stars for example.

It does look like there is something that significantly changes when V reaches ~11. Would the sensor lose accuracy once the "correct" V goes above 11?
Assuming that the camera was perfect you should expect a fit of 0+1*V. Given that the slope of very close to this I am surprised by the apparent slope less than 11. I suspect that there are two separate things going on messing up this fit. Any modelling will only end up correcting errors since we know that the slope should be 1.
Ididn't quote everything. In practice, I ran foo <- lm (...) and followed up with residuals(foo) and plot(foo). Those were used iteratively to remove outliers and suggest where the limiting magnitude should be.

Not sure why you are surprised. The fit is
Code:
V           1.000470   0.007241  138.17   <2e-16 ***
which states the slope differs from unity by 1/16 the standard error in the estimate for the slope. Seems pretty convincing to me. Anyway, the correlation need not be perfect because the absorption curves of the standard V filter and the Nikon G filter are quite different so if there is a systematic trend in the stellar population (fainter stars tend to have a more prominent absorption line for example) the correlation would be expected to be poorer.

The fit should not have an intercept of zero because the photometry was with respect to an arbitrary zero magnitude. Without precise knowledge of the sky brightness in terms of photons per square arcsec per second, an equally precise knowledge of the properties of the optics and detectors, and knowledge of the air mass between the star and the detector, it's impossible to perform absolute photometry. Differential photometry by its very nature typically has a non-zero constant offset.

At the faint end there will inevitably be a much greater scatter because the estimated star photons become an ever smaller fraction of the estimated sky photons and of the intrinsic noise in the detector. Why there appears to be a breakdown in linearity escapes me at the moment. It may well arise from my inexperience at performing the initial aperture photometry. That's something I'm going to explore further.

Thanks for the tip about log transformation. Something else to explore.

I can see that you know vastly more about R and data modelling than I. Consider yourself on notice for further questioning.
xilman is online now   Reply With Quote
Old 2017-04-20, 17:15   #4
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

3×3,613 Posts
Default

Quote:
Originally Posted by henryzz View Post
It does look like there is something that significantly changes when V reaches ~11. Would the sensor lose accuracy once the "correct" V goes above 11?
Assuming that the camera was perfect you should expect a fit of 0+1*V. Given that the slope of very close to this I am surprised by the apparent slope less than 11. I suspect that there are two separate things going on messing up this fit. Any modelling will only end up correcting errors since we know that the slope should be 1.
Ah, now I understand why you are surprised. The fit for V<11 is summarized as
Code:
summary (foo)

Call:
lm(formula = M ~ V, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.16135 -0.04791 -0.01078  0.04423  0.18402 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 4.803201   0.085455   56.21   <2e-16 ***
V           0.868028   0.009436   92.00   <2e-16 ***
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Residual standard error: 0.07683 on 60 degrees of freedom
Multiple R-squared:  0.993,	Adjusted R-squared:  0.9928 
F-statistic:  8463 on 1 and 60 DF,  p-value: < 2.2e-16
and the slope is significantly different from 1. Other than handwaving about different spectral pass bands I have no explanation.

The residual plot is attached. It looks fine to my untutored eye.
Attached Thumbnails
Click image for larger version

Name:	foobar.jpg
Views:	106
Size:	26.4 KB
ID:	15955  

Last fiddled with by xilman on 2017-04-20 at 17:17
xilman is online now   Reply With Quote
Old 2017-04-20, 20:16   #5
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

590410 Posts
Default

Quote:
Originally Posted by xilman View Post
Ididn't quote everything. In practice, I ran foo <- lm (...) and followed up with residuals(foo) and plot(foo). Those were used iteratively to remove outliers and suggest where the limiting magnitude should be.
It wasn't clear how much you had done/knew about R.
Care is needed removing outliers. Any fit will be good if all the outliers are removed.

Quote:
I can see that you know vastly more about R and data modelling than I. Consider yourself on notice for further questioning.
It's my job.
henryzz is offline   Reply With Quote
Old 2017-04-20, 20:22   #6
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

171016 Posts
Default

Quote:
Originally Posted by xilman View Post
The residual plot is attached. It looks fine to my untutored eye.
Depends on how fussy you are. Ideally the lowess fit would be a straight line. There is still a bit much of a pattern for me.
I don't think we are going to win with this data. I suspect that a piecewise model is needed. Even then whenever we divide the dataset there is still some appearance of a pattern.
I would not like to get this data from a medical trial.
henryzz is offline   Reply With Quote
Old 2017-04-20, 20:40   #7
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23·1,231 Posts
Default

Quote:
Originally Posted by henryzz View Post
I suspect that a piecewise model is needed. Even then whenever we divide the dataset there is still some appearance of a pattern.
Just putting this out there for consideration...

OpenCV was initially underwritten by Intel, and is still being supported. Codifying known and published algorithms. The libraries are licence under the BSD licence.

Could this possibly help?
chalsall is online now   Reply With Quote
Old 2017-04-21, 09:02   #8
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

3×3,613 Posts
Default

Quote:
Originally Posted by henryzz View Post
Depends on how fussy you are. Ideally the lowess fit would be a straight line. There is still a bit much of a pattern for me.
I don't think we are going to win with this data. I suspect that a piecewise model is needed. Even then whenever we divide the dataset there is still some appearance of a pattern.
I would not like to get this data from a medical trial.
Quite often in astronomy you have to take the data you are given. That's often the case in medicine too but you can run more trials to examine a long-lived phenomenon in either science. Short-lived phenomena, such as a cometary collision with Jupiter or the teratogenic effects of Thalidomide are more constrained.

General philosophy aside, I can make the data available for others to analyze, though I suspect not many here will want half a gig of NEF-format images. More likely is the table of matched NOMAD-1 values and DAOPHOT-reduced observational magnitudes and their estimated errors.

Given a day or so I could also download neighbouring areas of NOMAD-1 data and match them with the observations. This effort should produce another few thousand entries.
xilman is online now   Reply With Quote
Old 2017-04-21, 09:08   #9
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

3×3,613 Posts
Default

Quote:
Originally Posted by henryzz View Post
It does look like there is something that significantly changes when V reaches ~11. Would the sensor lose accuracy once the "correct" V goes above 11?
Another thought occurred overnight. Some of the stars are almost certain to be variables. If their brightness at the time of observation was significantly different from their catalogued value the residuals would be larger. There are (at least) two issues which could bias the selection. Those variables which are too faint to reach the cut-off would be removed from consideration, despite reaching the NOMAD-1 criterion. Another is that I do not know, yet, whether NOMAD-1 records maximum, minimum or some kind of weighted mean magnitudes for variables. Whichever is chosen there is an opportunity for bias.
xilman is online now   Reply With Quote
Old 2017-04-23, 19:18   #10
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

251278 Posts
Default

OK, here's the summary of the raw fit to four times as many catalogue entries. Raw in that no attempt has been made to remove outliers or data with magnitudes way fainter than can be plausibly justified.

To be clear, what I'm interested in is finding V given a value of M measured by aperture photometry on a stacked set of images.
Code:
> summary (foo)

Call:
lm(formula = V ~ M, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.0316 -0.5887 -0.4071 -0.0404  6.9252 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.06930    0.24615  -8.407   <2e-16 ***
M            0.93904    0.01507  62.326   <2e-16 ***
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Residual standard error: 1.151 on 2886 degrees of freedom
Multiple R-squared:  0.5737,	Adjusted R-squared:  0.5736 
F-statistic:  3884 on 1 and 2886 DF,  p-value: < 2.2e-16
All four plots this time because the others may mean more to some people (specifically henryzz) than they do to me and might tell us something important.
Attached Thumbnails
Click image for larger version

Name:	fit1.jpg
Views:	109
Size:	45.8 KB
ID:	15972   Click image for larger version

Name:	fit2.jpg
Views:	104
Size:	22.1 KB
ID:	15973   Click image for larger version

Name:	fit3.jpg
Views:	115
Size:	71.8 KB
ID:	15974   Click image for larger version

Name:	fit4.jpg
Views:	100
Size:	39.8 KB
ID:	15975  

Last fiddled with by xilman on 2017-04-23 at 19:19 Reason: Remeber to upload images
xilman is online now   Reply With Quote
Old 2017-04-23, 19:24   #11
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2A5716 Posts
Default

ISTM from fit1.jpg that if I arbitrary remove all data with abs(residual) greater than, say, 0.1 the remainder will enable me to predict V from M as required. I've absolutely no theoretical justification for making that decision. Any ideas and analysis are very welcome.

Last fiddled with by xilman on 2017-04-23 at 19:28 Reason: s/if if/if/
xilman is online now   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 17:25.


Sat Sep 18 17:25:05 UTC 2021 up 57 days, 11:54, 0 users, load averages: 2.15, 2.10, 1.75

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.