mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2015-04-22, 03:19   #1
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

100101011011112 Posts
Default Real world problem, involving the the surface of the earth.

This is related to my work, so I need to obfuscate it to avoid problems....

Let's assume I am marine biologist (I am not). I must monitor for red tide problems in several bays and harbours. Each of these is divided up into zones, based upon the shore, inlets, storm drains, subsurface features, etc. So the zone out in the open may be rectangular, while others can be concave ribbons and other odd shapes. Zones are typically 25,000 square meters. There are over 200 individual zones that must be monitored each reporting period. I have a Magic Red Tide Particle Counter 9000™. It samples the seawater at a steady rate and reports the value on a screen and can feed this to a data collector (these readings can be GPS tagged). There are 2 forms of monitoring that must go on.

One is to find the mean value over the entire zone. If the average is less than 1,000 particles per cubic meter things are fine.

The second is to find all of the high concentrations of RT particles and investigate known potential trouble areas (pipes, stagnant areas) and anything that catches my eye (red areas, floating dead fish) . This generates a single peak value for the zone (ideally below the limit) and written reports of any areas that exceed 20,000 particles per cubic meter (these areas must be dosed with Special Anti-Red Tide Serum™ [patent pending], then checked later to make sure that they are back below 20,000).


While these 2 largely look similar, they currently are achieved 2 different ways (for good reasons).

The average is produced by motoring back and forth over a zone at a constant rate while running a constant rate pump into a tank. Once the zone has been covered to the required specs., the MRTPC 9000™ is deployed into the tank to find the average.

The trouble finding method is to put the meter into the water and watch the meter, motor around the zone, when the value goes up, slow down and tack back and forth. Also, motor to any areas with likely problems or investigate any oddities. While doing so, the speed of the boat may be very slow and many readings may be taken in a small area (to pin-point a problem or prove that there is not one).

What I would like to be able to do is:
Put the meter in the water, motor around at what ever rate I choose (slowing down or stopping if desired), in whatever path I choose (including across multiple zones), have all of that data GPS tagged. Then when I get back on shore, dump the .csv to a PC. Run a magic program. The program would take the data sort it out by zone (the boundaries are defined), then it would take what ever data there is and take a spacial average (mean) for each zone. So, if I spent 3 minutes in a 5 square meter spot that had very high numbers, these would only effect that area, and if I zipped along and readings were low and are 3 meters apart, the low reading would cover the broad area.

The troubles that I see are:
The zones can be very oddly shaped, making the spacial averaging difficult. (Defining each as a rectangle won't work, some would overlap.
The distance between readings can vary quite a bit so we can just sum them and take the mean. The spacial averaging has to extend to the edge of the zone, but no further. The effective area for each reading may be oddly shaped.
While doing investigations there may be multiple readings with the same GPS location, because the resolution of the GPS is lower than my ability to spot the MRTPC 9000™ and I may have seen spikes going by that I want to check on. These should be averaged before putting them into the larger average.

Any hints, ideas, good pseudo-code, pointers to reference works, etc. would be appreciated.
If I can home-brew a system that would work, it would great and reduce the work load.

This seemed to best fit in the programming forum.
Uncwilly is offline   Reply With Quote
Old 2015-04-22, 17:50   #2
TheMawn
 
TheMawn's Avatar
 
May 2013
East. Always East.

110101111112 Posts
Default

Is there anything mathematically special about the term "spatial average"? By my knowledge of the words "spatial" and "average" I certainly have a general understanding of what that means, but if there's anything special about the words then I don't know it.

If you're just looking for an average over the area, for example, particles per square meter, then that's fine. The oddly shaped zones could be problematic though unless they're laid out for a very specific reason such as natural currents creating a separation between two areas that would otherwise mix.


As much as I hate to attack your analogy with yet a different analogy, I think I am safe in not wandering back to your real story. There is a similar task in surverying. If you want to measure the amount of dirt in a big pile, you take a whole bunch of points around the base of the pile and then a whole bunch more on the pile itself, trying to generate an accurate 3-D depiction of the pile with discrete points.

The people in drafting use AutoCAD after that to figure out the volume of the shape. The points around the toe of the pile create a "base plane" and the points on the top are tessellated into triangles, making a whole bunch of triangular cylinders with vertical lines drawn from the surface down to the base plane.

In your case, there is no base plane. You can just use zero particles per square meter whereas surveyors don't start at sea level nor is the pile placed on a flat surface. If you convert your "particles per square meter" into a vertical distance, you can get a "volume" which, knowing the area of your zone, gives you an average height i.e. average particles per square meter.

This takes into account the very dense readings, because they generate thinner chunks.
TheMawn is offline   Reply With Quote
Old 2015-04-22, 17:57   #3
TheMawn
 
TheMawn's Avatar
 
May 2013
East. Always East.

11·157 Posts
Default

Unfortunately, AutoCAD is a bit of a black-box solution. I have no idea how it generates the triangles using all the points. I'm sure there's a fairly simple and "optimal" way of doing it. (EDIT: What I mean here is you could skip AutoCAD entirely but you would need your own summation scheme)

Also important to know is how your zones are defined. If you have GPS coordinates then you can draw them out in AutoCAD. Ideally you would want a measurement as close as possible to your boundaries because if you don't, AutoCAD can't know what the "elevation" is right on the boundaries, and assuming zero (which is true for the pile example) is inaccurate. AutoCAD might be able to interpolate using points on either side of the boundary, but this is well outside of my knowledge of the program.

Last fiddled with by TheMawn on 2015-04-22 at 17:58
TheMawn is offline   Reply With Quote
Old 2015-04-23, 01:39   #4
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

100101011011112 Posts
Default

Quote:
Originally Posted by TheMawn View Post
Is there anything mathematically special about the term "spatial average"? By my knowledge of the words "spatial" and "average" I certainly have a general understanding of what that means, but if there's anything special about the words then I don't know it.
No, it is just that there are temporal averaging schemes. One is from a company that wants everyone to 'save time' with their scheme. This only works if the rate of traverse is constant. This does not allow the investigative method to be done at the same time as the averaging method. It also discourages good investigation.
Quote:
The oddly shaped zones could be problematic though unless they're laid out for a very specific reason such as natural currents creating a separation between two areas that would otherwise mix.
They are defined, somewhere they are in AutoCAD files (that first started in the 1980's), but using pre-GPS co-ordinate system. With some cajoling I could get the powers that be to get the data converted.
I have attached some poor illustrations of how odd the zones can be. The first attachment shows the rough shape of 4 actual zones (each in a different color). The second shows how the 2 R zones have one completely within the outer corners of the other. The others are freehand illustrations of typical odd shapes.
Quote:
In your case, there is no base plane. You can just use zero particles per square meter whereas surveyors don't start at sea level nor is the pile placed on a flat surface. If you convert your "particles per square meter" into a vertical distance, you can get a "volume" which, knowing the area of your zone, gives you an average height i.e. average particles per square meter.

This takes into account the very dense readings, because they generate thinner chunks.
This sounds good. There are times when the values vary by 3 or 4 orders of magnitude from one decimeter to the next.
I would not want to use AutoCAD itself, because licensing issues, etc.

I will have to look more at tessellation.

Thanks for the help.
Attached Thumbnails
Click image for larger version

Name:	Zones1.jpg
Views:	88
Size:	61.0 KB
ID:	12531   Click image for larger version

Name:	Zones2.png
Views:	106
Size:	13.6 KB
ID:	12532  
Uncwilly is offline   Reply With Quote
Old 2015-04-23, 04:00   #5
jwaltos
 
jwaltos's Avatar
 
Apr 2012
Brady

2·33·7 Posts
Default

Putting myself in your shoes within this hypothetical scenario, these are some of the
baseline criteria I would have: get to know the area intimately so that on a day-to-day basis
I can sense normalcy and if something is the slightest bit off. Historical information that
covers the area as far back as possible which should comprise: prevailing currents and tidal flows, winds,
temperature gradients and the like over all time frames and seasonality. Knowing the aquaculture - the
sea smells different in different parts of the world for a reason - and the local ecosystems as well as the
people who travel the area. Satellite spectral and sonar imaging, magnetometers and gravimeters are some high tech. tools to consider.
Dive areas that you consider must be seen up close. Few things match being immersed in the environment you're studying.
Be prepared to recognize those things you aren't looking for. If you wipe the labels off all the graphs tables you look at you
should be able to spatially and temporally place what you are observing, where it was and when it occurred. Regarding analysis,
here are some dated (~2007) but useful papers: "Computational Intelligence Techniques: A study of Scleroderma Skin Disease", "Virtual Reality
Spaces: Visual Data Mining with a Hybrid Computational Intelligence Tool."
Regarding low-tech. resourcefulness, watch a few episodes of MacGyver..you can do magic with duct tape.

"In 2006, Anderson appeared in a MasterCard television commercial for xxx. In it, he manages to cut the ropes binding him to a chair using a pine tree air freshener, uses an ordinary tube sock as the pulley for a zip-line, and somehow repairs and hot-wires a nonfunctional truck using a paper clip, ballpoint pen, rubber band, tweezers, nasal spray, and a turkey baster. "

Last fiddled with by jwaltos on 2015-04-23 at 04:14 Reason: couldn't resist this bit from Wiki
jwaltos is offline   Reply With Quote
Old 2015-04-23, 05:13   #6
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

7·372 Posts
Default

Quote:
Originally Posted by jwaltos View Post
Putting myself in your shoes within this hypothetical scenario,
......
prevailing currents and tidal flows, winds, temperature gradients and the like over all time frames and seasonality. Knowing the aquaculture
Again, I am using marine biology as an obfuscation. The real problem could be ant counts in a forest, it could be bacteria on work tables, it could be radiation above a city, etc. It is about blending the two work types into one to save time, man-power, and money.
Quote:
Regarding analysis, here are some dated (~2007) but useful papers: "Computational Intelligence Techniques: A study of Scleroderma Skin Disease", "Virtual Reality Spaces: Visual Data Mining with a Hybrid Computational Intelligence Tool."
I will look into those, but they maybe a bit too deep for my computational skills.

If I get far enough along to challenge the big players in the industry, I have a friend that I would ask to partner with me. He wrote code for the first iteration of the ISS, back when it was Space Station Alpha. Later he sold his firm to one of top ten big players in the computer field.
Uncwilly is offline   Reply With Quote
Old 2015-04-23, 15:50   #7
jwaltos
 
jwaltos's Avatar
 
Apr 2012
Brady

2·33·7 Posts
Default

I was completely aware of your analogy and responded with my own. Nearly every aspect of what I had written could be substituted with an appropriate and equivalent model. I programmed with djgpp years ago (where an initial incarnation of "Doom" was developed) and found it to be a simple and robust environment. The two papers referenced lead to a number of others so I hope that something you turn up will be of use.
jwaltos is offline   Reply With Quote
Old 2015-04-23, 16:24   #8
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

3·977 Posts
Default

You have essentially two problems from what I can see:

1. Determining which measurements are relevant to a zone.
2. Calculating a weighted average for the zone.

I would use a tree structure of polygons using binary space partitioning. This will allow you to sort your measurements and coordinates into a list for each zone. You can then run your spatial averaging algorithm for all the measurements in a zone, quickly.

You can do better than taking a simple mean, but I am not the one who can tell you the best approach. A simple mean is going to be inaccurate if your measurements in the zone are not uniformly distributed. You should consult a data scientist. Perhaps post this question on http://datascience.stackexchange.com/ ?
Mark Rose is offline   Reply With Quote
Old 2015-04-23, 17:34   #9
TheMawn
 
TheMawn's Avatar
 
May 2013
East. Always East.

172710 Posts
Default

Well I suppose the super-high variance within less than a meter can be troublesome in terms of finding peak values. For averages though it's a bit less important.

The idea of "connecting all the points with triangles" is to take into account the irregular distribution of points. One just imagines that everything varies linearly between any of the two points. The more points you have, the better.

There is a lot of flexibility on the human end if they understand the data they're working with. For the dirt-pile analogy, if I can see that a certain 50m x 50m area is very, very flat, I can get away with taking maybe 9 measurements total whereas I've also had to plot a hundred points for a 20m x 20m pile because of how irregular it is.

Unfortunately the difference here is I can SEE the complicated areas, the big peaks and the big holes, whereas you have to actually drive over them first.


Everything submitted to drafting looks like a hodge-podge of co-ordinates and it doesn't take them very long to calculate nor do they complain about it so AutoCAD must be friendly enough in that regard.

Mark Rose clearly has more knowledge about handling that data than me. I have no idea how your own program would go so telling you to "make triangles" and "calculate their volume" gets you like 2% of the way there...
TheMawn is offline   Reply With Quote
Old 2015-04-23, 23:39   #10
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

7×372 Posts
Default

Quote:
Originally Posted by jwaltos View Post
I was completely aware of your analogy and responded with my own. Nearly every aspect of what I had written could be substituted with an appropriate and equivalent model.
Ok. I know my zones better than anyone. I have a reputation for discovering troubles that no one else does.

@TheMawn: We have looked at a system that uses a Trimble unit for the data collection. Sometimes the first 2% (finding the right path) is the most important.
"Unfortunately the difference here is I can SEE the complicated areas, the big peaks and the big holes, whereas you have to actually drive over them first."
Yes, sometimes my track would look like that of a drunkard.

Last fiddled with by Uncwilly on 2015-04-23 at 23:42
Uncwilly is offline   Reply With Quote
Old 2015-04-24, 00:17   #11
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

7·372 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
You have essentially two problems from what I can see:
....
You can do better than taking a simple mean, but I am not the one who can tell you the best approach. A simple mean is going to be inaccurate if your measurements in the zone are not uniformly distributed. You should consult a data scientist. Perhaps post this question on http://datascience.stackexchange.com/ ?
For the first part, yes there is a bit of finding which points go into which zone. I need to look at the Wiki link some more.

The 'standard' as laid out in the regulations for the averaging method is actual physical sampling at a uniform rate with a uniform rate of traverse and uniform spacing of straight lines. A few years ago this was reconfirmed, when the limit was lowered.
One group with a non-sampling solution is to do the same thing and taking data points at fixed intervals. That takes as much work (almost as the actual sampling). The math is easy, but it does not give a good reflection of where there are issues and it does not really take care of the work of the investigative method.

When doing investigative method:
Speed is not uniform. I slow down as the numbers increase. This does generate more data per unit area. There are great expanses that are fast and easy, because there are no issues (the fluctuations are below the sig fig range.)
I will tighten my spacing or change course. As I cross other traverse lines I generate more data between other points.
When I find trouble I may spend several minutes to define the issue fully.

If I can take 5000 readings and map them out in the zone, then find what the value should be for every 100 square cm area withing the 25,000 square meter zone, then take the mean, would that not be most reflective of a true average? The regulators only want a single number per zone (for the averaging method) with no +- or other additional qualifications.

You have given me a bunch to think about.


An additional goal is to also take the data and produce color coded maps showing higher readings it red. Also, with GPS tracking, these could be used to show changes over time and long term trouble areas.
(And it is also good for finding those that are not doing the work as required....)
Uncwilly is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Googling through the earth. Flatlander Puzzles 43 2012-11-19 16:03
meaning of earth and globe science_man_88 Lounge 0 2010-10-22 19:48
Weird problem probably involving hard drive jasong Information & Answers 7 2007-11-16 02:41
digging a hole through the earth. tha Puzzles 27 2006-01-10 05:58
Real hardware problem? BigRed Hardware 7 2002-11-28 17:50

All times are UTC. The time now is 22:01.

Sun May 16 22:01:39 UTC 2021 up 38 days, 16:42, 0 users, load averages: 1.25, 1.34, 1.49

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.