All About Monocular Cues and How We Use Them:
The human eye has two types of photoreceptors, one for each color channel (red, green, blue). Each type of photoreceptor is sensitive to different wavelengths of light.
For example red cones are sensitive to shorter wavelength light than green cones. When these two types of cone cells combine their signals they form a single image which is then processed by the brain to produce our perception of color.
Monocular vision allows us to see both colors at once, but it does not allow us to distinguish between them. If we look at a bright object such as sunlight or a sunset, all three channels will be used simultaneously.
This means that when we look through one pair of glasses, we may still perceive the sun’s rays as yellowish orange even though they come from a different direction than those coming through the other pairs of glasses.
We have two types of monocular vision: monocular depth perception and monocular motion perception. Depth perception refers to how well we can determine distance from objects that are close to us.
For example if I stand next to a table with my right hand resting on top of it, and you stand behind me holding your left arm out straight in front of you, we can easily tell where each other is without using any special equipment.
Monocular motion perception is a bit different. It allows us to tell the direction of movement based on the position of an object in our field of view.
If you wave your left arm from side to side, we can ‘see’ it moving from side to side even if our eyes remain fixed on a single spot.
Monocular depth and motion perception cues are completely different from each other, and the human body uses both of them in concert with each other to accomplish a variety of tasks.
Binocular Vision: This type of monocular cue requires an understanding of how we use our eyes to see objects. Each eye sees a slightly different image of an object, but our brain merges them into one three dimensional image.
Monocular cues that make use of this are known as retinal disparity. An example of this would be a drawing of the Eiffel tower. If we were to draw the Eiffel tower on a post-it note and hold it in front of you, your eyes would each see a different image because they are not on the same level. This cue is difficult to use without two points of reference.
Disparity Cones: Disparity cones are specialized cells located in the back of the eye which detect changes in light. They can tell us the direction of an object if a bright light is shining on it, but they’re not very good at telling us how far away it is.
Images Converging at a Point: If an object is far away, each eye will see a slightly different image of it because each eye is at a different distance from the object. The further away the object is, the greater the difference between the two images.
This can be seen by holding up a finger at arm’s length. If you close each eye in turn, and then open them again, you’ll see your finger double. This happens because each eye is sending a different image to your brain.
Images Overlapping: When an object is closer than the distance between our eyes, our two eyes will see the same image. But, due to the fact that each eye is positioned at a different angle, the image will not quite line up.
If you look at the corner of a bookshelf, for example, you’ll see one shelf overlapping the other.
Relative Size: When looking at two objects of different sizes, our brain can tell which is closer based on their size relative to each other. If one is bigger than the other, it’s more likely to be further away because size often decreases with distance.
High-Low: Our brain can tell which object is higher or lower than another based on their shadows. If we see a ball resting in the shade of a tree, for example, and another ball in direct sunlight, we can tell that the sunlit ball is further away because it’s larger.
Texture Gradients: Our brain can tell which surface is farther away from another by how much texture it displays. If we see a grassy field next to a asphalt parking lot, for example, our brain can deduce that the parking lot is further away because of how blurry it appears.
Monocular Motion Perception:
Retinal Moving: This type of monocular cue makes use of the fact that our retina has receptors that send messages to the brain at different speeds. If something is moving, our brain knows that it’s farther away than something at rest.
Our brain uses this cue to help our sense of distance because we know how far away our eyes are from everything else in the world.
Relative Motion: If we see something moving and something else that isn’t, our brain can tell which one is closer based on their relative motion. If we are in a rowboat, for example, and see a lighthouse close to the shore slowly coming into view, our brain knows that we are moving toward the lighthouse and it must be closer to us than the shore.
Our brain works in three dimensions, so if we were looking at a mountain range from an angle, our brain could tell which mountain was closest to us based on how much it had rotated during the time between each flash of the lighthouse.
Familiar Size: Our brain can tell how far away something is based on our past experience with similar objects. If we see an animal that is about the same size as a cat, for example, even if it is much farther away than a cat would be, our brain knows that it must be much smaller than a cat due to the distance and therefore can’t be a cat.
This is often referred to as “size constancy”.
Retinal Grain: Our eyes receive a low resolution image of the world, and when something is far away, the smallest details are blurry and hard to distinguish. We can tell how far away an object is based on how blurry it appears.
If we see a mountain far in the distance, for example, it will be blurry compared to when we look at nearer objects.
Relative Speed: If we see something moving and have some idea what its approximate size is, our brain can tell how far away it is based on how fast it appears to be moving. If we are in a rowboat, for example, and see a light house far in the distance, but less than a minute later it seems to “close” on the lighthouse, our brain knows that it must be moving faster than the boat toward the lighthouse.
Surface Reflectance: Different surfaces reflect light differently. While we can’t tell the exact color of a surface without a reference, we can tell how bright or dark a surface is compared to how it appears in other lighting conditions.
Our brain knows that if a surface looks bright in one lighting condition, then it’s probably white in that lighting condition, because most surfaces reflect light at approximately the same brightness levels.
Light – Dark Contrast: Extreme differences between light and dark are more easily seen than less extreme, so our brain knows that a surface that is very bright in one lighting condition is probably white in that lighting condition.
Highlights and Shadows: Reflections of light can also tell us something about the surface. If we see a highlight on a dark object, for example, our brain assumes that this part of the surface is reflecting light onto it.
If we see a dark shadow on a light object, our brain assumes that this part of the surface is blocking the light from hitting it.
Shading: Because we can see things in three dimensions our brain can tell how “deep” an object appears to be. It knows that an object that is far away will probably not have much detail and will tend to fade toward the back.
It also knows that objects up close tend to be in “sharp focus” and have lots of detail. Looking at a map, for example, we know that mountains in the distance appear to be blue and fuzzy, while mountains up close are full of detail and not blue.
Sources & references used in this article:
Stereo using monocular cues within the tensor voting framework by P Mordohai, G Medioni – IEEE Transactions on Pattern Analysis …, 2006 – ieeexplore.ieee.org
Depth Estimation Using Monocular and Stereo Cues. by A Saxena, J Schulte, AY Ng – IJCAI, 2007 – aaai.org
Fusion of monocular cues to detect man-made structures in aerial imagery by JA Shufelt, DM McKeown – CVGIP: Image Understanding, 1993 – Elsevier
Integration of binocular disparity and monocular cues at near threshold level by M Ichikawa, S Saida, A Osa, K Munechika – Vision Research, 2003 – Elsevier