How machine vision is driving automation

Machine vision is the fusion of a number of technologies that enable industrial or other automated equipment to derive from images an advanced understanding of the environment at hand. Without machine vision software, digital images with varying color values and tonal intensities would be nothing more than a simple, unconnected collection of pixels to such equipment. Machine vision allows a computer (usually connected to a machine controller) to detect edges and shapes in such images in order to allow a higher-level processor to recognize a pre-defined target object. Images in this sense are not limited to photographic images in the visible spectrum; they can also include images obtained using infrared, laser, X-ray and ultrasonic signals.

In industrial environments, it is quite common for machine vision applications to recognize specific parts from the many parts placed in a clutter of material bins. Here, machine vision helps the pick-and-place robot automatically pick up the correct parts. Of course, if the parts are all neatly arranged in the same orientation on a pallet, it would be relatively simple to recognize them with imaging feedback. However, powerful machine vision algorithms can recognize objects that are at different distances from the camera (and therefore appear as different sized images on the imaging sensor) as well as objects that are not oriented in the same direction as the camera.

The most sophisticated machine vision systems have enabled emerging designs that are far more complex than picking parts from bins; for example, there may be no more complex identification than a self-driving car.

machine vision

Techniques related to machine vision

The term machine vision is sometimes reserved for reference to more sophisticated and efficient mathematical methods that can extract information from images. By contrast, the term computer vision typically describes more modern, computationally demanding systems-including black-box approaches that use machine learning or artificial intelligence (AI). However, machine vision can also be used as an all-encompassing term that includes all methods of extracting high-level information from images; in this case, computer vision describes its underlying theory of operation.

Techniques that can extract high-level meaning from images abound. In the research community, such techniques are often considered distinct from machine vision. In reality, however, all of them are different ways of implementing machine vision... and they overlap in many cases.

Digital image processing is a form of digital signal processing that involves image enhancement, restoration, coding and compression. Advantages over analog image processing are the minimization of noise and distortion as well as the multitude of algorithms available. One of the first types of image enhancement was used to correct the first close-up images of the lunar surface. In this process, photogrammetric mapping as well as noise filters were used and corrections were made for geometric distortions caused by the alignment of the imaging camera to the lunar surface.

Digital image enhancement usually involves increasing contrast and possibly geometric corrections for viewing angles and lens distortion. Compression is often achieved by approximating complex signals as a combination of cosine functions-a Fourier transform known as the discrete cosine transform (DCT).The JPEG file format is the most common application of the DCT. Image restoration can also use the Fourier transform to remove noise and blur.

Photogrammetry uses some kind of feature recognition to extract measurements from images. These measurements can include 3D information when multiple images of the same scene are acquired from different locations. The simplest photogrammetric systems use a scale to measure the distance between two points in an image. To do this, it is often necessary to include a known reference scale in the image.

Feature detection allows the computer to recognize edges, corners or points in the image. This is the first step needed for photogrammetry and for recognizing objects and motion.Blob detection identifies areas with edges that are too smooth for edge or corner detection.

Pattern recognition is used to recognize specific objects. In the simplest case, this might mean finding a well-defined specific mechanical part on a conveyor belt.

3D reconstruction determines the 3D shape of an object from a 2D image. This feature can be realized by photogrammetric methods. In this case, the heights of common features (determined in images from different observation points) are determined using triangulation. 3D reconstruction is also possible using 2D images alone; here, the software also explains the geometrical relationships between edges or shaded areas.

Humans can reconstruct cubes simply by processing them in their brains using line drawing - using shaded circles to reconstruct spheres. The shading shows the slope of the surface. However, this derivation process is far more complex than one might think, as the shading is a one-dimensional parameter, while the slope occurs in a two-dimensional case. This can lead to ambiguous situations - a fact validated by the art of depicting physically impossible objects.

How machine vision tasks are sequenced

Many machine vision systems incorporate the above techniques incrementally by starting with low-level operations and then progressing to higher-level operations. At the lowest level, all pixels of an image are stored as high-bandwidth data. Each operation in the sequence then recognizes image features and represents the information of interest with a relatively small amount of data.

The first is the low level operation of image enhancement and restoration followed by feature detection. Thus in the case of using multiple sensors, the low-level operations can be performed by distributed processes specialized for individual sensors. Once features are detected in individual images, more advanced photogrammetry can be performed - as with any object recognition or other task that relies on combined data from multiple images and sensors.

Direct computation and learning algorithms

In the case of machine vision, direct computation is a set of mathematical functions defined by the programmer. These functions take inputs such as image pixel values and produce outputs such as object edge coordinates. In contrast, learning algorithms are not written directly by humans, but are trained on example datasets that correlate inputs with desired outputs. As a result, learning algorithms are used as black boxes. Most such machine learning now uses deep learning based on artificial neural networks for computation.

Simple machine learning for industrial applications tends to be more reliable and less computationally demanding when based on direct computation. Of course, there are limits to what can be achieved through direct computation. For example, one should never hope to perform faces to recognize the advanced recognition patterns required, and especially not from video footage in crowded public spaces. In contrast, machine learning can skillfully handle such applications. It is therefore not surprising that machine learning is increasingly being deployed for low-level machine vision operations, specifically image enhancement, restoration, and feature detection.

Improved teaching methods (not algorithms)

The increasing sophistication of deep learning techniques has made it clear that it is not the learning algorithms themselves that need to be improved, but rather the way in which the algorithms are trained. One improved training procedure is known as data-centered computer vision. Here, a deep learning system accepts a very robust training set consisting of thousands, millions, or even billions of images - and then saves the synthesized information extracted from each image by its algorithms. These algorithms learn efficiently by linking them to working examples, and then refer to an "answer book" to verify that the correct values have been derived.

There is an old cautionary tale about digital pattern recognition. The U.S. military once intended to use machine vision for target recognition, and a defense contractor's demonstration reliably identified both U.S. and Russian tanks. Tanks of all different kinds could be correctly distinguished, one after the other, from the supplier's aerial photographs. However, when tested again with the Pentagon's own image library, the system kept giving incorrect answers. The problem was that the defense contractors' pictures all depicted American tanks in the desert and Russian tanks on green fields. Instead of identifying the different tanks, the system identified the different colored backgrounds. What are the recognition criteria? Learning algorithms require carefully curated training data to work.

Conclusion: a safe vision for robotic workcells

Machine vision is no longer a niche technology. In the current drive, the industrial sector is the largest growth area for machine vision deployments. The most notable development in this area is how machine vision is now completing safety systems in industrial plants, i.e., systems that sound an alarm or give voice notifications when a worker enters a work area without a helmet, mask, or other appropriate protective gear. Machine vision can also be used in systems that alert when moving machinery, such as forklifts, get too close to personnel.

These and similar machine vision systems can sometimes replace hard protective measures around industrial robots to make operations more efficient. Machine vision systems can also replace or enhance safety systems based on light guarding that stop machinery whenever a worker is detected entering the work cell. When machine vision monitors the factory floor around a work cell, the robots in that cell have the potential to gradually slow down as people approach.

As the design of industrial environments evolves to accommodate collaborative robots and other workcell equipment that allow plant personnel to walk around safely (even while the equipment is running), these and other machine vision-based systems will become a more common part of plant processes.