MIT gives AI a more accurate view of the world


3

Thanks to a new system for identifying images down to the pixel, the “vision” of artificial intelligences is a little closer to that of humans.

For a human being, it is relatively easy to recognize the elements of a scene thanks to his sight. From an early age, a child is able to differentiate a cat from a dog, regardless of the environment in which the animal is found.

For a digital system, this learning process is much more complicated and requires approximately 800 hours of manual labeling on sample images in order to achieve satisfactory accuracy. In order to endow AIs with a perception closer to our own, MIT researchers have developed STEGO, a pixel-perfect identification algorithm.

What this algorithm changes

Currently, object recognition involves a human drawing a box around a specific object in an image. This manipulation is well known to the general public since it is the very basis of CAPTCHAs (where the user roughs out the work by selecting the boxes where a traffic light appears, for example).

©MIT CSAIL- Differentiation of objects according to processing method

©MIT CSAIL- Differentiation of objects according to processing method

The problem is that other elements are associated with the object. For example, a dog sitting in the grass will result in a space containing the dog, but also some grass. STEGO will use a semantic segmentation technique to apply a label to each pixel. The “dog” object will therefore no longer contain traces of grass, a bit like using the magnetic lasso in Photoshop instead of the polygonal lasso.

The problem with this technique is that each of the 65,536 pixels of an image would have to be processed one by one for a system that requires thousands (even hundreds of thousands) of images. A task that would quickly become impossible if the algorithm did not look for similar objects in a database in order to complete its learning.

Find the solution before humans

In the case of complicated images, such as medical or space imagery, it is difficult to find a precise element without the knowledge of an expert. Except that with the emergence of new fields and with technological progress, it happens that even an expert finds it difficult to analyze an image.

“In this type of situation, where you want to design a method to operate at the limits of science, you can’t rely on humans to find the solution before machines,” said Mark Hamilton, an MIT doctoral student and engineer. software at Microsoft.

©MIT CSAIL- The world seen by STEGO

©MIT CSAIL- The world seen by STEGO

For this, STEGO is trained on a wide variety of images, from the interior of a house to high altitude shots. With twice the performance of previous semantic segmentation systems, STEGO can distinguish roads from vegetation or buildings from a spatial view or even produce a much clearer image of its environment from images captured by autonomous cars.

Nevertheless, as powerful as it is, the algorithm has its limits. If he is able to identify oatmeal or pasta as a “food product”, he finds it very difficult to tell them apart. Similarly, STEGO handles absurd images very poorly, such as that of a banana placed on a telephone handset. The team continues its work to bring a little more flexibility to the algorithm in a future version.



Source link -98