Search by photo, trend or demand?
BY LONG LONG YU, CHIEF SCIENTIST
Today we can say that we are living an authentic search engine revolution, especially incentivized by the peak of smartphones, which have burst in our lives changing even our habits. But the conventional search engines like Google not only have been obligated to adapt in the Mobile Era, but also to the demand of a generation absolutely digital that require interacting more in a visual way and in real time with different information sources.
Young people demand the ‘search by photo’, it is to say, to be able to begin a search from a picture or capture. So that, it shouldn’t be a surprise that we are witnessing the glory of ‘visual search technology’ that is materializing in different shapes and formats. Clear examples are the apps like Goggles.
The new generations claim the ‘search by photo’ but the conditions in which the users take the images are an authentic technological challenge
From a capture made with the mobile phone, Goggles tries to recognize the item (making coincide the image from the photo with its image file) and return relevant search results. When it’s about places (thanks also to the precision of the GPS) and about objects with flat surfaces (text, CD covers, bar codes, etc.) the results are surprisingly excellent.
It can be taken as a reference these examples, in which Goggles identified to perfection our logo and Taylor Swift’s signature and CD. Otherwise, Goggles, despite the evidence of the image, could not recognize the iPod Touch. And it is that the conditions in which the users take the images (light, perspective, focus, etc) are an authentic technological challenge. However, between the results we found objects ‘semantically’ similar to the image (different types of iPods), put ‘visually’ different.
Wide Eyes Technologies has bet for the specialization to contribute better results. Its fashion visual search engine allows them to identify items and find similar ones, without human interaction
Computer vision is one of the Artificial Intelligence branches that have experimented with a bigger development in the last years. Especially, the detection of objects that is one of the hottest subjects. In this field we find Wide Eyes Technologies, other of the companies that like Google is working to expand its potential of visual recognition. However, Wide Eyes Technologies, has opted for the specialization, creating a visual search engine for the fashion industry. This fashion visual search allows from finding clothes garments ‘visually’ similar to the given image, to identifying the complete look of a photo giving back results with similar items. This literally would be: ‘get the look’.
So, what different is there between Google’s visual search Goggles and Wide Eyes Technologies’ visual search? Though the answer is complex here there are 5 main differences
- Goggles is capable of ‘identifying’ types of objects from an image. If it cannot identify it (Ex. iPod), it gives back results of images ‘semantically’ similar. While Wide Eyes Technologies’ visual search engine is capable of ‘finding’ garments ‘visually’ similar.
- Google’s Goggles is capable of detecting texts and bar codes/QR. While Wide Eyes works exclusively with images in a fashion context.
- Both visual search engines are capable of describing automatically a fashion garment with keywords. The difference is in the interpretation of the images. The keywords estimated by Wide Eyes technologies are of descriptive nature (Ex: silver, high heel, high toe) while Google opts by more conceptual keywords (Ex: wedding, bridal, shoes).
Probably, if a ‘semantic’ search engine to find those shoes, we would use the description that Wide Eyes Technologies has offered automatically. According to recent investigations, with the help of semantic information, the visual search engine’s performance has risen considerably.
- Goggles (and also Google Images) offer excellent results as long as the images are indexed in their data base. However, Wide Eye’s visual search engine goes further from it because it is capable of offering items similar from captures made from mobile phones (not indexed photos). We have to take into account that the data base of fashion products is constituted by ‘perfect’ images (from catalogs), it is to say, photos taken in ideal conditions (good illumination, white uniform background, etc.). While the images to search by the ‘fashion search engine’ are photos taken by different camera devices (different types of illumination, perspectives, focus, etc.). So, finding images visually similar in this context so different is a total challenge!
- At last, Google’s visual search engine is generic/universal, it is not specialized, while Wide Eyes Technologies’ counts with a specific visual search engine for the fashion area. This means that the characteristic’s vectors used by Google to represent the images are very generic. While the vectors used by Wide Eyes are more specialized for the fashion area allowing better results as long as it is inside the fashion context. Otherwise, Google offers better results if we get in the universal visual search engine.
We are imagining a future (not so far away) in which the visual recognition will be as precise as the human brain, with systems capable of finding in an image a determined object, without human interaction. This is now possible, now we just need to tinge so the results ‘always’ satisfy the user.