Developing a method for efficient object recognition in 360° images
The company Frameless generates interactive virtual walkthroughs on the basis of 360 degree images. Using smartphones or head mounted displays, users can freely move in these worlds, embed media content and interact with shared items. An important pre-requisite for advanced services, such as searching and annotating the content in these virtual walkthroughs is the detection and recognition of objects and potentially higher-level (scene-related) concepts in these immersive environments. Therefore, advanced methods for object detection and scene understanding are required that can be applied directly in the walkthroughs.
Distortion in 360° images
Current Artificial Intelligence (AI) methods for object recognition work solely on undistorted images. 360° images, however, introduce significant image distortions. This is especially true along the vertical axis because the spherical geometry cannot be mapped to a flat plane without distortions. This raises a number of challenges for object recognition: Objects experience strong non-linear distortions dependent on where in the panorama they are located. These distortions strongly impede object detectors.
Efficient object recognition in 360° data
The overall aim of the project is to develop an algorithm for efficient object recognition in 360° data that enables to detect objects directly in the distorted images. The specific aims include:
• Establishing a representative collection of 360° images as a testbed for object recognition
• Evaluating the current state-of-the-art in object recognition on 360° content
• Developing concepts for making state-of-the-art object recognizers compatible with non-linear distortions, minimising the amount of necessary adaptations to further enable seamless integration of future recognition methods
• Proposal and prototypical implementation of a first workflow for object recognition in 360° images
Several approaches have been proposed to make object recognition compatible with distortions in panoramic images, each with its own strengths and weaknesses. Therefore, in this project a new direction is chosen. The aim is to develop powerful data augmentation methods to integrate non-linear distortions directly through the training samples. For this purpose, existing annotated datasets (e.g. Pascal VOC, ILSVRC) can be re-used. This strategy makes the approach compatible with arbitrary neural networks and applicable even if no annotated 360°content is available.
The developed methods represent initial solutions that demonstrate how to best adapt existing AI solutions to 360° content with minimum effort. The developed approaches are therefore a starting point for Frameless to build a powerful AI backend. This provides a new and strong USP and allows them to offer their customers new data management services beyond the current state of the art. The developed techniques will enable searching and matching similar content across different walkthroughs and thereby create advanced recommendation services for end users.
You want to know more? Feel free to ask!
Media Computing Research Group
Institute of Creative\Media/Technologies
Department of Media and Digital Technologies
- Frameless Gmbh