Table of contents
Table of contents
AI has become a powerful force in computer vision and it has unleashed tangible business opportunities for 2D visual data such as images and videos. Applying AI can bring tremendous results in a number of fields. To learn more about this exciting area, read our overview of 2D computer vision algorithms and applications.
Despite its popularity, there is nothing inherent to 2D imagery that makes it uniquely suitable for AI application. In fact, artificial intelligence systems can analyze various forms of information, including volumetric data. In spite of the increasing number of companies already using 3D data gathered by lidar or 3D cameras, AI applications aren’t the mainstream in their industries.
In this post, we describe how to leverage 3D data across multiple industries with the use of AI. Later in the article we’ll have a closer look at the nuts and bolts of the technology and we’ll aslo show what it takes to apply AI to 3D data. At the end of the post, you’ll also find an interactive demo to play with.
In the 3D world, there is no Swiss Army Knife
3D data is what we call volumetric information. The most common types include:- 2.5D data, including information on depth or the distance to visible objects, but no volumetric information of what’s hidden behind them. Lidar data is an example.
- 3D data, with full volumetric information. Examples include MRI scans or objects rendered with computer graphics.
- 4D data, where volumetric information is captured as a sequence, and the outcome is a recording where one can go back and forth in time to see the changes occurring in the volume. We refer to this as 3D + time, which we can treat as the 4th dimension. Such representation enables us to visualize and model dynamic 3D processes, which is especially useful in medical applications such as respiratory or cardiac monitoring.
Let us have a closer look at a few examples
1. Autonomous driving
- Task: 3D object detection and classification,
- Data: 2.5 Point clouds captured with a lidar: sparse data, big distances between points
- the distances between objects in outdoor environments are significant
- In the majority of cases lidar rays from the front and rear of the car don’t return to lidar, since there are no objects to reflect them.
- The resolution of objects gets worse the further they are from the laser scanner. Due to the angular expansion of the beam it’s impossible to determine the precise shape of objects that are far away.
Source of image: From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network
2. Indoor scene mapping
- Task: Object instance segmentation
- Data: Point clouds, sparse data, relatively small distances between points
3. Medical diagnosis
- Task: 3D Semantic segmentation
- Data: Stacked 2D images, dense data, small distance between images
4. A 3D-enhanced 2D approach
There is also another case, where luckily, it can be relatively straightforward to apply expertise and technology developed for 2D cases in 3D applications. One such scenario is where there are 2D labels available, but the data and the inference products are in 3D. Another is when 3D information can play a supportive role. In such a case, a depth map produced by 3D cameras can be treated as an additional image channel beyond regular RGB colors. Such additional information increases the sensitivity of neural networks to edge detection and thus yield better object boundaries. Examples of the projects we have delivered in such a setup include:- Defect detection based on 2D and 3D images.
- Object detection in a factory