Learning visual overlapping image pairs for SfM via CNN fine-tuning with photogrammetric geometry information

 

Efficient and accurate identification of visual overlapping image pairs is an ongoing challenge for large-scale Structure from Motion (SfM). Recently, CNN-based methods have demonstrated the ability to find visually similar image pairs. BoW (Bag-of-Word) or Visual Vocabulary tree (VoC) with hand-crafted or learning-based local features is yet widely embedded in 3D reconstruction tasks. To explore the corresponding differences, in this work, we fine-tuned several popular CNNs (AlexNet, VGG, ResNet) according to the regularities which are tailored for determining visual overlapping image pairs for SfM. More specifically, a new training dataset (called LOIP) consisting of regular photogrammetric images and crowdsourced images from the Internet is generated by fully considering photogrammetric requirements and 3D mesh models. The local regional overlapping information from paired images was employed in fine-tuning procedure. To aggregate feature maps from various channels, learnable multiple NetVLADs for each regional information are employed to further improve the retrieval performance. Comprehensive experiments have been conducted and the obtained results demonstrate that the image retrieval performance is improved, and the cost time of image matching is significantly reduced by applying the identifications of visual overlapping pairs. Furthermore, the SfM results are basically on par with several state-of-the-art CNN-based and VoC methods.

Graphical Abstract of proposed framework

How to cite

Hou, Q., Xia, R., Zhang, J., Feng, Y., Zhan, Z., & Wang, X. (2023). Learning visual overlapping image pairs for SfM via CNN fine-tuning with photogrammetric geometry information. Int. J. Appl. Earth Obs. Geoinformation, 116, 103162.

en_USEnglish