4:00 PM - 4:20 PM
[2K5-IS-1b-02] Cross-Modal Fish Species Detection in Underwater Environments with Semantic Guidance
Keywords:Classification, Deep learning, Environment Recognition, GAN Model
This study addresses the challenges of small object detection by proposing a semantics-guided cross-modal method, leveraging natural language processing to assist image recognition, enabling the model to accurately locate and identify small targets. Particularly in underwater fish species recognition, factors such as lighting variations, interference from suspended particles, and high-density fish populations affect the stability of traditional methods. Therefore, this study integrates the BERT pre-trained model with the PRB-FPN-Net image recognition technique, applying Retinex and GAN-based image enhancement to improve image quality, while utilizing semantic annotation to enhance fish species identification. Experimental results demonstrate that the proposed method achieves an accuracy of 73.2% and a recall rate of 60.4%, maintaining stable detection performance across varying lighting and background conditions. In addition, we demonstrate workflow integration of a robotic fish platform for taking the underwater image data and on board inference. Future research will focus on optimizing open-vocabulary recognition and exploring the integration of acoustic data and underwater 3D sensing technologies to enhance ecological monitoring and fish behavior analysis applications.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.