[3Win5-97] Investigation of Annotation Cost Reduction in Temporal Action Segmentation for Assembly Tasks Using Text Information
Keywords:Temporal Action Segmentation, CLIP, Deep learning
In assembly manufacturing lines, work analysis is required to understand and improve productivity. This involves measuring work time and checking the accuracy of work procedures. Traditionally, these measurements have been done manually through visual observation, which requires significant effort and makes automation of measurement and analysis challenging. Recently, research on Temporal Action Segmentation, which segments work videos into basic action units, has been actively conducted, and many deep learning models have been proposed. However, these models require high-cost annotations with action labels for each frame, posing a challenge for practical implementation. In this study, we examined the performance of action segmentation in assembly processes using a CLIP-based model called Caseg, which can input action labels as linguistic information. By providing action labels extracted from assembly standards as prompts, we achieved a certain level of accuracy in inference. This suggests the potential to contribute to the practical implementation of action segmentation while reducing annotation costs, and we report these findings.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.