[3Xin2-39] Speaker Estimation Using Voice Recognition for Specific Fraud
Keywords:Specific Fraud, Speech Recognition
Specific fraud primarily targets the elderly, and its impact is increasingly on the rise. According to the Tokyo Metropolitan Police Department, consulting a third party is recommended; however, it is anticipated that personal handling can be challenging due to psychological distress. Therefore, to convey information about whom the victim was speaking with to a third party, this study analyzed 66 voice files of Specific fraud released by the police to verify the accuracy of predicting the speaker's profession or relationship. The method used involved transcribing the voice files using two models of Wisper, creating word groups from morphological analysis, and employing ChatGPT for speaker prediction. The results showed that the Large model achieved a speaker prediction accuracy of 75.8%. In contrast, the speaker prediction accuracy of the Tiny model was relatively low at 30.3%. This difference in accuracy indicates that the precision of transcription impacts prediction accuracy, highlighting the optimization of sound quality as a future challenge. In conclusion, this study demonstrated the potential for a technical approach in combating Specific fraud, providing a foundation for future research.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.