5:00 PM - 5:20 PM
[1U5-IS-2b-01] Comparing Feature Extraction Methods for Sarcasm Detection in Twitter
[[Online, Regular]]
Keywords:sarcasm detection, feature extraction, machine learning
Sarcasm detection is a challenging task, which identifies expressions that have the opposite meaning of what is written. Most previous works only measure sentiment polarity in sentences. However, more features are needed for improving the result. In this paper, we intend to compare different feature extraction methods including n-gram, sentiment, punctuation, and part of speech features for sarcasm detection. Firstly, sarcastic data are collected using Twitter API, and preprocessed by removing all the hashtags, mentions and URLs. Then, after all features were extracted, they are combined by One Hot Encoding. Finally, we use two classification methods: Support Vector Machine and Logistic Regression for comparison. In our experimental results, n-gram feature gives the best performance compared to the other individual features. Support Vector Machine gives a better performance than logistic regression with an F1-measure of 79.64%. This shows the potential of combining different features for sarcasm detection.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.