•  
  •  
 

Abstract

Hate speech on social media poses significant societal challenges, necessitating accurate and context-sensitive automated detection. Traditional machine learning (ML) models typically rely on lexical or superficial features, limiting their ability to capture nuanced or contextually ambiguous expressions of hate speech. Recent transformer-based methods (e.g., RoBERTa) provide improved contextual understanding but often lack explicit mechanisms guiding the model’s attention to critical semantic tokens, thereby reducing interpretability and sensitivity to nuanced linguistic contexts. This paper introduces a novel contextual attention-guided transformer model that explicitly incorporates lexicon-guided attention supervision into RoBERTa fine-tuning, significantly enhancing semantic precision in hate speech detection on Twitter. Evaluations on a publicly available Twitter hate speech dataset demonstrate that our proposed model achieves 98.2% accuracy, substantially outperforming classical ML baselines (e.g., Logistic Regression, SVM, Random Forest; best baseline ∼95%) with a notable increase in macro-F1 (from ∼0.88 to ∼0.95), particularly improving precision and recall for the minority hate speech class (F1 increasing from ∼0.74 to ∼0.85). We provide interpretability analyses using attention visualization and LIME explanations, offering transparency into model decisions, an essential feature for real-world deployment.

Share

COinS