Abstract
Many language classification systems rely on language models that use machine learning approachesand utilise rather long recording periods to achieve satisfactory accuracy. This paper aims to extract informationfrom short recording intervals that are convenient to classify the spoken languages under test successfully. Theclassification is based on frames of (2–18) seconds, whereas most of the previous language classification systemsare based on much longer time frames (from 3 seconds to 2 minutes). This paper defines and implements manylow-level features using Mel-frequency cepstral coefficients, containing speech files in five languages (English,French, German, Italian and Spanish), and voxforge.org, an open source that consists submitted audio clips in variouslanguages, is the source of data used.This paper applies a convolutional neural network algorithm for classification, and the result is perfect. Binarylanguage classification has an accuracy of 100%, and five-language classification with six languages has an accuracyof 99.8%.
Recommended Citation
Rammo, Fawziya M. and Al-Hamdani, Mohammed N.
(2022)
"Detecting the Speaker Language Using CNN Deep LearningAlgorithm,"
Iraqi Journal for Computer Science and Mathematics: Vol. 3:
Iss.
1, Article 5.
DOI: https://doi.org/10.52866/ijcsm.2022.01.01.005
Available at:
https://ijcsm.researchcommons.org/ijcsm/vol3/iss1/5