Visual Speech recognition for Sinhala language using CNN

Jayarathne, W.M.U.; Perera, W.A.S.C.; Ketheesan, T.

IRUOR Home
→
Scholarly Publications
→
Conference and Symposia Proceedings
→
Ruhuna International Science and Technology Conference
→
RISTCON 2021
→
View Item

dc.contributor.author	Jayarathne, W.M.U.
dc.contributor.author	Perera, W.A.S.C.
dc.contributor.author	Ketheesan, T.
dc.date.accessioned	2021-12-13T04:06:38Z
dc.date.available	2021-12-13T04:06:38Z
dc.date.issued	2021-02-17
dc.identifier.issn	1391-8796
dc.identifier.uri	http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/4606
dc.description.abstract	Visual Speech Recognition (VSR) is an essential tool that is facilitating to understand the speech from the video by the visually impaired people. Moreover, VSR play an important role in analyzing the CCTV footage for a crime investigation where the audio is not available. On the other hand, VSR system for Sinhala language still under research not explored largely. Hence in this research, a preliminary research work is carried out to understand the suitability of convolutional neural network (CNN) to recognize the Sinhala character from the image which contain the mouth region. The proposed methodology train the CNN with the help of lip pose features and corresponding character label. The architecture of the CNN employees’ three convolution layers, two fully connected layers and one max pool layer. There is no data set available publicly for Sinhala language visual speech recognition and for the evaluation of the system, own data set was created for five Sinhala characters that has phonetics sound a, e, i, l, m. The data set was augmented to increase the feature domain and the outliers are removed to overcome the ambiguity. The system was trained with fifteen images and tested with ten images, those are containing the lip pose when pronounce five sounds. For the evaluation purpose the confusion matrix is analyzed and the accuracy was determined by the F_1score. The F_1score is calculated using the precision and recall and found 0.83, it means that the proposed methodology performs well.	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Science, University of Ruhuna, Matara, Sri Lanka	en_US
dc.subject	CNN	en_US
dc.subject	Sinhala	en_US
dc.subject	Character	en_US
dc.subject	Visual	en_US
dc.title	Visual Speech recognition for Sinhala language using CNN	en_US
dc.type	Article	en_US