Automated Lip-Reading Recognition Using EfficientNet Architectures

Show simple item record

dc.contributor.author Wijerathne, W.S.H.L.
dc.contributor.author Hameed, P.N.
dc.contributor.author Herath, D.
dc.date.accessioned 2023-02-10T08:55:07Z
dc.date.available 2023-02-10T08:55:07Z
dc.date.issued 2023-01-18
dc.identifier.issn 1391-8796
dc.identifier.uri http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/11017
dc.description.abstract Lip-reading recognition focuses on recognizing words spoken by a talking face by only utilizing a video without audio. Lip-reading recognition can help people with difficulties such as communication difficulties caused by removing the larynx in the total laryngectomy. It is hard to recognize the lip reads manually from humans. Therefore, it is necessary to build lip-reading recognition models with good accuracy and efficiency that results in the development of good practical applications. Lip-reading recognition involves preprocessing steps such as face recognition, facial landmark detection and image preprocessing, followed by mouth Region of Interest (ROI) extraction. Currently, these preprocessing techniques are much improved in efficiency and accuracy. Therefore, most recent works are focused on improving the performance by developing the optimal architecture. In this paper, we propose new lip-reading recognition model using Temporal Convolutional Networks (TCN) for classification and utilizing different EfficientNet architectures for feature extraction. First, we developed our base model for lip-reading recognition with EfficientNet-B0 and TCN. Secondly, we obtained the performance of the developed lip-reading recognition models by replacing EfficientNet-B0 with the scaled versions of the family of EfficientNets, EfficientNet-B1 to B6. All the models were trained for 80 epochs and Adam optimizer was used with a batch size of 32. We compared the performance of models when using different variants of EfficientNets. The results demonstrate that lip-reading recognition can be improved when TCN is combined with EfficientNet-B1, B2 and B3 architectures where the accuracies are 83.8%, 81.8%, and 84.32%, respectively. en_US
dc.language.iso en en_US
dc.publisher Faculty of Science, University of Ruhuna, Matara, Sri Lanka en_US
dc.subject Temporal Convolutional Networks (TCN) en_US
dc.subject Neural Networks en_US
dc.subject Deep Learning en_US
dc.subject Visual Speech recognition en_US
dc.title Automated Lip-Reading Recognition Using EfficientNet Architectures en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account