RECOGNIZING AND UNDERSTANDING AMERICAN SIGN LANGUAGE USING DEEP LEARNING AND MEDIAPIPE
DOI:
https://doi.org/10.18173/2354-1059.2026-0006Keywords:
American Sign Language, convolutional neural network, recurrent neural network, long short-term memory, MediaPipeAbstract
Sign language serves as the primary medium of expression serves as the primary medium of expression for deaf and hard-of-hearing individuals. However, interpersonal interaction remains challenging, as they primarily rely on sign language to express their thoughts. To address this issue, this study proposes an automated sign language recognition and interpretation framework, integrating static and dynamic recognition components. Specifically, a Convolutional Neural Network (CNN) is employed for static gesture classification, while a hybrid CNN- Long Short-Term Memory (CNN-LSTM) architecture is utilized to capture the spatiotemporal features of dynamic signs. Furthermore, MediaPipe is leveraged for robust landmark localization to enhance feature extraction. The American Sign Language (ASL) dataset used in this research ensures diversity in sign representation, including variations in hand shapes, positions, and movements. The proposed models achieved high accuracy, with the CNN model reaching 93.1% and the CNN-LSTM model achieving 94.1% on test datasets, confirming their effectiveness in ASL recognition tasks.
References
[1] Zambian Ministry of Health & Hapunda R, (2024). Addressing the rising prevalence of hearing loss. World Health Organization.
[2] Alyami S, Luqman H & Hammoudeh M, (2024). Reviewing 25 years of continuous sign language recognition research: Advances, challenges, and prospects. Information Processing & Management, 61, 103774. DOI: 10.1016/j.ipm.2024.103774.
[3] Bantupalli K, Xie Y, (2018). American Sign Language Recognition using Deep Learning and Computer Vision. IEEE International Conference on Big Data (Big Data). IEEE, Seattle, WA, USA, 2018, 4896-4899. DOI: 10.1109/BigData.2018.8622141.
[4] Paul SK, Walid MAA, Paul RR, Uddin MJ, Rana MS, Devnath MK, Dipu IR & Haque MM, (2024). An Adam based CNN and LSTM approach for sign language recognition in real time for deaf people. Bulletin of Electrical Engineering and Informatics, 13(1), 499-509. DOI: 10.11591/eei.v13i1.6059.
[5] Ahmed MA, Zaidan BB, Zaidan AA, Salih MM & Lakulu MMB, (2018). A review on systems-based sensory gloves for sign language recognition: State of the art between 2007 and 2017. Sensors. Basel, Switzerland, 18(7), 2208. DOI: 10.3390/e20110809.
[6] Zhang T & Xie L, (2016). Continuous sign language recognition based on 3D hand and body pose data. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 4606-4614.
[7] Li Y & Zhao M, (2015). A study of continuous sign language recognition using joint feature fusion. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 4241-4249.
[8] Ahmed K, Ahmed EAE, Omar A & Arif Y, (2022). DeepASLR: A CNN based human computer interface for American Sign Language recognition for hearing-impaired individuals. Computer Methods and Programs in Biomedicine Update (Vol. 2). DOI: 10.1016/j.cmpbup.2021.100048.
[9] Gupta A, Sawan A, Singh S & Kumari S, (2024). Dynamic Sign Language Recognition with Hybrid CNN-LSTM and 1D Convolutional Layers. 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO). India, 1-6. DOI: 10.1109/ICRITO61523.2024.10522339.
[10] Sign All Engineering Team, (2021). SignAll SDK: Sign language interface using Mediapipe is now available for developers. Google Developers Blog. https://developers.googleblog.com/en/signall-sdk-sign-language-interface-usingmediapipe-is-now-available-for-developers/
[11] Lugaresi C, Tang J, Nash H, McClanahan C, Uboweja E, Hays M, Zhang F, Chang CL, Yong MG, Lee J, Chang WT, Hua W, Georg M & Grundmann M, (2019). MediaPipe: A Framework for Building Perception Pipelines. arXiv preprint arXiv: 1906.08172.
[12] Simonyan K & Zisserman A, (2014). Very deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv: 1409.1556.
[13] Hefron RG, Borghetti BJ, Christensen JC & Kabban CM, (2017). Deep long short-term memory structures model temporal dependencies improving cognitive workload estimation. Pattern Recognition Letters, 94, 96-104. DOI: 10.1016/j.patrec.2017.05.020.
