A Data-Centric Approach for Portuguese Speech Recognition: Language Model And Its Implications



automatic speech recognition, language model, brazilian portuguese, wav2vec2, KenLM


Recent advances in Automatic Speech Recognition have made it possible to achieve a quality never seen before in the literature, both for languages with abundant data, such as English, which has a large number of studies and for the Portuguese language, which has a more limited amount of resources and studies. The most recent advances address speech recognition problems with Transformers based models, which have the capability to perform the speech recognition task directly from the raw signal, without the need for manual feature extraction. Some studies have already shown that it is possible to further improve the quality of the transcription of these models using language models within the decoding stage, however, the real impact of such language models is still not clear, especially for the Brazilian Portuguese scenario. Also, it is known that the quality of the data used for training the models is of paramount importance, however, there are few works in the literature addressing this issue. This work explores the impact of language models applied to Portuguese speech recognition both in terms of data quality and computational performance, with a data-centric approach. We propose an approach to measure similarity between datasets and, thus, assist in decision-making during training. The approach indicates paths for the advancement of the state-of-the-art aiming at Portuguese speech recognition, showing that it is possible to reduce the size of the language model by 80% and still achieve error rates around 7.17% for the Common Voice dataset. The source code is available at https://github.com/joaoalvarenga/language-model-evaluation.


Download data is not yet available.

Author Biographies

João Paulo Reis Alvarenga, Universidade Federal de Ouro Preto

João Alvarenga holds a bachelor's in Computer Science from the Federal University of Ouro Preto (UFOP), in 2019. He is currently Team Lead and Senior Machine Learning Engineer at Stilingue Inteligência Artificial and a master's student in the Graduate Program in Computer Science at UFOP. His research interests include deep learning, natural language processing, and speech recognition.

Luiz Henrique de Campos Merschmann, Universidade Federal de Lavras

Luiz H.C. Merschmann is Professor in the Department of Applied Computing at Federal University of Lavras, Brazil. He received the BSc degree in Mining Engineering from Federal University of Ouro Preto, Brazil, MSc degree in Production Engineering from Federal University of Rio de Janeiro, Brazil, and PhD degree in Computer Science from Fluminense Federal University, Brazil. In 2012, he carried out postdoctoral research at University of Kent, UK. He has published several peer reviewed papers in journals and conference proceedings. His research interests include data mining, machine learning, artificial intelligence and natural language processing.

Eduardo José da Silva Luz, Universidade Federal de Ouro Preto

Eduardo Luz holds a bachelor's degree in Electrical Engineering from the Federal University of Minas Gerais (2005), and a Ph.D. in Computer Science from the Federal University of Ouro Preto (2019). He is an Adjunct Professor at the Department of Computing (DECOM) at the Federal University of Ouro Preto and a permanent member of the Graduate Program in Computer Science. His research interests include pattern recognition, machine learning, computer vision, and embedded systems.


How to Cite

Alvarenga, J. P. R. ., Merschmann, L. H. de C., & Luz, E. J. da S. (2023). A Data-Centric Approach for Portuguese Speech Recognition: Language Model And Its Implications. IEEE Latin America Transactions, 21(4), 546–556. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/7464

