Automatic Phonetic Segmentation of the Yuhmu Language Using Mel Scale Spectral Parameters

Eric Ramos-Aguilar; J. Arturo Olvera-Lopez; Ivan Olmos-Pineda; Ricardo Ramos-Aguilar

Authors

Eric Ramos-Aguilar Benemérita Universidad Autónoma de Puebla https://orcid.org/0000-0002-3116-0803
J. Arturo Olvera-Lopez Benemérita Universidad Autónoma de Puebla https://orcid.org/0000-0003-0639-1463
Ivan Olmos-Pineda Benemérita Universidad Autónoma de Puebla https://orcid.org/0000-0003-1698-000X
Ricardo Ramos-Aguilar Instituto Politécnico Nacional https://orcid.org/0000-0002-0660-9285

Keywords:

Implicit Segmentation, Phoneme Analysis, Low-Resource Language, SER, Yuhmu Language

Abstract

The application of digital signal processing techniques and machine learning, along with implicit segmentation, poses a challenge in the study of phonetic segmentation of indigenous languages in Mexico, given their linguistic and phonetic diversity. The analysis of Mel-scaled spectrograms offers an effective approach to identify patterns that can outline relevant information. By comparing the results with the actual number of phonemes in a word, both successes and areas for improvement can be observed. This article proposes a methodology for automatic segmental analysis of the Yuhmu language, considering parameter search in the Mel scale and implementing the cosine distance between spectrogram vectors. Additionally, relevant data within the resulting matrices are taken into account based on four key thresholds in information selection. The analysis yields a Segment Error Rate (SER) ranging from 38.79% to 41.35%, which aligns with the results reported in the literature on the subject.

Downloads

Download data is not yet available.

Author Biographies

Eric Ramos-Aguilar, Benemérita Universidad Autónoma de Puebla

Eric Ramos-Aguilar currently a Ph.D. candidate in Language and Knowledge Engineering in the Faculty of Computer Science, Autonomous University of Puebla. His primary research interests include natural language processing, pattern recognition, digital audio processing and analysis, and the study of indigenous languages of Mexico.

J. Arturo Olvera-Lopez, Benemérita Universidad Autónoma de Puebla

J. Arturo Olvera-Lopez completed a PhD degree in Computer Science from the National Institute of Astrophysics, Optics and Electronics (INAOE, Mexico). He is interested in problems related to the areas of pattern recognition, data mining, machine learning, data preprocessing, data reduction, digital image/signal processing and analysis, and biometry. He currently works at the Benemérita Universidad Autónoma de Puebla, in the Faculty of Computer Science.

Ivan Olmos-Pineda, Benemérita Universidad Autónoma de Puebla

Ivan Olmos-Pineda completed a PhD degree in Computer Science at National Institute of Astrophysic, Optic and Electronic (INAOE, Mexico). He is interested in problems related to the areas of pattern recognition, data mining, machine learning, data pre-processing, data reduction, digital image/signal processing & analysis, and biometrics.He currently works at the Benemérita Universidad Autónoma de Puebla, in the Faculty of Computer Science.

Ricardo Ramos-Aguilar, Instituto Politécnico Nacional

Ricardo Ramos-Aguilar completed a PhD degree in Language and Knowledge Engineering from the Faculty of Computer Sciences at the Autonomous University of Puebla. His main research interests include natural language processing, pattern recognition, and computer vision. He currently works at the National Polytechnic Institute in the fields of Artificial Intelligence and Data Science.

References

ChatGPT Plus

Europa Press, Los idiomas, en cifras: ¿cuántas lenguas hay en el mundo?, s.f., https://www.europapress.es/sociedad/noticia-idiomas-cifras-cuantas-lenguas-hay-mundo-20190221115202.html, Accedido: 01-octubre-2024.

F. Lasi, A study on the ability of supra-segmental and segmental aspects in English pronunciation, Ethical Lingua: Journal of Language Teaching and Literature, vol. 7, no. 2, pp. 426–437, 2020. DOI:10.30605/25409190.222

M. A. Hedderich, L. Lange, H. Adel, J. Strötgen, and D. Klakow, ”A survey on recent approaches for natural language processing in low-resource scenarios,” Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2021, pp. 2545–2568, Jun. 2021. [Online]. Available: https://aclanthology.org/2021.naacl-main.201/. DOI: 10.18653/v1/2021.naacl-main.201

Secretaría de Cultura, México es uno de los países con mayor diversidad lingüística en el mundo, s.f., https://www.gob.mx/cultura/prensa/mexico-es-uno-de-los-paises-con-mayor-diversidad-linguistica-en-el-mundo#:∼:text=Las%20lenguas%20que%20m%C3%A1s%20se,m%C3%A1s%20de%20200%20mil%20hablantes, Accedido: 01-octubre-2024.

P. M. Rogerson-Revell, Computer-assisted pronunciation training (CAPT): Current issues and future directions, Relc Journal, vol. 52, no. 1, pp. 189–205, 2021. DOI:10.1177/0033688220977406

M. Javed, M. M. A. Baig, and S. A. Qazi, Unsupervised phonetic segmentation of classical Arabic speech using forward and inverse characteristics of the vocal tract, Arabian Journal for Science and Engineering, vol. 45, pp. 1581–1597, 2020. DOI:10.1007/s13369-019-04065-5

K. Penner, Prosodic structure in Ixtayutla Mixtec: Evidence for the foot, 2019. DOI:10.13140/RG.2.2.28786.96965

R. Turnbull, The phonetics and phonology of lexical prosody in San Jerónimo Acazulco Otomí, Journal of the International Phonetic Association, vol. 47, no. 3, pp. 251–282, 2017. DOI:10.1017/S0025100316000384

E. P. Velásquez Upegui, Entonación del español en contacto con el otomí de San Ildefonso Tultepec: Enunciados declarativos e interrogativos absolutos, Anuario de Letras. Lingüística y Filología, vol. 8, no. 2, pp. 143–168, 2020. DOI: https://doi.org/10.19130/iifl.adel.2020.24875

A. S. Sahagún, Spanish VOT Production by L1 Nahuat Speakers, PhD thesis, University of Saskatchewan, 2021.

S. Brognaux and T. Drugman, Hmm-based speech segmentation: Improvements of fully automatic approaches, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 1, pp. 5–15, 2015. DOI: 10.1109/TASLP.2015.2456421

M. Aissiou, A genetic model for acoustic and phonetic decoding of standard Arabic vowels in continuous speech, International Journal of Speech Technology, vol. 23, no. 2, pp. 425–434, 2020. DOI:10.1007/s10772-020-09694-y

W. Peng, Y. Gao, B. Lin, and J. Zhang, A practical way to improve automatic phonetic segmentation performance, in 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 1–5, 2021, IEEE. DOI:10.1109/ISCSLP49672.2021.9362107

S. Cuervo, M. Grabias, J. Chorowski, G. Ciesielski, A. Łańczucki, P. Rychlikowski, and R. Marxer, Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3189–3193, 2022, IEEE. DOI: 10.1109/ICASSP43922.2022.9746102

A. B. Ibrahim, Y. M. Seddiq, A. H. Meftah, M. Alghamdi, S. A. Selouani, M. A. Qamhan, Y. A. Alotaibi, and S. A. Alshebeili, Optimizing Arabic speech distinctive phonetic features and phoneme recognition using genetic algorithm, IEEE Access, vol. 8, pp. 200395–200411, 2020. DOI:10.1109/ACCESS.2020.3034762

F. Kreuk, Y. Sheena, J. Keshet, and Y. Adi, Phoneme boundary detection using learnable segmental features, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8089–8093, 2020. DOI: https://doi.org/10.48550/arXiv.2002.04992

I. Al-Hassani, O. Al-Dakkak, and A. Assami, Phonetic segmentation using a wavelet-based speech cepstral features and sparse representation classifier, Journal of Telecommunications and Information Technology, no. 4, pp. 12–22, 2021. DOI: https://doi.org/10.26636/jtit.2021.153321

Z. Kh. Abdul and A. K. Al-Talabani, “Mel frequency cepstral coefficient and its applications: A review,” IEEE Access, vol. 10, pp. 122136–122158, 2022. DOI: https://doi.org/10.1109/ACCESS.2022.3223444

P. Bhagath and P. K. Das, Quadrilaterals based phoneme segmentation technique for low resource spoken languages, in TENCON 2022-2022 IEEE Region 10 Conference (TENCON), pp. 1–6, 2022. DOI:10.1109/TENCON55691.2022.9977455

R. Alarcon Montero, Manual para la escritura de los sonidos del Yuhmu, INAH, 2023.

M. V. Belodedov, R. V. Fonkants, and R. R. Safin, “Development of an algorithm for optimal encoding of WAV files using genetic algorithms,” in 2023 5th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE), vol. 5, pp. 1–6, 2023. DOI: https://doi.org/10.1109/REEPE57272.2023.10086837

R. Liu, J. Zhang, and G. Gao, “Multi-space channel representation learning for mono-to-binaural conversion based audio deepfake detection,” Information Fusion, vol. 105, p. 102257, 2024. DOI: https://doi.org/10.1016/j.inffus.2024.102257

H. M. Fayek, “Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between,” 2016. [Online]. Available: https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

J. Kim, Iterated grid search algorithm on unimodal criteria, Virginia Polytechnic Institute and State University, 1997.

S. Pangaonkar and A. Panat, A review of various techniques related to feature extraction and classification for speech signal analysis, in ICDSMLA 2019: Proceedings of the 1st International Conference on Data Science, Machine Learning and Applications, pp. 534–549, Springer Singapore, Singapore, 2020. DOI:10.1007/978-981-15-1420-3_57

E. Ramos-Aguilar, J. A. Olvera-López, and I. Olmos-Pineda, A general overview of language pronunciation analysis based on machine learning, Research in Computing Science, vol. 152(10), 2023.

X. Li, S. Dalmia, J. Li, M. Lee, P. Littell, J. Yao, A. Anastasopoulos, D. R. Mortensen, G. Neubig, A. W. Black, and others, Universal phone recognition with a multilingual allophone system, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8249–8253, 2020. DOI: https://doi.org/10.48550/arXiv.2002.11800

M. Buisson, B. McFee, S. Essid, and H. C. Crayencour, Self-supervised learning of multi-level audio representations for music segmentation, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2141–2152, 2024. DOI: https://doi.org/10.1109/taslp.2024.3379894

Automatic Phonetic Segmentation of the Yuhmu Language Using Mel Scale Spectral Parameters

Authors

Keywords:

Abstract

Downloads

Author Biographies

Eric Ramos-Aguilar, Benemérita Universidad Autónoma de Puebla

J. Arturo Olvera-Lopez, Benemérita Universidad Autónoma de Puebla

Ivan Olmos-Pineda, Benemérita Universidad Autónoma de Puebla

Ricardo Ramos-Aguilar, Instituto Politécnico Nacional

References

Downloads

Published

How to Cite

Issue

Section

Make a Submission

Information