Deep Learning Convolutional Network for Bimodal Biometric Recognition with Information Fusion at Feature Level
Keywords:
Multimodal biometrics, Deep Learning, Speaker recognition, Face recognitionAbstract
Biometric recognition has been an extensively researched field in recent years due to the growth of its applications in daily activities. State of the art work in biometrics proposes the implementation of multimodal systems that employ one or more traits to increase the security of the system since it is more difficult for an impostor to acquire, falsify or forge multiple samples of different traits from an enrolled user. In this paper, we propose the implementation of a Deep Learning bimodal network that combines voice and face modalities. Voice feature extraction was done with a SincNet arquitecture and face image features were extracted with a set of convolutional layers. The feature vectors of both modalities are combined within the network with two methods: averaging or concatenation. The averaged/concatenated vector is further processed with a fully connected layer to output a bimodal vector that contains discriminatory information of an individual. The bimodal vector is used with a fully connected layer with the softmax function to perform the identification task. The verification task is performed by matching the bimodal vector with a template to obtain a score that must be used to either accept or reject an user’s identity. We compared the results yielded by both fusion methods implemented in our proposed network for both recognition tasks. Both methods achieved an accuracy as high as 99 % in the identification task and an Equal Error Rate (EER) as low as 0.14 % for verification. These results were obtained by combining BIOMEX-DB and VidTimit databases.
Downloads
References
T. Sabhanayagam, V. P. Venkatesan, and K. Senthamaraikannan, “A comprehensive survey on various biometric systems,” nternational
Journal of Applied Engineering Research, vol. 13, no. 5, pp. 2276–2297, 2018.
S. K. S. Modak and V. K. Jha, “Multibiometric fusion strategy and its applications: a review,” Information Fusion, vol. 49, pp. 174–204, 2019.
W. Dahea and H. Fadewar, “Multimodal biometric system: A review,” International Journal of Research in Advanced Engineering and Technology, vol. 4, no. 1, pp. 25–31, 2018.
H. Mandalapu, A. R. PN, R. Ramachandra, K. S. Rao, P. Mitra, S. M. Prasanna, and C. Busch, “Audio-visual biometric recognition and
presentation attack detection: A comprehensive survey,” IEEE Access, vol. 9, pp. 37431–37455, 2021.
M. Singh, R. Singh, and A. Ross, “A comprehensive overview of biometric fusion,” Information Fusion, vol. 52, pp. 187–205, 2019.
V. Talreja, M. C. Valenti, and N. M. Nasrabadi, “Multibiometric secure system based on deep learning,” in 2017 IEEE Global conference on signal and information processing (globalSIP), pp. 298–302, IEEE, 2017.
Q. Zhang, H. Li, Z. Sun, and T. Tan, “Deep feature fusion for iris and periocular biometrics on mobile devices,” IEEE Transactions on
Information Forensics and Security, vol. 13, no. 11, pp. 2897–2912, 2018.
N. Alay and H. H. Al-Baity, “Deep learning approach for multimodal biometric recognition system based on fusion of iris, face, and finger vein traits,” Sensors, vol. 20, no. 19, p. 5523, 2020.
E. mehdi Cherrat, R. Alaoui, and H. Bouzahir, “Convolutional neural networks approach for multimodal biometric identification system using the fusion of fingerprint, finger-vein and face images,” PeerJ Computer Science, vol. 6, p. e248, 2020.
M. Leghari, S. Memon, L. D. Dhomeja, A. H. Jalbani, and A. A. Chandio, “Deep feature fusion of fingerprint and online signature for
multimodal biometrics,” Computers, vol. 10, no. 2, p. 21, 2021.
Y. Xin, L. Kong, Z. Liu, C. Wang, H. Zhu, M. Gao, C. Zhao, and X. Xu, “Multimodal feature-level fusion for biometrics identification system on iomt platform,” IEEE Access, vol. 6, pp. 21418–21426, 2018.
O. Olazabal, M. Gofman, Y. Bai, Y. Choi, N. Sandico, S. Mitra, and K. Pham, “Multimodal biometrics for enhanced iot security,” in
IEEE 9th annual computing and communication workshop and conference (CCWC), pp. 0886–0893, IEEE, 2019.
M. Ravanelli and Y. Bengio, “Speaker recognition from raw waveform with sincnet,” in 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1021–1028, IEEE, 2018.
A. Mahmood and K. Utku, “Speech recognition based on convolutional neural networks and mfcc algorithm,” Advances in Artificial Intelligence Research, vol. 1, no. 1, pp. 6–12, 2021.
A. K. Dubey and V. Jain, “Comparative study of convolution neural network’s relu and leaky-relu activation functions,” in Applications of Computing, Automation and Wireless Systems in Electrical Engineering, pp. 873–880, Springer, 2019.
J. C. Moreno-Rodriguez, J. C. Atenco-Vazquez, J. M. Ramirez-Cortes, R. Arechiga-Martinez, P. Gomez-Gil, and R. Fonseca-Delgado,
“Biomex-db: A cognitive audiovisual dataset for unimodal and multimodal biometric systems,” IEEE Access, vol. 9, pp. 111267–111276,
C. Sanderson and B. C. Lovell, “Multi-region probabilistic histograms for robust and scalable identity inference,” in International conference on biometrics, pp. 199–208, Springer, 2009.
G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
M. Wang, Z. Wang, and J. Li, “Deep convolutional neural network applies to face recognition in small and medium databases,” in 2017 4th International Conference on Systems and Informatics (ICSAI), pp. 1368–1372, IEEE, 2017.
P. Ke, M. Cai, H. Wang, and J. Chen, “A novel face recognition algorithm based on the combination of lbp and cnn,” in 2018 14th IEEE
International Conference on Signal Processing (ICSP), pp. 539–543, IEEE, 2018.
Q. Xu and N. Zhao, “A facial expression recognition algorithm based on cnn and lbp feature,” in 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), vol. 1, pp. 2304–2308, IEEE, 2020.
D. Snyder, G. Chen, and D. Povey, “MUSAN: A Music, Speech, and Noise Corpus,” 2015. arXiv:1510.08484v1.
A. B. Jung, K. Wada, J. Crall, S. Tanaka, J. Graving, C. Reinders, S. Yadav, J. Banerjee, G. Vecsei, A. Kraft, Z. Rui, J. Borovec, C. Vallentin,
S. Zhydenko, K. Pfeiffer, B. Cook, I. Fernández, F.-M. De Rainville, C.H. Weng, A. Ayala-Acevedo, R. Meudec, M. Laporte, et al., “imgaug.”
https://github.com/aleju/imgaug, 2020. Online; accessed 01-Feb-2020.
J.-M. Cheng and H.-C. Wang, “A method of estimating the equal error rate for automatic speaker verification,” in 2004 International
Symposium on Chinese Spoken Language Processing, pp. 285–288, IEEE, 2004.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
I. Aliyu, M. A. Bomoi, and M. Maishanu, “A comparative study of eigenface and fisherface algorithms based on opencv and sci-kit libraries implementations.,” International Journal of Information Engineering &
Electronic Business, vol. 14, no. 3, 2022.
D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, “X-vectors: Robust dnn embeddings for speaker recognition,” in 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp. 5329–5333, IEEE, 2018.