Reconstructing fundamental frequency from noisy speech using initialized autoencoders
Keywords:Deep Learning, fundamental frequency, Neural-Network, Neural Network, lstm
In this paper, we present a new approach for fundamental frequency (f0) detection in noisy speech, based on Long Short-term Memory Neural Networks (LSTM). f0 is one of the most important parameters of human speech. Its detection is relevant in many speech signal processing areas and remains an important challenge for severely degraded signals.
In previous references for $f_0$ detection in speech enhancement and noise reduction tasks, LSTM has been initialized with random weights, following a back-propa\-gation through time algorithm to adjust them. Our proposal is an alternative for a more efficient initialization, based on a supervised training of an Auto-associative network. This initialization is a better starting point for the f0 detection in noisy speech.
We show the advantages of pre-training using objective measures for the parameter and the training process, with artificial and natural noise added at different signal-to-noise levels. Results show the performance of the LSTM increases in comparison to the random initialization, and represents a significant improvement in comparison with traditional initialization of neural networks for f0 detection in noisy conditions.