|Topic:||Metody identifikace mluvčího z akustického signálu|
|Department:||Analýza a interpretace biomedicínských dat|
|Supervisor:||Ing. Jan Macek Ph.D.|
|Announce as:||Diplomová práce, Bakalářská práce, Semestrální projekt|
|Description:||Introduction: This bachelor thesis gives an overview of the deep learning and machine learning techniques used for speaker identification, which is a method of biometric identification using unique auditory factors present for every person.
• Theory: Due to the varying shapes and sizes of vocal cords present in people, each person emits a unique voice. The speech recognition process relies on characteristics derived from both the spectral envelope (vocal tract characteristics) and the supra-segmental features (voice source characteristics) of speech and uses pattern recognition methods to identify a person based on prior processed data for that person respectively. Speaker identification is the process of mapping an unknown voice to a set of known speakers and, hence identifying them.
• Use cases: Speech recognition technologies can be utilized for various purposes such as, but not limited to, biometric identification, voice dialing, banking securely over the phone, security verification for accessing confidential data, reservation and informational services, and other applications where auditory biometric information or verification is required.
• Survey of methods and portrayal of data: The process of speaker identification can be broken into several parts, such as text dependent or independent, and numerous deep learning methods like using Convolutional neural networks (CNN) models for classification can be used for identifying an unknown speaker. This bachelor thesis hopes to explore such and other state-of-the-art methods and provide a comprehensive survey.
• Practical part: This bachelor thesis will deliver a working prototype of a speaker identification system using Python programming language utilizing one of the state-of-the-art techniques discussed in the earlier sections that is best suited for a particular use case based on the field survey.
|Bibliography:||• Matějka et al., “Analysis of DNN approaches to speaker identification,” 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 5100-5104, doi: 10.1109/ICASSP.2016.7472649.
• L. Schmidt, M. Sharifi and I. L. Moreno, “Large-scale speaker identification,” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 1650-1654, doi: 10.1109/ICASSP.2014.6853878.
• Li, R., Jiang, J.-Y., Wu, X., Hsieh, C.-C., Stolcke, A. “Speaker Identification for Household Scenarios with Self-Attention and Adversarial Training,” 2020, Proc. Interspeech 2020, 2272-2276, doi: 10.21437/Interspeech.2020-3025
• C. Kumar, F. ur Rehman, S. Kumar, A. Mehmood and G. Shabir, "Analysis of MFCC and BFCC in a speaker identification system," 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), 2018, pp. 1-5, doi: 10.1109/ICOMET.2018.8346330.
• R. Jahangir et al., "Text-Independent Speaker Identification Through Feature Fusion and Deep Neural Network," 2020, in IEEE Access, vol. 8, pp. 32187-32202, 2020, doi: 10.1109/ACCESS.2020.2973541.