Repository logo
Log In(current)
  • Inicio
  • Personal de Investigación
  • Unidad Académica
  • Publicaciones
  • Colecciones
    Datos de Investigacion Divulgacion cientifica Personal de Investigacion Protecciones Proyectos Externos Proyectos Internos Publicaciones Tesis
  1. Home
  2. Universidad de Santiago de Chile
  3. Publicaciones ANID
  4. A Novel Method for Estimating the Number of Speakers Based on Generalized Eigenvalue–Vector Decomposition and Adaptive Wavelet Transform by Using K-Means Clustering
Details

A Novel Method for Estimating the Number of Speakers Based on Generalized Eigenvalue–Vector Decomposition and Adaptive Wavelet Transform by Using K-Means Clustering

Journal
Signal, Image and Video Processing
ISSN
1863-1711
Date Issued
2020
Author(s)
Adasme-Soto, P  
DOI
https://doi.org/10.1007/s11760-020-01634-2
Abstract
The aim of this article is estimating the number of simultaneous speakers from the overlapped speech signals. The percentage of correct number of speakers is an important factor for the proposed algorithm. The proposed method in this article is based on spectrum estimation by using the adaptive wavelet transform in combination with generalized eigenvalue–vector decomposition (GEVD) and K-means clustering. Firstly, the speech signals are obtained by a uniform circular array, and each adjacent microphone pairs are considered for the processing. Then, the spectral estimation method is implemented on all microphone signals to select the best part of the speech spectrum. Next, the microphone signals are divided into different subbands by using adaptive wavelet transform. The GEVD algorithm is implemented on each microphone pairs in different subbands and time frames to estimate the room impulse response and time difference of arrival (TDOA). Finally, the K-means clustering with silhouette criteria is used to estimate the number of speakers (K value). The proposed algorithm is implemented on simulated and real data to show the superiority of proposed method in comparison with PENS, Bessel, i-vector PLDA, Hilbert envelope and DNN-based method. The proposed scheme outperforms the other evaluated schemes by 18% in terms of correct estimations in noisy–reverberant conditions for five simultaneous speakers. © 2020, Springer-Verlag London Ltd., part of Springer Nature.
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your Institution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Logo USACH

Universidad de Santiago de Chile
Avenida Libertador Bernardo O'Higgins nº 3363. Estación Central. Santiago Chile.
ciencia.abierta@usach.cl © 2023
The DSpace CRIS Project - Modificado por VRIIC USACH.

  • Accessibility settings
  • Privacy policy
  • End User Agreement
  • Send Feedback
Logo DSpace-CRIS
Repository logo COAR Notify