Overview of methods for determining the location of sound sources

Authors

DOI:

https://doi.org/10.18372/2073-4751.84.20901

Keywords:

acoustic source localization, microphone array, time difference of arrival, deep learning, physics-informed learning, multi-source acoustic scene

Abstract

The paper provides an overview and systematic comparison of modern methods for sound source localization, including time-based (TDoA, GCC), subspace-based (MUSIC, ESPRIT), beamforming methods, statistical tracking approaches, and neural network and hybrid solutions. For each class of methods, the key advantages and limitations are analyzed, as well as their suitability for use in real acoustic conditions, in particular in the presence of noise, reverberation, and multiple simultaneous sources. Based on a comparative analysis, it is concluded that hybrid architectures that combine physically interpreted features with adaptive machine learning models are the most promising for practical systems. A hybrid concept of multi-source localization is proposed, which uses GCC cross-spectral features and a multi-head neural network model to estimate the number of active sources, their directions, and the confidence of the prediction.

References

Yost W. A. History of sound source localization: 1850–1950 // Proceedings of Meetings on Acoustics. 2017. Т. 30, № 1. Art. 050002. DOI: https://doi.org/10.1121/2.0000529

Таланов А. В. Звуковая разведка артиллерии [Акустична розвідка артилерії]. Москва: Военное издательство Министерства Вооруженных сил Союза ССР, 1948.

Zimmerman D. Tucker’s acoustical mirrors: Aircraft detection before radar // War & Society. 1997. Т. 15, № 1. С. 73–99. DOI: https://doi.org/10.1179/072924797791201003

Knapp C. H., Carter G. C. The generalized correlation method for estimation of time delay // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1976. Т. 24, № 4. С. 320–327. DOI: https://doi.org/10.1109/TASSP.1976.1162830

Carter G., Nuttall A., Cable P. The smoothed coherence transform // Proceedings of the IEEE. 1973. Т. 61, № 10. С. 1497–1498. DOI: https://doi.org/10.1109/PROC.1973.9300

Chen L., Liu Y., Kong F., He N. Acoustic source localization based on generalized cross-correlation time-delay estimation // Procedia Engineering. 2011. Т. 15. С. 4912–4919. DOI: https://doi.org/10.1016/j.proeng.2011.08.915

Pena D. S., Lima A. D. L., de Sousa Jr. V. A., Silveira L. F., Martins A. M. Robust time delay estimation based on non-Gaussian impulsive acoustic channel // Journal of Communication and Information Systems. 2020. Т. 35, № 1. С. 86–93. [Електронний ресурс]. URL: https://jcis.sbrt.org.br/jcis/article/view/687/482

Wang J., Qian X., Pan Z., Zhang M. GCC-PHAT with speech-oriented attention for robotic sound source localization // 2021 IEEE International Conference on Robotics and Automation (ICRA). 2021. С. 13752–13758. DOI: https://doi.org/10.1109/ICRA48506.2021.956188

Schmidt R. O. Multiple emitter location and signal parameter estimation // IEEE Transactions on Antennas and Propagation. 1986. Т. 34, № 3. С. 276–280. DOI: https://doi.org/10.1109/TAP.1986.1143830

Hwang H. K., Aliyazicioglu Z., Grice M., Yakovlev A. Direction of arrival estimation using a Root-MUSIC algorithm // International MultiConference of Engineers and Computer Scientists 2008 (IMECS 2008). 2008. Vol. II.

Liu X., Liu C., Liao G. Polynomial coefficient finding for Root-MUSIC // Journal of Electronics (China). 2009. Т. 26, № 5. С. 543–548. DOI: https://doi.org/10.1007/s11767-009-0142-7

Das O., Abel J. S., Smith J. O. FAST MUSIC—An efficient implementation of the MUSIC algorithm for frequency estimation of approximately periodic signals // 21st International Conference on Digital Audio Effects (DAFx-18). 2018.

Huang Q., Lu N. Optimized real-time MUSIC algorithm with CPU–GPU architecture // IEEE Access. 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3070980

Aaltonen T. FPGA implementation of MUSIC direction of arrival algorithm using high-level synthesis : магістер. дис. Tampere University, 2023. [Електронний ресурс]. URL: https://urn.fi/URN:NBN:fi:tuni-202401091317

Roy R., Kailath T. ESPRIT—Estimation of signal parameters via rotational invariance techniques // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1989. Т. 37, № 7. С. 984–995. DOI: https://doi.org/10.1109/29.32276

Haardt M., Zoltowski M. D., Mathews C. P., Nossek J. A. 2D unitary ESPRIT for efficient 2D parameter estimation // 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-95). 1995. Т. 3. С. 2096–2099. DOI: https://doi.org/10.1109/ICASSP.1995.478488

Haardt M., Nossek J. A. Unitary ESPRIT: How to obtain increased estimation accuracy with a reduced computational burden // IEEE Transactions on Signal Processing. 1995. Т. 43, № 5. С. 1232–1242. DOI: https://doi.org/10.1109/78.382406

Römer F., Haardt M., Del Galdo G. Analytical performance assessment of multi-dimensional matrix- and tensor-based ESPRIT-type algorithms // IEEE Transactions on Signal Processing. 2014. Т. 62, № 10. С. 2611–2625. DOI: https://doi.org/10.1109/TSP.2014.2313530

Zeng W., He J., Li H., Zhu X. A SVT-ESPRIT estimation algorithm in sparse array // International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2016). 2016. С. 12–17. DOI: https://doi.org/10.2991/iccia-16.2016.3

Ramos A. L. L., Holm S., Gudvangen S., Otterlei R. Delay-and-sum beamforming for direction of arrival estimation applied to gunshot acoustics // Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense X (Proc. SPIE). 2011. Т. 8019. Art. 80190U. DOI: https://doi.org/10.1117/12.886833

Perrot V., Polichetti M., Varray F., Garcia D. So you think you can DAS? A viewpoint on delay-and-sum beamforming // Ultrasonics. 2021. Т. 111. Art. 106309. DOI: https://doi.org/10.1016/j.ultras.2020.106309

Capon J. High-resolution frequency-wavenumber spectrum analysis // Proceedings of the IEEE. 1969. Т. 57, № 8. С. 1408–1418. DOI: https://doi.org/10.1109/PROC.1969.7278

Brandstein M. S., Silverman H. F. A practical methodology for speech source localization with microphone arrays // Computer Speech & Language. 1997. Т. 11, № 2. С. 91–126.

Grinstein E., Tengan E., Çakmak B., Dietzen T., Nunes L., van Waterschoot T., Brookes M., Naylor P. A. Steered response power for sound source localization: A tutorial review [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.2405.02991

Grondin F., Michaud F. Lightweight and optimized sound source localization and tracking methods for open and closed microphone array configurations // Robotics and Autonomous Systems. 2019. Т. 113. С. 63–80. DOI: https://doi.org/10.1016/j.robot.2019.01.002

Sathish K., Chinthaginjala R., Kim W., Rajesh A., Corchado J. M., Abbas M. Underwater wireless sensor networks with RSSI-based advanced efficiency-driven localization and unprecedented accuracy // Sensors. 2023. Т. 23, № 15. Art. 6973. DOI: https://doi.org/10.3390/s23156973

Deng F., Guan S., Yue X., Gu X., Chen J., Lv J. Energy-based sound source localization with low power consumption in wireless sensor networks // IEEE Transactions on Industrial Electronics. 2017. Т. 64, № 6. С. 4894–4902. DOI: https://doi.org/10.1109/TIE.2017.2652394

Alves M., Coelho R., Dranka E. Effective acoustic energy sensing exploitation for target sources localization in urban acoustic scenes. [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.1910.02709

Hu Y. H., Li D. Energy-based collaborative source localization using acoustic micro-sensor array // IEEE Workshop on Multimedia Signal Processing (MMSP 2002). 2003. С. 509–512. DOI: https://doi.org/10.1109/MMSP.2002.1203323

Khalaf-Allah M. Emitter location using frequency difference of arrival measurements only // Sensors. 2022. Т. 22, № 24. Art. 9642. DOI: https://doi.org/10.3390/s22249642

Zhang B., Hu Y., Wang H., Zhuang Z. Underwater source localization using TDoA and FDOA measurements with unknown propagation speed and sensor parameter errors // IEEE Access. 2018. Т. 6. С. 36645–36661. DOI: https://doi.org/10.1109/ACCESS.2018.2852636

Li X., Girin L., Horaud R., Alameda-Pineda X. Multiple sound source localization with DP-RTF features and GMM-based clustering. [Електронний ресурс]. URL: https://arxiv.org/abs/1611.01172

Park M., Sim K., Yang H. Multiple sound source localization using GMM-based mask with diffuse component suppression // INTER-NOISE 2020. 2020. Т. 261, № 3. [Електронний ресурс]. URL: https://www.ingentaconnect.com/contentone/ince/incecp/2020/00000261/00000003/art00084

Fuchs J. Monaural sound localization: A probabilistic approach. Graz University of Technology, 2008. [Електронний ресурс]. URL: https://www.spsc.tugraz.at/system/files/MonauralSoundLocalization.pdf

Tan T.-H., Lin Y.-T., Chang Y.-L., Alkhaleefah M. Sound source localization using a convolutional neural network and regression model // Sensors. 2021. Т. 21, № 23. Art. 8031. DOI: https://doi.org/10.3390/s21238031

Tang D., Taseska M., van Waterschoot T. Toward learning robust contrastive embeddings for binaural sound source localization // Frontiers in Neuroinformatics. 2022. Т. 16. Art. 942978. DOI: https://doi.org/10.3389/fninf.2022.942978

Correia S. D., Tomic S., Beko M. A feed-forward neural network approach for energy-based acoustic source localization // Journal of Sensor and Actuator Networks. 2021. Т. 10, № 2. Art. 29. DOI: https://doi.org/10.3390/jsan10020029

Adavanne S., Politis A., Nikunen J., Virtanen T. Sound event localization and detection of overlapping sources using convolutional recurrent neural networks // IEEE Journal of Selected Topics in Signal Processing. 2018. Т. 13, № 1. С. 34–48. DOI: https://doi.org/10.1109/JSTSP.2018.2885636

Hu F., Song X., He R., Yu Y. Sound source localization based on residual network and channel attention module // Scientific Reports. 2023. Т. 13. Art. 32657. DOI: https://doi.org/10.1038/s41598-023-32657-7

Kuang S., Shi J., van der Heijden K., Mehrkanoon S. BAST: Binaural audio spectrogram transformer for binaural sound localization [Електронний ресурс]. URL: https://arxiv.org/abs/2207.03927v2

Berg A., Gulin J., O'Connor M., Zhou C., Åström K., Oskarsson M. wav2pos: Sound source localization using masked autoencoders // International Conference on Indoor Positioning and Indoor Navigation (IPIN). 2024. DOI: https://doi.org/10.1109/IPIN62893.2024.10786105

Zhang D., Wang S., Belatreche A., Wei W., Xiao Y., Zheng H., Zhou Z., Zhang M., Yang Y. Spike-based neuromorphic model for sound source localization // NeurIPS 2024. 2024. [Електронний ресурс]. URL: https://openreview.net/forum?id=CyCDqnrymT

Wu Y., Ayyalasomayajula R., Bianco M. J., Bharadia D., Gerstoft P. SSLIDE: Sound source localization for indoors based on deep learning [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.2010.14420

Berg A., Engman J., Gulin J., Åström K., Oskarsson M. Learning multi-target TDoA features for sound event localization and detection [Електронний ресурс]. URL: https://arxiv.org/abs/2408.17166

Pujol H., Bavu É., Garcia A. BeamLearning: An end-to-end deep learning approach for the angular localization of sound sources using raw multichannel acoustic pressure data // The Journal of the Acoustical Society of America. 2021. Т. 149, № 6. С. 4069–4081. DOI: https://doi.org/10.1121/10.0005046

Merkofer J. P., Revach G., Shlezinger N., van Sloun R. J. G. Deep augmented MUSIC algorithm for data-driven DoA estimation // ICASSP 2022 – 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. 2022. С. 4613–4617. DOI: https://doi.org/10.1109/ICASSP43922.2022.9746637

Elbir A. M. DeepMUSIC: Multiple signal classification via deep learning // IEEE Sensors Letters. 2020. Т. 4, № 6. С. 1–4. DOI: https://doi.org/10.1109/LSENS.2020.2980384

Ji J., Mao W., Xi F., Chen S. TransMUSIC: A transformer-aided subspace method for DOA estimation with low-resolution ADCs [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.2309.08174

Chen J., Hudson R. E., Yao K. A comparative study on time delay estimation in reverberant and noisy environments // Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005). 2005. DOI/URL: https://www.jingdongchen.com/conferencespapers/%282005%29A%20comparative%20study%20on%20time%20delay%20estimation%20in%20reverberant%20and%20noisy%20environments.pdf

DiBiase J. H. A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays : PhD thesis. Providence, Rhode Island : Brown University, 2000. [Електронний ресурс]. URL: https://www.glat.info/ma/av16.3/2000-DiBiaseThesis.pdf

Brandstein M. S., Ward D. B. Microphone arrays: signal processing techniques and applications. Berlin : Springer, 2001. [Електронний ресурс]. URL: https://link.springer.com/book/10.1007/978-3-662-04619-7

Published

2025-12-30

How to Cite

Ryaby, M., & Shatokhin, M. (2025). Overview of methods for determining the location of sound sources. Problems of Informatization and Management, 4(84), 103–121. https://doi.org/10.18372/2073-4751.84.20901

Issue

Section

Статті