Overview of methods for determining the location of sound sources
DOI:
https://doi.org/10.18372/2073-4751.84.20901Keywords:
acoustic source localization, microphone array, time difference of arrival, deep learning, physics-informed learning, multi-source acoustic sceneAbstract
The paper provides an overview and systematic comparison of modern methods for sound source localization, including time-based (TDoA, GCC), subspace-based (MUSIC, ESPRIT), beamforming methods, statistical tracking approaches, and neural network and hybrid solutions. For each class of methods, the key advantages and limitations are analyzed, as well as their suitability for use in real acoustic conditions, in particular in the presence of noise, reverberation, and multiple simultaneous sources. Based on a comparative analysis, it is concluded that hybrid architectures that combine physically interpreted features with adaptive machine learning models are the most promising for practical systems. A hybrid concept of multi-source localization is proposed, which uses GCC cross-spectral features and a multi-head neural network model to estimate the number of active sources, their directions, and the confidence of the prediction.
References
Yost W. A. History of sound source localization: 1850–1950 // Proceedings of Meetings on Acoustics. 2017. Т. 30, № 1. Art. 050002. DOI: https://doi.org/10.1121/2.0000529
Таланов А. В. Звуковая разведка артиллерии [Акустична розвідка артилерії]. Москва: Военное издательство Министерства Вооруженных сил Союза ССР, 1948.
Zimmerman D. Tucker’s acoustical mirrors: Aircraft detection before radar // War & Society. 1997. Т. 15, № 1. С. 73–99. DOI: https://doi.org/10.1179/072924797791201003
Knapp C. H., Carter G. C. The generalized correlation method for estimation of time delay // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1976. Т. 24, № 4. С. 320–327. DOI: https://doi.org/10.1109/TASSP.1976.1162830
Carter G., Nuttall A., Cable P. The smoothed coherence transform // Proceedings of the IEEE. 1973. Т. 61, № 10. С. 1497–1498. DOI: https://doi.org/10.1109/PROC.1973.9300
Chen L., Liu Y., Kong F., He N. Acoustic source localization based on generalized cross-correlation time-delay estimation // Procedia Engineering. 2011. Т. 15. С. 4912–4919. DOI: https://doi.org/10.1016/j.proeng.2011.08.915
Pena D. S., Lima A. D. L., de Sousa Jr. V. A., Silveira L. F., Martins A. M. Robust time delay estimation based on non-Gaussian impulsive acoustic channel // Journal of Communication and Information Systems. 2020. Т. 35, № 1. С. 86–93. [Електронний ресурс]. URL: https://jcis.sbrt.org.br/jcis/article/view/687/482
Wang J., Qian X., Pan Z., Zhang M. GCC-PHAT with speech-oriented attention for robotic sound source localization // 2021 IEEE International Conference on Robotics and Automation (ICRA). 2021. С. 13752–13758. DOI: https://doi.org/10.1109/ICRA48506.2021.956188
Schmidt R. O. Multiple emitter location and signal parameter estimation // IEEE Transactions on Antennas and Propagation. 1986. Т. 34, № 3. С. 276–280. DOI: https://doi.org/10.1109/TAP.1986.1143830
Hwang H. K., Aliyazicioglu Z., Grice M., Yakovlev A. Direction of arrival estimation using a Root-MUSIC algorithm // International MultiConference of Engineers and Computer Scientists 2008 (IMECS 2008). 2008. Vol. II.
Liu X., Liu C., Liao G. Polynomial coefficient finding for Root-MUSIC // Journal of Electronics (China). 2009. Т. 26, № 5. С. 543–548. DOI: https://doi.org/10.1007/s11767-009-0142-7
Das O., Abel J. S., Smith J. O. FAST MUSIC—An efficient implementation of the MUSIC algorithm for frequency estimation of approximately periodic signals // 21st International Conference on Digital Audio Effects (DAFx-18). 2018.
Huang Q., Lu N. Optimized real-time MUSIC algorithm with CPU–GPU architecture // IEEE Access. 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3070980
Aaltonen T. FPGA implementation of MUSIC direction of arrival algorithm using high-level synthesis : магістер. дис. Tampere University, 2023. [Електронний ресурс]. URL: https://urn.fi/URN:NBN:fi:tuni-202401091317
Roy R., Kailath T. ESPRIT—Estimation of signal parameters via rotational invariance techniques // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1989. Т. 37, № 7. С. 984–995. DOI: https://doi.org/10.1109/29.32276
Haardt M., Zoltowski M. D., Mathews C. P., Nossek J. A. 2D unitary ESPRIT for efficient 2D parameter estimation // 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-95). 1995. Т. 3. С. 2096–2099. DOI: https://doi.org/10.1109/ICASSP.1995.478488
Haardt M., Nossek J. A. Unitary ESPRIT: How to obtain increased estimation accuracy with a reduced computational burden // IEEE Transactions on Signal Processing. 1995. Т. 43, № 5. С. 1232–1242. DOI: https://doi.org/10.1109/78.382406
Römer F., Haardt M., Del Galdo G. Analytical performance assessment of multi-dimensional matrix- and tensor-based ESPRIT-type algorithms // IEEE Transactions on Signal Processing. 2014. Т. 62, № 10. С. 2611–2625. DOI: https://doi.org/10.1109/TSP.2014.2313530
Zeng W., He J., Li H., Zhu X. A SVT-ESPRIT estimation algorithm in sparse array // International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2016). 2016. С. 12–17. DOI: https://doi.org/10.2991/iccia-16.2016.3
Ramos A. L. L., Holm S., Gudvangen S., Otterlei R. Delay-and-sum beamforming for direction of arrival estimation applied to gunshot acoustics // Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense X (Proc. SPIE). 2011. Т. 8019. Art. 80190U. DOI: https://doi.org/10.1117/12.886833
Perrot V., Polichetti M., Varray F., Garcia D. So you think you can DAS? A viewpoint on delay-and-sum beamforming // Ultrasonics. 2021. Т. 111. Art. 106309. DOI: https://doi.org/10.1016/j.ultras.2020.106309
Capon J. High-resolution frequency-wavenumber spectrum analysis // Proceedings of the IEEE. 1969. Т. 57, № 8. С. 1408–1418. DOI: https://doi.org/10.1109/PROC.1969.7278
Brandstein M. S., Silverman H. F. A practical methodology for speech source localization with microphone arrays // Computer Speech & Language. 1997. Т. 11, № 2. С. 91–126.
Grinstein E., Tengan E., Çakmak B., Dietzen T., Nunes L., van Waterschoot T., Brookes M., Naylor P. A. Steered response power for sound source localization: A tutorial review [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.2405.02991
Grondin F., Michaud F. Lightweight and optimized sound source localization and tracking methods for open and closed microphone array configurations // Robotics and Autonomous Systems. 2019. Т. 113. С. 63–80. DOI: https://doi.org/10.1016/j.robot.2019.01.002
Sathish K., Chinthaginjala R., Kim W., Rajesh A., Corchado J. M., Abbas M. Underwater wireless sensor networks with RSSI-based advanced efficiency-driven localization and unprecedented accuracy // Sensors. 2023. Т. 23, № 15. Art. 6973. DOI: https://doi.org/10.3390/s23156973
Deng F., Guan S., Yue X., Gu X., Chen J., Lv J. Energy-based sound source localization with low power consumption in wireless sensor networks // IEEE Transactions on Industrial Electronics. 2017. Т. 64, № 6. С. 4894–4902. DOI: https://doi.org/10.1109/TIE.2017.2652394
Alves M., Coelho R., Dranka E. Effective acoustic energy sensing exploitation for target sources localization in urban acoustic scenes. [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.1910.02709
Hu Y. H., Li D. Energy-based collaborative source localization using acoustic micro-sensor array // IEEE Workshop on Multimedia Signal Processing (MMSP 2002). 2003. С. 509–512. DOI: https://doi.org/10.1109/MMSP.2002.1203323
Khalaf-Allah M. Emitter location using frequency difference of arrival measurements only // Sensors. 2022. Т. 22, № 24. Art. 9642. DOI: https://doi.org/10.3390/s22249642
Zhang B., Hu Y., Wang H., Zhuang Z. Underwater source localization using TDoA and FDOA measurements with unknown propagation speed and sensor parameter errors // IEEE Access. 2018. Т. 6. С. 36645–36661. DOI: https://doi.org/10.1109/ACCESS.2018.2852636
Li X., Girin L., Horaud R., Alameda-Pineda X. Multiple sound source localization with DP-RTF features and GMM-based clustering. [Електронний ресурс]. URL: https://arxiv.org/abs/1611.01172
Park M., Sim K., Yang H. Multiple sound source localization using GMM-based mask with diffuse component suppression // INTER-NOISE 2020. 2020. Т. 261, № 3. [Електронний ресурс]. URL: https://www.ingentaconnect.com/contentone/ince/incecp/2020/00000261/00000003/art00084
Fuchs J. Monaural sound localization: A probabilistic approach. Graz University of Technology, 2008. [Електронний ресурс]. URL: https://www.spsc.tugraz.at/system/files/MonauralSoundLocalization.pdf
Tan T.-H., Lin Y.-T., Chang Y.-L., Alkhaleefah M. Sound source localization using a convolutional neural network and regression model // Sensors. 2021. Т. 21, № 23. Art. 8031. DOI: https://doi.org/10.3390/s21238031
Tang D., Taseska M., van Waterschoot T. Toward learning robust contrastive embeddings for binaural sound source localization // Frontiers in Neuroinformatics. 2022. Т. 16. Art. 942978. DOI: https://doi.org/10.3389/fninf.2022.942978
Correia S. D., Tomic S., Beko M. A feed-forward neural network approach for energy-based acoustic source localization // Journal of Sensor and Actuator Networks. 2021. Т. 10, № 2. Art. 29. DOI: https://doi.org/10.3390/jsan10020029
Adavanne S., Politis A., Nikunen J., Virtanen T. Sound event localization and detection of overlapping sources using convolutional recurrent neural networks // IEEE Journal of Selected Topics in Signal Processing. 2018. Т. 13, № 1. С. 34–48. DOI: https://doi.org/10.1109/JSTSP.2018.2885636
Hu F., Song X., He R., Yu Y. Sound source localization based on residual network and channel attention module // Scientific Reports. 2023. Т. 13. Art. 32657. DOI: https://doi.org/10.1038/s41598-023-32657-7
Kuang S., Shi J., van der Heijden K., Mehrkanoon S. BAST: Binaural audio spectrogram transformer for binaural sound localization [Електронний ресурс]. URL: https://arxiv.org/abs/2207.03927v2
Berg A., Gulin J., O'Connor M., Zhou C., Åström K., Oskarsson M. wav2pos: Sound source localization using masked autoencoders // International Conference on Indoor Positioning and Indoor Navigation (IPIN). 2024. DOI: https://doi.org/10.1109/IPIN62893.2024.10786105
Zhang D., Wang S., Belatreche A., Wei W., Xiao Y., Zheng H., Zhou Z., Zhang M., Yang Y. Spike-based neuromorphic model for sound source localization // NeurIPS 2024. 2024. [Електронний ресурс]. URL: https://openreview.net/forum?id=CyCDqnrymT
Wu Y., Ayyalasomayajula R., Bianco M. J., Bharadia D., Gerstoft P. SSLIDE: Sound source localization for indoors based on deep learning [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.2010.14420
Berg A., Engman J., Gulin J., Åström K., Oskarsson M. Learning multi-target TDoA features for sound event localization and detection [Електронний ресурс]. URL: https://arxiv.org/abs/2408.17166
Pujol H., Bavu É., Garcia A. BeamLearning: An end-to-end deep learning approach for the angular localization of sound sources using raw multichannel acoustic pressure data // The Journal of the Acoustical Society of America. 2021. Т. 149, № 6. С. 4069–4081. DOI: https://doi.org/10.1121/10.0005046
Merkofer J. P., Revach G., Shlezinger N., van Sloun R. J. G. Deep augmented MUSIC algorithm for data-driven DoA estimation // ICASSP 2022 – 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. 2022. С. 4613–4617. DOI: https://doi.org/10.1109/ICASSP43922.2022.9746637
Elbir A. M. DeepMUSIC: Multiple signal classification via deep learning // IEEE Sensors Letters. 2020. Т. 4, № 6. С. 1–4. DOI: https://doi.org/10.1109/LSENS.2020.2980384
Ji J., Mao W., Xi F., Chen S. TransMUSIC: A transformer-aided subspace method for DOA estimation with low-resolution ADCs [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.2309.08174
Chen J., Hudson R. E., Yao K. A comparative study on time delay estimation in reverberant and noisy environments // Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005). 2005. DOI/URL: https://www.jingdongchen.com/conferencespapers/%282005%29A%20comparative%20study%20on%20time%20delay%20estimation%20in%20reverberant%20and%20noisy%20environments.pdf
DiBiase J. H. A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays : PhD thesis. Providence, Rhode Island : Brown University, 2000. [Електронний ресурс]. URL: https://www.glat.info/ma/av16.3/2000-DiBiaseThesis.pdf
Brandstein M. S., Ward D. B. Microphone arrays: signal processing techniques and applications. Berlin : Springer, 2001. [Електронний ресурс]. URL: https://link.springer.com/book/10.1007/978-3-662-04619-7
Downloads
Published
How to Cite
Issue
Section
License
Автори, які публікуються у цьому журналі, погоджуються з наступними умовами:- Автори залишають за собою право на авторство своєї роботи та передають журналу право першої публікації цієї роботи на умовах ліцензії Creative Commons Attribution License, котра дозволяє іншим особам вільно розповсюджувати опубліковану роботу з обов'язковим посиланням на авторів оригінальної роботи та першу публікацію роботи у цьому журналі.
- Автори мають право укладати самостійні додаткові угоди щодо неексклюзивного розповсюдження роботи у тому вигляді, в якому вона була опублікована цим журналом (наприклад, розміщувати роботу в електронному сховищі установи або публікувати у складі монографії), за умови збереження посилання на першу публікацію роботи у цьому журналі.
- Політика журналу дозволяє і заохочує розміщення авторами в мережі Інтернет (наприклад, у сховищах установ або на особистих веб-сайтах) рукопису роботи, як до подання цього рукопису до редакції, так і під час його редакційного опрацювання, оскільки це сприяє виникненню продуктивної наукової дискусії та позитивно позначається на оперативності та динаміці цитування опублікованої роботи (див. The Effect of Open Access).