Огляд методів визначення місцезнаходження джерел звуку

М.О. Рябий; М.А. Шатохін

doi:10.18372/2073-4751.84.20901

Автор(и)

М.О. Рябий https://orcid.org/0000-0002-9651-9135
М.А. Шатохін https://orcid.org/0000-0003-0028-6208

DOI:

https://doi.org/10.18372/2073-4751.84.20901

Ключові слова:

акустична локалізація джерел, мікрофонний масив, часова затримка приходу сигналу, глибоке навчання, фізично-інформоване навчання, багатоджерельна акустична сцена

Анотація

У роботі виконано огляд та систематичне порівняння сучасних методів локалізації джерел звуку, зокрема часових (TDoA, GCC), підпросторових (MUSIC, ESPRIT), методів формування променя, статистичних підходів трекінгу та нейромережевих і гібридних рішень. Для кожного класу методів проаналізовано ключові переваги та обмеження, а також придатність до роботи в реальних акустичних умовах, зокрема за наявності шуму, реверберації та кількох одночасних джерел. На основі порівняльного аналізу зроблено висновок, що найбільш перспективними для практичних систем є гібридні архітектури, які поєднують фізично інтерпретовані ознаки з адаптивними моделями машинного навчання. Запропоновано гібридну концепцію багатоджерельної локалізації, що використовує крос-спектральні ознаки GCC та multi-head нейромережеву модель для оцінювання кількості активних джерел, їх напрямків і впевненості прогнозу.

Посилання

Yost W. A. History of sound source localization: 1850–1950 // Proceedings of Meetings on Acoustics. 2017. Т. 30, № 1. Art. 050002. DOI: https://doi.org/10.1121/2.0000529

Таланов А. В. Звуковая разведка артиллерии [Акустична розвідка артилерії]. Москва: Военное издательство Министерства Вооруженных сил Союза ССР, 1948.

Zimmerman D. Tucker’s acoustical mirrors: Aircraft detection before radar // War & Society. 1997. Т. 15, № 1. С. 73–99. DOI: https://doi.org/10.1179/072924797791201003

Knapp C. H., Carter G. C. The generalized correlation method for estimation of time delay // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1976. Т. 24, № 4. С. 320–327. DOI: https://doi.org/10.1109/TASSP.1976.1162830

Carter G., Nuttall A., Cable P. The smoothed coherence transform // Proceedings of the IEEE. 1973. Т. 61, № 10. С. 1497–1498. DOI: https://doi.org/10.1109/PROC.1973.9300

Chen L., Liu Y., Kong F., He N. Acoustic source localization based on generalized cross-correlation time-delay estimation // Procedia Engineering. 2011. Т. 15. С. 4912–4919. DOI: https://doi.org/10.1016/j.proeng.2011.08.915

Pena D. S., Lima A. D. L., de Sousa Jr. V. A., Silveira L. F., Martins A. M. Robust time delay estimation based on non-Gaussian impulsive acoustic channel // Journal of Communication and Information Systems. 2020. Т. 35, № 1. С. 86–93. [Електронний ресурс]. URL: https://jcis.sbrt.org.br/jcis/article/view/687/482

Wang J., Qian X., Pan Z., Zhang M. GCC-PHAT with speech-oriented attention for robotic sound source localization // 2021 IEEE International Conference on Robotics and Automation (ICRA). 2021. С. 13752–13758. DOI: https://doi.org/10.1109/ICRA48506.2021.956188

Schmidt R. O. Multiple emitter location and signal parameter estimation // IEEE Transactions on Antennas and Propagation. 1986. Т. 34, № 3. С. 276–280. DOI: https://doi.org/10.1109/TAP.1986.1143830

Hwang H. K., Aliyazicioglu Z., Grice M., Yakovlev A. Direction of arrival estimation using a Root-MUSIC algorithm // International MultiConference of Engineers and Computer Scientists 2008 (IMECS 2008). 2008. Vol. II.

Liu X., Liu C., Liao G. Polynomial coefficient finding for Root-MUSIC // Journal of Electronics (China). 2009. Т. 26, № 5. С. 543–548. DOI: https://doi.org/10.1007/s11767-009-0142-7

Das O., Abel J. S., Smith J. O. FAST MUSIC—An efficient implementation of the MUSIC algorithm for frequency estimation of approximately periodic signals // 21st International Conference on Digital Audio Effects (DAFx-18). 2018.

Huang Q., Lu N. Optimized real-time MUSIC algorithm with CPU–GPU architecture // IEEE Access. 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3070980

Aaltonen T. FPGA implementation of MUSIC direction of arrival algorithm using high-level synthesis : магістер. дис. Tampere University, 2023. [Електронний ресурс]. URL: https://urn.fi/URN:NBN:fi:tuni-202401091317

Roy R., Kailath T. ESPRIT—Estimation of signal parameters via rotational invariance techniques // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1989. Т. 37, № 7. С. 984–995. DOI: https://doi.org/10.1109/29.32276

Haardt M., Zoltowski M. D., Mathews C. P., Nossek J. A. 2D unitary ESPRIT for efficient 2D parameter estimation // 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-95). 1995. Т. 3. С. 2096–2099. DOI: https://doi.org/10.1109/ICASSP.1995.478488

Haardt M., Nossek J. A. Unitary ESPRIT: How to obtain increased estimation accuracy with a reduced computational burden // IEEE Transactions on Signal Processing. 1995. Т. 43, № 5. С. 1232–1242. DOI: https://doi.org/10.1109/78.382406

Römer F., Haardt M., Del Galdo G. Analytical performance assessment of multi-dimensional matrix- and tensor-based ESPRIT-type algorithms // IEEE Transactions on Signal Processing. 2014. Т. 62, № 10. С. 2611–2625. DOI: https://doi.org/10.1109/TSP.2014.2313530

Zeng W., He J., Li H., Zhu X. A SVT-ESPRIT estimation algorithm in sparse array // International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2016). 2016. С. 12–17. DOI: https://doi.org/10.2991/iccia-16.2016.3

Ramos A. L. L., Holm S., Gudvangen S., Otterlei R. Delay-and-sum beamforming for direction of arrival estimation applied to gunshot acoustics // Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense X (Proc. SPIE). 2011. Т. 8019. Art. 80190U. DOI: https://doi.org/10.1117/12.886833

Perrot V., Polichetti M., Varray F., Garcia D. So you think you can DAS? A viewpoint on delay-and-sum beamforming // Ultrasonics. 2021. Т. 111. Art. 106309. DOI: https://doi.org/10.1016/j.ultras.2020.106309

Capon J. High-resolution frequency-wavenumber spectrum analysis // Proceedings of the IEEE. 1969. Т. 57, № 8. С. 1408–1418. DOI: https://doi.org/10.1109/PROC.1969.7278

Brandstein M. S., Silverman H. F. A practical methodology for speech source localization with microphone arrays // Computer Speech & Language. 1997. Т. 11, № 2. С. 91–126.

Grinstein E., Tengan E., Çakmak B., Dietzen T., Nunes L., van Waterschoot T., Brookes M., Naylor P. A. Steered response power for sound source localization: A tutorial review [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.2405.02991

Grondin F., Michaud F. Lightweight and optimized sound source localization and tracking methods for open and closed microphone array configurations // Robotics and Autonomous Systems. 2019. Т. 113. С. 63–80. DOI: https://doi.org/10.1016/j.robot.2019.01.002

Sathish K., Chinthaginjala R., Kim W., Rajesh A., Corchado J. M., Abbas M. Underwater wireless sensor networks with RSSI-based advanced efficiency-driven localization and unprecedented accuracy // Sensors. 2023. Т. 23, № 15. Art. 6973. DOI: https://doi.org/10.3390/s23156973

Deng F., Guan S., Yue X., Gu X., Chen J., Lv J. Energy-based sound source localization with low power consumption in wireless sensor networks // IEEE Transactions on Industrial Electronics. 2017. Т. 64, № 6. С. 4894–4902. DOI: https://doi.org/10.1109/TIE.2017.2652394

Alves M., Coelho R., Dranka E. Effective acoustic energy sensing exploitation for target sources localization in urban acoustic scenes. [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.1910.02709

Hu Y. H., Li D. Energy-based collaborative source localization using acoustic micro-sensor array // IEEE Workshop on Multimedia Signal Processing (MMSP 2002). 2003. С. 509–512. DOI: https://doi.org/10.1109/MMSP.2002.1203323

Khalaf-Allah M. Emitter location using frequency difference of arrival measurements only // Sensors. 2022. Т. 22, № 24. Art. 9642. DOI: https://doi.org/10.3390/s22249642

Zhang B., Hu Y., Wang H., Zhuang Z. Underwater source localization using TDoA and FDOA measurements with unknown propagation speed and sensor parameter errors // IEEE Access. 2018. Т. 6. С. 36645–36661. DOI: https://doi.org/10.1109/ACCESS.2018.2852636

Li X., Girin L., Horaud R., Alameda-Pineda X. Multiple sound source localization with DP-RTF features and GMM-based clustering. [Електронний ресурс]. URL: https://arxiv.org/abs/1611.01172

Park M., Sim K., Yang H. Multiple sound source localization using GMM-based mask with diffuse component suppression // INTER-NOISE 2020. 2020. Т. 261, № 3. [Електронний ресурс]. URL: https://www.ingentaconnect.com/contentone/ince/incecp/2020/00000261/00000003/art00084

Fuchs J. Monaural sound localization: A probabilistic approach. Graz University of Technology, 2008. [Електронний ресурс]. URL: https://www.spsc.tugraz.at/system/files/MonauralSoundLocalization.pdf

Tan T.-H., Lin Y.-T., Chang Y.-L., Alkhaleefah M. Sound source localization using a convolutional neural network and regression model // Sensors. 2021. Т. 21, № 23. Art. 8031. DOI: https://doi.org/10.3390/s21238031

Tang D., Taseska M., van Waterschoot T. Toward learning robust contrastive embeddings for binaural sound source localization // Frontiers in Neuroinformatics. 2022. Т. 16. Art. 942978. DOI: https://doi.org/10.3389/fninf.2022.942978

Correia S. D., Tomic S., Beko M. A feed-forward neural network approach for energy-based acoustic source localization // Journal of Sensor and Actuator Networks. 2021. Т. 10, № 2. Art. 29. DOI: https://doi.org/10.3390/jsan10020029

Adavanne S., Politis A., Nikunen J., Virtanen T. Sound event localization and detection of overlapping sources using convolutional recurrent neural networks // IEEE Journal of Selected Topics in Signal Processing. 2018. Т. 13, № 1. С. 34–48. DOI: https://doi.org/10.1109/JSTSP.2018.2885636

Hu F., Song X., He R., Yu Y. Sound source localization based on residual network and channel attention module // Scientific Reports. 2023. Т. 13. Art. 32657. DOI: https://doi.org/10.1038/s41598-023-32657-7

Kuang S., Shi J., van der Heijden K., Mehrkanoon S. BAST: Binaural audio spectrogram transformer for binaural sound localization [Електронний ресурс]. URL: https://arxiv.org/abs/2207.03927v2

Berg A., Gulin J., O'Connor M., Zhou C., Åström K., Oskarsson M. wav2pos: Sound source localization using masked autoencoders // International Conference on Indoor Positioning and Indoor Navigation (IPIN). 2024. DOI: https://doi.org/10.1109/IPIN62893.2024.10786105

Zhang D., Wang S., Belatreche A., Wei W., Xiao Y., Zheng H., Zhou Z., Zhang M., Yang Y. Spike-based neuromorphic model for sound source localization // NeurIPS 2024. 2024. [Електронний ресурс]. URL: https://openreview.net/forum?id=CyCDqnrymT

Wu Y., Ayyalasomayajula R., Bianco M. J., Bharadia D., Gerstoft P. SSLIDE: Sound source localization for indoors based on deep learning [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.2010.14420

Berg A., Engman J., Gulin J., Åström K., Oskarsson M. Learning multi-target TDoA features for sound event localization and detection [Електронний ресурс]. URL: https://arxiv.org/abs/2408.17166

Pujol H., Bavu É., Garcia A. BeamLearning: An end-to-end deep learning approach for the angular localization of sound sources using raw multichannel acoustic pressure data // The Journal of the Acoustical Society of America. 2021. Т. 149, № 6. С. 4069–4081. DOI: https://doi.org/10.1121/10.0005046

Merkofer J. P., Revach G., Shlezinger N., van Sloun R. J. G. Deep augmented MUSIC algorithm for data-driven DoA estimation // ICASSP 2022 – 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. 2022. С. 4613–4617. DOI: https://doi.org/10.1109/ICASSP43922.2022.9746637

Elbir A. M. DeepMUSIC: Multiple signal classification via deep learning // IEEE Sensors Letters. 2020. Т. 4, № 6. С. 1–4. DOI: https://doi.org/10.1109/LSENS.2020.2980384

Ji J., Mao W., Xi F., Chen S. TransMUSIC: A transformer-aided subspace method for DOA estimation with low-resolution ADCs [Електронний ресурс]. DOI: https://doi.org/10.48550/arXiv.2309.08174

Chen J., Hudson R. E., Yao K. A comparative study on time delay estimation in reverberant and noisy environments // Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005). 2005. DOI/URL: https://www.jingdongchen.com/conferencespapers/%282005%29A%20comparative%20study%20on%20time%20delay%20estimation%20in%20reverberant%20and%20noisy%20environments.pdf

DiBiase J. H. A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays : PhD thesis. Providence, Rhode Island : Brown University, 2000. [Електронний ресурс]. URL: https://www.glat.info/ma/av16.3/2000-DiBiaseThesis.pdf

Brandstein M. S., Ward D. B. Microphone arrays: signal processing techniques and applications. Berlin : Springer, 2001. [Електронний ресурс]. URL: https://link.springer.com/book/10.1007/978-3-662-04619-7

Огляд методів визначення місцезнаходження джерел звуку

Автор(и)

DOI:

Ключові слова:

Анотація

Посилання

##submission.downloads##

Опубліковано

Як цитувати

Номер

Розділ

Ліцензія

##plugins.block.developedBy.blockTitle##

Мова

Інформація

Подати статтю