Auditory Stream Segregation in Auditory Scene Analysis with a Multi-Agent System

Tomohiro Nakatani, Hiroshi G. Okuno, Takeshi Kawabata

We propose a novel approach to auditory stream segregation which extracts individual sounds (auditory stream) from a mixture of sounds in auditory scene analysis. The HBSS (Harmonic-Based Stream Segregation) system is designed and developed by employing a multi-agent system. HBSS uses only harmonics as a clue to segregation and extracts auditory streams incrementally. When the tracer-generator agent detects a new sound, it spawns a tracer agent, which extracts an auditory stream by tracing its harmonic structure. The tracer sends a feedforward signal so that the generator and other tracers should not work on the same stream that is being traced. The quality of segregation may be poor due to redundant and ghost tracers. HBSS copes with this problem by introducing monitor agents, which detect and eliminate redundant and ghost tracers. HBSS can segregate two streams from a mixture of man’s and woman’s speech. It is easy to resynthesize speech or sounds from the corresponding streams. Additionally, HBSS can be easily extended by adding agents of a new capability. HBSS can be considered as the first step to computational auditory scene analysis.

