This page provides laboratory recordings of simulated moving speakers using a loudspeaker mounted on a rotating arm. We are working on making this webpage more interactive; temporarily, only links are available.
Videos from recordings of separate utterances are available here. The recordings are indexed from 1 through 6 for each speaker (1=static,…, 6=very fast and wide angular range).
Mixtures (4-channel WAVs) are available here. There are 36 mixtures corresponding to all combinations of the above separate recordings. For example, the index equal to 11 means that static male and static female speeches are mixed. The initial signal-to-interference ratio is 8 dB (signal = male speech, interferer = female speech).
The following links contain mono WAVs obtained by the compared blind algorithms to extract the male speech. AuxIVA and Online AuxIVA stand for conventional methods based on the static mixing model while Block AuxIVE is based on an advanced CSV mixing vector allowing for the target source movements. The main characteristics you should hear are as follows:
|The target male speech is vanishing once the source moves from a targeted position. The signal distortion is low at the targeted position. The interferer (female speech) is attenuated depending on the movements.
The signal distortion is overall bad as the algorithm works with local context of data (it is a sequential algorithm where the size of the context must be balanced in order to preserve the adaptability of the algorithm). It can adapt to sources’ movements, however, the overall performance is low.
|The algorithm performs similarly to AuxIVA as it employs the entire batch of the recording. However, it can extract the target male speech even if the source is moving, without distortion. It is the gained ability due to the CSV mixing model compared to the above methods.
Extracted male speech by AuxIVA: here
Extracted male speech by Online AuxIVA: here
Extracted male speech by the proposed Block AuxIVE algorithm: here