Shoji Makino, Shoko Araki, Stefan Winter, and Hiroshi Sawada
NTT Communication Science Laboratories
People can engage in comprehensible conversations at a noisy cocktail party. This is the well-known “cocktail-party effect,” whereby our ears can extract what a person is saying under such conditions. The aim of blind source separation (BSS)  for speech applications is to provide computers with this ability, thus enabling them to determine individual speech waveforms from mixtures.
Blind source separation has already been applied to various problems including the wireless communication and biomedical fields. However, as speech signal mixtures in a natural (i.e., reverberant) environment are generally convolutive mixtures, they involve a task that is structurally much more challenging than instantaneous mixtures, which are prevalent in many other applications. BSS is an approach for estimating source signals using only information about their mixtures observed at each sensor. The estimation is performed without possessing information on individual sources, such as their location, frequency characteristics, and how they are mixed. The BSS technique for speech dealt with in this chapter has many applications including hands-free teleconference systems and automatic meeting minute generators. ...