Conference papers

User Assisted Separation Using Tensor Factorisations

Derry Fitzgerald — Wed, 03 Oct 2012 07:35:48 PDT

Recent research has demonstrated that user assisted techniques, where the user provides a ”guide” version of the source to be separated, are capable of giving good sound source separation. Here the user sings or plays along with the target source, and the user input is used to guide the separation towards the source of interest. This is typically done in a factorisation framework, such as non-negative matrix factorisation. Here we extend such approaches to a tensor factorisation framework to deal with multichannel signals. Further, we demonstrate how this framework can be used to improve the output from other user assisted techniques, such as the Adress algorithm, where the user manually selects a region from the stereo space corresponding to a given source.

On the Use of Masking Filters in Sound Source Separation

Derry Fitzgerald et al. — Wed, 03 Oct 2012 07:25:52 PDT

Many sound source separation algorithms, such as NMF and related approaches, disregard phase information and operate only on magnitude or power spectrograms. In this context, generalised Wiener filters have been widely used to generate masks which are applied to the original complex-valued spectrogram before inversion to the time domain, as these masks have been shown to give good results. However, these masks may not be optimal from a perceptual point of view. To this end, we propose new families of masks and compare their performance to generalised Wiener filter masks using three different factorisation-based separation algorithms. Further, to-date no analysis of how the performance of masking varies with the number of iterations performed when estimating the separated sources. We perform such an analysis and show that when using these masks, running to convergence may not be required in order to obtain good separation performance.

Vocal Separation Using Nearest Neighbours and Median Filtering

Derry Fitzgerald — Wed, 03 Oct 2012 07:15:54 PDT

Recently, single channel vocal separation algorithms have been proposed which exploit the fact that most popular music can be regarded as a repeating musical background over which a locally non-repeating vocal signal is superimposed. In this paper we describe a novel vocal separator inspired by these approaches which finds the k nearest neighbours to each frame of a spectrogram of the mixture signal. The median value of these frames is then used as the estimate of the background music at the current frame. This is then used to generate a mask on the original complex-valued spectrogram before inversion to the time domain. The e ectiveness of the approach is demonstrated on a number of real-world signals.

On Inpainting the Adress Algorithm

Derry Fitzgerald et al. — Wed, 03 Oct 2012 07:05:54 PDT

The Adress algorithm has been demonstrated to be capable of separating sound sources from instantaneous linear mixtures, provided that the sources have a unique pan position in the stereo field. However, a shortcoming of the Adress algorithm is that all time-frequency bins outside of the chosen azimuth range are set to zero, resulting in audible artifacts in the resynthesised sound. Here we show that an inpainting algorithm based on NMF is capable of estimating these missing values and improves on the results obtained using Adress only.

Shifted NMF with Group Sparsity for Clustering NMF Basis Functions

Rajesh Jaiswal et al. — Mon, 17 Sep 2012 03:10:56 PDT

Recently, Non-negative Matrix Factorisation (NMF) has found application in separation of individual sound sources. NMF decomposes the spectrogram of an audio mixture into an additive parts based representation where the parts typically correspond to individual notes or chords. However, there is a need to cluster the NMF basis functions to their sources. Although, many attempts have been made to improve the clustering of the basis functions to sources, much research is still required in this area. Recently, Shifted Non-negative Matrix Factorisation (SNMF) was used to cluster these basis functions. To this end, we propose that the incorporation of group sparsity to the Shifted NMF based methods may benefit the clustering algorithms. We have tested this on SNMF algorithms with improved separation quality. Results show that this gives improved clustering of pitched basis functions over previous methods.

Formative Assessment Practices in Built Environment Higher Education Programmes and the Enhancement of the Student Learning Experience

Lloyd Scott et al. — Fri, 11 May 2012 08:07:09 PDT

It is widely accepted across Higher Education (HE) that assessment has a strong link with learning and a key factor in this link is formative assessment. Formative assessment is generally defined as an activity taking place during a programme or unit of learning with the express purpose of improving and enhancing student learning. However, there is still considerable disagreement over the roles of lecturers and students in this process. It is therefore very important to understand how lecturers in built environment (BE) undergraduate education perceive their own roles and the role of their students in using assessment strategy to deliver deep learning. An investigation into lecturers' perceptions of their roles and their conceptions related to the assessment process of students in BE programmes is reported. An on-line survey was conducted with over 130 Irish BE academics involved with the delivery of undergraduate programmes in the areas of Architecture, Architectural Technology, Quantity Surveying and Construction Management. Additional data were also obtained and analysed from their associated programme documentation. Discussion is focused on a critical evaluation of the findings of the study with the current literature on the roles of BE academics in the formative assessment process. As a result recommendations are made on how lecturers may better formulate appropriate assessments for their students that will encourage deep learning and thus create enhanced HE learning experiences.

On the Use of a Dynamic Hybrid Tempo Detection Model for Beat Tracking

Mikel Gainza — Tue, 13 Sep 2011 05:09:08 PDT

In this paper, an approach that estimates the times at which musical beats occur is presented. The system uses a hybrid multi-band decomposition in order to estimate the music tempo. Following this, beat events are tracked by using a dynamic programming approach, which is updated by using short time tempo estimates. The hybrid decomposition is used in order to calculate the tempo by using different onset detection functions in different frequency bands. In addition, a method that estimates which frequency bands provide reliable periodicities is also presented. The accuracy of the model is evaluated by comparing the presented system against existing approaches using a database of 474 songs.

Sub-band Independent Subspace Analysis for Drum Transcription

Derry Fitzgerald et al. — Tue, 13 Sep 2011 05:09:06 PDT

While Independent Subspace Analysis provides a means of separating sound sources from a single channel signal, making it an effective tool for drum transcription, it does have a number of problems. Not least of these is that the amount of information required to allow separation of sound sources varies from signal to signal. To overcome this indeterminacy and improve the robustness of transcription an extension of Independent Subspace Analysis to include sub-band processing is proposed. The use of this approach is demonstrated by its application in a simple drum transcription algorithm.

Drum Transcription in the Presence of Pitched Instruments using Prior Subspace Analysis

Derry Fitzgerald et al. — Tue, 13 Sep 2011 05:09:03 PDT

This paper demonstrates the use of Prior Subspace Analysis (PSA) as a method for transcribing drums in the presence of pitched instruments. PSA uses prior subspaces that represent the sources to be transcribed to overcome some of the problems associated with other subspace methods such as Independent Subspace Analysis (ISA) or sub-band ISA. The use of prior knowledge results in improved robustness for transcription purposes and enables the method to work more readily in the presence of pitched instruments than other subspace methods. The system presented in this paper attempts to extend the use of PSA to transcribe drum sounds in the presence of interfering pitched instruments.

Onset Detection, Music Transcription and Ornament Detection for the the Traditional Irish Fiddle

Aileen Kelleher et al. — Tue, 13 Sep 2011 05:09:01 PDT

By combining techniques used in previous onset detectors, a system that detects note onsets in traditional Irish fiddle tunes has been implemented. The notes detected also include the most common types of ornamentation played by the fiddle. Ornaments are notes of extremely short duration, at most a fifth the length of a regular note. A Short Time Fourier Transform based sub-band technique, which previously gave good results for the Irish tin whistle, was modified to include a threshold approximation more suitable for the fiddle. This system has been tested on a database of real recorded fiddle tunes and good results have been achieved.

Development of a Computer-Based Violin Teaching Aid: ViTool

Jane Charles et al. — Tue, 13 Sep 2011 05:08:59 PDT

This paper considers the development of a violin teaching aid, called ViTool, which is based on violin pedagogy, sound analysis, and comparison of beginner and good player recordings. It is a computer based teaching aid and will ultimately consist of at least four task dependent tools. Typical beginner faults have been identified and features, that best describe them for classification purposes, are considered. The ViTool is not intended as a replacement or electronic teacher, but as a teaching aid. Presently, it seems that no such violin learning aid or tool exists and an opportunity exists for the development of such home learning aids. This paper puts forward the initial steps towards such a teaching aid.

Single Channel Source Separation using Short-time Independent Component Analysis

Dan Barry et al. — Tue, 13 Sep 2011 05:08:57 PDT

In this paper we develop a method for the sound source separation of single channel mixtures using Independent Component Analysis within a time-frequency representation of the audio signal. We apply standard Independent Component Analysis techniques to contiguous magnitude frames of the short-time Fourier transform of the mixture. Provided that the amplitude envelopes of each source are sufficiently different, it can be seen that it is possible to recover the independent short-time power spectra of each source. A simple scoring scheme based on auditory scene analysis cues is then used to overcome the source ordering problem ultimately allowing each of the independent spectra to be assigned to the correct source. A final stage of adaptive filtering is then applied which forces each of the spectra to become more independent. Each of the sources is then resynthesised using the standard inverse short-time Fourier transform with an overlap add scheme.

Non-negative Tensor Factorisation for Sound Source Separation

Derry Fitzgerald et al. — Tue, 13 Sep 2011 05:08:55 PDT

An algorithm for Non-negative Tensor Factorisation is introduced which extends current matrix factorisation techniques to deal with tensors. The effectiveness of the algorithm is then demonstrated through tests on synthetic data. The algorithm is then employed as a means of performing sound source separation on two channel mixtures, and the separation capabilities of the algorithm demonstrated on a two channel mixture containing saxophone, strings and bass guitar.

Blind Source Separation and Automatic Transcription of Music Using Tensor Decompositions

Derry Fitzgerald — Tue, 13 Sep 2011 05:08:53 PDT

Recent advances in the use of tensor decompositions for the analysis of music are described. In particular, the use of such decompositions for sound source separation and the automatic transcription of music are explored.

Using Tensor Factorisation Models to Separate Drums from Polyphonic Music

Derry Fitzgerald et al. — Tue, 13 Sep 2011 05:08:51 PDT

This paper describes the use of Non-negative Tensor Factorisation models for the separation of drums from polyphonic audio. Improved separation of the drums is achieved through the incorporation of Gamma Chain priors into the Non-negative Tensor Factorisation framework. In contrast to many previous approaches, the method used in this paper requires little or no pre-training or use of drum templates. The utility of the technique is shown on real-world audio examples.

Musical Sound Source Separation using Extended Tensor Decompositions

Derry Fitzgerald — Tue, 13 Sep 2011 05:08:49 PDT

Recently, tensor decompositions have found use in sound source separation. In particular, non-negative tensor decompositions have received a lot of attention due to their ability to decompose audio spectrograms into meaningful ”parts” such as individual notes. Extensions to the basic non-negative tensor factorisation framework allow the incorporation of additional constraints, such as shift-invariance in both frequency and time. This enables the factorisations to capture more complex structures than individual notes, such as individual sources playing diﬀerent pitches and time-evolving instrument timbres. Further music speciﬁc constraints such as harmonicity and source-ﬁlter modeling have been shown to improve separation performance for musical signals. Other recent advances also allow the incorporation of Bayesian priors into these models, thereby further improving the separations obtained.

On the use of the Beta Divergence for Musical Source Separation

Derry Fitzgerald et al. — Tue, 13 Sep 2011 05:08:47 PDT

Non-negative Tensor Factorisation based methods have found use in the context of musical sound source separation. These techniques require the use of a suitable cost function to determine the optimal factorisation, and most work has focused on the use of the generalised Kullback-Liebler divergence, and more recently the Itakura-Saito divergence. These divergences can be regarded as limiting cases of the parameterised Beta divergence. This paper looks at the use of the Beta Divergence in the context of musical source separation with a view to determining an optimal value of Beta for this problem. This is considered for both magnitude and power spectrograms. In an eﬀort to avoid potential local minima in the Beta divergence, the use of a “tempered” Beta Divergence is also explored.

Interactive Music Archive Access System

Martin Gallagher et al. — Tue, 13 Sep 2011 05:08:45 PDT

The goal of the Interactive Music Archive Access System (IMAAS) project was to develop an interactive music archive access system which was capable of allowing an end-user to easily extract rhythmic, melodic and harmonic musical metadata descriptors from audio, and allow the user to interact with the archive contents in a manner not typically allowed in archive access systems. To this end, the IMAAS system incorporates a range of real-time interaction tools which allow the user to modify the retrieved audio in a number of ways including the ability to isolate individual instruments in stereo mixes, pitch and time-scale modiﬁcation, and beat-synchronous looping. This demo gives an overview of the capabilities of the IMAAS application.

Harmonic/Percussive Separation Using Median Filtering

Derry Fitzgerald — Tue, 13 Sep 2011 05:08:43 PDT

In this paper, we present a fast, simple and effective method to separate the harmonic and percussive parts of a monaural audio signal.The technique involves the use of median ﬁltering on a spectrogram of the audio signal, with median ﬁltering performed across successive frames to suppress percussive events and enhance harmonic components, while median ﬁltering is also performed across frequency bins to enhance percussive events and supress harmonic components. The two resulting median ﬁltered spectrograms are then used to generate masks which are then applied to the original spectrogram to separate the harmonic and percussive parts of the signal. We illustrate the use of the algorithm in the context of remixing audio material from commercial recordings.

Multi-channel Audio Time-scale Modification

David Dorran et al. — Tue, 13 Sep 2011 05:08:40 PDT

Phase vecoder based approaches to audio time-scale modification introduce a reverberant artefact into the time scaled output. Recent techniques have been developed to reduce the presence of this artefact; however, these techniques have the effect of introducing additional issues relating to their application to multi-channel recordings. This paper addresses these issues by collectively analysing all channels prior to time-scaling each individual channel.