Matlab Audio Analysis Library


(C) 2014
Theodoros Giannakopoulos
Aggelos Pikrakis

The current document is an outline of the Matlab Audio Analysis Library which accompanies the book Introduction to Audio Analysis:A MATLAB® Approach, 1st Edition.

The provided material is organized as follows::

  • Folder "library"
    • In the root of this folder you can find:
      • the core m-files of the Matlab Audio Analysis Library
      • a number of mat files
    • Folder "demos" contains m-files used for demonstrating particular functionalities of the library. Most of these demos are presented in the book. Note, that, in order to run the demos, one has to add the root path (i.e. the path of the "library" folder) in the MATLAB path.
  • Folder "data" contains basic audio data that have been used to evaluate and train several algorithms described in the book.

Contents of "/data"

Name Description
Clarinet This folder contains pitch-tracking sequences (WindInstrumentPitch.mat file) that have been extracted from a set of monophonic recordings of a wind instrument,the clarinet. The recordings are variations of two melodies (patterns) and are organized into two sets (folders Pattern_1 and Pattern_2, respectively) .
1WORD.wav, 3WORDS.wav Speech examples that can be used in silence detection or speech filtering (Chapter 6)
4ClassStream.wav, 4ClassStreamGT.mat 4-class (female speech, male speech, silence and music) example to be used for supervised segmentation methods (Chapter 6). The mat file contains the respective ground truth.
BassClarinet_model1.mat, frequency.txt A small sample of a bass clarinet sound. The text file contains the ground truth frequencies of the respective sound (used to demonstrate fundumental frequency estimation in demo "demoFo()")
diarizationExample.wav Audio example for speaker diarization (Chapter 6).
DubaiAirport.wav, KingGeorgeSpeech_1939_53sec.wav, KingGeorgeSpeech_1939_small.wav Three general purpose speech fiules (used for silence detection, segmentation, filtering and so on).
musicLargeData.mat, musicSmallData.mat Two datasets of mid-term features extracted from 300 and 40 music tracks respectively. Used for music visualization tasks (Chapter 8)
speech_music_sample.wav An audio stream of speech and music segments. Used for speech-music segmentation methods (Chapter 6)
topGear.wav, topGearGT.mat An audio stream from a TV show with respective ground-truth. Used by signal change detection methods (Chapter 6)

Contents of "/library"

In the following table we provide a short description of the core Matlab functions of the library, i.e the functions stored in the "library" folder (not the ones stored in the "demos" folder).
For a description of the five (5) .mat files (i.e the kNN models of the respective classification tasks), please refer to Table 5.1 of the book.
m-file Description Chapter
audioRecorderOnline Demonstrates the audio recording using the audiorecorder() MATLAB function. Calls the audioRecorderTimerCallback() callback function. 2
audioRecorderTimerCallback Callback function used to record audio data (through the audiorecorder() MATLAB function) 2
classifyKNN_D_Multi Classifies an unknown sample using the kNN algorithm, in its multi-class mode. Returns probability estimates 5, 6
computePerformanceMeasures Computes the confusion matrix and performance measures of a classification process 5
dctCompress Demonstrates the use of DCT for audio compressing 3
dctDecompress Demonstrates the use of DCT for audio (de)compressing 3
dynamicTimeWarpingItakura Computes the Dynamic Time Warping cost between two feature sequences based on the Itakura local path constraints 7
dynamicTimeWarpingSakoeChiba Computes the Dynamic Time Warping cost between two feature sequences based on the Sakoe-Chiba local path constraints 7
em_alg_function EM algorithm for estimating the parameters of a mixture of normal distributions, with diagonal covariance matrices 7
EM_pdf_est EM estimation of the pdfs of c classes 7
evaluateClassifier Implements the repeated hold out and leave-one-out validation methods 5
feature_chroma_vector Computes the chroma vector of a short-term window 4
feature_energy Computes the energy of a shortterm window 4
feature_energy_entropy Computes the entropy of energy of a short-term window 4
featureExtractionDir Extracts mid-term features for a list of WAV files stored in a given folder 8
featureExtractionFile Reads a WAVE file and computes audio feature statistics on a mid-term basis 4,5,6
feature_harmonic Computes the harmonic ratio and fundamental frequency of a window (autocorrelation method) 4
feature_mfccs Computes the MFCCs of a short-term window (Based on Slaney'sAuditory Toolbox) 4
feature_mfccs_init Initializes the computation of the MFCCs (see feature mfccs()) 4
feature_spectral_centroid Computes the spectral centroid of a short-term window 4
feature_spectral_entropy Computes the spectral entropy of a short-term window 4
feature_spectral_flux Computes the spectral flux of a short-term window 4
feature_spectral_rolloff Computes the spectral rolloff of a short-term window 4
feature_zcr Computes the zero crossing rate of a short-term window 4
fftExample Demonstrates how to use the getDFT() function 3
fftSNR Demonstrates the use of the getDFT() function using a noisy signal 3
fileClassification Demonstrates the classification of an audio segment from a WAVE file (not to be confused with mtFileClassification() which performs joint segmentation-classification of an audio file 5
fld Finds a linear discriminant subspace using the LDA algorithm. Used for dimensionality reduction in the context of music visualization. This m-file has not be implemented by the authors, but it was taken from Mathworks File Exchange, Fisher Linear Discriminant Analysis, by Sergios Petridis 8
getDFT Returns the (normalized) magnitude of the DFT of a signal. 3,4
kNN_model_add_class Adds an audio class to a kNN classification setup. As the kNN classifier requires no actual training, the function it only performs a feature extraction stage for a set of WAVE files, stored in a given directory 5
kNN_model_load Loads a kNN classification setup, i.e., a feature matrix for each class, along with the respective normalization parameters (means and standard deviations of the features) 5
mixturepdf Computes the value of a pdf that is given as a mixture of normal distributions, at a given point. 7
mp3toWav Performs MP3 to WAVE conversion with the FFMPEG command-line tool 2
mp3toWavDIR Transcodes each MP3s of a given folder to the WAVE format, using the FFMPEG command-line tool 2
mtFeatureExtraction Computes the mid-term statistics for a set of sequences of short-term features. It returns a matrix, whose columns contain the vectors of mid-term feature statistics. 4
mtFileClassification Splits an audio signal into fixedsize segments and classifies each segment separately (fixed-size window segmentation) 5,6
musicMeterTempoInduction Performs joint estimation of the music meter and tempo of a music recording 8
musicThumbnailing Extracts pairs of thumbnails from music recordings 8
musicVisualizationDemo Demonstrates three linear dimensionality reduction methods for music content visualization (random projection, PCA and LDA) 8
musicVisualizationDemoSOM Demonstrates SOM-based music content visualization 8
plotFeaturesFile Plots a given feature sequence that has been computed over a WAVE file 4
printPerformanceMeasures Prints a table of classification performance measures (confusion matrix, recall, etc) in LATEX format 5
readWavFile Demonstrates how to read the contents of a WAVE file, using two different modes: (a) all the contents of the WAVE file are loaded (b) blocks of data are read and each block is processed separately 2
readWavFileScript Generates experiments that measure the elapsed time of different WAVE file I/O approaches 2
scaledBaumWelchContObs Implements the scaled version of the Baum-Welch algorithm (continuous features) 7
scaledBaumWelchDisObs Implements the scaled version of the Baum-Welch algorithm (discrete observations) 7
scaledViterbiContObs Implements the Viterbi algorithm for continuous features 7
scaledViterbiDisObs Implements the Viterbi algorithm for discrete observations 7
scriptClassificationPerformance Loads a kNN classification setup (stored in a mat file) and extracts the respective classification performance measures. For the best value of k , it prints the respective confusion matrix and class-specific performance measures. 5
segmentationCompareResults Visualizes two different segmentation results for the sake of comparison. 6
segmentationPlotResults Provides a simple user interface to view and listen to the results of a segmentation - classification procedure. 6
segmentationProbSeq Segments an audio stream based on the estimated posterior probabilities for each class. Implements: (a) naive merging and (b) viterbi-based probability smoothing. To be called after mtFileClassification(). 6
segmentationSignalChange Basic unsupervised signal change segmentation (no classifier needed). 6
showHistogramFeatures This auxiliary function is used to plot the histograms of a particular feature for different audio classes. It has been used to generate the histograms of Chapter 4 4
silenceDetectorUtterance Computes the endpoints of a single speech utterance. Based on Rabiner and Schafer, Theory and Applications of Digital Speech Processing, Section 10.3. 6
silenceRemoval Applies a semi-supervised algorithm for detecting speech segments (removing silence) in an audio stream stored in a WAVE file. 6
smithWaterman Implements the Smith-Waterman algorithm for sequence alignment 7
soundOS An alternative to the Matlab sound() function, in case problems are encountered in Linuxbased systems 2
speakerDiarization Implements a simple unsupervised speaker diarization procedure. 6
stFeatureExtraction Breaks an audio signal to possibly overlapping short-term windows and computes sequences of audio features. It returns a matrix whose rows correspond to the extracted feature sequences 4
stpFile Demonstrates the short-term processing stage of an audio signal. 2
viterbiBestPath Finds the most-likely state sequence given a matrix of probability estimations. Used for smoothing segmentation results. 6
viterbiTrainingDo Implements the Viterbi training scheme for the case of discrete observations 7
viterbiTrainingMultiCo Implements the Viterbi training scheme for the case of continuous, multidimensional features, under the assumption that the density function at each state is Gaussian 7
viterbiTrainingMultiCoMix Implements the Viterbi training scheme for the case of Gaussian mixtures 7

M-files dependency graph