Name |
Description |
Clarinet |
This folder contains pitch-tracking sequences (WindInstrumentPitch.mat file) that have been extracted from a set of monophonic recordings of a wind instrument,the clarinet. The recordings are variations of two melodies (patterns) and are organized into two sets (folders Pattern_1 and Pattern_2, respectively) . |
1WORD.wav, 3WORDS.wav |
Speech examples that can be used in silence detection or speech filtering (Chapter 6) |
4ClassStream.wav, 4ClassStreamGT.mat |
4-class (female speech, male speech, silence and music) example to be used for supervised segmentation methods (Chapter 6). The mat file contains the respective ground truth. |
BassClarinet_model1.mat, frequency.txt |
A small sample of a bass clarinet sound. The text file contains the ground truth frequencies of the respective sound (used to demonstrate fundumental frequency estimation in demo "demoFo()") |
diarizationExample.wav |
Audio example for speaker diarization (Chapter 6). |
DubaiAirport.wav, KingGeorgeSpeech_1939_53sec.wav, KingGeorgeSpeech_1939_small.wav |
Three general purpose speech fiules (used for silence detection, segmentation, filtering and so on). |
musicLargeData.mat, musicSmallData.mat |
Two datasets of mid-term features extracted from 300 and 40 music tracks respectively. Used for music visualization tasks (Chapter 8) |
speech_music_sample.wav |
An audio stream of speech and music segments. Used for speech-music segmentation methods (Chapter 6) |
topGear.wav, topGearGT.mat |
An audio stream from a TV show with respective ground-truth. Used by signal change detection methods (Chapter 6) |
In the following table we provide a short description of the core Matlab functions of the library, i.e the functions stored in the "library" folder (not the ones stored in the "demos" folder).
For a description of the five (5) .mat files (i.e the kNN models of the respective classification tasks), please refer to Table 5.1 of the book.
m-file |
Description |
Chapter |
audioRecorderOnline |
Demonstrates the audio recording using the audiorecorder() MATLAB function. Calls the audioRecorderTimerCallback() callback function. |
2 |
audioRecorderTimerCallback |
Callback function used to record audio data (through the audiorecorder() MATLAB function) |
2 |
classifyKNN_D_Multi |
Classifies an unknown sample using the kNN algorithm, in its multi-class mode. Returns probability estimates |
5, 6 |
computePerformanceMeasures |
Computes the confusion matrix and performance measures of a classification process |
5 |
dctCompress |
Demonstrates the use of DCT for audio compressing |
3 |
dctDecompress |
Demonstrates the use of DCT for audio (de)compressing |
3 |
dynamicTimeWarpingItakura |
Computes the Dynamic Time Warping cost between two feature sequences based on the Itakura local path constraints |
7 |
dynamicTimeWarpingSakoeChiba |
Computes the Dynamic Time Warping cost between two feature sequences based on the Sakoe-Chiba local path constraints |
7 |
em_alg_function |
EM algorithm for estimating the parameters of a mixture of normal distributions, with diagonal covariance matrices |
7 |
EM_pdf_est |
EM estimation of the pdfs of c classes |
7 |
evaluateClassifier |
Implements the repeated hold out and leave-one-out validation methods |
5 |
feature_chroma_vector |
Computes the chroma vector of a short-term window |
4 |
feature_energy |
Computes the energy of a shortterm window |
4 |
feature_energy_entropy |
Computes the entropy of energy of a short-term window |
4 |
featureExtractionDir |
Extracts mid-term features for a list of WAV files stored in a given folder |
8 |
featureExtractionFile |
Reads a WAVE file and computes audio feature statistics on a mid-term basis |
4,5,6 |
feature_harmonic |
Computes the harmonic ratio and fundamental frequency of a window (autocorrelation method) |
4 |
feature_mfccs |
Computes the MFCCs of a short-term window (Based on Slaney'sAuditory Toolbox) |
4 |
feature_mfccs_init |
Initializes the computation of the MFCCs (see feature mfccs()) |
4 |
feature_spectral_centroid |
Computes the spectral centroid of a short-term window |
4 |
feature_spectral_entropy |
Computes the spectral entropy of a short-term window |
4 |
feature_spectral_flux |
Computes the spectral flux of a short-term window |
4 |
feature_spectral_rolloff |
Computes the spectral rolloff of a short-term window |
4 |
feature_zcr |
Computes the zero crossing rate of a short-term window |
4 |
fftExample |
Demonstrates how to use the getDFT() function |
3 |
fftSNR |
Demonstrates the use of the getDFT() function using a noisy signal |
3 |
fileClassification |
Demonstrates the classification of an audio segment from a WAVE file (not to be confused with mtFileClassification() which performs joint segmentation-classification of an audio file |
5 |
fld |
Finds a linear discriminant subspace using the LDA algorithm. Used for dimensionality reduction in the context of music visualization. This m-file has not be implemented by the authors, but it was taken from Mathworks File Exchange, Fisher Linear Discriminant Analysis, by Sergios Petridis |
8 |
getDFT |
Returns the (normalized) magnitude of the DFT of a signal. |
3,4 |
kNN_model_add_class |
Adds an audio class to a kNN classification setup. As the kNN classifier requires no actual training, the function it only performs a feature extraction stage for a set of WAVE files, stored in a given directory |
5 |
kNN_model_load |
Loads a kNN classification setup, i.e., a feature matrix for each class, along with the respective normalization parameters (means and standard deviations of the features) |
5 |
mixturepdf |
Computes the value of a pdf that is given as a mixture of normal distributions, at a given point. |
7 |
mp3toWav |
Performs MP3 to WAVE conversion with the FFMPEG command-line tool |
2 |
mp3toWavDIR |
Transcodes each MP3s of a given folder to the WAVE format, using the FFMPEG command-line tool |
2 |
mtFeatureExtraction |
Computes the mid-term statistics for a set of sequences of short-term features. It returns a matrix, whose columns contain the vectors of mid-term feature statistics. |
4 |
mtFileClassification |
Splits an audio signal into fixedsize segments and classifies each segment separately (fixed-size window segmentation) |
5,6 |
musicMeterTempoInduction |
Performs joint estimation of the music meter and tempo of a music recording |
8 |
musicThumbnailing |
Extracts pairs of thumbnails from music recordings |
8 |
musicVisualizationDemo |
Demonstrates three linear dimensionality reduction methods for music content visualization (random projection, PCA and LDA) |
8 |
musicVisualizationDemoSOM |
Demonstrates SOM-based music content visualization |
8 |
plotFeaturesFile |
Plots a given feature sequence that has been computed over a WAVE file |
4 |
printPerformanceMeasures |
Prints a table of classification performance measures (confusion matrix, recall, etc) in LATEX format |
5 |
readWavFile |
Demonstrates how to read the contents of a WAVE file, using two different modes: (a) all the contents of the WAVE file are loaded (b) blocks of data are read and each block is processed separately |
2 |
readWavFileScript |
Generates experiments that measure the elapsed time of different WAVE file I/O approaches |
2 |
scaledBaumWelchContObs |
Implements the scaled version of the Baum-Welch algorithm (continuous features) |
7 |
scaledBaumWelchDisObs |
Implements the scaled version of the Baum-Welch algorithm (discrete observations) |
7 |
scaledViterbiContObs |
Implements the Viterbi algorithm for continuous features |
7 |
scaledViterbiDisObs |
Implements the Viterbi algorithm for discrete observations |
7 |
scriptClassificationPerformance |
Loads a kNN classification setup (stored in a mat file) and extracts the respective classification performance measures. For the best value of k , it prints the respective confusion matrix and class-specific performance measures. |
5 |
segmentationCompareResults |
Visualizes two different segmentation results for the sake of comparison. |
6 |
segmentationPlotResults |
Provides a simple user interface to view and listen to the results of a segmentation - classification procedure. |
6 |
segmentationProbSeq |
Segments an audio stream based on the estimated posterior probabilities for each class. Implements: (a) naive merging and (b) viterbi-based probability smoothing. To be called after mtFileClassification(). |
6 |
segmentationSignalChange |
Basic unsupervised signal change segmentation (no classifier needed). |
6 |
showHistogramFeatures |
This auxiliary function is used to plot the histograms of a particular feature for different audio classes. It has been used to generate the histograms of Chapter 4 |
4 |
silenceDetectorUtterance |
Computes the endpoints of a single speech utterance. Based on Rabiner and Schafer, Theory and Applications of Digital Speech Processing, Section 10.3. |
6 |
silenceRemoval |
Applies a semi-supervised algorithm for detecting speech segments (removing silence) in an audio stream stored in a WAVE file. |
6 |
smithWaterman |
Implements the Smith-Waterman algorithm for sequence alignment |
7 |
soundOS |
An alternative to the Matlab sound() function, in case problems are encountered in Linuxbased systems |
2 |
speakerDiarization |
Implements a simple unsupervised speaker diarization procedure. |
6 |
stFeatureExtraction |
Breaks an audio signal to possibly overlapping short-term windows and computes sequences of audio features. It returns a matrix whose rows correspond to the extracted feature sequences |
4 |
stpFile |
Demonstrates the short-term processing stage of an audio signal. |
2 |
viterbiBestPath |
Finds the most-likely state sequence given a matrix of probability estimations. Used for smoothing segmentation results. |
6 |
viterbiTrainingDo |
Implements the Viterbi training scheme for the case of discrete observations |
7 |
viterbiTrainingMultiCo |
Implements the Viterbi training scheme for the case of continuous, multidimensional features, under the assumption that the density function at each state is Gaussian |
7 |
viterbiTrainingMultiCoMix |
Implements the Viterbi training scheme for the case of Gaussian mixtures |
7 |