TarsosLSH is a Java library implementing Locality-sensitive Hashing (LSH), a practical nearest neighbor search algorithm for high dimensional vectors that operates in sublinear time. The open source software package is authored by me and is available on GitHub: TarsosLSH on GitHub.
With TarsosLSH, Joseph Hwang and Nicholas Kwon from Rice University created an Image Mosaic web application. The application chops an uploaded photo into small blocks. For each block, a color histogram is created and compared with an index of color histograms of reference images. Subsequently each block is replaced with one of the top three nearest neighbors, creating a mosaic. Since high dimensional nearest neighbor search is needed, this is an ideal application for TarsosLSH. The application somewhat proves that TarsosLSH can be used in practical applications, which is comforting.
Woensdag 18 december 2013 organiseerde Olmo Cornelis een concert in het kader van zijn doctoraat. De dag erna volgde zijn verdediging. Nogmaals proficiat Olmo met het mooie eeh mbirapunt. Hieronder staat kort wat uitleg over het project en het concert.
In zijn onderzoeksproject ‘Exploring the symbiosis of Western and non-Western Music’ stelde Olmo Cornelis de beschrijving van Centraal-Afrikaanse muziek centraal. Deze werd verkend via computationele technieken die de klank als signaal
benaderden. De verkregen informatie zorgde voor beïnvloeding van het artistieke oeuvre waarin steeds een mengeling van impliciete en expliciete etnische invloeden spelen.
In het kader van de afronding van dit doctoraal onderzoek spelen het HERMESensemble, het Nadar Ensemble, Maja Jantar en Françoise Vanhecke op 18 december werk van Olmo Cornelis dat tijdens dit project geschreven werd. Het onderzoeksproject Exploring the symbiosis of Western and non-Western Music werd in 2008 geïnitieerd aan het Conservatorium / School of Arts van de HoGent en werd gefinancierd door het onderzoeksfonds Hogeschool Gent.
Beeld: Noel Cornelis, Reality of Possibilities, 2012
The journal paper Evaluation and Recommendation of Pulse and Tempo Annotation in Ethnic Music – In Journal Of New Music Research by Cornelis, Six, Holzapfel and Leman was published in a special issue about Computational Ethnomusicology of the Journal of New Music Research on the 20th of august 2013. Below you can find the abstract for the article, and the full text author version of the article itself.
Abstract: Large digital archives of ethnic music require automatic tools to provide musical content descriptions. While various automatic approaches are available, they are to a wide extent developed for Western popular music. This paper aims to analyze how automated tempo estimation approaches perform in the context of Central-African music. To this end we collect human beat annotations for a set of musical fragments, and compare them with automatic beat tracking sequences. We first analyze the tempo estimations derived from annotations and beat tracking results. Then we examine an approach, based on mutual agreement between automatic and human annotations, to automate such analysis, which can serve to detect musical fragments with high tempo ambiguity.
@article{cornelis2013tempo_jnmr,
author = {OlmoCornelis, JorenSix, AndreHolzapfel, andMarcLeman},
title = {{EvaluationandRecommendation of Pulse ant TempoAnnotationinEthnicMusic}},
journal = {{Journal of NewMusicResearch}},
volume = {42},
number = {2},
pages = {131-149},
year = {2013},
doi = {10.1080/09298215.2013.812123}
}
The DSP library for Taros, aptly named TarsosDSP, now includes an implementation of a Constant-Q Transform (as of version 1.6). The Constant-Q transform does essentially the same thing as an FFT, but has the advantage that each octave has the same amount of bins. This makes the Constant-Q transform practical for applications processing music. If, for example, 12 bins per octave are chosen, these can correspond with the western musical scale.
Also included in the newest release (version 1.7) is a way to visualize the transform, or other musical features. The visualization implementation is done together with Thomas Stubbe.
The example application below shows the Constant-Q transform with an overlay of pitch estimations. The corresponding waveform is also shown.
The journal paper Tarsos, a Modular Platform for Precise Pitch Analysis of Western and Non-Western Music by Six, Cornelis, and Leman was published in a special issue about Computational Ethnomusicology of the Journal of New Music Research on the 20th of august 2013. Below you can find the abstract for the article, and pointers to audio examples, the Tarsos software, and the author version of the article itself.
Abstract: This paper presents Tarsos, a modular software platform used to extract and analyze pitch organization in music. With Tarsos pitch estimations are generated from an audio signal and those estimations are processed in order to form musicologically meaningful representations. Tarsos aims to offer a flexible system for pitch analysis through the combination of an interactive user interface, several pitch estimation algorithms, filtering options, immediate auditory feedback and data output modalities for every step. To study the most frequently used pitches, a fine-grained histogram that allows up to 1200 values per octave is constructed. This allows Tarsos to analyze deviations in Western music, or to analyze specific tone scales that differ from the 12 tone equal temperament, common in many non-Western musics. Tarsos has a graphical user interface or can be launched using an API – as a batch script. Therefore, it is fit for both the analysis of individual songs and the analysis of large music corpora. The interface allows several visual representations, and can indicate the scale of the piece under analysis. The extracted scale can be used immediately to tune a MIDI keyboard that can be played in the discovered scale. These features make Tarsos an interesting tool that can be used for musicological analysis, teaching and even artistic productions.
Ladrang Kandamanyura (slendro pathet manyura), is the name of the piece used in the article throughout section 2. The album on which the piece can be found is available at wergo. Below a thirty second fragment is embedded. You can also download the thirty second fragment to analyse it yourself.
Below the BibTex entry for the article is embedded.
In the extended abstract, also titled Computer Assisted Transcription of Ethnic Music, it is described how the Tarsos software program now has features aiding transcription. Tarsos is especially practical for ethnic music of which the tone scale is not known beforehand. The proceedings of FMA 2013 are available as well.
During the conference there also was an interesting panel on transcription. The following people participated: John Ashley Burgoyne, moderator (University of Amsterdam), Kofi Agawu (Princeton University), Dániel P. Biró (University of Victoria), Olmo Cornelis (University College Ghent, Belgium), Emilia Gómez (Universitat Pompeu Fabra, Barcelona), and Barbara Titus (Utrecht University). Some pictures can be found below.
TarsosLSH is a Java library implementing Locality-sensitive Hashing (LSH), a practical nearest neighbour search algorithm for multidimensional vectors that operates in sublinear time. It supports several Locality Sensitive Hashing (LSH) families: the Euclidean hash family (L2), city block hash family (L1) and cosine hash family. The library tries to hit the sweet spot between being capable enough to get real tasks done, and compact enough to serve as a demonstration on how LSH works. It relates to the Tarsos project because it is a practical way to search for and compare musical features.
git clone https://JorenSix@github.com/JorenSix/TarsosLSH.git
cd TarsosLSH/build
ant #Builds the core TarsosLSH library
ant javadoc #build the API documentation
When everything runs correctly you should be able to run the command line application, and have the latest version of the TarsosLSH library for inclusion in your projects. Also, the Javadoc documentation for the API should be available in TarsosLSH/doc. Drop me a line if you use TarsosLSH in your project. Always nice to hear how this software is used.
The fastest way to get something on your screen is executing this on your command line: java - jar TarsosLSH.jar this lets LSH run on a random data set. The full reference of the command line application is included below:
Name
TarsosLSH: finds the nearest neighbours in a data set quickly, using LSH.
Synopsis
java - jar TarsosLSH.jar [options] dataset.txt queries.txt
Description
Tries to find nearest neighbours for each vector in the
query file, using Euclidean (L2) distance by default.
Both dataset.txt and queries.txt have a similar format:
an optional identifier for the vector and a list of N
coordinates (which should be doubles).
[Identifier] coord1 coord2 ... coordN
[Identifier] coord1 coord2 ... coordN
For an example data set with two elements and 4 dimensions:
Hans 12 24 18.5 -45.6
Jane 13 19 -12.0 49.8
Options are:
-f cos|l1|l2
Defines the hash family to use:
l1 City block hash family (L1)
l2 Euclidean hash family(L2)
cos Cosine distance hash family
-r radius
Defines the radius in which near neighbours should
be found. Should be a double. By default a reasonable
radius is determined automatically.
-h n_hashes
An integer that determines the number of hashes to
use. By default 4, 32 for the cosine hash family.
-t n_tables
An integer that determines the number of hash tables,
each with n_hashes, to use. By default 4.
-n n_neighbours
Number of neighbours in the neighbourhood, defaults to 3.
-b
Benchmark the settings.
--help
Prints this helpful message.
Examples
Search for nearest neighbours using the l2 hash family with a radius of 500
and utilizing 5 hash tables, each with 3 hashes.
java - jar TarsosLSH.jar -f l2 -r 500 -h 3 -t 5 dataset.txt queries.txt
Source Code Organization
The source tree is divided in three directories:
src contains the source files of the core DSP libraries.
test contains unit tests for some of the DSP functionality.
build contains ANT build files. Either to build Java documentation or runnable JAR-files for the example applications.
Further Reading
This section includes a links to resources used to implement this library.
The LSH-page maintained by Alexandr Andoni contains pointers to good resources:
Locality-Sensitive Hashing Scheme Based on p-Stable Distributions a chapter by Alexandr Andoni, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni which appeared in the book Nearest Neighbor Methods in Learning and Vision: Theory and Practice, by T. Darrell and P. Indyk and G. Shakhnarovich (eds.), MIT Press, 2006.
The DSP library for Taros, aptly named TarsosDSP, now includes an example demonstrating the flanging audio effect. Flanging, essentialy mixing the signal with a varying delay of itself, produces an interesting interference pattern.
The flanging example works on wav-files or on input from microphone. Try it yourself, download Flanging.jar, the executable jar file. Below you can check what flanging sounds like with various parameters.
The source code of the Java implementation can be found on the TarsosDSP github page.
The DSP library for Taros, aptly named TarsosDSP, now includes an example showing how to synthesize cat sounds. The inspration came from this youtube video
To hear what exactly it does, listen to the following audio example.
There is also a command line interface, the following command does
The DSP library for Taros, aptly named TarsosDSP, now includes an example showing how to synthesize pitch estimations. The goal of the example is to show which errors are made by different pitch detectors.
To test the application, download and execute the Resynthesizer.jar file and load an audio file. For the moment only 44.1kHz mono wav is allowed. To hear what exactly it does, compare the following two audio fragments:
There is also a command line interface, the following command does pitch tracking, and follows the envelope of in.wav and immediately plays it on the default audio device. If you want to save the audio, see the command line options. The flute example is provided for your convenience.
java -jar Resynthesizer-latest.jar in.wav
_______ _____ _____ _____
|__ __| | __ \ / ____| __ \
| | __ _ _ __ ___ ___ ___| | | | (___ | |__) |
| |/ _` | '__/ __|/ _ \/ __| | | |\___ \| ___/
| | (_| | | \__ \ (_) \__ \ |__| |____) | |
|_|\__,_|_| |___/\___/|___/_____/|_____/|_|
----------------------------------------------------
Name:
TarsosDSP resynthesizer
----------------------------------------------------
Synopsis:
java -jar CommandLineResynthesizer.jar [--detector DETECTOR] [--output out.wav] [--combined combined.wav] input.wav
----------------------------------------------------
Description:
Extracts pitch and loudnes from audio and resynthesises the audio with that information.
The result is either played back our written in an output file.
There is als an option to combine source and synthezized material
in the left and right channels of a stereo audio file.
input.wav a readable wav file.
--output out.wav a writable file.
--combined combined.wav a writable output file. One channel original, other synthesized.
--detector DETECTOR defaults to FFT_YIN or one of these:
YIN
MPM
FFT_YIN
DYNAMIC_WAVELET
AMDF
The source code of the Java implementation of the synthesizer can be found on the TarsosDSP github page.
The DSP library for Taros, aptly named TarsosDSP, now includes an implementation of a pitch shifting algorithm (as of version 1.4) and a time stretching algorithm. Combined, the two can be used for something like phase vocoding. With a phase vocoder you can load an audio snippet, change the pitch and duration and e.g. create a library of snippets. E.g. by recording one piano key stroke, it is possible to generate two octaves of samples of different lengths, and use those in stead of synthesized samples. The following example application shows exactly that, implemented in the java programming language.
The example application below shows how to pitch shift and time stretch a sample to create a sample library with the TarsosDSP library.
Today marks the reslease of Tarsos 1.0 . The new Tarsos release contains practical transcription features. As can be seen in the screenshot below, a time stretching feature makes it easy to loop a certain audio fragment while it is playing in a slow tempo. The next loop can be played with by pressing the n key, the one before by pressing b.
Since the pitch classes can be found in a song, and there is a feature that lets you play a MIDI keyboard in the tone scale of the song under analysis, transcription of ethnic music is made a lot easier.
The new release of Tarsos can be found in the Tarsos release repository. From now on, nightly releases are uploaded there automatically.
The DSP library for Taros, aptly named TarsosDSP, now includes an implementation of a pitch shifting algorithm (as of version 1.4). The goal of pitch shifting is to change the pitch of a piece of audio without affecting the duration. The algorithm implemented is a combination of resampling and time stretching. Resampling changes the pitch of the audio, but affects the total duration. Consecutively, the duration of the audio is stretched to the original (without affecting pitch) with time stretching. The result is very similar to phase vocoding.
The example application below shows how to pitch shift input from the microphone in real-time, or pitch shift a recorded track with the TarsosDSP library.
To test the application, download and execute the PitchShift.jar file and load an audio file. For the moment only 44.1kHz mono wav is allowed. To get started you can try this piece of audio.
There is also a command line interface, the following command lowers the pitch of in.wav by two semitones.
java -jar in.wav out.wav -200
----------------------------------------------------
_______ _____ _____ _____
|__ __| | __ \ / ____| __ \
| | __ _ _ __ ___ ___ ___| | | | (___ | |__) |
| |/ _` | '__/ __|/ _ \/ __| | | |\___ \| ___/
| | (_| | | \__ \ (_) \__ \ |__| |____) | |
|_|\__,_|_| |___/\___/|___/_____/|_____/|_|
----------------------------------------------------
Name:
TarsosDSP Pitch shifting utility.
----------------------------------------------------
Synopsis:
java -jar PitchShift.jar source.wav target.wav cents
----------------------------------------------------
Description:
Change the play back speed of audio without changing the pitch.
source.wav A readable, mono wav file.
target.wav Target location for the pitch shifted file.
cents Pitch shifting in cents: 100 means one semitone up,
-100 one down, 0 is no change. 1200 is one octave up.
The resampling feature was implemented with libresample4j by Laszlo Systems. libresample4j is a Java port of Dominic Mazzoni’s libresample 0.1.3, which is in turn based on Julius Smith’s Resample 1.7 library.
The 13th International Society for Music Information Retrieval Conference took place in Porto, Portugal, October 8th-12th, 2012. This text contains links to some papers, toolkits, software presented there which are interesting for my research. Basically it contains my personal highlights of the conference. The ISMIR 2012 is described as follows:
The annual Conference of the International Society for Music Information Retrieval (ISMIR) is the world’s leading research forum on processing, searching, organizing and accessing music-related data. The revolution in music distribution and storage brought about by digital technology has fueled tremendous research activities and interests in academia as well as in industry. The ISMIR Conference reflects this rapid development by providing a meeting place for the discussion of MIR-related research, developments, methods, tools and experimental results. Its main goal is to foster multidisciplinary exchange by bringing together researchers and developers, educators and librarians, as well as students and professional users.
Tutorials
I saw an interesting tutorial on Jazz music and a tutorial on source separation. After an introduction, which detailed the experimental basis of the system, a source separator was introduced. The REPET source separator is a relatively simple system that yields reasonable results to split accompaniment from foreground melody.
The ongoing work by Ceril Bohak and Matija Marolt on segmentation of folk music could be very useful to apply on Afican musics. The paper is called Finding Repeating Stanzas in Folk Songs.
What follows is about the Conference on Interdisciplinary Musicology and the 15th international Conference of the Gesellschaft fur Musikfoschung. First this text will give information about our contribution to CIM2012: Revealing and Listening to Scales From the Past; Tone Scale Analysis of Archived Central-African Music Using Computational Means and then a number of highlights of the conference follow. The joint conference took place from the 4th to the 8th of september 2012.
In 2012, CIM will tackle the subject of History. Hosted by the University of Göttingen, whose one time music director Johann Nikolaus Forkel is widely regarded as one of the founders of modern music historiography, CIM12 aims to promote collaborations that provoke and explore new methods and methodologies for establishing, evaluating, preserving and communicating knowledge of music and musical practices of past societies and the factors implicated in both the preservation and transformation of such practices over time.
Revealing and Listening to Scales From the Past; Tone Scale Analysis of Archived Central-African Music Using Computational Means
The work presented by Rytis Ambrazevicius et al. Modal changes in traditional Lithuanian singing: Diachronic aspect has a lot in common with our research, it was interesting to see their approach. Another highlight of the conference was the whole session organized by Klaus-Peter Brenner around Mbira music.
Rainer Polak gave a talk titled ‘Swing, Groove and Metre. Asymmetric Feels, Metric Ambiguity and Metric Transformation in African Musics’. He showed how research about rhythm in jazz research, music theory and empirical musicology ( amongst others) could be bridged and applied to ethnic music.
The overview Eleanore Selfridge-Field gave during her talk Between an Analogue Past and a Digital Future: The Evolving Digital Present was refreshing. She had a really clear view on all the different ways musicology and digital media can benifit from each-other.
From the concert programme I found two especially interesting: the lecture-performance by Margarete Maierhofer-Lischka and Frauke Aulbert of Lotofagos, a piece by Beat Furrer and Burdocks composed and performed by Christian Wolff and a bunch of enthusiastic students.
At this years ICMC Conference, ICMC 2012 we presented a paper describing a way to experiment with tone scales and how to use Tarsos as a compositional tool. What follows are some pointers to the presentation, paper and to other interesting talks that were presented there.
ICMC 2012 was organized in Ljubljana from the 9 to 14 septembre and had a very dense program of talks, posters, presentations, demos and concerts.
Since 1974 the International Computer Music Conference has been the major international forum for the presentation of the full range of outcomes from technical and musical research, both musical and theoretical, related to the use of computers in music. This annual conference regularly travels the globe, with recent conferences in the Americas, Europe and Asia. This year we welcome the conference to Slovenia for the first time.
Sound to Scale to Sound, a Setup for Microtonal Exploration and Composition
@inproceedings{cornelis2012sound_to_scale,
author = {OlmoCornelisandJorenSix},
title = {{Sound to Scale to Sound, a SetupforMicrotonalExplorationandComposition}},
booktitle = {{Proceedings of the 2012InternationalComputerMusicConference,
(ICMC2012)}},
year = {2012},
publisher = {TheInternationalComputerMusicAssociation}
}
Program highlights
What follows are a number of pointers to my personal program highlights.
Verena Thomas presented two very well polished software tools. One to detect patterns in scores, called motifviewer and a tool to search in score databases in a multi-modal way. The Probado tool does score-to-audio alignment and much more.
Gibber is an impressive live-coding environment with an easy syntax. Since it is all done with javascript you can start playing with it immediately. Overtone Another live-coding environment, presented at the conference by Sam Aaron, was equally impressive. It is programmed using the Closure language.
At ICMC there were a number of tools to assist in composition. One of those is The Bach Project, by Andrea Agostini. Togheter with CatART by Diemo Swartz it forms a very expressive platform to work with sound, which was demonstrated by Aaron Einbond and Christopher Trapani in their paper titled Precise Pitch Control In Real Time Corpus-Based Concatenative Synthesis. Diemo Swartz presented work on Audio Mosaicing, it can be seen as a follow-up to AuidioGuild by Ben Hackbarth.
I also got to know the work by Thomas Grill, on his website a nice piece of software can be found a Python implementation of the Non Stationary Gabor Transform. Another software system I got to know is the functional signal processing programming language FAUST
My personal highlights of the concert programme include the works by Johannes Kreidler, Aura Pon, Daniel Mayer, Alexander Schubert and the remarkable performance by Dexter Ford. The concept behind Soundlog by Johannes Kretz was also interesting.
At the 2012 AAWM conference we presented a way to explore tone scales in the music of Central Africa. Since the audience consisted of (ethno)musicologists, the main focus of the presentation was on the applicication part, the technical aspects were only briefly mentioned.
Today a new version of the TarsosDSP library was released. TarsosDSP is a small library to do audio processing in Java. It features two new pitch detectors. An AMDF pitch detector, contributed by Eder Souza of Brazil and a faster implementation of YIN kindly provided by Matthias Mauch of Queen Mary University, London.
Thursday the 3th of May I gave a guest lecture titled ‘Ethnic Music Analysis: Challenges & Opportunities’ it featured Tarsos as a Case Study. The goal was to identify the difficulties when dealing with ethnic music and to show a possible approach, the approach implemented by Tarsos.
The invitation to give the guest lecture came from Michael Cuthbert who is one of the driving forces behind music21. The audience was a small group of double majors in both musicology and computer science: the ideal profile to gather useful feedback.
After about a year of development and several revisions TarsosDSP has enough features and is stable enough to slap the 1.0 tag onto it. A ‘read me’, manual, API documentation, source and binaries can be found on the TarsosDSP release directory. The source is present in the
What follows below is the information that can be found in the read me file:
TarsosDSP is a collection of classes to do simple audio processing. It features an implementation of a percussion onset detector and two pitch detection algorithms: Yin and the Mcleod Pitch method. Also included is a Goertzel DTMF decoding algorithm and a time stretch algorithm (WSOLA).
Its aim is to provide a simple interface to some audio (signal) processing algorithms implemented in pure JAVA. Some TarsosDSP example applications are available.
The following example filters a band of frequencies of an input file testFile. It keeps the frequencies form startFrequency to stopFrequency.
git clone https://JorenSix@github.com/JorenSix/TarsosDSP.git
cd TarsosDSP/build
ant tarsos_dsp_library #Builds the core TarsosDSP library
ant build_examples #Builds all the TarsosDSP examples
ant javadoc #Creates the documentation in TarsosDSP/doc
When everything runs correctly you should be able to run all example applications and have the latest version of the TarsosDSP library for inclusion in your projects. Also the Javadoc documentation for the API should be available in TarsosDSP/doc. Drop me a line if you use TarsosDSP in your project. Always nice to hear how this software is used.
Source Code Organization and Examples of TarsosDSP
The source tree is divided in three directories:
src contains the source files of the core DSP libraries.
test contains unit tests for some of the DSP functionality.
build contains ANT build files. Either to build Java documentation or runnable JAR-files for the example applications.
examples contains a couple of example applications with a Java Swing user interface:
SoundDetector show how you loudness calculations can be done. When input sound is over a defined limit an event is fired.
PitchDetector this demo application shows real-time pitch detection. When pitch is detected the hertz value is printed together with a probability.
PercussionDetector show the percussion (onset) dectection. Clapping your hands causes an event. This demo application also shows the influence of the two parameters on the algorithm.
UtterAsterisk a game with the goal to sing as close to a melody a possible. Technically it shows real-time pitch detection with YIN or MPM.
Spectrogram in Java shows a spectrogram and detected pitch, either live or from an audio file. It is interesting to see which frequencies are picked as fundamentals.
Goertzel DTMF decoding an implementation of the Goertzel Algorithm. A fancy user interface shows what goes on under the hood.
Audio Time Stretching – Implementation in Pure Java Using WSOLA an implementation of a time stretching algorithm. WSOLA makes it possible to change the play back speed of audio without changing the pitch. The play back speed can be changed at any moment, even when there is audio playing.
The Dan Ellis implementation is nicely documented here: Robust Landmark-Based Audio Fingerprinting . To download, get info about and decode mp3’s some external binaries are needed:
#install octave if needed
sudo apt-get install octave3.2
#Install the required dependencies for the script
sudo apt-get install mp3info curl
#mpg123 is not present as a package, install from source:
wget http://www.mpg123.de/download/mpg123-1.13.5.tar.bz2
tar xvvf mpg123-1.13.5.tar.bz2
cd mpg123-1.13.5/
./configure
make
sudo make install
In mp3read.m the following code was changed (line 111 and 112):
mpg123 = 'mpg123'; % was fullfile(path,['mpg123.',ext]);
mp3info = 'mp3info'; % was fullfile(path,['mp3info.',ext]);
Then, the demo program runs flawlessly when executing octave -q demo_fingerprint.m.
Running the demo with the original code with GNU Octave, version 3.2.3 takes 152 seconds on a PC with a Q9650 @ 3GHz processor. A small tweak can make it run almost 8 times faster. When working with larger data sets (10k audio files) this makes a big difference. I do not know why but storing a hash in the large hash table was really slow (0.5s per hash, with 900 hashes per song…). Caching the hashes and adding them all at once makes it faster (at least in Octave, YMMV). The optimized version of record_hashes.m can be found attached. With this alteration the same demo ran in 20s. When caching the data locally the difference is 11.5s to 141s or 12 times faster. The code with all the changes can be found here: Robust Landmark-Based Audio Fingerprinting – optimized for Octave 3.2. Please note again that the implementation is done by Dan Ellis (2009) ( available on Robust Landmark-Based Audio Fingerprinting) and I did only some small tweaks.
The 29th of February 2012 there was a symposium on Music Information Retreival in Utrecht. It was organized on the occasion of Bas de Haas’ PhD defense. The title of the study day was Harmony and variation in music information retrieval.
During the talk by Xavier Serra rasikas.org was mentioned a forum with discussions about Carnatic Music. Since I could find a couple of discussions about pitch use on that forum I plugged Tarsos there to see if I could gather some feedback.
The DSP library for Taros, aptly named TarsosDSP, now includes an implementation of an audio echo effect. An echo effect is very simple to implement digitally and can serve as a good example of a DSP operation.
The implementation of the effect can be seen below. As can be seen, to achieve an echo one simply needs to mix the current sample i with a delayed sample present in echoBuffer with a certain decay factor. The length of the buffer and the decay are the defining parameters for the sound of the echo. To fill the echo buffer the current sample is stored (line 4). Looping through the echo buffer is done by incrementing the position pointer and resetting it at the correct time (lines 6-9).
//output is the input added with the decayed echo
audioFloatBuffer[i] = audioFloatBuffer[i] + echoBuffer[position] * decay;
//store the sample in the buffer;
echoBuffer[position] = audioFloatBuffer[i];
//increment the echo buffer position
position++;
//loop in the echo bufferif(position == echoBuffer.length)
position = 0;
To test the application, download and execute the Delay.jar file and start singing in a microphone.
The source code of the Java implementation can be found on the TarsosDSP github page.
This is post presents a better version of the spectrogram implementation. Now it is included as an example in TarsosDSP, a small java audio processing library. The application show a live spectrogram, calculated using an FFT and the detected fundamental frequency (in red).
To test the application, download and execute the Spectrogram.jar file and start singing in a microphone.
There is also a command line interface, the following command shows the spectrum for in.wav:
java -jar Spectrogram.jar in.wav
The source code of the Java implementation can be found on the TarsosDSP github page.
To test the application, download and execute the WSOLA jar file and load an audio file. For the moment only 44.1kHz mono wav is allowed. To get started you can try this piece of audio.
There is also a command line interface, the following command doubles the speed of in.wav:
java -jar TimeStretch.jar in.wav out.wav 2.0
_______ _____ _____ _____
|__ __| | __ \ / ____| __ \
| | __ _ _ __ ___ ___ ___| | | | (___ | |__) |
| |/ _` | '__/ __|/ _ \/ __| | | |\___ \| ___/
| | (_| | | \__ \ (_) \__ \ |__| |____) | |
|_|\__,_|_| |___/\___/|___/_____/|_____/|_|
----------------------------------------------------
Name:
TarsosDSP Time stretch utility.
----------------------------------------------------
Synopsis:
java -jar TimeStretch.jar source.wav target.wav factor
----------------------------------------------------
Description:
Change the play back speed of audio without changing the pitch.
source.wav A readable, mono wav file.
target.wav Target location for the time stretched file.
factor Time stretching factor: 2.0 means double the length, 0.5 half. 1.0 is no change.
The source code of the Java implementation of WSOLA can be found on the TarsosDSP github page.
Tarsos contains a couple of useful command line applications. They can be used to execute common tasks on lots of files. Dowload Tarsos and call the applications using the following format:
The first part java -jar tarsos.jar tells the Java Runtime to start the correct application. The first argument for Tarsos defines the command line application to execute. Depending on the command, required arguments and options can follow.
To get a list of available commands, type java -jar tarsos.jar -h. If you want more information about a command type java -jar tarsos.jar command -h
Detect Pitch
Detects pitch for one or more input audio files using a pitch detector. If a directory is given it traverses the directory recursively. It writes CSV data to standard out with five columns. The first is the start of the analyzed window (seconds), the second the estimated pitch, the third the saillence of the pitch. The name of the algorithm follows and the last column shows the original filename.
For the Folk Music Analyisis (FMA) 2012 conference we (Olmo Cornelis and myself), wrote a paper presenting a new acoustic fingerprint scheme based on pitch class histograms.
The aim of acoustic fingerprinting is to generate a small representation of an audio signal that can be used to identify or recognize similar audio samples in a large audio set. A robust fingerprint generates similar fingerprints for perceptually similar audio signals. A piece of music with a bit of noise added should generate an almost identical fingerprint as the original. The use cases for audio fingerprinting or acoustic fingerprinting are myriad: detection of duplicates, identifying songs, recognizing copyrighted material,…
Using a pitch class histogram as a fingerprint seems like a good idea: it is unique for a song and it is reasonably robust to changes of the underlying audio (length, tempo, pitch, noise). The idea has probably been found a couple of times independently, but there is also a reference to it in the literature, by Tzanetakis, 2003: Pitch Histograms in Audio and Symbolic Music Information Retrieval:
Although mainly designed for genre classification it is possible that features derived from Pitch Histograms might also be applicable to the problem of content-based audio identification or audio fingerprinting (for an example of such a system see (Allamanche et al., 2001)). We are planning to explore this possibility in the future.
Unfortunately they never, as far as I know, did explore this possibility, and I also do not know if anybody else did. I found it worthwhile to implement a fingerprinting scheme on top of the Tarsos software foundation. Most elements are already available in the Tarsos API: a way to detect pitch, construct a pitch class histogram, correlate pitch class histograms with a pitch shift,… I created a GUI application which is presented here. It is, probably, acoustic / audio fingerprinting system based on pitch class histograms.
It works using drag and drop and the idea is to find a needle (an audio file) in a hay stack (a large amount of audio files). For every audio file in the haystack and for the needle pitch is detected using an optimized, for speed, MPM implementation. A pitch class histogram is created for each file, the histogram for the needle is compared with each histogram in the hay stack and, hopefully, the needle is found in the hay stack.
This document describes how pitch can be represented using various units. More specifically it documents how a software program to analyse pitch in music, Tarsos, represents pitch. This document contains definitions of and remarks on different pitch and pitch interval representations. For good measure we need a definition of pitch, here the definition from [McLeod 2009] is used: The pitch frequency is the frequency of a pure sine wave which has the same perceived sound as the sound of interest. For remarks and examples of cases where the pitch frequency does not coincide with the fundamental frequency of the signal, also see [McLeod 2009] . In this text pitch, pitch interval and pitch ratio are briefly discussed.
The DSP library of Tarsos, aptly named TarsosDSP, contains an implementation of a game that bares some resemblance to SingStar. It is called UtterAsterisk. It is meant to be a technical demonstration showing real-time pitch detection in pure java using a YIN -implementation.
The goal of the thesis was to develop an automatic transcription system for monophonic music. You can download the latest version of jAM – Java Automatic Music Transcription.
WORKSHOP – Muziek (ont)luisteren op de computer
Is het mogelijk om piano te spelen op een tafel? Kan een computer luisteren naar muziek en er van genieten? Wat is muziek eigenlijk, en hoe werkt geluid?
Tijdens deze workshop worden de voorgaande vragen beantwoord met enkele computerprogramma’s!
Concreet worden enkele componenten van geluid (en bij uitbreiding, muziek) gedemonstreerd met computerprogrammaatjes gemaakt in het conservatorium:
Geluidssterkte: een decibel-meter met een bepaalde drempelwaarde. Probeer zo luid mogelijk te doen en zie hoe moeilijk het is om, eens een bepaald niveau bereikt is, in decibel te stijgen.
Toonhoogte: een klein spelletje om toonhoogte aan te tonen. Probeer zo juist mogelijk te zingen of te fluiten en vergelijk je score.
Percussie: dit programma reageert op handgeklap. Hoe kan je het onderscheid maken tussen bijvoorbeeld een fluittoon en handgeklap?
A small part of Tarsos has been turned into a audio fingerprinting application. The idea of audio fingerprinting is to create a condensed representation of an audio file. A perceptually similar audio file should generate similar fingerprints. To test how robust a fingerprinting technique is, a data set with audio files that are alike in some way is practical.
SoX – Sound eXchange is a command line utility for sound processing. It can apply audio effects to a sound. Using these effects and a set of unmodified songs an audio fingerprinting data set can be created. To generate such a data set SoX can be used to:
Trim the first x seconds of a file
Speed-up or slow-down the audio
Change the pitch of a file without modifying the tempo
#Trim the first 10 seconds
sox input.wav output.wav trim 10#speed-up of 10%
sox input.wav output.wav speed 1.10#change the pitch upwards 100 cents (one semitone)#without changing the tempo
sox input.wav output.wav pitch 100#generate white noise with the length of input.wav
sox input.wav noise.wav synth whitenoise
#mix the white noise with the input to generate noisy output#-v defines how loud the white noise is
sox -m input.wav -v 0.1 noise.wav output.wav
#reverse the audio
sox input.wav output.wav reverse
A ruby script to generate a lot of these files can be found attached.
The following video shows Bobby McFerrin demonstrating the power of the pentatonic scale. It is a fascinating demonstration of how quickly a (western) audience of the World Science Festival 2009 adapts to an unusual tone scale:
With Tarsos the scale used in the example can be found. This is the result of a quick analysis: it becomes clear that this, in fact, a pentatonic scale with an unequal octave division. A perfect fifth is present between 255 and 753 cents:
Friday the second of December I presented a talk about software for music analysis. The aim was to make clear which type of research topics can benefit from measurements by software for music analysis. Different types of digital music representations and examples of software packages were explained.
Following presentation was used during the talk. (ppt, odp):
Sonic Visualizer: As its name suggests Sonic Visualizer contains a lot different visualisations for audio. It can be used for analysis (pitch,beat,chroma,…) with VAMP-plugins. To quote “The aim of Sonic Visualiser is to be the first program you reach for when want to study a musical recording rather than simply listen to it”. It is the swiss army knife of audio analysis.
BeatRoot is designed specifically for one goal: beat tracking. It can be used for e.g. comparing tempi of different performances of the same piece or to track tempo deviation within one piece.
Tartini is capable to do real-time pitch analysis of sound. You can e.g. play into a microphone with a violin and see the harmonics you produce and adapt you playing style based on visual feedback. It also contains a pitch deviation measuring apparatus to analyse vibrato.
Tarsos is software for tone scale analysis. It is useful to extract tone scales from audio. Different tuning systems can be seen, extracted and compared. It also contains the ability to play along with the original song with a tuned midi keyboard .
To show the different digital representations of music one example (Liebestraum 3 by Liszt) was used in different formats:
The aim of acoustic fingerprinting is to generate a small representation of an audio signal that can be used to identify or recognize similar audio samples in a large audio set. A robust fingerprint generates similar fingerprints for perceptually similar audio signals. A piece of music with a bit of noise added should generate an almost identical fingerprint as the original. The use cases for audio fingerprinting or acoustic fingerprinting are myriad: detection of duplicates, identifying songs, recognizing copyrighted material,…
Using a pitch class histogram as a fingerprint seems like a good idea: it is unique for a song and it is reasonably robust to changes of the underlying audio (length, tempo, pitch, noise). The idea has probably been found a couple of times independently, but there is also a reference to it in the literature, by Tzanetakis, 2003: Pitch Histograms in Audio and Symbolic Music Information Retrieval:
Although mainly designed for genre classification it is possible that features derived from Pitch Histograms might also be applicable to the problem of content-based audio identification or audio fingerprinting (for an example of such a system see (Allamanche et al., 2001)). We are planning to explore this possibility in the future.
Unfortunately they never, as far as I know, did explore this possibility, and I also do not know if anybody else did. I found it worthwhile to implement a fingerprinting scheme on top of the Tarsos software foundation. Most elements are already available in the Tarsos API: a way to detect pitch, construct a pitch class histogram, correlate pitch class histograms with a pitch shift,… I created a GUI application which is presented here. It is, probably, the first open source acoustic / audio fingerprinting system based on pitch class histograms.
It works using drag and drop and the idea is to find a needle (an audio file) in a hay stack (a large amount of audio files). For every audio file in the haystack and for the needle pitch is detected using an optimized, for speed, Yin implementation. A pitch class histogram is created for each file, the histogram for the needle is compared with each histogram in the hay stack and, hopefully, the needle is found in the hay stack.
Unfortunately I do not have time for rigorous testing (by building a large acoustic fingerprinting data set, or an other decent test bench) but the idea seems to work. With the following modifications, done with audacity effects the needle was still found a hay stack of 836 files :
A 10% speedup
15 and 30 seconds removed form the needle (a song of 4 minutes 12 seconds)
White noise added
Reversed the audio (This is, I believe, a rather unique property of this fingerprinting technique)
GSM reencoded
The following modifications failed to identify the correct song:
A one semitone pitch shift
A two semitone pitch shift
60 seconds removed from the needle
The original was also found. No failure analysis was done. The hay stack consists of about 100 hours of western pop, the needle is also a western pop song. If somebody wants to pick up this work or has an acoustic fingerprinting data set or drop me a line at .
The 21st of October a demo of PeachNote Piano was given at the ISMIR 2011 conference. The demo raised some interest.
The extended abstract about PeachNote Piano can be found on the ISMIR 2011 schedule.
A previous post about PeachNote Piano has more technical details together with a video showing the core functionality (quasi-instantaneous USB-BlueTooth-MIDI communication).
The 17th of Octobre 2011 Tarsos was presented at the Study Day: Tuning and Temperament which was held at the Institue of Music Research in Londen. The study day was organised by Dan Tidhar. A short description of the aim of the study day:
This is an interdisciplinary study day, bringing together musicologists, harpsichord specialists, and digital music specialists, with the aim of exploring the different angles these fields provide on the subject, and how these can be fruitfully interconnected.
We offer an optional introduction to temperament for non specialists, to equip all potential listeners with the basic concepts and terminology used throughout the day.
The live demo we gave went well and we got a lot of positive, interesting feedback. The presentation about Tarsos is available here.
It was the first time in the history of ISMIR that there was a session with oral presentations about Non-Western Music. We were pleased to be part of this.
Op dinsdag vier oktober 2011 werd een les gegeven over bruikbare software voor muziekanalyse. Het doel was om duidelijk te maken welk type onderzoeksvragen van bachelor/masterproeven baat kunnen hebben bij objectieve metingen met software voor klankanalyse. Ook de manier waarop werd besproken: soorten digitale representaties van muziek met voorbeelden van softwaretoepassingen werden behandeld.
Voor de les werden volgende slides gebruikt (ppt, odp):
De behandelde software voor klank als signaal werd al eerder besproken:
Sonic Visualizer: As its name suggests Sonic Visualizer contains a lot different visualisations for audio. It can be used for analysis (pitch,beat,chroma,…) with VAMP-plugins. To quote “The aim of Sonic Visualiser is to be the first program you reach for when want to study a musical recording rather than simply listen to it”. It is the swiss army knife of audio analysis.
BeatRoot is designed specifically for one goal: beat tracking. It can be used for e.g. comparing tempi of different performances of the same piece or to track tempo deviation within one piece.
Tartini is capable to do real-time pitch analysis of sound. You can e.g. play into a microphone with a violin and see the harmonics you produce and adapt you playing style based on visual feedback. It also contains a pitch deviation measuring apparatus to analyse vibrato.
Tarsos is software for tone scale analysis. It is useful to extract tone scales from audio. Different tuning systems can be seen, extracted and compared. It also contains the ability to play along with the original song with a tuned midi keyboard .
music21 from their website: “music21 is a set of tools for helping scholars and other active listeners answer questions about music quickly and simply. If you’ve ever asked yourself a question like, “I wonder how often Bach does that” or “I wish I knew which band was the first to use these chords in this order,” or “I’ll bet we’d know more about Renaissance counterpoint (or Indian ragas or post-tonal pitch structures or the form of minuets) if I could write a program to automatically write more of them,” then music21 can help you with your work.”
Om aan te duiden welke digitale representaties welke informatie bevatten werd een stuk van Franz Liszt in verschillende formaten gebruikt:
The DSP library of Tarsos, aptly named TarsosDSP, now contains an implementation of the Goertzel Algorithm. It is implemented using pure Java.
The Goertzel algorithm can be used to detect if one or more predefined frequencies are present in a signal and it does this very efficiently. One of the classic applications of the Goertzel algorithm is decoding the tones generated on by touch tone telephones. These use DTMF-signaling.
Playing music instruments can bring a lot of joy and satisfaction, but not all apsects of music practice are always enjoyable. In this contribution we are addressing two such sometimes unwelcome aspects: the solitude of practicing and the “dumbness” of instruments.
The process of practicing and mastering of music instruments often takes place behind closed doors. A student of piano spends most of her time alone with the piano. Sounds of her playing get lost, and she can’t always get feedback from friends, teachers, or, most importantly, random Internet users. Analysing her practicing sessions is also not easy. The technical possibility to record herself and put the recordings online is there, but the needed effort is relatively high, and so one does it only occasionally, if at all.
Instruments themselves usually do not exhibit any signs of intelligence. They are practically mechanic devices, even when implemented digitally. Usually they react only to direct actions of a player, and the player is solely responsible for the music coming out of the insturment and its quality. There is no middle ground between passive listening to music recordings and active music making for someone who is alone with an instrument.
We have built a prototype of a system that strives to offer a practical solution to the above problems for digital pianos. From ground up, we have built a system which is capable of transmitting MIDI data from a MIDI instrument to a web service and back, exposing it in real-time to the world and optionally enriching it.
A previous post about PeachNote Piano has more technical details together with a video showing the core functionality (quasi-instantaneous USB-BlueTooth-MIDI communication). Some photos can be found below.
While working on a Latex document with several collaborators some problems arise:
Who has the latest version of the TeX-files?
Which LaTeX distributions are in use (MiKTeX, LiveTex,…)
Are all LaTeX packages correctly installed on each computer?
Why is the bibliography, generated with BiBTeX, not included or incomplete?
How does the final PDF look like when it is build by one of the collaborators, with a different LaTeX distribution?
Especially installing and maintaining LaTeX distributions on different platforms (Mac OS X, Linux, Windows) in combination with a lot of LaTeX packages can be challenging. This blog post presents a way to deal with these problems.
Solution
The solution proposed here uses a build-server. The server is responsible for compiling the LaTeX source files and creating a PDF-file when the source files are modified. The source files should be available on the server should be in sync with the latest versions of the collaborators. Also the new PDF-file should be distributed. The syncing and distribution of files is done using a Dropbox install. Each author installs a Dropbox share (available on all platforms) which is also installed on the server. When an author modifies a file, this change is propagated to the server, which, in turn, builds a PDF and sends the resulting file back. This has the following advantages:
Everyone always has the latest version of files;
Only one LaTeX install needs to be maintained (on the server);
The PDF is the same for each collaborator;
You can modify files on every platform with Dropbox support (Linux, Mac OS X, Windows) and even smartphones;
Compiling a large LaTeX file can be computationally intensive, a good task for a potentially beefy server.
Implementation
The implementation of this is done with a couple of bash-scripts running on Ubuntu Linux. LaTeX compilation is handeled by the LiveTeX distribution. The first script compile.bash handles compilation in multiple stages: the cross referencing and BiBTeX bibliography need a couple of runs to get everything right.
#!/bin/bash#first iteration: generate aux file
pdflatex -interaction=nonstopmode --src-specials article.tex
#run bibtex on the aux file
bibtex article.aux
#second iteration: include bibliography
pdflatex -interaction=nonstopmode --src-specials article.tex
#third iteration: fix references
pdflatex -interaction=nonstopmode --src-specials article.tex
#remove unused files
rm article.aux article.bbl article.blg article.out
The second script watcher.bash is more interesting. It watches the Dropbox directory for changes (only in .tex-files) using the efficient inotify library. If a modification is detected the compile script (above) is executed.
#!/bin/bash
directory=/home/user/Dropbox/article/
#recursivly watch te directorywhile inotifywait -r $directory; do#find all files changed the last minute that match tex#if there are matches then do something...if find $directory -mmin -1 | grep tex; then#tex files changed => recompile
echo "Tex file changed... compiling"/bin/bash $directory/compile.bash
#sleep a minute to prevent recompilation loop
sleep 60
fi
done
To summarize: a user-friendly way of collaboration on LaTeX documents was presented. Some server side configuration needs to be done but the clients only need Dropbox and a simple text editor and can start working togheter.
The Pidato experiment demonstrates a rather straightforward method to handle vibrato on a digital piano. It solves the age-old problem on what to do with the enigmatic “vibrato” instructions on some piano solo scores of Franz Liszt. The figure on the right is an exerpt of sonetto 104 del Petrarca.
Since there is no way to perform vibrato on an analogue piano there are all kinds of different interpretations. Interpretations of the ‘vibrato’ instruction include: vibrating the pedal, vibrating the key, simply ignoring it, a vibrato like wiggling with a psychological sounding effect, … A pianist specialized in 19th century music, explains his embodied use of vibrato in a youtube video: Brian Ganz on piano vibrato. Those solutions all seem a bit halfhearted, so I created an alternative approach which resulted in the Pidato experiment.
Pidato is a portmanteau of piano and vibrato, the d, a and o hint to the use of an Arduino. Pidato is also Indonesian for speech, expression. To get a feel of what it actually does I created the video below. Please note that this is a technical demonstration, not an artistic performance… in any way.
Vid: The Pidato experiment – Vibrato on a Digital Piano using an Arduino.
The way it works is by translating movement (accelerometer data) to MIDI messages. The hardware consists of an Arduino, MIDI-ports and a three axis accelerometer. The MIDI-ports are provided by this MIDI IN & OUT Arduino shield. The accelerometer is a MMA7260Q from Sparkfun. Attaching the MMA7260Q and the arduino is done by following the instructions here. One change was made: by attaching the 3.3V output to AREF and executing analogReference(EXTERNAL); fluctuations in power supply cease to have an influence on accelerometer data readings. It is represented by the purple wire in the diagram below.
The software should know when a vibrato like movement is made and how to translate such movement to MIDI messages. The software therefore contains a periodicity estimator and frequency detector to detect how periodic a movement is and how fast the movement is repeated. This was done with the YIN algorithm (more commonly used in audio signal analysis). A periodicity threshold was determined experimentally so the system does not yield false positives when playing the piano in the usual way. Another interesting bit of code is the interrupt setup that samples the accelerometer at a fixed sample rate and sends MIDI messages, also at a fixed rate.
MIDI messaging is done over a serial connection. From the Arduino sending a MIDI message is as simple as calling Serial.print with the correct data. For the task at hand (sending vibrato) Pitch Bend messages were used. The standard Arduino UNO firmware is replaced with Arduino MIDI firmware. This makes the Arduino appear as a standard MIDI device when connected to a computer, which makes interfacing with it practical.
The YIN algorithm is encapsulated in a reusable Arduino library and can be used to detect periodicity and frequency for any signal. This guy used his implementation to create a chromatic tuner. The source code for both the Yin Arduino library and Pidato experiment can be found on github or here.
The Pidato experiment was done with the help the friendly hackers at Hackerspace Ghent.
Tarsos can be used to render MIDI files to audio (WAV) files using arbitrary tone scales. This functionallity can be used to (automatically) verify tone scale extraction from audio files. Since I could not find a dataset with audio and corresponding tone scales creating one using MIDI seemed a good idea.
MIDI files can be found in spades (for example on piano-midi.de or kunstderfuge.com), tone scales on the other hand are harder to find. Luckily there is one massive source, the Scala Tone Scale Archive: A large collection of over 3700 tone scales.
Using Scala tone scale files and a midi files a Tone Scale – Audio dataset can be generated. The quality of the audio depends on the (software) synthesizer and the SoundFont used. Tarsos currently uses the Gervill synthesizer. Gervill is a pure Java software synthesizer with support for 24bit SoundFonts and the MIDI tuning standard.
How To Render MIDI Using Arbitrary Tone Scales with Tarsos
A recent version of the JRE needs to be installed on your system if you want to use Tarsos. Tarsos itself can be downloaded in the form of the MIDI and Scala to Wav – JAR Package.
To test the program you can use a MIDI file and a Scala file and drag and drop those on the graphical interface.
The result should sound like this:
To summarize: by rendering audio with MIDI and Scala tone scale files a dataset with tone scale – audio information can be generated and tone scale extraction algorithms can be tested on the fly.
This is about PeachNote Piano, a project only tangentially related to Tarsos. PeachNote Piano aims to capture as many piano practice sessions as possible and offer useful services using this data. The system does this by capturing and redirecting MIDI events on a Bluetooth enabled smartphone. It is done together with Vladimir Viro and builds on the existing PeachNote infrastructure.
The schema – right – shows the components of the PeachNote Piano system. At the bottom you have a MIDI keyboard connected to the MIDI-Bluetooth-bridge. A smartphone (middle left) receives these MIDI events via Bluetooth and controls the communication to the server (top left). An alternative path goes through a standard computer (top right).
The Arduino based Bluetooth to MIDI bridge is an improvement on the work by Peter Brinkmann. The video below shows communication between USB-MIDI, Bluetooth MIDI and MIDI IN/OUT ports.
As an example application of the PeachNote Piano system we implemented a “Continue a Melody” service which works as follows: a user plays something on a keyboard, maybe just a few notes, and pauses for a few seconds. In the meantime, the server searches through a large database of MIDI piano recordings, finds the longest fuzzy match for the user’s most recent input, and, after a short silence on the users part, starts streaming the continuation of the best matched performance from the database to the user. This mechanism, in fact, is way of browsing a music collection. Users may play a known leitmotiv or just improvise something, and the system continues playing a high quality recording, “replying” to the musical proposition of the user.
More technical details
The melody matching is done on the server, which is implemented in Javascript in the Node.js framework. The whole dataset (about 350 hours of piano recordings) resides in memory in two representations: as a sequence of pitches, and as a sequence of “densities” at the corresponding places of the pitch sequence dataset. This second array is used to store the rough tempo information (number of notes per second) absent in the pitch sequence data.
By combining the two search criteria we can achieve reasonable approximation of the tempo-aware search without its computational complexity.
The implementation of the hardware is based on the open-source electronic prototyping platform Arduino. Optocoupled MIDI ports (IN/OUT) and the BlueSMiRF Bluetooth module were attached to the main board, as can be seen in the middle left block of the schema. The BlueTooth module is configured to use the Serial Port Profile (SPP) which emulates RS-232. The software on the Arduino manages bi-directional, low latency message passing between three serial ports: USB (through an FTDI chip), BlueTooth and the hardware MIDI-IN and OUT port.
The standard Arduino firmware has been replaced with firmware that implements the “Universal Serial Bus Device Class Definition for MIDI Devices”: when attached to a computer via USB, the Arduino shows up as a standard MIDI device, which makes it compatible with all available MIDI software. The software client currently works on the Android smartphone platform. It is represented using the middle right block in the schema. The client can send and receive MIDI events over its Bluetooth port. Pairing, connecting and communicating with the device is done using the Amarino software library. The client communicates with the Peachnote Piano server using TCP sockets implemented on the Dalvik Java runtime.
This article describes how to do makam recognition with a script that uses the Tarsos API.
The task we want to do is to find the tone scales most similar to the one used in recorded music. To complete this task you need a small set of theoretical scales and a large set of music, each brought in one of the scales. To make it more concrete, an example of Turkish classical music is used.
In an article by Bozkurt pitch histograms are used for – amongst other tasks – makam recognition. A maqam defines rules for a composition or performance of classical Turkish music. It specifies melodic shapes and pitch intervals, the scale. The task is to identify which of nine makams is used in a specific song. A simplified, generalized implementation of this task is shown here. In our implementation there is no tonic detection step. Also here we use only theoretical descriptions of the tone scales as a template and do not construct a template using the audio itself, as is done by Bozkurt. Ioannidis Leonidas wrote an interesting master thesis about makam recognition. Since no knowledge of the music itself is used the approach is generally applicable.
The following is an implementation in Scala a general purpose programming language that is interoperable with Jave . The first step is to write the Scala header. This is just some boilerplate code to be able to run the script from the command line – it assumes a UNIX-like environment and tarsos.jar in the same directory:
The second step constructs the templates the capability of Tarsos to create
theoretical tone scale templates using Gaussian kernels is used, line 8. See the attached images for some examples.
val makams = List( "hicaz","huseyni","huzzam","kurdili_hicazar",
"nihavend","rast","saba","segah","ussak")
var theoreticKDEs = Map[java.lang.String,KernelDensityEstimate]()
makams.foreach{ makam =>
val scalaFile = makam + ".scl"
val scalaObject = new ScalaFile(scalaFile);
val kde = HistogramFactory.createPichClassKDE(scalaObject,35)
kde.normalize
theoreticKDEs = theoreticKDEs + (makam -> kde)
}
The third and last step is matching. First a list of audio
files is created by recursively iterating a directory and matching each file to
a regular expression. Next, starting from line 4, each audio file is processed.
The internal implementation of the YIN pitch detection
algorithm is used on the audio file and a pitch class histogram is created
(line 6,7). On line 10 normalization of the histogram is done, to
make the correlation calculation meaningful. Line 11 until 15 compare the
created histogram from the audio file with the templates calculated beforehand.
The results are stored, ordered and eventually printed on line 19.
val directory = "/home/joren/turkish_makams/"
val audio_pattern = ".*.(mp3|wav|ogg|flac)"
val audioFiles = FileUtils.glob(directory,audio_pattern,true).toList
audioFiles.foreach{ file =>
val audioFile = new AudioFile(file)
val detectorYin = PitchDetectionMode.TARSOS_YIN.getPitchDetector(audioFile)
val annotations = detectorYin.executePitchDetection()
val actualKDE = HistogramFactory.createPichClassKDE(annotations,15);
actualKDE.normalize
var resultList = List[Tuple2[java.lang.String,Double]]()
for ((name, theoreticKDE) <- theoreticKDEs){
val shift = actualKDE.shiftForOptimalCorrelation(theoreticKDE)
val currentCorrelation = actualKDE.correlation(theoreticKDE,shift)
resultList = (name -> currentCorrelation) :: resultList
}
//order by correlation
resultList = resultList.sortBy{_._2}.reverse
Console.println(file + " is brought in tone scale " + resultList(0)._1)
}
An oral presentation about Tarsos is going to take place Tuesday, the 25 of October during the afternoon, as can be seen on the ISMIR preliminary program schedule.
If you want to cite our work, please use the following data:
@inproceedings{six2011tarsos,
author = {JorenSixandOlmoCornelis},
title = {Tarsos - a Platform to ExplorePitchScalesinNon-WesternandWesternMusic},
booktitle = {Proceedings of the 12th InternationalSocietyforMusicInformationRetrievalConference,
ISMIR2011},
year = {2011},
publisher = {InternationalSocietyforMusicInformationRetrieval}
}
Tarsos, a software package to analyse pitch organization in music, contains a new output modality. It is now possible to export a pitch class histogram and a pitch class interval matrix to latex from within Tarsos. This makes documenting tone scales more efficient.
Tarsos, a software package to analyse pitch organization in music, contains a new output modality. Now it is possible to export resynthesized pitch annotations, detected by a pitch detection algorithm and compare those with the original sound. This can be interesting to see which errors a pitch detection algorithm makes.
Below you can listen to an example of synthesized pitch detection results compared with the original flute piece. The file starts with only the original flute sound (on the right channel) and gradually changes so only the synthesized annotations (on the left channel) can be heard.
The 25th of May 2011 Tarsos was present at the IPEM open house.
IPEM (Institute for Psychoacoustics and Electronic Music) is the research center of the Department of Musicology, which is part of the Department of Art, Music and Theater Studies of Ghent University. IPEM provides a scientific basis for the cultural and creative sector, especially for music and performance arts, and does pioneering research work on the relationship between music body movement and new technologies. The institute consists of an interdisciplinary team but also welcomes visiting researchers from all over the world. One of its aims is also to actively try and validate research results during public events and by means of user studies.
This article describes how to make sun-java6 play nice with the PulseAudio sound sytem on Ubuntu with an x64 processor architecture. With some changes the method should also work with other operating systems and other platforms.
The default way sun-java6 operates with respect to sound on Ubuntu is, well unrespectfull. When playing audio it claims an audio device, which then can not be used any more by other applications trying to access the same device. This is far from ideal. Also changing audio interfaces (by e.g. plugging in a USB audio interface) goes wrong most of the time.
These problems are addressed by PulseAudio and there is a way to make sun-java6 aware of PulseAudio on Ubuntu. The OpenJDK does this automatically but it has some other, unrelated, issues. If you want to use PulseAudio with java6 on Ubuntu x64 you need copy pulse-java.jar and platform dependent libpulse-java.so file to correct JVM directories. To make it easy you can execute these commands:
From this moment on the “PulseAudio Mixer” is available for Java applications. Sharing, switching and assigning audio devices to Java programs is as a result smooth. To use the PulseAudio Mixer by default you need to change sound.properties which can be found at /usr/lib/jvm/java-6-sun/jre/lib/sound.properties. Details can be found here.
“The First International Workhop of Folk Music Analysis: Symbolic and Signal Processing, will take place in Athens, Greece, on the 19th and 20th of May, 2011. … The purpose of the event is to gather reseachers who work in the area of computational folk music analysis, using symbolic or singal processing methods, to present their work, discuss and exchange views on the topic.”
TarsosDSP is a collection of classes to do simple audio processing. It features an implementation of a percussion onset detector and two pitch detection algorithms: Yin and the Mcleod Pitch method.
Its aim is to provide a simple interface to some audio (signal) processing algorithms implemented in JAVA.
To make some of the possibilities clear I coded some examples.
Tarsos is een softwareprogramma waarmee toonhoogte in muziek onderzocht kan worden in onder meer etnische muziek. Het kan gebruikt worden om toonintervallen en toonladders te identificeren. Het maakt kwarttonen of andere (ongewone) intervallen visueel duidelijk. Tarsos heeft nu ook real-time mogelijkheden. Geluid afkomstig van een microfoon wordt meteen geanalyseerd en onmiddellijke feedback toont een gespeeld of gezongen interval. Zangers of instrumentalisten die willen experimenteren met intonatie kunnen Tarsos zelf uit te proberen.
Meer info over de werking van Tarsos is te vinden in een kort artikel over de werking van Tarsos. Daarin wordt ook de betekenis van de verschillende vensters in Tarsos duidelijk.
Installatie
Om Tarsos te kunnen gebruiken heb je, naast Tarsos zelf, Java nodig. Java staat waarschijnlijk al op je pc. Dit is te controleren op java.com. Daar zijn ook instructies te vinden hoe je Java kan installeren.
Met een werkende Java is Tarsos installeren eenvoudig: download Tarsos en voer het uit (dubbel klikken).
Toonladder detecteren
Om een toonladder te detecteren in een muziekbestand met Tarsos doe je het volgende.
Detecteer de pieken in het histogram door met de slider peakpicking te bewegen. Als niet alle pieken gedetecteerd worden kan je een piek toevoegen door met alt en een muisbeweging over het pitch class histogram te bewegen. Klik om de positie van de piek te bevestigen. Met ctrl kunnen pieken verplaatst worden.
Deze stappen zijn ook te zien in een screenshot in de bijlage. Om de informatie te exporteren kan je bij de uitvoeropties kijken. Met file - export - scala... kan je een scala bestand exporteren. Dit zijn tekstbestanden met daarin een toonladder die gebruikt kunnen worden in het Scala programma.
Real time analyse
Met de real time analyse kan je intervallen inspelen en via visuele feedback meteen te weten komen hoe dicht je intonatie bij het gewenste resultaat zat. Je hebt er een microfoon voor nodig. Door Settings - Tarsos Live te kiezen en daarna Tarsos af te sluiten en opnieuw te starten luistert Tarsos naar geluid afkomstig van een microfoon. Speel toonhoogte intervallen en die worden visueel duidelijk. De reset knop wist het histogram.
Tijdens ARIP wordt Tarsos voorgesteld en kan het zelfs uitgetest worden. Volgens de ARIP website : “Op 18 maart 2011 stellen de verschillende onderzoekers hun onderzoeksproject voor: geen afgewerkte producten of eindresultaten, maar wel momentopnames. Samen bieden ze een interessante en intrigerende kijk in wat het onderzoek in ons Conservatorium te bieden heeft”.
Het tekstje over Tarsos:
Tarsos is een softwareprogramma waarmee toonhoogte in muziek onderzocht kan worden in onder meer etnische muziek. Tarsos heeft nu ook nieuwe, real-time mogelijkheden. Geluid afkomstig van een microfoon wordt meteen geanalyseerd en onmiddellijke feedback toont een gespeeld of gezongen interval. Het maakt kwarttonen of andere (ongewone) intervallen visueel duidelijk.
Tijdens ARIP zal er kort wat uitleg gegeven worden over Tarsos en mag je een demo verwachten. Zangers of instrumentalisten die willen experimenteren met intonatie zijn ook meer dan welkom om Tarsos zelf uit te proberen.
This monday the 28th of February Tarsos will be presented at “Lectures on Computational Ethnomusicology” which is held at Izmir, Turkey. The presentation of Tarsos is available here.
Next to the interesting programme it is a great opportunity to meet Baris Bozkurt who has been working on similar research but applied to Makam music.
On wednesday the second of March there is a small seminar at Electrical and Electronics Eng. Dept. of İzmir Yüksek Teknoloji Enstitüsü where Tarsos will be presented also.
Tarsos Transcoder is a library to transcode audio with JAVA.
Downloads and more info on http://tarsos.0110.be/tag/TarsosTranscoder
It uses (platform dependent) FFmpeg binaries in the background. It is a fork of JAVE (Java Audio and Video Encoder) by Carlo Pelliccia (www.sauronsoftware.it).
Tarsos Transcoder focuses only on audio and it is compatible with more, and more recent FFmpeg binaries and it less dependent on text output of the different binaries. The interface is also simplified. It falls back to use the ffmpeg binary in the system path, if one is present, therefore it supports platforms for which no binary is provided within the release.
Getting Started
If you have Apache Ant and git installed on your system the following commands get you started quickly:
git clone https://JorenSix@github.com/JorenSix/TarsosTranscoder.git
cd TarsosTranscoder/build
ant #Compiles and builds the core TarsosTranscoder library
ant javadoc #Creates the javadoc documentation in TarsosTranscoder/doc
java -jar tarsos_transcoder-1.0.jar ../audio/input/tone/tone_10s.wav test.flac FLAC_MONO_44KHZ #Test wav to flac transcoding
If you want to use the transcoder from within Java you need to call Transcoder. It is as simple as:
FFmpeg can encode to a lot of audio formats and can decode even more.
Inner workings
Tarsos Transcoder tries to find an FFmpeg binary in the path of the system. If it does not find one it tries to copy a binary for the current platform. Tarsos Transcoder contains three binaries: one for MAC OS X, one for Linux (x86) and one for windows. Tarsos Transcoder has been tested on:
MAC OS X 10.6
Windows 7
Ubuntu linux 10.10 ARM
Ubuntu Linux 10.04 x86_64
It will probably work most of the time.
Alternative Binaries
If the TarsosTranscoder does not include binaries for you platform, install ffmpeg and add the ffmpeg executable to your system path. It will be found and used by TarsosTranscoder automatically.
Alternatively, providing binaries for your (unsupported) platform can be done by implementing FFMPEGLocator. The PickMe method should yield true on your platform and copy e.g. an FFmpeg binary to a temporary directory.
Lisence
This software is licensed under GPL, TarsosTranscoder is based on JAVE (GPL).
Credits
JAVE (Java Audio and Video Encoder) by Carlo Pelliccia – www.sauronsoftware.it
FFmpeg: this uses libraries from the FFmpeg project under the LGPLv2.1
Voor ARIP heb ik een artikel over Tarsos geschreven. Het motiveert kort de bestaansredenen van Tarsos – een applicatie om toonhoogtegebruik in muziek te analyseren – en het artikel geeft een overzicht van de werking van Tarsos aan de hand van een voorbeeld. Hieronder zijn multimediale aanvullingen te vinden bij het artikel.
Ladrang Kandamanyura (slendro pathet manyura), zo heet het muziekfragment dat gebruikt werd in het artikel als voorbeeld van een stuk muziek met een ongewone (voor onze westerse oren toch) toonladder. De CD waarop het stuk te vinden is, is bij wergo te verkrijgen. Een fragment van 30 seconden is hier te beluisteren:
Het fragment kan je ook downloaden om zelf te analyseren met Tarsos.
Ladrang Kandamanyura (slendro pathet manyura)
Courtesy of: WERGO/Schott Music & Media, Mainz, Germany, www.wergo.de and Museum Collection Berlin
Lestari – The Hood Collection, Early Field Recordings from Java (SM 1712 2)
Recorded in 1957 and 1958 in Java – First release
Tarsos Live
Het onderstaande videofragment geeft aan hoe Tarsos gebruikt kan worden om in real time stemmingen te meten. Geluid afkomstig van een microfoon wordt dan meteen geanalyseerd en onmiddellijke feedback toont een gespeeld of gezongen interval. Het maakt kwarttonen of andere (ongewone) intervallen visueel duidelijk. Tarsos kan zo gebruikt worden door zangers of strijkers die willen experimenteren met microtonaliteit. Ook kan het handig zijn voor etnomusicologisch veldwerk: bijvoorbeeld om kora (een Afrikaanse harp) toonladders te documenteren.
A new version of Tarsos was uploaded today and it contains an exciting (at least my kind of exciting) new feature. It is capable of real-time pitch analysis and tone scale construction. A video should make its use clear:
The immediate feedback is practical for educational purposes: it makes rather vague things like quarter tones or (uncommon) pitch intervals in general quite tangible. It could be used by singers or string players to explore microtonality or to improve their technique. Another use case is ethnomusicologic field-work: if you would want to research Kora tuning (an African harp) Tarsos could be a practical tool for real-time analysis.
Naar jaarlijkse gewoonte wordt er in het Orpheus instituut de Dag van het Artistiek onderzoek georganiseerd. Hieronder volgt een tekstje over het onderzoeksproject rond Tarsos dat in het jaarboek komt. Het jaarboek is een boekje met daarin een overzicht van artistieke onderzoeksprojecten aan Vlaamse instituten. Het wordt gepubliceerd naar aanleiding van de eerder aangehaalde “Dag van het Artistiek Onderzoek”.
Het doel van dit onderzoeksproject is het ontwikkelen van een methode om een cultuuronafhankelijke kijk op muzikale parameters te verkrijgen. Meer concreet worden er technieken aangewend uit Music Information Retrieval om toonhoogte, tempo en timbre te bestuderen. Aanpassing van bestaande, meestal westers georiënteerde, MIR-methodes moet leiden tot een gestructureerde documentatie van verschillende klankkleuren, toonschalen, metrische verhoudingen en muzikale vormen. Die beschrijving kan dienen als inspiratie voor de ontwikkeling van een artistieke compsitionele taal of kan gebruikt worden als bronmateriaal voor wetenschappelijk onderzoek rond ethnische muziek. Bijvoorbeeld om (de eventuele
teloorgang van) de eigenheid van orale muziekculturen objectief aan te tonen.
In de eerste fase van het onderzoek ligt de focus van het onderzoek op één van de meer tastbare parameters: toonhoogte. In etnische muziek is het gebruik van toonhoogte vaak radicaal anders dan westerse muziek die meestal gebaseerd is op de onderverdeling van een octaaf in twaalf gelijke delen. Om toonladders uit
muziek te extraheren en weer te geven werd het software platform Tarsos ontwikkeld. Met Tarsos is het mogelijk om automatische toonladderanlyse uit te voeren op een grote dataset of om manueel een gedetailleerde analyse te verkrijgen van enkele muziekstukken. De cultuuronafhankelijke analysemethode waarvan Tarsos gebruik maakt kan even goed toegepast worden op Indonesische, Westerse of Afrikaanse muziek.
Onze bedoeling is om Tarsos te gebruiken om evoluties in toonladdergebruik te ontdekken in de enorme dataset van het Koninklijk Museum voor Midden-Afrika. Is toonladderdiversiteit in Afrika aan het wegkwijnen onder invloed van Westerse muziek? Zijn er specifieke kenmerken te vinden over eventueel ‘uitgestorven’ muziekculturen? Dit zijn vragen die kaderen in het overkoepelende onderzoeksproject van Olmo Cornelis en waar we met behulp van Tarsos een antwoord op proberen te vinden.
Later krijgen de twee overige muzikale parameters, tempo en timbre, een gelijkaardige behandeling. In de laatste fase van dit toch wel ambitieuze onderzoekproject wordt de relatie tussen de parameters onderzocht.
At the workshop I had an interesting meeting with Dan Tidhar. He researches harpsichord temperament estimation at QMUL. Together they created the Tempest web service where you can upload harpsichord audio and let the system guess the temperament. The process is described in the paper “High precision frequency estimation for harpsichord tuning classification”. Although Tarsos was not officially part of the programme I hijacked the poster sessions to show a live demo of Tarsos with Dan’s dataset.
Another interesting talk was about 2032, a tunable synthesizer with definable Harmonics. It elaborates on the ideas of Sethares about tone scales .
LaTeX (pronounced “latech”) is a document preparation system for high-quality typesetting based on, and succeeding TeX formatting. It is a very popular format in academia, as it allows advanced document formatting capabilities not found in other common document formatting systems. Some of these capabilities include table figure notations, bibliography formatting (see BibTeX), and an advanced macro language.
This post contains links to genuinely useful software to do signal based audio analysis.
Sonic Visualizer: As its name suggests Sonic Visualizer contains a lot different visualisations for audio. It can be used for analysis (pitch,beat,chroma,…) with VAMP-plugins. To quote “The aim of Sonic Visualiser is to be the first program you reach for when want to study a musical recording rather than simply listen to it”. It is the swiss army knife of audio analysis.
BeatRoot is designed specifically for one goal: beat tracking. It can be used for e.g. comparing tempi of different performances of the same piece or to track tempo deviation within one piece.
Tartini is capable to do real-time pitch analysis of sound. You can e.g. play into a microphone with a violin and see the harmonics you produce and adapt you playing style based on visual feedback. It also contains a pitch deviation measuring apparatus to analyse vibrato.
Tarsos is software for tone scale analysis. It is useful to extract tone scales from audio. Different tuning systems can be seen, extracted and compared. It also contains the ability to play along with the original song with a tuned midi keyboard .
Melodic Match is a different beast. It does not work on signal level but processes symbolic audio. More to the point it searches through MusicXML files – which can be created from MIDI-files. See its website for use cases. Melodic Match is only available for Windows.
There is more to Tarsos then meets te eye. The graphical user interface only exposes some functionality; the API exposes all of Tarsos’ capabilities.
Tarsos is programmed in Java so the API is accessible trough Java and other programming languages targeting the JVM like JRuby, Scala and Groovy. The following examples use the Groovy programming language because I find it the most aesthetically pleasing with regards to interoperability and it gets the job done without getting in your way.
To run the examples a copy of the Tarsos JAR-file needs to be added to the Classpath and the Groovy runtime must be installed correctly. I’ll leave this as an exercise for the reader: godspeed to you, brave soul. Quick protip: placing a copy of the jar in the extensions directory seems to work best, e.g. see important java directories on mac OS X.
The first example extracts pitch class histograms from a bunch of files and saves them as EPS-files. It iterates a directory recursively and handles each file that matches a given regular expression. In this example the regular expression matches all WAV-files. Batch processing is one of those things scripting is ideal for, doing the same thing with the user interface would be tedious or even mind-numbingly boring, not groovy at all indeed.
import be.hogent.tarsos.*
import be.hogent.tarsos.util.*
import be.hogent.tarsos.util.histogram.ToneScaleHistogram
import be.hogent.tarsos.sampled.pitch.Annotation
import be.hogent.tarsos.sampled.pitch.PitchDetectionMode
dir = "/home/joren/audio"FileUtils.glob(dir,".*.wav",true).each { file ->
audioFile = new AudioFile(file)
pitchDetector = PitchDetectionMode.TARSOS_YIN.getPitchDetector(audioFile)
pitchDetector.executePitchDetection()
//get some annotations
annotations = pitchDetector.getAnnotations()
//create an ambitus and tone scale histogram
ambitusHistogram = Annotation.ambitusHistogram(annotations)
toneScaleHisto = ambitusHistogram.toneScaleHistogram()
//plot a smoothed version of the histogram
p = new SimplePlot()
p.addData 0, toneScaleHisto.gaussianSmooth(0.2)
p.save FileUtils.basename( file) + ".eps"
}
The second example uses functionality that is currently only available trough the API. It takes a MIDI-file and synthesizes it to a wave file using an arbitrary scale. In this case 10-TET. The heavy-work is done by the Gervill synthesizer. The resulting file is available for download, micro—macro?—tonal Bach is great: BWV 1013 in 10-TET. The result of an analysis with Tarsos on the synthesized audio clearly shows an interval of 120 cents with some deviations.
import java.io.File
import be.hogent.tarsos.midi.MidiToWavRenderer
import be.hogent.tarsos.util.ScalaFile
midiFile = new File("BWV_1013.mid")
outFile = new File("out.wav")
tuning = [0,120,240,360,480,600,720,840,960,1080] as double []
MidiToWavRenderer renderer
renderer = new MidiToWavRenderer()
renderer.setTuning(tuning)
renderer.createWavFile(midiFile, outFile)
An extended version of this second example script could be used to generate a dataset with audio and corresponding tone scale information on the fly. The dataset could then be used as a baseline.
The API is not yet well documented and is still in flux or more correctly: superflux. Note to self: I will provide documentation and a number of useful examples when the dust settles down. I’m not even sure if I will stick with Groovy. Scala has a nice Lispy feel to it and seems more developed. Groovy has a less steep learning curve, especially if you have some experience with Ruby. JRuby is also nice but the interoperability with legacy Java looks like an ugly hack.
Yesterday Tarsos was publicly presented at the symposium Perspectives for Computational Musicology in Amsterdam. The first public presentation of Tarsos, excluding this website. The symposium was organized by the Meertens Institute on the occasion of Peter van Kranenburg’s PhD defense.
The presentation included a live demo of a daily build of Tarsos (a Friday evening build) which worked, surprisingly, without hiccups. The presentation was done by Olmo Cornelis. This was the small introduction:
Tarsos – a Platform for Pitch Analysis of Ethnic Music
Ethnic music is a vulnerable cultural heritage that has received only recently more attention within the Music Information Retrieval community. However, access to ethnic music remains problematic, as this music does not always correspond to the Western concepts of music and metadata that underlie the currently available content-based methods. During this lecture, we like to present our current research on pitch analysis of African music. TARSOS, a platform for analysis, will be presented as a powerful tool that can describe and compare scales with great detail.
Drag and drop works for scala tone scale files and different kinds of audio files. Audiofiles are transcoded automagically using an embedded ffmpeg binary which is platform dependend. It works on linux and windows, on other platforms only WAV files are supported.
Some of the current features:
Scala file extraction from audio
Real time pitch tracking
Real time pitch class histogram visualization
Alignment of pitch intervals with histogram using mouse dragging
Tarsos can be used to render MIDI files to audio (WAV) files using arbitrary tone scales. This functionallity can be used to (automatically) verify tone scale extraction from audio files. Since I could not find a dataset with audio and corresponding tone scales creating one using MIDI seemed a good idea.
MIDI files can be found in spades, tone scales on the other hand are harder to find. Luckily there is one massive source, the Scala Tone Scale Archive: A large collection of over 3700 tone scales.
Using Scala tone scale files and a midi files a Tone Scale – Audio dataset can be generated. The quality of the audio depends on the (software) synthesizer and the SoundFont used. Tarsos currently uses the Gervill synthesizer. Gervill is a pure Java software synthesizer with support for 24bit SoundFonts and the MIDI tuning standard.
How To Render MIDI Using Arbitrary Tone Scales with Tarsos
A recent version of the JRE needs to be installed on your system if you want to use Tarsos. Tarsos itself can be downloaded in the form of the Tarsos JAR Package.
Currently Tarsos has a Command Line Interface. An example with the files you can find attached:
To summarize: by rendering audio with MIDI and Scala tone scale files a dataset with tone scale – audio information can be generated and tone scale extraction algorithms can be tested on the fly.
This method also has some limitations. Because audio is rendered there is no (background) noise, no fluctuations in pitch and timbre,… all of which are present in recorded audio. So testing testing tone scale extraction algorithms on recorded audio remains advised.
Tarsos is now capable of reproducing speech using MIDI. The idea to convert speech into MIDI comes from the blog of Corban Brook where the following video can be found, actually a work by Peter Ablinger:
Another example of music inspired by speech is this interview with Louis Van Gaal:
Tarsos sends out midi data based on an FFT analysis of the signal. It maps the spectrogram to MIDI Messages and uses the power spectrum to calculate the velocity of each note on message.
The implementation can run in real-time but the output has some delay: the FFT calculation, constructing MIDI messages, calculating velocity, synthesizing sound, … is not instantaneous.
To use this capability Tarsos supports the following syntax. If a MIDI file is given the MIDI messages are written to the file. If an audio file is given Tarsos uses the audio as input. If the --pitch switch is used only the F0 is considered to construct MIDI messages instead of a complete FFT.
Tarsos can be used to search for music that uses a certain tone scale or tone interval(s). Tone scales can be defined by a Scala tone scale file or an exemplifying audio file. This text explains how you can use Tarsos for this task.
Search Using Scala Tone Scale Files
Scala files are text files with information about a tone scale. It is used to share and exchange tone scales. The file format originates from the Scala program :
Scala is a powerful software tool for experimentation with musical tunings, such as just intonation scales, equal and historical temperaments, microtonal and macrotonal scales, and non-Western scales. It supports scale creation, editing, comparison, analysis, …
Tarsos also understands Scala files. It is able to create a pitch class histogram using a gaussian mixture model. A technique described in A. C. Gedik, B.Bozkurt, 2010, "Pitch Frequency Histogram Based Music Information Retrieval for Turkish Music ", Signal Processing, vol.10, pp.1049-1063. (doi:10.106/j.sigpro.2009.06.017).
An example should make things clear. Lets search for an interval of 300 cents or exactly three semitones. A scala file with this interval is easy to define:
! example.scl
! An example of a tone interval of 300 cents
Tone interval of 300 cents
2
!
9001200.0
The next step is to create a histogram with an interval of 300 cents. In the block diagram this step is called “Peak histogram creation”. The Similarity calculation step expects a list of histograms to compare with the newly defined histogram. Feeding the similarity calculation with the western12ET tone scale and a pentatonic Indonesian Slendro tone scale shows that a 300 cents interval is used in the western tone scale but is not available in the Slendro tone scale.
This example only uses scala files, creating histograms is actually not needed: calculating intervals can be done using the scala file itself. This changes when audio files are compared with each other or with scala files.
Search Using Audio Files
When audio files are fed to the algorithm additional steps need to be taken.
First of all pitch detection is executed on the audio file. Currently two pitch extractors are implemented in pure Java, it is also possible to use an external pitch extractor such as aubio
Using pitch annotations a Pitch Histogram is created.
Peak detection on the Pitch Histogram results in a number of peaks, these should represent the distinct pitch classes used in the musical piece.
With the pitch classes a clean peak histogram is created during the Peak Histogram construction phase.
Finally the Peak histogram is matched with other histograms.
The last two steps are the same for audio files or scala files.
Using real audio files can cause dirty histograms. Determining how many distinct pitch classes are used is no trivial task, even for an expert (human) listener. Tarsos should provide a semi-automatic way of peak extraction: a best guess by an algorithm that can easily be corrected by a user. For the moment Tarsos does not allow manual intervention.
Tarsos
To use tarsos you need a recent java runtime (1.6) and the following command line arguments:
This post is about the tools I use to keep the source code of Tarsos reasonably clean, consistent and readable. Static code analysis can be of great help if you want to maintain strict coding standards and follow language idioms. Some of the patterns they can detect for you:
Dead code – unused variables, parameters, methods
Suboptimal code – wasteful resource usage
Overcomplicated expressions – unnecessary if statements, for loops that could be while loops
Duplicate code – copied/pasted code is a code smell.
Formatting inconsistencies, e.g. variable modifier order
And even more subtle, but equally important:
Resource management: is a resource handled (closed) correctly on all possible code paths?
Abstraction level: is it needed to expose the concrete type of an object or could an (abstract) supertype or even an interface be used instead?
…
In a previous life I used .NET and the static code analysis tools FxCop & StyleCop. FxCop operates on bytecode (or intermediate language in .NET parlance) level, StyleCop analyses the source code itself. Tarsos uses JAVA so I looked for JAVA alternatives and found a few.
PMD & Checkstyle both operate on source code level.
FindBugs operates on bytecode level.
On freesoftwaremagazine.com there is an article series on JAVA static code analysis software. It covers PMD and FixBugs and integration in Eclipse. It does not cover Checkstyle. Checkstyle is essentialy the same as PMD but it is better integrated in eclipse: it checks code on save and uses the standard ‘Problems’ interface, PMD does not.
Continuous testing is also a really nice thing to have: detecting unexpected behavior while refactoring/programming can prevent unnecessary bug hunts. A video about immediate feedback using continuous testing makes this clear.
Another tip is a more philosophical one: making your code and code revisions publicly available makes you think twice before implementing (and subsequently publishing) a quick and dirty hack. Tarsos is available on github.
I just finished creating a first release of Tarsos. The release contains several demo applications, some more usefull than other. Tarsos is a work in progress: not all functionality is exposed with the CLI demo applications. The demos should however give a taste of the possibilities. All demo applications follow this pattern:
Today I created a spectrogram application using Tarsos. The application listens to an audio input, computes an FFT and at the same time calculates pitch. The expected pitch is overlaid on the spectrogram. All this happens real-time and is implemented using JAVA.
The JAVA software program we are developing is called Tarsos and can now be found on GitHub. GitHub is a web-based hosting service for projects that use the Git version control system.
Currently Tarsos is a collection of Java classes to create, compare and process pitch-frequency data using histograms. In it’s current state it is not usable for end-users.
The dataset we use is the sound archive of the department of Ethnomusicology of the Royal Museum for Central Africa at Tervuren, Belgium. The archive was digitized during the DEKKMMA project. More information about the dataset can be foun on the website of the DEKKMMA project:
The archive is a collection of sound recordings of traditional music from Central Africa, with a particular focus on Congo and Rwanda. The sound archive contains about 3,000 hours of music recordings, the oldest of which date from 1910: Edison cylinders recorded by Hutereau in the Uele-province in Congo.
The archive contains several sound carriers (Edison cylinders, Sonofil wire, magnetic tapes, audiocassettes, disks, CD’s …) with associated metadata (paper files) and contextual data (photographs, films, video’s, books, documents of all kind).
The collection was created during and after the colonial era of the Belgian Kingdom in Central Africa. The RMCA collection forms for an important part the musical memory of Central Africa and in terms of size, documentation and musical quality, it is – without any doubt – the world’s most important sound archive for this region.
The aim of this research project is to gain novel musicological insights into a large dataset of music from Central Africa. While practising ethnomusicological research on this dataset, we to develop and publish useful software and methodologies for the (ethno)musicological research community.
From November 2009 until November 2013 this research project was organised at the School of Arts, University College Ghent, under supervision by Olmo Cornelis. Later, from November 2013 onwards, the project turned into a 2 year doctoral research project hosted at IPEM, University Ghent under the supervision of Marc Leman.
Partners
University Ghent, IPEM: The Institute for Psychoacoustics and Electronic Music is part of the Music department at the Faculty of Arts and Philosophy, at Ghent University.