The journal paper Tarsos, a Modular Platform for Precise Pitch Analysis of Western and Non-Western Music by Six, Cornelis, and Leman was published in a special issue about Computational Ethnomusicology of the Journal of New Music Research on the 20th of august 2013. Below you can find the abstract for the article, and pointers to audio examples, the Tarsos software, and the author version of the article itself.
Abstract: This paper presents Tarsos, a modular software platform used to extract and analyze pitch organization in music. With Tarsos pitch estimations are generated from an audio signal and those estimations are processed in order to form musicologically meaningful representations. Tarsos aims to offer a flexible system for pitch analysis through the combination of an interactive user interface, several pitch estimation algorithms, filtering options, immediate auditory feedback and data output modalities for every step. To study the most frequently used pitches, a fine-grained histogram that allows up to 1200 values per octave is constructed. This allows Tarsos to analyze deviations in Western music, or to analyze specific tone scales that differ from the 12 tone equal temperament, common in many non-Western musics. Tarsos has a graphical user interface or can be launched using an API – as a batch script. Therefore, it is fit for both the analysis of individual songs and the analysis of large music corpora. The interface allows several visual representations, and can indicate the scale of the piece under analysis. The extracted scale can be used immediately to tune a MIDI keyboard that can be played in the discovered scale. These features make Tarsos an interesting tool that can be used for musicological analysis, teaching and even artistic productions.
Ladrang Kandamanyura (slendro pathet manyura), is the name of the piece used in the article throughout section 2. The album on which the piece can be found is available at wergo. Below a thirty second fragment is embedded. You can also download the thirty second fragment to analyse it yourself.
Below the BibTex entry for the article is embedded.
In the extended abstract, also titled Computer Assisted Transcription of Ethnic Music, it is described how the Tarsos software program now has features aiding transcription. Tarsos is especially practical for ethnic music of which the tone scale is not known beforehand. The proceedings of FMA 2013 are available as well.
During the conference there also was an interesting panel on transcription. The following people participated: John Ashley Burgoyne, moderator (University of Amsterdam), Kofi Agawu (Princeton University), Dániel P. Biró (University of Victoria), Olmo Cornelis (University College Ghent, Belgium), Emilia Gómez (Universitat Pompeu Fabra, Barcelona), and Barbara Titus (Utrecht University). Some pictures can be found below.
Today marks the reslease of Tarsos 1.0 . The new Tarsos release contains practical transcription features. As can be seen in the screenshot below, a time stretching feature makes it easy to loop a certain audio fragment while it is playing in a slow tempo. The next loop can be played with by pressing the n key, the one before by pressing b.
Since the pitch classes can be found in a song, and there is a feature that lets you play a MIDI keyboard in the tone scale of the song under analysis, transcription of ethnic music is made a lot easier.
The new release of Tarsos can be found in the Tarsos release repository. From now on, nightly releases are uploaded there automatically.
What follows is about the Conference on Interdisciplinary Musicology and the 15th international Conference of the Gesellschaft fur Musikfoschung. First this text will give information about our contribution to CIM2012: Revealing and Listening to Scales From the Past; Tone Scale Analysis of Archived Central-African Music Using Computational Means and then a number of highlights of the conference follow. The joint conference took place from the 4th to the 8th of september 2012.
In 2012, CIM will tackle the subject of History. Hosted by the University of Göttingen, whose one time music director Johann Nikolaus Forkel is widely regarded as one of the founders of modern music historiography, CIM12 aims to promote collaborations that provoke and explore new methods and methodologies for establishing, evaluating, preserving and communicating knowledge of music and musical practices of past societies and the factors implicated in both the preservation and transformation of such practices over time.
Revealing and Listening to Scales From the Past; Tone Scale Analysis of Archived Central-African Music Using Computational Means
The work presented by Rytis Ambrazevicius et al. Modal changes in traditional Lithuanian singing: Diachronic aspect has a lot in common with our research, it was interesting to see their approach. Another highlight of the conference was the whole session organized by Klaus-Peter Brenner around Mbira music.
Rainer Polak gave a talk titled ‘Swing, Groove and Metre. Asymmetric Feels, Metric Ambiguity and Metric Transformation in African Musics’. He showed how research about rhythm in jazz research, music theory and empirical musicology ( amongst others) could be bridged and applied to ethnic music.
The overview Eleanore Selfridge-Field gave during her talk Between an Analogue Past and a Digital Future: The Evolving Digital Present was refreshing. She had a really clear view on all the different ways musicology and digital media can benifit from each-other.
From the concert programme I found two especially interesting: the lecture-performance by Margarete Maierhofer-Lischka and Frauke Aulbert of Lotofagos, a piece by Beat Furrer and Burdocks composed and performed by Christian Wolff and a bunch of enthusiastic students.
At this years ICMC Conference, ICMC 2012 we presented a paper describing a way to experiment with tone scales and how to use Tarsos as a compositional tool. What follows are some pointers to the presentation, paper and to other interesting talks that were presented there.
ICMC 2012 was organized in Ljubljana from the 9 to 14 septembre and had a very dense program of talks, posters, presentations, demos and concerts.
Since 1974 the International Computer Music Conference has been the major international forum for the presentation of the full range of outcomes from technical and musical research, both musical and theoretical, related to the use of computers in music. This annual conference regularly travels the globe, with recent conferences in the Americas, Europe and Asia. This year we welcome the conference to Slovenia for the first time.
Sound to Scale to Sound, a Setup for Microtonal Exploration and Composition
@inproceedings{cornelis2012sound_to_scale,
author = {OlmoCornelisandJorenSix},
title = {{Sound to Scale to Sound, a SetupforMicrotonalExplorationandComposition}},
booktitle = {{Proceedings of the 2012InternationalComputerMusicConference,
(ICMC2012)}},
year = {2012},
publisher = {TheInternationalComputerMusicAssociation}
}
Program highlights
What follows are a number of pointers to my personal program highlights.
Verena Thomas presented two very well polished software tools. One to detect patterns in scores, called motifviewer and a tool to search in score databases in a multi-modal way. The Probado tool does score-to-audio alignment and much more.
Gibber is an impressive live-coding environment with an easy syntax. Since it is all done with javascript you can start playing with it immediately. Overtone Another live-coding environment, presented at the conference by Sam Aaron, was equally impressive. It is programmed using the Closure language.
At ICMC there were a number of tools to assist in composition. One of those is The Bach Project, by Andrea Agostini. Togheter with CatART by Diemo Swartz it forms a very expressive platform to work with sound, which was demonstrated by Aaron Einbond and Christopher Trapani in their paper titled Precise Pitch Control In Real Time Corpus-Based Concatenative Synthesis. Diemo Swartz presented work on Audio Mosaicing, it can be seen as a follow-up to AuidioGuild by Ben Hackbarth.
I also got to know the work by Thomas Grill, on his website a nice piece of software can be found a Python implementation of the Non Stationary Gabor Transform. Another software system I got to know is the functional signal processing programming language FAUST
My personal highlights of the concert programme include the works by Johannes Kreidler, Aura Pon, Daniel Mayer, Alexander Schubert and the remarkable performance by Dexter Ford. The concept behind Soundlog by Johannes Kretz was also interesting.
At the 2012 AAWM conference we presented a way to explore tone scales in the music of Central Africa. Since the audience consisted of (ethno)musicologists, the main focus of the presentation was on the applicication part, the technical aspects were only briefly mentioned.
To test the application, download and execute the WSOLA jar file and load an audio file. For the moment only 44.1kHz mono wav is allowed. To get started you can try this piece of audio.
There is also a command line interface, the following command doubles the speed of in.wav:
java -jar TimeStretch.jar in.wav out.wav 2.0
_______ _____ _____ _____
|__ __| | __ \ / ____| __ \
| | __ _ _ __ ___ ___ ___| | | | (___ | |__) |
| |/ _` | '__/ __|/ _ \/ __| | | |\___ \| ___/
| | (_| | | \__ \ (_) \__ \ |__| |____) | |
|_|\__,_|_| |___/\___/|___/_____/|_____/|_|
----------------------------------------------------
Name:
TarsosDSP Time stretch utility.
----------------------------------------------------
Synopsis:
java -jar TimeStretch.jar source.wav target.wav factor
----------------------------------------------------
Description:
Change the play back speed of audio without changing the pitch.
source.wav A readable, mono wav file.
target.wav Target location for the time stretched file.
factor Time stretching factor: 2.0 means double the length, 0.5 half. 1.0 is no change.
The source code of the Java implementation of WSOLA can be found on the TarsosDSP github page.
Tarsos contains a couple of useful command line applications. They can be used to execute common tasks on lots of files. Dowload Tarsos and call the applications using the following format:
The first part java -jar tarsos.jar tells the Java Runtime to start the correct application. The first argument for Tarsos defines the command line application to execute. Depending on the command, required arguments and options can follow.
To get a list of available commands, type java -jar tarsos.jar -h. If you want more information about a command type java -jar tarsos.jar command -h
Detect Pitch
Detects pitch for one or more input audio files using a pitch detector. If a directory is given it traverses the directory recursively. It writes CSV data to standard out with five columns. The first is the start of the analyzed window (seconds), the second the estimated pitch, the third the saillence of the pitch. The name of the algorithm follows and the last column shows the original filename.
This document describes how pitch can be represented using various units. More specifically it documents how a software program to analyse pitch in music, Tarsos, represents pitch. This document contains definitions of and remarks on different pitch and pitch interval representations. For good measure we need a definition of pitch, here the definition from [McLeod 2009] is used: The pitch frequency is the frequency of a pure sine wave which has the same perceived sound as the sound of interest. For remarks and examples of cases where the pitch frequency does not coincide with the fundamental frequency of the signal, also see [McLeod 2009] . In this text pitch, pitch interval and pitch ratio are briefly discussed.
The following video shows Bobby McFerrin demonstrating the power of the pentatonic scale. It is a fascinating demonstration of how quickly a (western) audience of the World Science Festival 2009 adapts to an unusual tone scale:
With Tarsos the scale used in the example can be found. This is the result of a quick analysis: it becomes clear that this, in fact, a pentatonic scale with an unequal octave division. A perfect fifth is present between 255 and 753 cents:
The aim of acoustic fingerprinting is to generate a small representation of an audio signal that can be used to identify or recognize similar audio samples in a large audio set. A robust fingerprint generates similar fingerprints for perceptually similar audio signals. A piece of music with a bit of noise added should generate an almost identical fingerprint as the original. The use cases for audio fingerprinting or acoustic fingerprinting are myriad: detection of duplicates, identifying songs, recognizing copyrighted material,…
Using a pitch class histogram as a fingerprint seems like a good idea: it is unique for a song and it is reasonably robust to changes of the underlying audio (length, tempo, pitch, noise). The idea has probably been found a couple of times independently, but there is also a reference to it in the literature, by Tzanetakis, 2003: Pitch Histograms in Audio and Symbolic Music Information Retrieval:
Although mainly designed for genre classification it is possible that features derived from Pitch Histograms might also be applicable to the problem of content-based audio identification or audio fingerprinting (for an example of such a system see (Allamanche et al., 2001)). We are planning to explore this possibility in the future.
Unfortunately they never, as far as I know, did explore this possibility, and I also do not know if anybody else did. I found it worthwhile to implement a fingerprinting scheme on top of the Tarsos software foundation. Most elements are already available in the Tarsos API: a way to detect pitch, construct a pitch class histogram, correlate pitch class histograms with a pitch shift,… I created a GUI application which is presented here. It is, probably, the first open source acoustic / audio fingerprinting system based on pitch class histograms.
It works using drag and drop and the idea is to find a needle (an audio file) in a hay stack (a large amount of audio files). For every audio file in the haystack and for the needle pitch is detected using an optimized, for speed, Yin implementation. A pitch class histogram is created for each file, the histogram for the needle is compared with each histogram in the hay stack and, hopefully, the needle is found in the hay stack.
Unfortunately I do not have time for rigorous testing (by building a large acoustic fingerprinting data set, or an other decent test bench) but the idea seems to work. With the following modifications, done with audacity effects the needle was still found a hay stack of 836 files :
A 10% speedup
15 and 30 seconds removed form the needle (a song of 4 minutes 12 seconds)
White noise added
Reversed the audio (This is, I believe, a rather unique property of this fingerprinting technique)
GSM reencoded
The following modifications failed to identify the correct song:
A one semitone pitch shift
A two semitone pitch shift
60 seconds removed from the needle
The original was also found. No failure analysis was done. The hay stack consists of about 100 hours of western pop, the needle is also a western pop song. If somebody wants to pick up this work or has an acoustic fingerprinting data set or drop me a line at .
The 17th of Octobre 2011 Tarsos was presented at the Study Day: Tuning and Temperament which was held at the Institue of Music Research in Londen. The study day was organised by Dan Tidhar. A short description of the aim of the study day:
This is an interdisciplinary study day, bringing together musicologists, harpsichord specialists, and digital music specialists, with the aim of exploring the different angles these fields provide on the subject, and how these can be fruitfully interconnected.
We offer an optional introduction to temperament for non specialists, to equip all potential listeners with the basic concepts and terminology used throughout the day.
The live demo we gave went well and we got a lot of positive, interesting feedback. The presentation about Tarsos is available here.
It was the first time in the history of ISMIR that there was a session with oral presentations about Non-Western Music. We were pleased to be part of this.
This article describes how to do makam recognition with a script that uses the Tarsos API.
The task we want to do is to find the tone scales most similar to the one used in recorded music. To complete this task you need a small set of theoretical scales and a large set of music, each brought in one of the scales. To make it more concrete, an example of Turkish classical music is used.
In an article by Bozkurt pitch histograms are used for – amongst other tasks – makam recognition. A maqam defines rules for a composition or performance of classical Turkish music. It specifies melodic shapes and pitch intervals, the scale. The task is to identify which of nine makams is used in a specific song. A simplified, generalized implementation of this task is shown here. In our implementation there is no tonic detection step. Also here we use only theoretical descriptions of the tone scales as a template and do not construct a template using the audio itself, as is done by Bozkurt. Ioannidis Leonidas wrote an interesting master thesis about makam recognition. Since no knowledge of the music itself is used the approach is generally applicable.
The following is an implementation in Scala a general purpose programming language that is interoperable with Jave . The first step is to write the Scala header. This is just some boilerplate code to be able to run the script from the command line – it assumes a UNIX-like environment and tarsos.jar in the same directory:
The second step constructs the templates the capability of Tarsos to create
theoretical tone scale templates using Gaussian kernels is used, line 8. See the attached images for some examples.
val makams = List( "hicaz","huseyni","huzzam","kurdili_hicazar",
"nihavend","rast","saba","segah","ussak")
var theoreticKDEs = Map[java.lang.String,KernelDensityEstimate]()
makams.foreach{ makam =>
val scalaFile = makam + ".scl"
val scalaObject = new ScalaFile(scalaFile);
val kde = HistogramFactory.createPichClassKDE(scalaObject,35)
kde.normalize
theoreticKDEs = theoreticKDEs + (makam -> kde)
}
The third and last step is matching. First a list of audio
files is created by recursively iterating a directory and matching each file to
a regular expression. Next, starting from line 4, each audio file is processed.
The internal implementation of the YIN pitch detection
algorithm is used on the audio file and a pitch class histogram is created
(line 6,7). On line 10 normalization of the histogram is done, to
make the correlation calculation meaningful. Line 11 until 15 compare the
created histogram from the audio file with the templates calculated beforehand.
The results are stored, ordered and eventually printed on line 19.
val directory = "/home/joren/turkish_makams/"
val audio_pattern = ".*.(mp3|wav|ogg|flac)"
val audioFiles = FileUtils.glob(directory,audio_pattern,true).toList
audioFiles.foreach{ file =>
val audioFile = new AudioFile(file)
val detectorYin = PitchDetectionMode.TARSOS_YIN.getPitchDetector(audioFile)
val annotations = detectorYin.executePitchDetection()
val actualKDE = HistogramFactory.createPichClassKDE(annotations,15);
actualKDE.normalize
var resultList = List[Tuple2[java.lang.String,Double]]()
for ((name, theoreticKDE) <- theoreticKDEs){
val shift = actualKDE.shiftForOptimalCorrelation(theoreticKDE)
val currentCorrelation = actualKDE.correlation(theoreticKDE,shift)
resultList = (name -> currentCorrelation) :: resultList
}
//order by correlation
resultList = resultList.sortBy{_._2}.reverse
Console.println(file + " is brought in tone scale " + resultList(0)._1)
}
An oral presentation about Tarsos is going to take place Tuesday, the 25 of October during the afternoon, as can be seen on the ISMIR preliminary program schedule.
If you want to cite our work, please use the following data:
@inproceedings{six2011tarsos,
author = {JorenSixandOlmoCornelis},
title = {Tarsos - a Platform to ExplorePitchScalesinNon-WesternandWesternMusic},
booktitle = {Proceedings of the 12th InternationalSocietyforMusicInformationRetrievalConference,
ISMIR2011},
year = {2011},
publisher = {InternationalSocietyforMusicInformationRetrieval}
}
Tarsos, a software package to analyse pitch organization in music, contains a new output modality. It is now possible to export a pitch class histogram and a pitch class interval matrix to latex from within Tarsos. This makes documenting tone scales more efficient.
Tarsos, a software package to analyse pitch organization in music, contains a new output modality. Now it is possible to export resynthesized pitch annotations, detected by a pitch detection algorithm and compare those with the original sound. This can be interesting to see which errors a pitch detection algorithm makes.
Below you can listen to an example of synthesized pitch detection results compared with the original flute piece. The file starts with only the original flute sound (on the right channel) and gradually changes so only the synthesized annotations (on the left channel) can be heard.
The 25th of May 2011 Tarsos was present at the IPEM open house.
IPEM (Institute for Psychoacoustics and Electronic Music) is the research center of the Department of Musicology, which is part of the Department of Art, Music and Theater Studies of Ghent University. IPEM provides a scientific basis for the cultural and creative sector, especially for music and performance arts, and does pioneering research work on the relationship between music body movement and new technologies. The institute consists of an interdisciplinary team but also welcomes visiting researchers from all over the world. One of its aims is also to actively try and validate research results during public events and by means of user studies.
“The First International Workhop of Folk Music Analysis: Symbolic and Signal Processing, will take place in Athens, Greece, on the 19th and 20th of May, 2011. … The purpose of the event is to gather reseachers who work in the area of computational folk music analysis, using symbolic or singal processing methods, to present their work, discuss and exchange views on the topic.”
TarsosDSP is a collection of classes to do simple audio processing. It features an implementation of a percussion onset detector and two pitch detection algorithms: Yin and the Mcleod Pitch method.
Its aim is to provide a simple interface to some audio (signal) processing algorithms implemented in JAVA.
To make some of the possibilities clear I coded some examples.
Tarsos is een softwareprogramma waarmee toonhoogte in muziek onderzocht kan worden in onder meer etnische muziek. Het kan gebruikt worden om toonintervallen en toonladders te identificeren. Het maakt kwarttonen of andere (ongewone) intervallen visueel duidelijk. Tarsos heeft nu ook real-time mogelijkheden. Geluid afkomstig van een microfoon wordt meteen geanalyseerd en onmiddellijke feedback toont een gespeeld of gezongen interval. Zangers of instrumentalisten die willen experimenteren met intonatie kunnen Tarsos zelf uit te proberen.
Meer info over de werking van Tarsos is te vinden in een kort artikel over de werking van Tarsos. Daarin wordt ook de betekenis van de verschillende vensters in Tarsos duidelijk.
Installatie
Om Tarsos te kunnen gebruiken heb je, naast Tarsos zelf, Java nodig. Java staat waarschijnlijk al op je pc. Dit is te controleren op java.com. Daar zijn ook instructies te vinden hoe je Java kan installeren.
Met een werkende Java is Tarsos installeren eenvoudig: download Tarsos en voer het uit (dubbel klikken).
Toonladder detecteren
Om een toonladder te detecteren in een muziekbestand met Tarsos doe je het volgende.
Detecteer de pieken in het histogram door met de slider peakpicking te bewegen. Als niet alle pieken gedetecteerd worden kan je een piek toevoegen door met alt en een muisbeweging over het pitch class histogram te bewegen. Klik om de positie van de piek te bevestigen. Met ctrl kunnen pieken verplaatst worden.
Deze stappen zijn ook te zien in een screenshot in de bijlage. Om de informatie te exporteren kan je bij de uitvoeropties kijken. Met file - export - scala... kan je een scala bestand exporteren. Dit zijn tekstbestanden met daarin een toonladder die gebruikt kunnen worden in het Scala programma.
Real time analyse
Met de real time analyse kan je intervallen inspelen en via visuele feedback meteen te weten komen hoe dicht je intonatie bij het gewenste resultaat zat. Je hebt er een microfoon voor nodig. Door Settings - Tarsos Live te kiezen en daarna Tarsos af te sluiten en opnieuw te starten luistert Tarsos naar geluid afkomstig van een microfoon. Speel toonhoogte intervallen en die worden visueel duidelijk. De reset knop wist het histogram.
This monday the 28th of February Tarsos will be presented at “Lectures on Computational Ethnomusicology” which is held at Izmir, Turkey. The presentation of Tarsos is available here.
Next to the interesting programme it is a great opportunity to meet Baris Bozkurt who has been working on similar research but applied to Makam music.
On wednesday the second of March there is a small seminar at Electrical and Electronics Eng. Dept. of İzmir Yüksek Teknoloji Enstitüsü where Tarsos will be presented also.
Voor ARIP heb ik een artikel over Tarsos geschreven. Het motiveert kort de bestaansredenen van Tarsos – een applicatie om toonhoogtegebruik in muziek te analyseren – en het artikel geeft een overzicht van de werking van Tarsos aan de hand van een voorbeeld. Hieronder zijn multimediale aanvullingen te vinden bij het artikel.
Ladrang Kandamanyura (slendro pathet manyura), zo heet het muziekfragment dat gebruikt werd in het artikel als voorbeeld van een stuk muziek met een ongewone (voor onze westerse oren toch) toonladder. De CD waarop het stuk te vinden is, is bij wergo te verkrijgen. Een fragment van 30 seconden is hier te beluisteren:
Het fragment kan je ook downloaden om zelf te analyseren met Tarsos.
Ladrang Kandamanyura (slendro pathet manyura)
Courtesy of: WERGO/Schott Music & Media, Mainz, Germany, www.wergo.de and Museum Collection Berlin
Lestari – The Hood Collection, Early Field Recordings from Java (SM 1712 2)
Recorded in 1957 and 1958 in Java – First release
Tarsos Live
Het onderstaande videofragment geeft aan hoe Tarsos gebruikt kan worden om in real time stemmingen te meten. Geluid afkomstig van een microfoon wordt dan meteen geanalyseerd en onmiddellijke feedback toont een gespeeld of gezongen interval. Het maakt kwarttonen of andere (ongewone) intervallen visueel duidelijk. Tarsos kan zo gebruikt worden door zangers of strijkers die willen experimenteren met microtonaliteit. Ook kan het handig zijn voor etnomusicologisch veldwerk: bijvoorbeeld om kora (een Afrikaanse harp) toonladders te documenteren.
A new version of Tarsos was uploaded today and it contains an exciting (at least my kind of exciting) new feature. It is capable of real-time pitch analysis and tone scale construction. A video should make its use clear:
The immediate feedback is practical for educational purposes: it makes rather vague things like quarter tones or (uncommon) pitch intervals in general quite tangible. It could be used by singers or string players to explore microtonality or to improve their technique. Another use case is ethnomusicologic field-work: if you would want to research Kora tuning (an African harp) Tarsos could be a practical tool for real-time analysis.
Naar jaarlijkse gewoonte wordt er in het Orpheus instituut de Dag van het Artistiek onderzoek georganiseerd. Hieronder volgt een tekstje over het onderzoeksproject rond Tarsos dat in het jaarboek komt. Het jaarboek is een boekje met daarin een overzicht van artistieke onderzoeksprojecten aan Vlaamse instituten. Het wordt gepubliceerd naar aanleiding van de eerder aangehaalde “Dag van het Artistiek Onderzoek”.
Het doel van dit onderzoeksproject is het ontwikkelen van een methode om een cultuuronafhankelijke kijk op muzikale parameters te verkrijgen. Meer concreet worden er technieken aangewend uit Music Information Retrieval om toonhoogte, tempo en timbre te bestuderen. Aanpassing van bestaande, meestal westers georiënteerde, MIR-methodes moet leiden tot een gestructureerde documentatie van verschillende klankkleuren, toonschalen, metrische verhoudingen en muzikale vormen. Die beschrijving kan dienen als inspiratie voor de ontwikkeling van een artistieke compsitionele taal of kan gebruikt worden als bronmateriaal voor wetenschappelijk onderzoek rond ethnische muziek. Bijvoorbeeld om (de eventuele
teloorgang van) de eigenheid van orale muziekculturen objectief aan te tonen.
In de eerste fase van het onderzoek ligt de focus van het onderzoek op één van de meer tastbare parameters: toonhoogte. In etnische muziek is het gebruik van toonhoogte vaak radicaal anders dan westerse muziek die meestal gebaseerd is op de onderverdeling van een octaaf in twaalf gelijke delen. Om toonladders uit
muziek te extraheren en weer te geven werd het software platform Tarsos ontwikkeld. Met Tarsos is het mogelijk om automatische toonladderanlyse uit te voeren op een grote dataset of om manueel een gedetailleerde analyse te verkrijgen van enkele muziekstukken. De cultuuronafhankelijke analysemethode waarvan Tarsos gebruik maakt kan even goed toegepast worden op Indonesische, Westerse of Afrikaanse muziek.
Onze bedoeling is om Tarsos te gebruiken om evoluties in toonladdergebruik te ontdekken in de enorme dataset van het Koninklijk Museum voor Midden-Afrika. Is toonladderdiversiteit in Afrika aan het wegkwijnen onder invloed van Westerse muziek? Zijn er specifieke kenmerken te vinden over eventueel ‘uitgestorven’ muziekculturen? Dit zijn vragen die kaderen in het overkoepelende onderzoeksproject van Olmo Cornelis en waar we met behulp van Tarsos een antwoord op proberen te vinden.
Later krijgen de twee overige muzikale parameters, tempo en timbre, een gelijkaardige behandeling. In de laatste fase van dit toch wel ambitieuze onderzoekproject wordt de relatie tussen de parameters onderzocht.
There is more to Tarsos then meets te eye. The graphical user interface only exposes some functionality; the API exposes all of Tarsos’ capabilities.
Tarsos is programmed in Java so the API is accessible trough Java and other programming languages targeting the JVM like JRuby, Scala and Groovy. The following examples use the Groovy programming language because I find it the most aesthetically pleasing with regards to interoperability and it gets the job done without getting in your way.
To run the examples a copy of the Tarsos JAR-file needs to be added to the Classpath and the Groovy runtime must be installed correctly. I’ll leave this as an exercise for the reader: godspeed to you, brave soul. Quick protip: placing a copy of the jar in the extensions directory seems to work best, e.g. see important java directories on mac OS X.
The first example extracts pitch class histograms from a bunch of files and saves them as EPS-files. It iterates a directory recursively and handles each file that matches a given regular expression. In this example the regular expression matches all WAV-files. Batch processing is one of those things scripting is ideal for, doing the same thing with the user interface would be tedious or even mind-numbingly boring, not groovy at all indeed.
import be.hogent.tarsos.*
import be.hogent.tarsos.util.*
import be.hogent.tarsos.util.histogram.ToneScaleHistogram
import be.hogent.tarsos.sampled.pitch.Annotation
import be.hogent.tarsos.sampled.pitch.PitchDetectionMode
dir = "/home/joren/audio"FileUtils.glob(dir,".*.wav",true).each { file ->
audioFile = new AudioFile(file)
pitchDetector = PitchDetectionMode.TARSOS_YIN.getPitchDetector(audioFile)
pitchDetector.executePitchDetection()
//get some annotations
annotations = pitchDetector.getAnnotations()
//create an ambitus and tone scale histogram
ambitusHistogram = Annotation.ambitusHistogram(annotations)
toneScaleHisto = ambitusHistogram.toneScaleHistogram()
//plot a smoothed version of the histogram
p = new SimplePlot()
p.addData 0, toneScaleHisto.gaussianSmooth(0.2)
p.save FileUtils.basename( file) + ".eps"
}
The second example uses functionality that is currently only available trough the API. It takes a MIDI-file and synthesizes it to a wave file using an arbitrary scale. In this case 10-TET. The heavy-work is done by the Gervill synthesizer. The resulting file is available for download, micro—macro?—tonal Bach is great: BWV 1013 in 10-TET. The result of an analysis with Tarsos on the synthesized audio clearly shows an interval of 120 cents with some deviations.
import java.io.File
import be.hogent.tarsos.midi.MidiToWavRenderer
import be.hogent.tarsos.util.ScalaFile
midiFile = new File("BWV_1013.mid")
outFile = new File("out.wav")
tuning = [0,120,240,360,480,600,720,840,960,1080] as double []
MidiToWavRenderer renderer
renderer = new MidiToWavRenderer()
renderer.setTuning(tuning)
renderer.createWavFile(midiFile, outFile)
An extended version of this second example script could be used to generate a dataset with audio and corresponding tone scale information on the fly. The dataset could then be used as a baseline.
The API is not yet well documented and is still in flux or more correctly: superflux. Note to self: I will provide documentation and a number of useful examples when the dust settles down. I’m not even sure if I will stick with Groovy. Scala has a nice Lispy feel to it and seems more developed. Groovy has a less steep learning curve, especially if you have some experience with Ruby. JRuby is also nice but the interoperability with legacy Java looks like an ugly hack.
Drag and drop works for scala tone scale files and different kinds of audio files. Audiofiles are transcoded automagically using an embedded ffmpeg binary which is platform dependend. It works on linux and windows, on other platforms only WAV files are supported.
Some of the current features:
Scala file extraction from audio
Real time pitch tracking
Real time pitch class histogram visualization
Alignment of pitch intervals with histogram using mouse dragging
Tarsos can be used to render MIDI files to audio (WAV) files using arbitrary tone scales. This functionallity can be used to (automatically) verify tone scale extraction from audio files. Since I could not find a dataset with audio and corresponding tone scales creating one using MIDI seemed a good idea.
MIDI files can be found in spades, tone scales on the other hand are harder to find. Luckily there is one massive source, the Scala Tone Scale Archive: A large collection of over 3700 tone scales.
Using Scala tone scale files and a midi files a Tone Scale – Audio dataset can be generated. The quality of the audio depends on the (software) synthesizer and the SoundFont used. Tarsos currently uses the Gervill synthesizer. Gervill is a pure Java software synthesizer with support for 24bit SoundFonts and the MIDI tuning standard.
How To Render MIDI Using Arbitrary Tone Scales with Tarsos
A recent version of the JRE needs to be installed on your system if you want to use Tarsos. Tarsos itself can be downloaded in the form of the Tarsos JAR Package.
Currently Tarsos has a Command Line Interface. An example with the files you can find attached:
To summarize: by rendering audio with MIDI and Scala tone scale files a dataset with tone scale – audio information can be generated and tone scale extraction algorithms can be tested on the fly.
This method also has some limitations. Because audio is rendered there is no (background) noise, no fluctuations in pitch and timbre,… all of which are present in recorded audio. So testing testing tone scale extraction algorithms on recorded audio remains advised.
Tarsos can be used to search for music that uses a certain tone scale or tone interval(s). Tone scales can be defined by a Scala tone scale file or an exemplifying audio file. This text explains how you can use Tarsos for this task.
Search Using Scala Tone Scale Files
Scala files are text files with information about a tone scale. It is used to share and exchange tone scales. The file format originates from the Scala program :
Scala is a powerful software tool for experimentation with musical tunings, such as just intonation scales, equal and historical temperaments, microtonal and macrotonal scales, and non-Western scales. It supports scale creation, editing, comparison, analysis, …
Tarsos also understands Scala files. It is able to create a pitch class histogram using a gaussian mixture model. A technique described in A. C. Gedik, B.Bozkurt, 2010, "Pitch Frequency Histogram Based Music Information Retrieval for Turkish Music ", Signal Processing, vol.10, pp.1049-1063. (doi:10.106/j.sigpro.2009.06.017).
An example should make things clear. Lets search for an interval of 300 cents or exactly three semitones. A scala file with this interval is easy to define:
! example.scl
! An example of a tone interval of 300 cents
Tone interval of 300 cents
2
!
9001200.0
The next step is to create a histogram with an interval of 300 cents. In the block diagram this step is called “Peak histogram creation”. The Similarity calculation step expects a list of histograms to compare with the newly defined histogram. Feeding the similarity calculation with the western12ET tone scale and a pentatonic Indonesian Slendro tone scale shows that a 300 cents interval is used in the western tone scale but is not available in the Slendro tone scale.
This example only uses scala files, creating histograms is actually not needed: calculating intervals can be done using the scala file itself. This changes when audio files are compared with each other or with scala files.
Search Using Audio Files
When audio files are fed to the algorithm additional steps need to be taken.
First of all pitch detection is executed on the audio file. Currently two pitch extractors are implemented in pure Java, it is also possible to use an external pitch extractor such as aubio
Using pitch annotations a Pitch Histogram is created.
Peak detection on the Pitch Histogram results in a number of peaks, these should represent the distinct pitch classes used in the musical piece.
With the pitch classes a clean peak histogram is created during the Peak Histogram construction phase.
Finally the Peak histogram is matched with other histograms.
The last two steps are the same for audio files or scala files.
Using real audio files can cause dirty histograms. Determining how many distinct pitch classes are used is no trivial task, even for an expert (human) listener. Tarsos should provide a semi-automatic way of peak extraction: a best guess by an algorithm that can easily be corrected by a user. For the moment Tarsos does not allow manual intervention.
Tarsos
To use tarsos you need a recent java runtime (1.6) and the following command line arguments:
I just finished creating a first release of Tarsos. The release contains several demo applications, some more usefull than other. Tarsos is a work in progress: not all functionality is exposed with the CLI demo applications. The demos should however give a taste of the possibilities. All demo applications follow this pattern:
Today I created a spectrogram application using Tarsos. The application listens to an audio input, computes an FFT and at the same time calculates pitch. The expected pitch is overlaid on the spectrogram. All this happens real-time and is implemented using JAVA.
The JAVA software program we are developing is called Tarsos and can now be found on GitHub. GitHub is a web-based hosting service for projects that use the Git version control system.
Currently Tarsos is a collection of Java classes to create, compare and process pitch-frequency data using histograms. In it’s current state it is not usable for end-users.