~ Démonstration de Tarsos
» By Joren on Friday 10 February 2012Nous avons creé une video pour expliquer des possibilités de Tarsos, et maintenant en français.
Nous avons creé une video pour expliquer des possibilités de Tarsos, et maintenant en français.
The DSP library for Taros, aptly named TarsosDSP, now includes an implementation of a time stretching algorithm. The goal of time stretching is to change the duration of a piece of audio without affecting the pitch. The algorithm implemented is described in An Overlap-add Technique Based On Waveform Similarity (WSOLA) for High Quality Time-Scale Modification of Speech.
To test the application, download and execute the “WSOLA jar”:[TimeStretch.jar] file and load an audio file. For the moment only 44.1kHz mono wav is allowed. To get started you can try “this piece of audio”:[08._Ladrang_Kandamanyura_10s-20s.wav].
There is also a command line interface, the following command doubles the speed of in.wav:
\
java -jar TimeStretch.jar in.wav out.wav 2.0\
_______ _____ _____ _____
|__ __| | __ \ / ____| __ \
| | __ _ _ __ ___ ___ ___| | | | (___ | |__) |
| |/ _` | '__/ __|/ _ \/ __| | | |\___ \| ___/
| | (_| | | \__ \ (_) \__ \ |__| |____) | |
|_|\__,_|_| |___/\___/|___/_____/|_____/|_|
----------------------------------------------------
Name:
TarsosDSP Time stretch utility.
----------------------------------------------------
Synopsis:
java -jar TimeStretch.jar source.wav target.wav factor
----------------------------------------------------
Description:
Change the play back speed of audio without changing the pitch.
source.wav A readable, mono wav file.
target.wav Target location for the time stretched file.
factor Time stretching factor: 2.0 means double the length, 0.5 half. 1.0 is no change.
The source code of the Java implementation of WSOLA can be found on the TarsosDSP github page.
Tarsos contains a couple of useful command line applications. They can be used to execute common tasks on lots of files. Dowload Tarsos and call the applications using the following format:
java -jar tarsos.jar command [argument...] [--option [value]...]
The first part java -jar tarsos.jar tells the Java Runtime to start the correct application. The first argument for Tarsos defines the command line application to execute. Depending on the command, required arguments and options can follow.
java -jar tarsos.jar detect_pitch in.wav --detector TARSOS_YIN
To get a list of available commands, type java -jar tarsos.jar -h. If you want more information about a command type java -jar tarsos.jar command -h
Detects pitch for one or more input audio files using a pitch detector. If a directory is given it traverses the directory recursively. It writes CSV data to standard out with five columns. The first is the start of the analyzed window (seconds), the second the estimated pitch, the third the saillence of the pitch. The name of the algorithm follows and the last column shows the original filename.
Synopsis
--------
java -jar tarsos.jar detect_pitch [option] input_file...
Option Description
------ -----------
-?, -h, --help Show help
--detector <PitchDetectionMode> The detector to use [VAMP_YIN |
VAMP_YIN_FFT |
VAMP_FAST_HARMONIC_COMB |
VAMP_MAZURKA_PITCH | VAMP_SCHMITT |
VAMP_SPECTRAL_COMB |
VAMP_CONSTANT_Q_200 |
VAMP_CONSTANT_Q_400 | IPEM_SIX |
IPEM_ONE | TARSOS_YIN |
TARSOS_FAST_YIN | TARSOS_MPM |
TARSOS_FAST_MPM | ] (default:
TARSOS_YIN)
The output of the command looks like this:
Start(s),Frequency(Hz),Probability,Source,file
0.52245,366.77039,0.92974,TARSOS_YIN,in.wav
0.54567,372.13873,0.93553,TARSOS_YIN,in.wav
0.55728,375.10638,0.95261,TARSOS_YIN,in.wav
0.56889,380.24854,0.94275,TARSOS_YIN,in.wav
For the Folk Music Analyisis (FMA) 2012 conference we (Olmo Cornelis and myself), wrote a paper presenting a new acoustic fingerprint scheme based on pitch class histograms.
The aim of acoustic fingerprinting is to generate a small representation of an audio signal that can be used to identify or recognize similar audio samples in a large audio set. A robust fingerprint generates similar fingerprints for perceptually similar audio signals. A piece of music with a bit of noise added should generate an almost identical fingerprint as the original. The use cases for audio fingerprinting or acoustic fingerprinting are myriad: detection of duplicates, identifying songs, recognizing copyrighted material,…
Using a pitch class histogram as a fingerprint seems like a good idea: it is unique for a song and it is reasonably robust to changes of the underlying audio (length, tempo, pitch, noise). The idea has probably been found a couple of times independently, but there is also a reference to it in the literature, by Tzanetakis, 2003: Pitch Histograms in Audio and Symbolic Music Information Retrieval:
Although mainly designed for genre classification it is possible that features derived from Pitch Histograms might also be applicable to the problem of content-based audio identification or audio fingerprinting (for an example of such a system see (Allamanche et al., 2001)). We are planning to explore this possibility in the future.
Unfortunately they never, as far as I know, did explore this possibility, and I also do not know if anybody else did. I found it worthwhile to implement a fingerprinting scheme on top of the Tarsos software foundation. Most elements are already available in the Tarsos API: a way to detect pitch, construct a pitch class histogram, correlate pitch class histograms with a pitch shift,… I created a GUI application which is presented here. It is, probably, acoustic / “audio fingerprinting system based on pitch class histograms”:[fingerprinter.jar].
It works using drag and drop and the idea is to find a needle (an audio file) in a hay stack (a large amount of audio files). For every audio file in the haystack and for the needle pitch is detected using an optimized, for speed, MPM implementation. A pitch class histogram is created for each file, the histogram for the needle is compared with each histogram in the hay stack and, hopefully, the needle is found in the hay stack.
An experiment was done on the audio collection of the museum for Central Africa. A test dataset was generated using SoX with the following “Ruby script”:[audio_fingerprinting_dataset_generator.rb.txt]. The “raw results”:[fingerprinting_results.txt] were parsed with another “Ruby script”:[fingerprinting_results_parser.rb.txt]. With the data “a spreadsheet with the results”:[fingerprinting_on_dekkmma_results.ods] was created (OpenOffice.org format). Those results are mentioned in the paper.
You can try the system yourself by “downloading the fingerprinter”:[fingerprinter.jar].
To prevent confusion about pitch representation in general and pitch representation in Tarsos specifically I wrote a “document about pitch, pitch Interval, and pitch ratio representation”:[pitch_representation.pdf]. The abstract goes as follows:
This document describes how pitch can be represented using various units. More specifically it documents how a software program to analyse pitch in music, Tarsos, represents pitch. This document contains definitions of and remarks on different pitch and pitch interval representations. For good measure we need a definition of pitch, here the definition from \[McLeod 2009\] is used: *The pitch frequency is the frequency of a pure sine wave which has the same perceived sound as the sound of interest.* For remarks and examples of cases where the pitch frequency does not coincide with the fundamental frequency of the signal, also see \[McLeod 2009\] . In this text pitch, pitch interval and pitch ratio are briefly discussed.
The DSP library of Tarsos, aptly named TarsosDSP, contains an implementation of a game that bares some resemblance to SingStar. It is called UtterAsterisk. It is meant to be a technical demonstration showing real-time pitch detection in pure java using a YIN -implementation.
“Download Utter Asterisk”:[UtterAsterisk.jar] and try to sing (utter) as close to the melody as possible. The souce code for Utter Asterisk is available on github.
TarsosDSP, a small Java DSP library, has been used in a bachelor thesis: Entwicklung eines Systems zur automatischen Notentranskription von monophonischem Audiomaterial by Michael Wager.
The goal of the thesis was to develop an automatic transcription system for monophonic music. You can download the latest version of jAM - Java Automatic Music Transcription.
If you want to use TarsosDSP, please consult the TarsosDSP page on github or read more about TarsosDSP here.
Zondag 18 december 2011 gaf ik een workshop voor de Gentse kinderuniversiteit. Het thema van de kinderuniversiteit was Muziek onder de microscoop. De teaser voor de workshop is hier te vinden:
WORKSHOP - Muziek (ont)luisteren op de computer\ Is het mogelijk om piano te spelen op een tafel? Kan een computer luisteren naar muziek en er van genieten? Wat is muziek eigenlijk, en hoe werkt geluid?
\ Tijdens deze workshop worden de voorgaande vragen beantwoord met enkele computerprogramma's!
Concreet worden enkele componenten van geluid (en bij uitbreiding, muziek) gedemonstreerd met computerprogrammaatjes gemaakt in het conservatorium:
“Geluidssterkte”:[SoundDetector.jar]: een decibel-meter met een bepaalde drempelwaarde. Probeer zo luid mogelijk te doen en zie hoe moeilijk het is om, eens een bepaald niveau bereikt is, in decibel te stijgen.
“Toonhoogte”:[UtterAsterisk.jar]: een klein spelletje om toonhoogte aan te tonen. Probeer zo juist mogelijk te zingen of te fluiten en vergelijk je score.
“Percussie”:[PercussionDetector.jar]: dit programma reageert op handgeklap. Hoe kan je het onderscheid maken tussen bijvoorbeeld een fluittoon en handgeklap?
De foto’s hieronder geven een sfeerbeeld.
A small part of Tarsos has been turned into a audio fingerprinting application. The idea of audio fingerprinting is to create a condensed representation of an audio file. A perceptually similar audio file should generate similar fingerprints. To test how robust a fingerprinting technique is, a data set with audio files that are alike in some way is practical.
SoX - Sound eXchange is a command line utility for sound processing. It can apply audio effects to a sound. Using these effects and a set of unmodified songs an audio fingerprinting data set can be created. To generate such a data set SoX can be used to:
Trim the first x seconds of a file
Speed-up or slow-down the audio
Change the pitch of a file without modifying the tempo
Generate background noise (white noise is used)
Reverse the audio stream
```ruby\ #Trim the first 10 seconds\ sox input.wav output.wav trim 10
sox input.wav output.wav speed 1.10
#without changing the tempo\ sox input.wav output.wav pitch 100
sox input.wav noise.wav synth whitenoise\ #mix the white noise with the input to generate noisy output\ #-v defines how loud the white noise is\ sox -m input.wav -v 0.1 noise.wav output.wav
sox input.wav output.wav reverse\ ```
A ruby script to generate a lot of these files can be found “attached”:[audio_fingerprinting_dataset_generator.rb.txt].
The following video shows Bobby McFerrin demonstrating the power of the pentatonic scale. It is a fascinating demonstration of how quickly a (western) audience of the World Science Festival 2009 adapts to an unusual tone scale:
With Tarsos the scale used in the example can be found. This is the result of a quick analysis: it becomes clear that this, in fact, a pentatonic scale with an unequal octave division. A perfect fifth is present between 255 and 753 cents:
Friday the second of December I presented a talk about software for music analysis. The aim was to make clear which type of research topics can benefit from measurements by software for music analysis. Different types of digital music representations and examples of software packages were explained.
Following presentation was used during the talk. (“ppt”:[2011.12.02.software_for_music_analysis.ppt], “odp”:[2011.12.02.software_for_music_analysis.odp]):
Sonic Visualizer: As its name suggests Sonic Visualizer contains a lot different visualisations for audio. It can be used for analysis (pitch,beat,chroma,…) with VAMP-plugins. To quote “The aim of Sonic Visualiser is to be the first program you reach for when want to study a musical recording rather than simply listen to it”. It is the swiss army knife of audio analysis.
BeatRoot is designed specifically for one goal: beat tracking. It can be used for e.g. comparing tempi of different performances of the same piece or to track tempo deviation within one piece.
Tartini is capable to do real-time pitch analysis of sound. You can e.g. play into a microphone with a violin and see the harmonics you produce and adapt you playing style based on visual feedback. It also contains a pitch deviation measuring apparatus to analyse vibrato.
Tarsos is software for tone scale analysis. It is useful to extract tone scales from audio. Different tuning systems can be seen, extracted and compared. It also contains the ability to play along with the original song with a tuned midi keyboard .
To show the different digital representations of music one example (Liebestraum 3 by Liszt) was used in different formats:
“Score (PDF)”:[00.partituur.liebestraum_3.pdf]
“MusicXML”:[01.musicXML-liebestraum_no_3.xml]
“MIDI as notation”:[01.deadpan_midi.wav]
“MIDI as performance”:[02.performed_midi.wav]
“Acoustic performance”:[03.human.performance.wav]
The aim of acoustic fingerprinting is to generate a small representation of an audio signal that can be used to identify or recognize similar audio samples in a large audio set. A robust fingerprint generates similar fingerprints for perceptually similar audio signals. A piece of music with a bit of noise added should generate an almost identical fingerprint as the original. The use cases for audio fingerprinting or acoustic fingerprinting are myriad: detection of duplicates, identifying songs, recognizing copyrighted material,…
Using a pitch class histogram as a fingerprint seems like a good idea: it is unique for a song and it is reasonably robust to changes of the underlying audio (length, tempo, pitch, noise). The idea has probably been found a couple of times independently, but there is also a reference to it in the literature, by Tzanetakis, 2003: Pitch Histograms in Audio and Symbolic Music Information Retrieval:
Although mainly designed for genre classification it is possible that features derived from Pitch Histograms might also be applicable to the problem of content-based audio identification or audio fingerprinting (for an example of such a system see (Allamanche et al., 2001)). We are planning to explore this possibility in the future.
Unfortunately they never, as far as I know, did explore this possibility, and I also do not know if anybody else did. I found it worthwhile to implement a fingerprinting scheme on top of the Tarsos software foundation. Most elements are already available in the Tarsos API: a way to detect pitch, construct a pitch class histogram, correlate pitch class histograms with a pitch shift,… I created a GUI application which is presented here. It is, probably, the first open source acoustic / “audio fingerprinting system based on pitch class histograms”:[AudioFingerprinter.jar].
It works using drag and drop and the idea is to find a needle (an audio file) in a hay stack (a large amount of audio files). For every audio file in the haystack and for the needle pitch is detected using an optimized, for speed, Yin implementation. A pitch class histogram is created for each file, the histogram for the needle is compared with each histogram in the hay stack and, hopefully, the needle is found in the hay stack.
Unfortunately I do not have time for rigorous testing (by building a large acoustic fingerprinting data set, or an other decent test bench) but the idea seems to work. With the following modifications, done with audacity effects the needle was still found a hay stack of 836 files :
A 10% speedup
15 and 30 seconds removed form the needle (a song of 4 minutes 12 seconds)
White noise added
Reversed the audio (This is, I believe, a rather unique property of this fingerprinting technique)
GSM reencoded
The following modifications failed to identify the correct song:
A one semitone pitch shift
A two semitone pitch shift
60 seconds removed from the needle
The original was also found. No failure analysis was done. The hay stack consists of about 100 hours of western pop, the needle is also a western pop song. If somebody wants to pick up this work or has an acoustic fingerprinting data set or drop me a line at
.
The source code is available, as always, on the Tarsos GitHub page.
The 21st of October a demo of PeachNote Piano was given at the ISMIR (International Society for Music Information Retrieval) 2011 conference. The demo raised some interest.
The extended abstract about PeachNote Piano can be found on the ISMIR 2011 schedule.
A previous post about PeachNote Piano has more technical details together with a video showing the core functionality (quasi-instantaneous USB-BlueTooth-MIDI communication).
The 17th of Octobre 2011 Tarsos was presented at the Study Day: Tuning and Temperament which was held at the Institue of Music Research in Londen. The study day was organised by Dan Tidhar. A short description of the aim of the study day:
This is an interdisciplinary study day, bringing together musicologists, harpsichord specialists, and digital music specialists, with the aim of exploring the different angles these fields provide on the subject, and how these can be fruitfully interconnected. We offer an optional introduction to temperament for non specialists, to equip all potential listeners with the basic concepts and terminology used throughout the day.
Olmo Cornelis and myself just gave a presentation about Tarsos at the at the 12th International Society for Music Information Retrieval Conference which is held at Miami.
The live demo we gave went well and we got a lot of positive, interesting feedback. The presentation about Tarsos is available here.
It was the first time in the history of ISMIR that there was a session with oral presentations about Non-Western Music. We were pleased to be part of this.
The peer reviewed paper about our work: Tarsos - a Platform to Explore Pitch Scales in Non-Western and Western Music is available from the ISMIR website and embedded below:
During the the demo session of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) a demonstration of Tarsos was given. During the demo, the 18th of Octobre 2011 feedback was gathered.
During the conference I met interesting people and their work:
Carnatic Music Analysis: Shadja, Swara Identification and Raga Verification in Alapana using Stochastic Models\ Ranjani HG, Arthi S, Sreenivas TV
Simulation of the Violin Section Sound based on the analysis of orchestra performance\ Jukka Pätynen, Sakari Tervo, Tapio Lokki
Another interesting paper is Informed Source Separation: Source Coding Meets Source Separation. A demo of this can be found here.
Op dinsdag vier oktober 2011 werd een les gegeven over bruikbare software voor muziekanalyse. Het doel was om duidelijk te maken welk type onderzoeksvragen van bachelor/masterproeven baat kunnen hebben bij objectieve metingen met software voor klankanalyse. Ook de manier waarop werd besproken: soorten digitale representaties van muziek met voorbeelden van softwaretoepassingen werden behandeld.
Voor de les werden volgende slides gebruikt (“ppt”:[2011.10.04.bruikbare_software_voor_muziekanalyse.ppt], “odp”:[2011.10.04.bruikbare_software_voor_muziekanalyse.odp]):
De behandelde software voor klank als signaal werd al eerder besproken:
* [Sonic Visualizer](http://www.sonicvisualiser.org): As its name suggests Sonic Visualizer contains a lot different visualisations for audio. It can be used for analysis (pitch,beat,chroma,...) with [VAMP-plugins](http://vamp-plugins.org). To quote *"The aim of Sonic Visualiser is to be the first program you reach for when want to study a musical recording rather than simply listen to it"*. It is the swiss army knife of audio analysis. * [BeatRoot](http://www.eecs.qmul.ac.uk/~simond/beatroot/) is designed specifically for one goal: beat tracking. It can be used for e.g. comparing tempi of different performances of the same piece or to track tempo deviation within one piece. * [Tartini](http://tartini.net) is capable to do real-time pitch analysis of sound. You can e.g. play into a microphone with a violin and see the harmonics you produce and adapt you playing style based on visual feedback. It also contains a pitch deviation measuring apparatus to analyse vibrato. * [Tarsos](http://tarsos.0110.be) is software for tone scale analysis. It is useful to extract tone scales from audio. Different tuning systems can be seen, extracted and compared. It also contains the ability to play along with the original song with a tuned midi keyboard . * [music21](http://mit.edu/music21/) from their website: "music21 is a set of tools for helping scholars and other active listeners answer questions about music quickly and simply. If you've ever asked yourself a question like, "I wonder how often Bach does that" or "I wish I knew which band was the first to use these chords in this order," or "I'll bet we'd know more about Renaissance counterpoint (or Indian ragas or post-tonal pitch structures or the form of minuets) if I could write a program to automatically write more of them," then music21 can help you with your work."
Om aan te duiden welke digitale representaties welke informatie bevatten werd een stuk van Franz Liszt in verschillende formaten gebruikt:
“Partituur (PDF)”:[00.partituur.liebestraum_3.pdf]
“MusicXML”:[01.musicXML-liebestraum_no_3.xml]
“MIDI als partituur”:[01.deadpan_midi.wav]
“MIDI als uitvoering”:[02.performed_midi.wav]
“Acoustische uitvoering”:[03.human.performance.wav]
The DSP library of Tarsos, aptly named TarsosDSP, now contains an implementation of the Goertzel Algorithm. It is implemented using pure Java.
The Goertzel algorithm can be used to detect if one or more predefined frequencies are present in a signal and it does this very efficiently. One of the classic applications of the Goertzel algorithm is decoding the tones generated on by touch tone telephones. These use DTMF (Dual tone multi frequency)-signaling.
To make the algorithm visually appealing a Java Swing interface has been created(visible right). You can try this application by running the “Goertzel DTMF Jar-file”:[GoertzelDTMF.jar]. The souce code is included in the jar and is avaliable as a separate “zip file”:[GoertzelDTMF_src.zip]. The TarsosDSP github page also contains the source for the Goertzel algorithm Java implementation.
The extended abstract about PeachNote Piano has been accepted as a demonstration presentation to appear at the ISMIR (International Society for Music Information Retrieval) 2011 conference in Miami. To know more about PeachNote Piano come see us at our demo stand (during the Late Breaking and Demo Session) or read the paper: “Peachnote Piano: Making MIDI instruments social and smart using Arduino, Android and Node.js”:[PeachNote_Piano_ISMIR_Demo.pdf]. What follows here is the introduction of the extended abstract:
Playing music instruments can bring a lot of joy and satisfaction, but not all apsects of music practice are always enjoyable. In this contribution we are addressing two such sometimes unwelcome aspects: the solitude of practicing and the "dumbness" of instruments. The process of practicing and mastering of music instruments often takes place behind closed doors. A student of piano spends most of her time alone with the piano. Sounds of her playing get lost, and she can't always get feedback from friends, teachers, or, most importantly, random Internet users. Analysing her practicing sessions is also not easy. The technical possibility to record herself and put the recordings online is there, but the needed effort is relatively high, and so one does it only occasionally, if at all. Instruments themselves usually do not exhibit any signs of intelligence. They are practically mechanic devices, even when implemented digitally. Usually they react only to direct actions of a player, and the player is solely responsible for the music coming out of the insturment and its quality. There is no middle ground between passive listening to music recordings and active music making for someone who is alone with an instrument. We have built a prototype of a system that strives to offer a practical solution to the above problems for digital pianos. From ground up, we have built a system which is capable of transmitting MIDI data from a MIDI instrument to a web service and back, exposing it in real-time to the world and optionally enriching it.
A previous post about PeachNote Piano has more technical details together with a video showing the core functionality (quasi-instantaneous USB-BlueTooth-MIDI communication). Some photos can be found below.
While working on a Latex document with several collaborators some problems arise:
Who has the latest version of the TeX-files?
Which LaTeX distributions are in use (MiKTeX, LiveTex,…)
Are all LaTeX packages correctly installed on each computer?
Why is the bibliography, generated with BiBTeX, not included or incomplete?
How does the final PDF look like when it is build by one of the collaborators, with a different LaTeX distribution?
Especially installing and maintaining LaTeX distributions on different platforms (Mac OS X, Linux, Windows) in combination with a lot of LaTeX packages can be challenging. This blog post presents a way to deal with these problems.
The solution proposed here uses a build-server. The server is responsible for compiling the LaTeX source files and creating a PDF-file when the source files are modified. The source files should be available on the server should be in sync with the latest versions of the collaborators. Also the new PDF-file should be distributed. The syncing and distribution of files is done using a Dropbox install. Each author installs a Dropbox share (available on all platforms) which is also installed on the server. When an author modifies a file, this change is propagated to the server, which, in turn, builds a PDF and sends the resulting file back. This has the following advantages:
Everyone always has the latest version of files;
Only one LaTeX install needs to be maintained (on the server);
The PDF is the same for each collaborator;
You can modify files on every platform with Dropbox support (Linux, Mac OS X, Windows) and even smartphones;
Compiling a large LaTeX file can be computationally intensive, a good task for a potentially beefy server.
The implementation of this is done with a couple of bash-scripts running on Ubuntu Linux. LaTeX compilation is handeled by the LiveTeX distribution. The first script compile.bash handles compilation in multiple stages: the cross referencing and BiBTeX bibliography need a couple of runs to get everything right.
```ruby\ #!/bin/bash\ #first iteration: generate aux file\ pdflatex -interaction=nonstopmode —src-specials article.tex\ #run bibtex on the aux file\ bibtex article.aux\ #second iteration: include bibliography\ pdflatex -interaction=nonstopmode —src-specials article.tex\ #third iteration: fix references\ pdflatex -interaction=nonstopmode —src-specials article.tex\ #remove unused files\ rm article.aux article.bbl article.blg article.out\ ```
The second script watcher.bash is more interesting. It watches the Dropbox directory for changes (only in .tex-files) using the efficient inotify library. If a modification is detected the compile script (above) is executed.
```ruby\ #!/bin/bash\ directory=/home/user/Dropbox/article/\ #recursivly watch te directory\ while inotifywait -r $directory; do\ #find all files changed the last minute that match tex\ #if there are matches then do something…\ if find $directory -mmin –1 | grep tex; then\ #tex files changed => recompile\ echo “Tex file changed… compiling”\ /bin/bash $directory/compile.bash\ #sleep a minute to prevent recompilation loop\ sleep 60\ fi\ done\ ```
To summarize: a user-friendly way of collaboration on LaTeX documents was presented. Some server side configuration needs to be done but the clients only need Dropbox and a simple text editor and can start working togheter.