0110.be logo

~ Audio Time Stretching - Implementation in Pure Java Using WSOLA

The DSP library for Taros, aptly named TarsosDSP, now includes an implementation of a time stretching algorithm. The goal of time stretching is to change the duration of a piece of audio without affecting the pitch. The algorithm implemented is described in An Overlap-add Technique Based On Waveform Similarity (WSOLA) for High Quality Time-Scale Modification of Speech.

Time Stretching (WSOLA) in Java

To test the application, download and execute the WSOLA jar file and load an audio file. For the moment only 44.1kHz mono wav is allowed. To get started you can try this piece of audio.

There is also a command line interface, the following command doubles the speed of in.wav:

java -jar TimeStretch.jar in.wav out.wav 2.0

 _______                       _____   _____ _____  
|__   __|                     |  __ \ / ____|  __ \ 
   | | __ _ _ __ ___  ___  ___| |  | | (___ | |__) |
   | |/ _` | '__/ __|/ _ \/ __| |  | |\___ \|  ___/ 
   | | (_| | |  \__ \ (_) \__ \ |__| |____) | |     
   |_|\__,_|_|  |___/\___/|___/_____/|_____/|_|     
                                                    
----------------------------------------------------
Name:
	TarsosDSP Time stretch utility.
----------------------------------------------------
Synopsis:
	java -jar TimeStretch.jar source.wav target.wav factor
----------------------------------------------------
Description:
	Change the play back speed of audio without changing the pitch.

		source.wav	A readable, mono wav file.
		target.wav	Target location for the time stretched file.
		factor		Time stretching factor: 2.0 means double the length, 0.5 half. 1.0 is no change.

The source code of the Java implementation of WSOLA can be found on the TarsosDSP github page.


~ Tarsos CLI: Detect Pitch

Tarsos LogoTarsos contains a couple of useful command line applications. They can be used to execute common tasks on lots of files. Dowload Tarsos and call the applications using the following format:

java -jar tarsos.jar command [argument...] [--option [value]...]

The first part java -jar tarsos.jar tells the Java Runtime to start the correct application. The first argument for Tarsos defines the command line application to execute. Depending on the command, required arguments and options can follow.

java -jar tarsos.jar detect_pitch in.wav --detector TARSOS_YIN

To get a list of available commands, type java -jar tarsos.jar -h. If you want more information about a command type java -jar tarsos.jar command -h

Detect Pitch

Detects pitch for one or more input audio files using a pitch detector. If a directory is given it traverses the directory recursively. It writes CSV data to standard out with five columns. The first is the start of the analyzed window (seconds), the second the estimated pitch, the third the saillence of the pitch. The name of the algorithm follows and the last column shows the original filename.

Synopsis
--------
java -jar tarsos.jar detect_pitch [option] input_file...

Option                                  Description                            
------                                  -----------                            
-?, -h, --help                          Show help                              
--detector <PitchDetectionMode>         The detector to use [VAMP_YIN |        
                                          VAMP_YIN_FFT |                       
                                          VAMP_FAST_HARMONIC_COMB |            
                                          VAMP_MAZURKA_PITCH | VAMP_SCHMITT |  
                                          VAMP_SPECTRAL_COMB |                 
                                          VAMP_CONSTANT_Q_200 |                
                                          VAMP_CONSTANT_Q_400 | IPEM_SIX |     
                                          IPEM_ONE | TARSOS_YIN |              
                                          TARSOS_FAST_YIN | TARSOS_MPM |       
                                          TARSOS_FAST_MPM | ] (default:        
                                          TARSOS_YIN) 

The output of the command looks like this:

Start(s),Frequency(Hz),Probability,Source,file
0.52245,366.77039,0.92974,TARSOS_YIN,in.wav
0.54567,372.13873,0.93553,TARSOS_YIN,in.wav
0.55728,375.10638,0.95261,TARSOS_YIN,in.wav
0.56889,380.24854,0.94275,TARSOS_YIN,in.wav

~ A Robust Audio Fingerprinter Based on Pitch Class Histograms - Applications for Ethnic Music Archives

For the Folk Music Analyisis (FMA) 2012 conference we (Olmo Cornelis and myself), wrote a paper presenting a new acoustic fingerprint scheme based on pitch class histograms.

The aim of acoustic fingerprinting is to generate a small representation of an audio signal that can be used to identify or recognize similar audio samples in a large audio set. A robust fingerprint generates similar fingerprints for perceptually similar audio signals. A piece of music with a bit of noise added should generate an almost identical fingerprint as the original. The use cases for audio fingerprinting or acoustic fingerprinting are myriad: detection of duplicates, identifying songs, recognizing copyrighted material,…

Using a pitch class histogram as a fingerprint seems like a good idea: it is unique for a song and it is reasonably robust to changes of the underlying audio (length, tempo, pitch, noise). The idea has probably been found a couple of times independently, but there is also a reference to it in the literature, by Tzanetakis, 2003: Pitch Histograms in Audio and Symbolic Music Information Retrieval:

Although mainly designed for genre classification it is possible that features derived from Pitch Histograms might also be applicable to the problem of content-based audio identification or audio fingerprinting (for an example of such a system see (Allamanche et al., 2001)). We are planning to explore this possibility in the future.

Unfortunately they never, as far as I know, did explore this possibility, and I also do not know if anybody else did. I found it worthwhile to implement a fingerprinting scheme on top of the Tarsos software foundation. Most elements are already available in the Tarsos API: a way to detect pitch, construct a pitch class histogram, correlate pitch class histograms with a pitch shift,… I created a GUI application which is presented here. It is, probably, acoustic / audio fingerprinting system based on pitch class histograms.

Audio fingerprinter based on pitch class histograms

It works using drag and drop and the idea is to find a needle (an audio file) in a hay stack (a large amount of audio files). For every audio file in the haystack and for the needle pitch is detected using an optimized, for speed, MPM implementation. A pitch class histogram is created for each file, the histogram for the needle is compared with each histogram in the hay stack and, hopefully, the needle is found in the hay stack.

An experiment was done on the audio collection of the museum for Central Africa. A test dataset was generated using SoX with the following Ruby script. The raw results were parsed with another Ruby script. With the data a spreadsheet with the results was created (OpenOffice.org format). Those results are mentioned in the paper.

You can try the system yourself by downloading the fingerprinter.


~ Pitch, Pitch Interval, and Pitch Ratio Representation

To prevent confusion about pitch representation in general and pitch representation in Tarsos specifically I wrote a document about pitch, pitch Interval, and pitch ratio representation. The abstract goes as follows:

This document describes how pitch can be represented using various units. More specifically it documents how a software program to analyse pitch in music, Tarsos, represents pitch. This document contains definitions of and remarks on different pitch and pitch interval representations. For good measure we need a definition of pitch, here the definition from [McLeod 2009] is used: The pitch frequency is the frequency of a pure sine wave which has the same perceived sound as the sound of interest. For remarks and examples of cases where the pitch frequency does not coincide with the fundamental frequency of the signal, also see [McLeod 2009] . In this text pitch, pitch interval and pitch ratio are briefly discussed.


~ TarsosDSP sample application: Utter Asterisk

Uttter AsteriskThe DSP library of Tarsos, aptly named TarsosDSP, contains an implementation of a game that bares some resemblance to SingStar. It is called UtterAsterisk. It is meant to be a technical demonstration showing real-time pitch detection in pure java using a YIN -implementation.

Download Utter Asterisk and try to sing (utter) as close to the melody as possible. The souce code for Utter Asterisk is available on github.


~ TarsosDSP used in jAM - Java Automatic Music Transcription

jAM logoTarsosDSP, a small Java DSP library, has been used in a bachelor thesis: Entwicklung eines Systems zur automatischen Notentranskription von monophonischem Audiomaterial by Michael Wager.

The goal of the thesis was to develop an automatic transcription system for monophonic music. You can download the latest version of jAM – Java Automatic Music Transcription.

If you want to use TarsosDSP, please consult the TarsosDSP page on github or read more about TarsosDSP here.


~ Kinderuniversiteit - Muziek onder de microscoop!

Zondag 18 december 2011 gaf ik een workshop voor de Gentse kinderuniversiteit. Het thema van de kinderuniversiteit was Muziek onder de microscoop. De teaser voor de workshop is hier te vinden:

Logo kinderuniversiteitWORKSHOP – Muziek (ont)luisteren op de computer
Is het mogelijk om piano te spelen op een tafel? Kan een computer luisteren naar muziek en er van genieten? Wat is muziek eigenlijk, en hoe werkt geluid?
Tijdens deze workshop worden de voorgaande vragen beantwoord met enkele computerprogramma’s!

Concreet worden enkele componenten van geluid (en bij uitbreiding, muziek) gedemonstreerd met computerprogrammaatjes gemaakt in het conservatorium:

De foto’s hieronder geven een sfeerbeeld.


~ How To: Generate an Audio Fingerprinting Data Set With Sox Audio Effects

A small part of Tarsos has been turned into a audio fingerprinting application. The idea of audio fingerprinting is to create a condensed representation of an audio file. A perceptually similar audio file should generate similar fingerprints. To test how robust a fingerprinting technique is, a data set with audio files that are alike in some way is practical.

SoX – Sound eXchange is a command line utility for sound processing. It can apply audio effects to a sound. Using these effects and a set of unmodified songs an audio fingerprinting data set can be created. To generate such a data set SoX can be used to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#Trim the first 10 seconds
sox input.wav output.wav trim 10

#speed-up of 10%
sox input.wav output.wav speed 1.10

#change the pitch upwards 100 cents (one semitone)
#without changing the tempo
sox input.wav output.wav pitch 100

#generate white noise with the length of input.wav
sox input.wav noise.wav synth whitenoise
#mix the white noise with the input to generate noisy output
#-v defines how loud the white noise is
sox -m input.wav -v 0.1 noise.wav output.wav

#reverse the audio
sox input.wav output.wav reverse

A ruby script to generate a lot of these files can be found attached.


~ The Power of the Pentatonic Scale

The following video shows Bobby McFerrin demonstrating the power of the pentatonic scale. It is a fascinating demonstration of how quickly a (western) audience of the World Science Festival 2009 adapts to an unusual tone scale:

With Tarsos the scale used in the example can be found. This is the result of a quick analysis: it becomes clear that this, in fact, a pentatonic scale with an unequal octave division. A perfect fifth is present between 255 and 753 cents:

A pentatonic scale, demonstrated by Bobby McFerrin

~ Software for Music Analysis

Friday the second of December I presented a talk about software for music analysis. The aim was to make clear which type of research topics can benefit from measurements by software for music analysis. Different types of digital music representations and examples of software packages were explained.

software for music analysis

Following presentation was used during the talk. (ppt, odp):

To show the different digital representations of music one example (Liebestraum 3 by Liszt) was used in different formats:


Previous blog posts

09-11-2011 ~ Robust Audio Fingerprinting with Tarsos and Pitch Class Histograms

09-11-2011 ~ PeachNote Piano demo at ISMIR 2011

25-10-2011 ~ Tarsos at 'Study Day: Tuning and Temperament - Insitute of Musical Research, London'

25-10-2011 ~ Tarsos presentation at 'ISMIR 2011'

18-10-2011 ~ Tarsos at 'WASPAA 2011'

04-10-2011 ~ Bruikbare software voor muziekanalyse

27-09-2011 ~ Dual-Tone Multi-Frequency (DTMF) Decoding with the Goertzel Algorithm in Java

26-09-2011 ~ PeachNote Piano at the ISMIR 2011 demo session

21-09-2011 ~ Simplify Collaboration on a LaTeX Documents with Dropbox and a Build Server

21-09-2011 ~ The Pidato Experiment: Vibrato on a Digital Piano Using an Arduino