Software
Below you can find links to the open source software I developed during my research. It is always nice to hear how this software is used, don’t hesitate to drop me a line. Bug reports are welcomed as well.
TarsosDSP: a Java Library for Audio Processing
TarsosDSP is a Java library for audio processing. Its aim is to provide an easy-to-use interface to practical music processing algorithms implemented, as simply as possible, in pure Java and without any other external dependencies. TarsosDSP features an implementation of a percussion onset detector and a number of pitch detection algorithms: YIN, the Mcleod Pitch method and a “Dynamic Wavelet Algorithm Pitch Tracking” algorithm. Also included is a Goertzel DTMF decoding algorithm, a time stretch algorithm (WSOLA), resampling, filters, simple synthesis, some audio effects, and a pitch shifting algorithm.
To show the capabilities of the library, TarsosDSP example applications are available. Head over to the TarosDSP release directory for downloads.
Download TarosDSP | Fork TarsosDSP on Github | Consult the API documentation | Read the manual | TarsosDSP article
Feature extraction with TarsosDSP: Constant-Q, beat-detection, onset-detection, pitch estimation
Panako: an Acoustic Fingerprinting System
Panako is an extendable acoustic fingerprinting framework. The aim of acoustic fingerprinting is to find small audio fragments in large audio databases. Panako contains several acoustic fingerprinting algorithms to make comparison between them easy. The main Panako algorithm uses key points in a Constant-Q spectrogram as a fingerprint to allow pitch-shifting, time-stretching and speed modification. The aim of Panako is to serve as a platform for research on Acoustic Fingerprinting Systems.
Panako fingerprint with modifications.
Web applications or libraries
Thanks to WebAssembly it is possible to repurpose software for use on the web, even if it was originally designed with other use in mind. I have developed some practical tools/proof of concepts by repackaging and extending existing software for use on the web. The main advantage is the ease of use. Software is accessible to anyone with a browser, without having to install any additional software. Another advantage should be longevity: web standards have a tendency to be supported for a long time.
I have developed the following web-based proof-of-concepts, each link to a blog post explaining them further:
- SyncSink.wasm – Synchronize media files by audio-to-audio alignment It is is able to synchronize video and audio recordings of the same event by using audio-to-audio alignment.
- ffmpeg.audio.wasm – An audio focused ffmpeg build for the web Since ffmpeg handles almost any audio format, this software enables developers to build web applications which support even the most diverse audio formats (and containers, sample rates, sample formats,…).
- ltc.wasm – SMPTE decoding in the browser, it does exactly that: decode SMPTE from audio. It is practical for synchronisation purposes.
- pffft.wasm – an FFT library for the web, the FFT is a fundamental building block of many audio applications. pffft.wasm provides a fast FFT calculation by employing SIMD.
- Emotopa – Patterns in Pitch Organization easily discover how pitch is used in almost any type of music.
- Gabber – Visualizing constant-Q transform in the browser A fine grained spectral transform to demonstrate properties of musical audio.
- Olaf – Acoustic fingerprinting in the browser A demonstration of acoustic fingerprinting in the browser.
Example of an audio-to-audio alignment in the browser.
Olaf: Overly Lightweight Acoustic Fingerprinting
Olaf is a portable, landmark-based, acoustic fingerprint-ing system released as open source software. Olaf runs on embedded platforms, traditional computers and in the browser. Olaf is able to extract fingerprints from an audio stream, and either store those fingerprints in a database, or find a match between extracted fingerprints and stored fingerprints. It implements an algorithm similar to the one described in a classic ISMIR paper and has similar retrieval performance. It facilitates the many use cases acoustic fingerprinting has to offer such as duplicate detection, meta-data coupling, and synchronization.
Olaf blog post | Olaf GitHub Repo | Olaf LBD ISMIR2020 abstract
Olaf in the browser.
JGaborator: High resolution spectral transforms from Java
This library calculates fine-grained constant-Q spectral representations of audio signals quickly from Java. The spectral transform can be visualized or further processed in a (Music Information Retrieval) processing chain.
The calculation of a Gabor transform is done by a C++ library named Gaborator. A Java native interface (JNI) bridge to the C++ Gaborator is provided here. A combination of Gaborator and a fast FFT library (such as pfft) allows fine grained constant-Q transforms at a rate of about 200 times real-time on moderate hardware.
Fine grained spectral transforms in Java.
SyncSink: A tool to synchronize video and audio files
SyncSink is able to synchronize video and audio recordings of the same event. As long as some audio is shared between the multimedia files a reliable synchronization solution will be proposed. SyncSink is ideal to synchronize video recordings of the same event by multiple cameras or to align a high definition audio recording with a video recording (with less qualitative audio). SyncSink is also used to facilitate synchronization of multimodal research data e.g. to research the interaction between movement and music.
SyncSink program with some multimedia files.
Tarsos: Software for Pitch Analysis
Tarsos is a software tool to analyze and experiment with pitch organization in all kinds of musics. Most of the analysis is done using pitch histograms and octave reduced pitch class histograms. Tarsos has an intuitive user interface and contains a couple of command line programs to analyze large sets of music.
To run Tarsos you need a recent Java runtime on your machine.
Download Tarsos | Watch the screencast | Tarsos on Github | Tarsos API | Tarsos Manual | The Tarsos JNMR articlePitch analysis with Tarsos
TarsosLSH: An Implementation of Locality Sensitive Hashing (LSH) in Java
TarsosLSH is a Java library implementing Locality-sensitive Hashing (LSH), a practical nearest neighbor search algorithm for multidimensional vectors that operates in sublinear time. It supports several LSH families: the Euclidean hash family (L2), city block hash family (L1) and cosine hash family. The library tries to hit the sweet spot between being capable enough to get real tasks done, and compact enough to serve as a demonstration on how LSH works.
Head over to the TarosLSH release directory for downloads.
General vs. locality sensitiv hashing (from cybertron.cg.tu-berlin.de)
TeensyDAQ: an application to visualize analog input signals on a Teensy
TeensyDAQ is a Java application to quickly visualize and record analog signals with a Teensy micro-controller and some custom software. It is mainly useful to quickly get an idea of how an analog sensor reacts to different stimuli. Some of the features of the TeensyDAQ:
- Visualize up to five analog signals simultaneously in real-time.
- Capture analog input signals with sampling rates up to 8000Hz.
- Record analog input to a CSV-file and, using drag-and-drop, previously recorded CSV-files can be visualized.
- Works on Linux, Mac OS X and Windows.
- While a capture session is in progress you can going back in time and zoom, pan and drag to get a detailed view on your data.
- Allows you to listen to your input signal, this is especially practical with analog microphone input.
TeensyDAQ user interface.
JGaborator: a JNI bridge with the gaborator C++ library
This library calculates fine grained constant-Q spectral representations of audio signals quickly from Java. The calculation of a Gabor transform is done by a C++ library named Gaborator. A Java native interface (JNI) bridge to the C++ Gaborator is provided in this project. A combination of Gaborator and a fast FFT library (such as pfft) allows fine grained constant-Q transforms at a rate of about 200 times real-time on moderate hardware. It can serve as a front-end for several audio processing or MIR applications.
A spectral visualization tool, a part of the JGaborator package.
AMPEL: The Augmented Movement Platform For Embodied Learning
The idea behind AMPEL (by Lousin Moumdjian, Marc Leman, Peter Feys) is to combine both motor and cognitive rehabilitation in a single combined ‘embodied learning’ paradigm. To this end a device was developed which combines a short term memory task with a movement task: in this case walking over interactive tiles.
AMPEL is build together with a number of collaborators for the concept, the hardware design and implementation. I was responsible for the development of the embedded software in the tiles and for development of the GUI application the researchers.
Schematic representation of the AMPEL platform.
Pidato: Vibrato on a Digital Piano Using an Arduino
The Pidato experiment demonstrates a rather straightforward method to handle vibrato on a digital piano. It ‘solves’ the age-old problem on what to do with the enigmatic “vibrato” instructions on some piano solo scores of Franz Liszt.
The way it works is by translating movement (accelerometer data) to MIDI messages. Pidato consists of a hardware and software part. The hardware consists of an Arduino, MIDI-ports and a three axis accelerometer. The software should know when a vibrato like movement is made and how to translate such movement to MIDI messages. The software therefore contains a periodicity estimator and frequency detector to detect how periodic a movement is and how fast the movement is repeated. This was done with the YIN algorithm (more commonly used in audio signal analysis).
Wire schema of the Pidato implementation.