~ DiscStitch & BAF - Contributions to ISMIR 2022

» By Joren on Sunday 04 December 2022

This year the ISMIR 2022 conference is organized from 4 to 9 December 2022 in Bengaluru, India. ISMIR is the main music technology and music information retrieval (MIR) conference. It is a relief to experience a conference in physical form and not through a screen.

I have contributed to the following work which is in the main paper track of ISMIR 2022:

BAF: An Audio Fingerprinting Dataset For Broadcast Monitoring (version of record)\ Guillem Cortès, Alex Ciurana, Emilio Molina, Marius Miron, Owen Meyers, Joren Six, Xavier Serra\
Abstract: Audio Fingerprinting (AFP) is a well-studied problem in music information retrieval for various use-cases e.g. content-based copy detection, DJ-set monitoring, and music excerpt identification. However, AFP for continuous broadcast monitoring (e.g. for TV & Radio), where music is often in the background, has not received much attention despite its importance to the music industry. In this paper (1) we present BAF, the first public dataset for music monitoring in broadcast. It contains 74 hours of production music from Epidemic Sound and 57 hours of TV audio recordings. Furthermore, BAF provides cross-annotations with exact matching timestamps between Epidemic tracks and TV recordings. Approximately, 80% of the total annotated time is background music. (2) We benchmark BAF with public state-of-the-art AFP systems, together with our proposed baseline PeakFP: a simple, non-scalable AFP algorithm based on spectral peak matching. In this benchmark, none of the algorithms obtain a F1-score above 47%, pointing out that further research is needed to reach the AFP performance levels in other studied use cases. The dataset, baseline, and benchmark framework are open and available for research.

I have also presented a first version of DiscStitch, an audio-to-audio alignment algorithm. This contribution is in the ISMIR 2022 late breaking demo session:

DiscStitch: towards audio-to-audio alignment with robustness to playback speed variabilities (version of record)\ Joren Six\
Abstract: Before magnetic tape recording was common, acetate discs were the main audio storage medium for radio broadcasters. Acetate discs only had a capacity to record about ten minutes. Longer material was recorded on overlapping discs using (at least) two recorders. Unfortunately, the recorders used were not reliable in terms of recording speed, resulting in audio of variable speed. To make digitized audio originating from acetate discs fit for reuse, (1) overlapping parts need to be identified, (2) a precise alignment needs to be found and (3) a mixing point suggested. All three steps are challenging due to the audio speed variabilities. This paper introduces the ideas behind DiscStitch: which aims to reassemble audio from overlapping parts, even if variable speed is present. The main contribution is a fast and precise audio alignment strategy based on spectral peaks. The method is evaluated on a synthetic data set.

Next to my own contributions, the ISMIR conference program is the best overview of the state-of-the art of MIR.

This contribution was made possible thanks to travel funds by the FWO travel grant K1D2222N and the Ghent University BOF funded project PaPiOM.

Music Information Retrieval, ISMIR, and UGent Attachments

2022_DiscStitch__towards_audio-to-audio_alignment_with_robustness_to_playback_speed_variabilities_ISMIR_2022_Late_Breaking___Demo_abstracts.webp, discstitch_static_poster.pdf, BAF_broadcast_audio_fingerprinting.mp4, ismir_tab_icon.png, and baf_broadcast_audio_fingerprinting_thumb.jpg

~ Panako: a scalable audio search system

» By Joren on Wednesday 12 October 2022

Fig: DALL.E 2 imagining a fight between papers and software.

Recently I have published a paper titled ‘Panako: a scalable audio search system’ in the Journal of Open Source Software (JOSS). The journal is a ‘hack’ to circumvent the focus on citable papers in the academic world: getting recognition for publishing software as a researcher is not straightforward.

The research output tracking system of Ghent University (biblio) and Flanders FWO’s academic profile are not built to track software as research output. The focus is still solely on papers, even when custom developed research software has become a fundamental aspect in many research areas. My role is somewhere between that of a ‘pure’ researcher and that of a research software engineer which makes this focus on papers quite relevant to me.

The paper aims to make the recent development on Panako ‘count’. Thanks to the JOSS review process the Panako software was improved considerably: CI, unit tests, documentation, containerization,… The paper was a good reason to improve on all these areas which are all too easy to neglect. The paper itself is a short, rather general overview of Panako:

“Panako solves the problem of finding short audio fragments in large digital audio archives. The content based audio search algorithm implemented in Panako is able to identify a short audio query in a large database of thousands of hours of audio using an acoustic fingerprinting technique.”

Research papers, Panako, and UGent

~ Low impact runner: a music based bio-feedback system

» By Joren on Saturday 01 October 2022

Fig: schema of the low impact runner system.

I have been lucky to have been involved in an interdisciplinary research project around the low impact runner: a music based bio-feedback system to reduce tibial shock in over-ground running. In the beginning of October 2022 the PhD defence of Rud Derie takes place so it is a good moment to look back to this collaboration between several branches of Ghent University: IPEM , movement and sports science and IDLab.

The idea behind the project was to first select runners with a high foot-fall impact. Then an intervention would slightly nudge these runner to a running style with lower impact. A lower repetitive impact is expected to reduce the chance on injuries common for runners. A system was invented in which musical bio-feedback was given on the measured impact. The schema to the right shows the concept.

I was involved in development of the first hardware prototypes which measured acceleration on the legs of the runner and the development of software to receive and handle these measurement on a tablet strapped to a backpack the runner was wearing. This software also logged measurements, had real-time visualisation capabilities and allowed remote control and monitoring over the network. Finally measurements were send to a Max/MSP sonification engine. These prototypes of software and hardware were replaced during a valorization project but some parts of the software ended up in the final Android application.

Video: the left screen shows the indoor positioning system via UWB (ultra-wide-band) and the right screen shows the music feedback system and the real time monitoring of impact of the runner. Video by Pieter Van den Berghe

Over time the first wired sensors were replaced with wireless Bluetooth versions. This made the sensors easy to use and also to visualize sensor values in the browser thanks to the Web Bluetooth API. I have experimented with this and made two demos: a low impact runner visualizer and one with the conceptual schema.

Vid: Visualizing the Bluetooth Low Impact Runner sensor in the browser.

The following three studies shows a part of the trajectory of the project. The first paper is a validation of the measurement system. Secondly a proof-of-concept study is done which finally greenlights a larger scale intervention study.

Van den Berghe, P., Six, J., Gerlo, J., Leman, M., & De Clercq, D. (2019). Validity and reliability of peak tibial accelerations as real-time measure of impact loading during over-ground rearfoot running at different speeds. Journal of Biomechanics, 86, 238-242.
Van den Berghe, P., Lorenzoni, V., Derie, R., Six, J., Gerlo, J., Leman, M., & De Clercq, D. (2021). Music-based biofeedback to reduce tibial shock in over-ground running: A proof-of-concept study. Scientific reports, 11(1), 1-12.
Van den Berghe, P., Derie, R., Bauwens, P., Gerlo, J., Segers, V., Leman, M., & De Clercq, D. (2022). Reducing the peak tibial acceleration of running by music‐based biofeedback: A quasi‐randomized controlled trial. Scandinavian Journal of Medicine & Science in Sports

There are quite a number of other papers but I was less involved in those. The project also resulted in two PhD’s:

Motor retraining by real-time sonic feedback: understanding strategies of low impact running (2021) by Pieter Van den Berghe
Running on good vibes: music induced running-style adaptations for lower impact running (2022) by Rud Derie

I am also recognized as co-inventor on the low impact runner system patent and there are concrete plans for a commercial spin-off. To be continued…

Conceptual schema
Screenshot of the smartphone app
Screening and selection
Music feedback schema

Research papers, Java, IPEM, and UGent

~ SMPTE decoding in the browser

» By Joren on Friday 16 September 2022

I have created a web application to LTC.wasm decodes SMPTE timecodes from an LTC encoded audio signal.

To synchronize multiple music and video recordings a shared SMPTE timecode signal is often used. For practical purposes the timecode signal is encoded in an audio stream. The timecode can then be recorded in sync with microphone inputs or added to a video recording. The timecode is encoded in audio with LTC, linear timecode. A special decoder is needed to extract SMPTE timecode from the audio. This is exactly what the LTC.wasm application does.

Using the [web based SMTE decoder](https://0110.be/attachment/cors/ltc.wasm/ltc_decoder.html

Try out the SMPTE decoder with your own SMPTE files.

The advantage of the web-based version versus the command line ltc-tools is that it does not need to be installed separately and that ffmpeg decodes audio. This means that almost any multimedia format is supported automatically. The command line version only supports a limited number of audio formats.

For further information check out the the LTC.wasm GitHub repository

UGent

~ SyncSink.wasm - Synchronize media files by audio-to-audio alignment

» By Joren on Tuesday 06 September 2022

I have built a tool for audio-to-audio alignment. It has applications for synchronization of media files. It works in the browser and you can synchronize your media files here with SyncSink.wasm. SyncSink.wasm does the following:

From an incoming media-file audio is extracted, downmixed to mono and and resampled. This is done with ffmpeg.audio.wasm a wasm version of ffmpeg.
For each audio track, fingerprints are extracted. These fingerprints reduce the the search space for alignment drastically.
Each list of fingerprints is aligned with the list of fingerprints from the reference. Resulting in a rough alignment
Cross correlation is done to refine the alignment resulting in sample accurate results.

Fig: media synchronization with audio-to-audio alignment.

It supports small time-scale adjustments of around 5%: audio alignment can still be found if audio speed differs a bit.

Some potential use cases where it might be of use:

To stitch partially overlapping audio recordings together resulting in a single long audio recording.
To synchronize multiple independent video recordings of the same event each with an audio recording of the environment.
To align a high quality microphone recording with video/low-quality audio recording of the same event. The low quality audio recorded with a camera can then be replaced with the high quality microphone audio.

The code can be found in the SyncSink.wasm GitHub repository

UGent, ISMIR, and Code Attachments

media_sync_recording.apng

~ Sending audio over a network with ffmpeg

» By Joren on Tuesday 30 August 2022

Fig: stable diffusion imagining a networked music performance

This post describes how to send audio over a network using the ffmpeg suite. Ffmpeg is the Swiss army knife for working with audio and video formats. It is a command line tool that supports almost all audio formats known to man and woman. ffmpeg also supports streaming media over networks.

Here, we want to send audio recorded by a microphone, over a network to a single receiver on the other end. We are not aiming for low latency. Also the audio is going only in a single direction. This can be of interest for, for example, a networked music performance. Note that ffmpeg needs to be installed on your system.

The receiver - Alice

For the receiver we use ffplay, which is part of the ffmpeg tools. The command instructs the receiver to listen to TCP connections on a randomly chosen port 12345. The \?listen is important since this keeps the program waiting for new connections. For streaming media over a network the stateless UDP protocol is often used. When UDP packets go missing they are simply dropped. If only a few packets are dropped this does not cause much harm for the audio quality. For TCP missing packets are resent which can cause delays and stuttering of audio. However, TCP is much more easy to tunnel and the stuttering can be compensated with a buffer. Using TCP it is also immediately clear if a connection can be made. With UDP packets are happily sent straight to the void and you need to resort to wiresniffing to know whether packets actually arrive.

ffplay -nodisp -f mpegts tcp://0.0.0.0:12345\?listen

In this example we use MPEGTS over a plain TCP socket connection. Alteratively RTMP could be used (which also works over TCP). RTP , however is usually delivered over UDP.

The shorthand address 0.0.0.0 is used to bind the port to all available interfaces. Make sure that you are listening to the correct interface if you change the IP address.

The sender - Björn

Björn, aka Bob, sends the audio. First we need to know from which microphone to use. To that end there is a way to list audio devices. In this example the macOS avfoundation system is used. For other operating systems there are similar provisions.

ffmpeg -f avfoundation -list_devices true -i ""

Once the index of the device is determined the command below sends incoming audio to the receiver (which should already be listening on the other end). The audio format used here is MP3 which can be safely encapsulated into mpegts.

Note that the IP address 192.168.x.x needs to be changed to the address of the receiver. Now if both devices are on the same network the incoming audio from Bob should arrive at the side of Alice.

The tunnel

If sender and receiver are not on the same network it might be needed to do Network Addres Translation (NAT) and port forwarding. Alternatively an ssh tunnel can be used to forward local tcp connections to a remote location. So on the sender the following command would send the incoming audio to a local port:

ffmpeg -f avfoundation -i ":1" -acodec libmp3lame -ab 196k -f mpegts tcp://192.168.x.x:12345

The connection to the receiver can be made using a local port forwarding tunnel. With ssh the TCP traffic on port 12345 is forwarded to the remote receiver via an intermediary (remote) host using the following command:

ssh -v -L 12345:192.168.x.x:12345 user@host -N

0110.be and UGent

~ Using Java LMDB on Apple Sillicon or other unsupported platforms

» By Joren on Wednesday 25 May 2022

LMDB is a fast key value store, ideal to store and query sorted data with small keys and values. LMDB is a pure C library but often used from other programming languages via some type of bindings. These bindings are ‘bridges’ between languages and are automatically present on supported platform. On new or unsupported platforms, however, you need to build a this bridge yourself.

This blog post is about getting java-lmdb working on such unsupported platform: arm64. The arm64 platform is much more popular since the introduction of the Apple silicon - M1 platform. On Apple M1 the default architecture of Docker images is also aarch64.

The java-lmdb project uses JNR-FFI in the background. This is only one of the many ways to bridge Java and other programming languages. The new version of JNR-FFI supports the arm64. Currently, only the ‘SNAPSHOT’ version of java-lmdb uses this version. So the dependencies need to be changed to e.g. (when using Gradle):

repositories {
    mavenCentral()
    maven { url 'https://oss.sonatype.org/content/repositories/snapshots' }
}

dependencies {
    implementation group: 'org.lmdbjava', name: 'lmdbjava', version: '0.8.3-SNAPSHOT'
}

Next you need to build the lmdb library for your platform and copy it to a location where Java looks for it. This only works when compilers are already available on your system. In macOS you might need to install the XCode command line tools:

#xcode-select --install
git clone --depth 1 https://git.openldap.org/openldap/openldap.git
cd openldap/libraries/liblmdb
make -e SOEXT=.dylib
cp  liblmdb.dylib ~/Library/Java/Extensions

On Debian aarch64 the procedure is similar but a different extension is used (.so):

#apt install build-essential
git clone --depth 1 https://git.openldap.org/openldap/openldap.git
cd openldap/libraries/liblmdb
make
mv liblmdb.so /lib

Finally, to use the library in a JAR-file is might be needed to allow lmdbjava to access some parts of the JRE:

java -jar your_jar.jar --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED

A very similar setup was needed for the docker version of Panako.

UGent and Code

~ The Augmented Movement Platform For Embodied Learning (AMPEL)

» By Joren on Friday 22 April 2022

I have been lucky to be part of a fruitful interdisciplinary scientific collaboration around AMPEL: ‘The Augmented Movement Platform For Embodied Learning’. The recent publication of an article is an ideal occasion to give a glimpse behind the scenes.

Fig: Schematic representation of AMPEL, a floor with interactive tiles.

Around 2016 the idea arose to search for new potential rehabilitation approaches for persons with multiple sclerosis. Multiple sclerosis causes problems, in varying degrees, with both motor and cognitive function. Common rehabilitation approaches either work on motor or cognitive function. The idea (by Lousin Moumdjian, Marc Leman, Peter Feys) was to combine both motor and cognitive rehabilitation in a single combined ‘embodied learning’ paradigm.

After some discussion we wanted to perform a combined short-term memory and walking task. First the participants would be presented with a target trajectory which would then be performed by walking. During walking we would modulate feedback types (melodic, sounds or visual). To this end, an ‘intelligent floor’ device was needed that was able to present a target trajectory, register a performed trajectory and provide several types of feedback. After a search for off-the-shelf solutions it became clear that a custom hard-and-software platform was required.

After a great deal of cardboard prototyping we settled on a design consisting of interactive tiles. Thomas Vervust of UGhent NamiFab designed a PCB with force sensitive resistors (FSR) on the bottom and RGB LED’s on top. Ivan Schepers provided practical insights during prototyping and developed the hardware around the interactive tiles. I was responsible for programming the system. Custom software was developed for the tiles, a controller to drive the tiles and to run and record experiments. Finally the system was moved to a hospital where the experiments took place. To know more about the exact experiments, please read the following three publications on AMPEL:

The Augmented Movement Platform For Embodied Learning (AMPEL): development and reliability - 2021
Moumdjian, L., Vervust, T., Six, J., Schepers, I., Lesaffre, M., Feys, P., & Leman, M.
This article details the rationale behind AMPEL and provides technical details and reliability measurements. It was published in the Journal on Multimodal User Interfaces
Embodied learning in multiple sclerosis using melodic, sound, and visual feedback: a potential rehabilitation approach - 2022
Moumdjian, L., Six, J., Veldkamp, R., Geys, J., Van Der Linden, C., Goetschalckx, M., Van Nieuwenhoven, J., Bosmans, I., Leman, M. and Feys, P.
The main AMPEL study which presents the rehabilitation potential. This work was published in the Annals of the New York Academy of Sciences.
[Motor sequence learning in a goal-directed stepping task in persons with multiple sclerosis: a pilot study\
- 2022](https://nyaspubs.onlinelibrary.wiley.com/doi/10.1111/nyas.14702)
  Veldkamp, R., Moumdjian, L., van Dun, K., Six, J., Vanbeylen, A., Kos, D. and Feys, P.
  For this study AMPEL was slightly modified for a reaction time task, showing its flexibility. The participants were asked to step on a tile as quickly as possible after it lit up. They were either knowledgable of the tile trajectory or not. This work was also published in the Annals of the New York Academy of Sciences.

Almost connected the full I2C bus
The driving laptop and main Arduino
Testing LEDS
AMPEL schematic
Custom software driving AMPEL
Examples of walking paths
Early test of AMPEL
AMPEL ready to use
First tests of the I2C bus

UGent

~ An audio focused ffmpeg build for the web

» By Joren on Thursday 24 February 2022

I have prepared an audio focused ffmpeg build for the web which facilitates browser based audio applications. I have prepared three demos:

Audio transcoding and playback demo: converts any media file into audio compatible with the Web Audio API for in-browser playback or analysis.
High quality time-stretching or pitch-shifting: demonstrates how pitch and tempo can be modified independently thanks to the Rubber Band Library.
Basic media info: gives information about the streams and encodings used in a media file.

Fig: [audio transcodinging in the browser](/attachment/cors/ffmpeg.audio.wasm/transcode.html). A `wav` file is converted to an `mp3`.

A bit more about the rationale behind this effort: Browsers have become practical platforms for audio processing applications thanks to the combination of Web Audio API , performant Javascript environment and WebAssembly. Have a look, for example, at essentia.JS.

However, browsers only support a small subset of audio formats and container formats. Dealing with many (legacy) audio formats is often a rather painful experience since there are so many media container formats which can contain a surprising variation of audio (and video) encodings. In short, decoding audio for in-browser analysis or playback is often problematic.

Luckily there is FFmpeg which claims to be ‘a complete, cross-platform solution to record, convert and stream audio and video’. It is, indeed, capable to decode almost any audio encoding known to man from about any container. Additionally, it also contains tools to filter, manipulate, resample, stretch, … audio. FFmpeg is a must-have when working with audio. It would be ideal to have FFmpeg running in a browser…

Thanks to WebAssembly ffmpeg can be compiled for use in the browser. There have been efforts to get ffmpeg working in the browser. These efforts have been focusing on the complete ffmpeg suite. Now I have prepared an audio focused ffmpeg build for the web based on these efforts. I have selected only audio parts which makes the resulting .wasm binary four to five times smaller (from \~20MB to \~5MB). I also provided a simplified Javascript wrapper. The project brings audio decoding to the browser but also audio filtering, transcoding, pitch-shifting, sample rate conversions, audio channel manipulation, and so forth. It is also capable to extract audio streams from video container formats.

Next to the pure functionality of ffmpeg there are general advantages to run audio analysis software in the browser at client-side:

Ease-of-use: no software needs to be installed. The runtime comes with a compatible browser.
Privacy: Since media files are not transferred it is impossible for the system running the service to make unauthorised copies of these files. There is no need to trust the service since all processing happens locally, in the browser.
Speed: Downloading and especially uploading large media files takes a while. When files are kept locally, processing can start immediately and no time is wasted sending bytes over the internet. This results in a snappy user experience.
Computational load: the computational load of transcoding is distributed over the clients and not centralised on a (single) server. The server does not do any computing and only serves static files, so it can handle as many concurrent clients as its bandwidth allows.

Check out the audio focused ffmpeg build for the web on GitHub.

UGent, Code, and Command Line Application

~ pffft.wasm: an FFT library for the web

» By Joren on Thursday 10 February 2022

PFFFT is a small, pretty fast FFT library programmed in C with a BSD-like license. I have taken it upon myself to compile a WebAssembly version of PFFFT to make it available for browsers and node.js environments. It is called pffft.wasm and available on GitHub.

The pffft.wasm library comes in two flavours. One is compiled with SIMD instructions while the other comes without these instructions. SIMD stands for ‘single instruction, multiple data’ and does what it advertises: in a single step it processes multiple datapoints. The aim of SIMD is to make calculations several times faster. Especially for workloads where the same calculations are repeated over and over again on similar data, SIMD optimisation is relevant. FFT calculation is such a workload.

Evidently the SIMD version is much faster but there is no need to take my word for it. Below you can benchmark the SIMD version of pffft.wasm and compare it with the non-SIMD version on your machine. A pure Javascript FFT library called FFT.js serves as a baseline.

When running the same benchmark on Firefox and on Chrome it becomes clear that FFT.js on Chrome is about twice as fast thanks to its superior Javascript engine for this workload. The performance of the WebAssembly versions in Chrome and Firefox is nearly identical. Safari unfortunately does not (yet) support SIMD WebAssembly binaries and fails to complete the benchmark.

The source code, the limitations and other info can be found at the pffft.wasm GitHub repository

Edit: PulseFFT might be of interest as well: a (as far as I can tell non-SIMD) WASM version of KissFFT.

Benchmark pffft.wasm - Firefox on a 2010 Macbook Air
Benchmark pffft.wasm - Chrome on an Apple M1 Pro chip
Benchmark pffft.wasm - Chrome on an 2010 Macbook Air
STFT calculated with pffft.wasm

Code and UGent

~ Panako 2.0 - Updates for an acoustic fingerprinting system

» By Joren on Sunday 07 November 2021

At the online ISMIR 2021 conference I have presented updates to Panako, an audio fingerprinting system:

This work presents updates to Panako, an acoustic fingerprinting system that was introduced at ISMIR 2014. The notable feature of Panako is that it matches queries even after a speedup, time-stretch or pitch-shift. It is freely available and has no problems indexing and querying 100k sea shanties. The updates presented here improve query performance significantly and allow a wider range of time-stretch, pitch-shift and speed-up factors: e.g. the top 1 true positive rate for 20s query that were sped up by 10 percent increased from 18% to 83% from the 2014 version of Panako to the new version. The aim of this short write-up is to reintroduce Panako, evaluate the improvements and highlight two techniques with wider applicability. The first of the two techniques is the use of a constant-Q non-stationary Gabor transform: a fast, reversible, fine-grained spectral transform which can be used as a front-end for many MIR tasks. The second is how near-exact hashing is used in combination with a persistent B-Tree to allow some margin of error while maintaining reasonable query speeds.

Together with the paper there is also a poster and a short video presentation which explains the work:

ISMIR, Panako, and UGent Attachments

2021.ismir-lbd-panako-updates.pdf, poster-movie.mp4, and panako_updates_poster.pdf

~ Decoding LTC and SMPTE on Teensy - Now using interrupts

» By Joren on Friday 24 September 2021

Have you ever found yourself wondering how to build an accurate, low-latency LTC decoder with a common micro-controller? Well! Wonder no more and read on! Or, stop reading and do go read something that is more appealing to your predispositions.

SMPTE timecodes were originally used to synchronize audio and video material. SMPTE timecode data is often encoded into audio using LTC or linear time code. This special audio stream can be recorded together with other audio and video material. By decoding the LTC audio afterwards and working back to SMPTE timecodes, synchronization of multiple camera angles and audio material becomes straightforward. This concept tagging data streams with SMPTE timecodes is also used for other types of data.

\ Fig: LTC is a 'self-clocking' protocol for which a period can be found automatically. Once the period is found, transitions within the period are counted. A period with a transition translates to a 1, a period without any transitions to a 0.

SMPTE timecodes supports up to 30 frames per second and this resolution might not be sufficient for some data streams. It helps if the frames could be split up and 60 or 120 frames per second could be generated. With a low latency LTC decoder it would be possible to support this case and, for example, provide four pulses for every SMPTE frame. To be more precise: a SMPTE frame consists of 80 bits and in this case we would send a pulse exactly when decoding bit 0, bit 20, bit 40 and bit 60. We would then be able to sample at 120Hz while staying in sync with the SMPTE.

My first attempt was to treat the signal like audio and use a ready built library for LTC audio decoding The problem there is that sampling is done which might not exactly match the SMPTE bit transition period and relatively large buffers are used to decode LTC. The bit exact decoding is not possible using this method: the latency is too large, the method also uses excessive computational power and memory.

Biasing circuit to offset voltages from zero centred to having a bias

Fig: Biasing circuit to offset voltages

In my second attempt, the current iteration, interrupts are used to detect rising and falling edges in the LTC stream. By counting the number of microseconds between these edges a bit string is constructed. Effectively decoding LTC without any wasted computational power or memory and at a very low latency. If the LTC stream is well-formed, following each incoming bit and reacting to it becomes straightforward. Finally, after gently massaging the LTC bit string, SMPTE timecodes ooze out of the system at a low latency.

I have implemented a low latency LTC and SMPTE timecode data decoder for a Teensy microcontroller. One of the current limitations is that only 30fps SMPTE without skipped frames is supported. Another limitation is that the precision of the derived 120Hz clock is dependent on the sampling rate of the encoded audio signal: if e.g. only 8000Hz is used, transitions can only be precise up until 125µs. The derived clock will jitter slightly but will not drift.

There is still a slight problem with audio and Teensy input: audio is generally transmitted from ~~1.8V to +1.8V and not~~ as a Teensy would expect - from 0 to 3.3V. To make this change a small biasing circuit is placed before the Teensy input. In my case two 100k resistors and a 0.1uF capacitor worked best. The interrupt is relatively robust against signals that are a clipping (outside the 0 - 3.3V) or slightly too silent. If the signal becomes too small LTC decoding obviously fails.

For more information and updates see the GitHub Repository for the low latency LTC and SMPTE timecode data decoder.

Code and UGent

~ Updates for Panako - an acoustic fingerprinting system

» By Joren on Sunday 11 July 2021

Panako is an acoustic fingerprinting system I developed a couple of years ago. With acoustic fingerprinting systems it is possible to find duplicates in digital music archives and compare meta-data or identify unlabelled audio fragments. In the margins of my post-doc project working with large music archives, I have found the time to update Panako significantly. The updates simplify, improve and speed up Panako.

\ Fig. General content based audio search scheme.

The main algorithms are simplified. There is also a reduction of dependencies and a refocus to core functionality. This also simplifies building the software. The retrieval characteristics are improved, mainly thanks to the use of a fine-grained Gabor transform. Also new is the near-exact hashing construct which helps with off-by-one issues when matching time bins. The key-value store used is now LMDB, which speeds up the query performance of Panako significantly. The updates should make Panako stand the test of time somewhat better.

\ Fig. The top one true positive rate for 20s query fragments. The audio playback is speed modified from 84 to 116% with respect to the indexed reference audio. The original query length is 20s, if it is slowed down by 10% it takes, evidently, 22s. Note the improvement of the 2021 version of Panako (blue) vs the 2014 version (light-gray). As a baseline the standard algorithm (wang 2003) is included as well. For the 2021 Panako algorithm, audio recognition performance suffers (below 80) when playback speed is changed more than 10.

A more complete list of updates can be found below and on the Panako GitHub repository:

- The number of dependencies has been drastically cut by removing support for multiple key-value stores. - The key-value store has been changed to a faster and simpler system (from [MapDB](https://mapdb.org) to [LMDB](http://www.lmdb.tech/doc)). - The SyncSink functionality has been moved to another project (with Panako as dependency). - The main algorithms have been replaced with simpler and better working versions: - Olaf is a new implementation of the classic Shazam algorithm. - The algoritm described in the Panako paper was also replaced. The core ideas are still the same. The main change is the use of a [Gabor transform](https://en.wikipedia.org/wiki/Gabor_transform) to go from time domain to the spectral domain (previously a constant-q transform was used). The gabor transform is implemented by [JGaborator](https://github.com/JorenSix/JGaborator) which in turn relies on [The Gaborator](https://gaborator.com/) C library via JNI. - Folder structure has been simplified. - The UI which was mainly used for debugging has been removed. - A new set of helper scripts are added in the `scripts` directory. They help with evaluation, parsing results, checking results, building panako, creating documentation,... - Changed the default panako location to \~/.panako, so users can install and use panako more easily (without need for sudo rights)

Fig: An interactive CLI session with Panako.

UGent and Code

~ SyncSink - Synchronize media by aligning audio

» By Joren on Thursday 10 June 2021

I have just released a new version of SyncSink. SyncSink is a tool to synchronize media files with shared audio. It is ideal to synchronize video captured by multiple cameras or audio captured by many microphones. It finds a rough alignment between audio captured from the same event and subsequently refines that offset with a crosscorrelation step. Below you can see SyncSink in action or you can try out SyncSink (you will need ffmpeg and Java installed on your system).

SyncSink used to be part of the Panako acoustic fingerprinting system but I decided that it was better to keep the Panako package focused and made a separate repository for SyncSink. More information can be found at the SyncSink GiHub repo

SyncSink is a tool to synchronize media files with shared audio. SyncSink matches and aligns shared audio and determines offsets in seconds. With these precise offsets it becomes trivial to sync files. SyncSink is, for example, used to synchronize video files: when you have many video captures of the same event, the audio attached to these video captures is used to align and sync multiple (independently operated) cameras. Evidently, SyncSink can also synchronize audio captured from many (independent) microphones if some environmental sound is shared (leaked in) the each recording.
Fig: SyncSink in action: syncing some audio files

Panako, Music Information Retrieval, UGent, and Code Attachments

SyncSink-synchronizing_audio.gif and syncsink-1.0.jar

~ Calling JNI code from multiple Java threads: sharing state

» By Joren on Tuesday 01 June 2021

Mapping Java threads to C states in a JNI bridge

This post deals with the problem of using stateful C code from multiple Java threads. With JNI (Java Native Interface) it is possible to glue C code to a Java environment. There are many helpful tutorials on how to call C code and receive results. JNI helps to reuse existing, often highly complex and computationally expensive, C code.

The introductory tutorials often stop once it is made clear how to repackage (simple) datatypes and do not mention threads. It is, however, reasonable to expect JNI code to take into account thread-safety and proper multi-threading. In all but the simplest cases it is not that straightforward to share state at the C side and allow JNI code to be called from multiple Java threads. Incorrectly sharing state can lead to memory leaks and segmentation faults (segfaults) and crashes the application. In what follows, a way to share thread-local state is presented.

It is quite common to have an init, work and dispose method to create a state, use that state and do some work and finally dispose of used resources. Each Java thread independently calls these methods and expects results. These results should not change if multiple Java threads are calling the same methods. In other words: the state should remain Java thread-local. A typical Java class could look like the code below.

With the Java code in mind, the C code should know which Java thread is used and which state needs to be used for the work. Luckily there is a way to find out: The JNI specification states that each JNIEnv is local to a Java thread. So we can use the JNIEnv pointer to identify a thread. This is the idea that is used below.

The code maps a JNIEnv pointer to a structure with (any) state information. An unordered map is used for this mapping. There is, however, still a problem: multiple threads can call the init method at once. So multiple threads potentially write to the unordered_map at the same time which leads to problems. To prevent this from happening a mutex is used. The mutex, together with a unique lock, makes sure that only a single thread writes to the unordered map. The same holds for the dispose method.

The work method does not need a unique lock since it does not write to the unordered map and reading from multiple threads is no problem.


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

  #include <unordered_map>
#include <mutex>

const int DATA_ARRAY_SIZE = 300000 * 2;

struct BridgeState {
    jfloat *data;
};

//A hash map with a JNIEnv * as key and a BridgeState * as value
std::unordered_map<uintptr_t, uintptr_t> stateMap;

//A mutex to ensure that writes to the stateMap are synchronized.
std::mutex stateMutex;

JNIEXPORT jint JNICALL Java_init(JNIEnv *env, jobject object) {
    //Makes sure only one thread writes to the stateMap
    std::unique_lock<std::mutex> lck(stateMutex);

    BridgeState *state = new BridgeState();
    uintptr_t env_addresss = reinterpret_cast<uintptr_t>(env);

    state->data = new jfloat[DATA_ARRAY_SIZE];

    uintptr_t state_addresss = reinterpret_cast<uintptr_t>(state);
    stateMap[env_addresss] = state_addresss;

    return 1;
}

JNIEXPORT jint JNICALL Java_work(JNIEnv *env, jobject object) {
    //get a ref to the state pointer
    uintptr_t env_addresss = reinterpret_cast<uintptr_t>(env);
    BridgeState *state = reinterpret_cast<BridgeState *>(stateMap[env_addresss]);

    //do something with state->data, e.g. calculate the sum
    int sum = 0;
    for (int i = 0; i < DATA_ARRAY_SIZE; i++) {
        state->data[i] = state->data[i] + 1;
        sum += (int)state->data[i];
    }
    return sum;
}

JNIEXPORT jint JNICALL Java_dispose(JNIEnv *env, jobject object) {
    //Makes sure only one thread writes to the stateMap
    std::unique_lock<std::mutex> lck(stateMutex);

    uintptr_t env_addresss = reinterpret_cast<uintptr_t>(env);
    BridgeState *state = reinterpret_cast<BridgeState *>(stateMap[env_addresss]);
    stateMap.erase(env_addresss);

    //cleanup memory
    delete[] state->data;
    delete state;
    return 0;
}

This conceptual code has been lifted from a JNI library doing actual work: The JGaborator JNI bridge . If you need more information on how to compile and use this construct in actual code, please have a look at the JGaborator GitHub repository

UGent and Code

~ JGaborator Updated - Fine grained spectral transforms from Java

» By Joren on Monday 31 May 2021

I have updated the JGaborator library. The library calculates fine grained constant-Q spectral representations of audio signals quickly from Java. Such spectral transform can be used for visualisation or as a front-end for audio processing or music information retrieval applications.

The calculation of a Gabor transform is done by a C library named Gaborator. JGaborator provides a Java native interface (JNI) bridge to that library. Thanks to the recent updates, the library is now automatically unpacked which makes it easy to use on supported platforms (intel macOS and x64 Linux).

The new version of JGaborator now also allows multiple Java threads to call the transform. This has the potential to speed up some audio processing chains dramatically.

The visualisation parts of JGaborator also received light touch-ups. Below a number of screenshots can be seen with of spectral representations of several audio files. If you want to try it yourself download the “JGaborator JAR-file”:[JGaborator-0.6.jar]. Note that it should work only on intel macOS and x64 Linux with ffmpeg installed on your path. For other environments, please read and follow the JGaborator instructions to get it working.

Spectral vizualization with JGaborator
Spectral vizualization with JGaborator
Spectral vizualization with JGaborator

Code and UGent

~ Music-based biofeedback to reduce tibial shock in over-ground running: a proof-of-concept study

» By Joren on Thursday 18 February 2021

For the last couple of years there has been a fruitful collaboration ongoing between the systematic musicology (IPEM) and sports-science departments at Ghent University. IPEM has a rich history of fundamental research on the link between movement and music. In a newly published proof-of-concept study the music-movement link improves running style. The runner is equipped with a musical biofeedback system to lower foot-impact. For more details, see:

Music-based biofeedback to reduce tibial shock in over-ground running: a proof-of-concept study, published in Scientific Reports(2021) by Van den Berghe, P., Lorenzoni, V., Derie, R. et al.

Abstract Methods to reduce impact in distance runners have been proposed based on real-time auditory feedback of tibial acceleration. These methods were developed using treadmill running. In this study, we extend these methods to a more natural environment with a proof-of-concept. We selected ten runners with high tibial shock. They used a music-based biofeedback system with headphones in a running session on an athletic track. The feedback consisted of music superimposed with noise coupled to tibial shock. The music was automatically synchronized to the running cadence. The level of noise could be reduced by reducing the momentary level of tibial shock, thereby providing a more pleasant listening experience. The running speed was controlled between the condition without biofeedback and the condition of biofeedback. The results show that tibial shock decreased by 27% or 2.96 g without guided instructions on gait modification in the biofeedback condition. The reduction in tibial shock did not result in a clear increase in the running cadence. The results indicate that a wearable biofeedback system aids in shock reduction during over-ground running. This paves the way to evaluate and retrain runners in over-ground running programs that target running with less impact through instantaneous auditory feedback on tibial shock.

UGent

~ ISMIR 2020 - Virtual Conference

» By Joren on Tuesday 20 October 2020

From 11-16 October 2020 the latest instalment of the ISMIR conference series was held. Due to the pandemic, the 21st ISMIR conference was the first virtual one. As usual, participants and presenters from around the world joined the conference. For the first time, however, not all participants synchronised their circadian rhythm. By repeating most events with 12h in between, the organisers managed to put together a schedule befitting nearly all participants.

The virtual format had some clear advantages: travel was not needed, so (environmental) cost was low. Attendance fees were lower than usual since no spaces or catering was needed. This democratised the conference experience and attendance reached a record high.

The scientific program of the conference was impressive and varied. It is At the conferences Late Breaking/Demo session I presented Olaf: Overly Lightweight Acoustic Fingerprinting.

Command Line Application, Presentation, and UGent

~ PaPiOM: Patterns in Pitch Organization in Music

» By Joren on Wednesday 30 September 2020

Form the 1st of October 2020 I will start on a new research project. The BOF fund of Ghent University is kind enough to sponsor the project for three years. The abstract is as follows:

Music is present in every culture in the world. We as a species seem to have an urge to make music. While the diversity of music cultures around the world is phenomenal, they do seem to have patterns in common. Especially for pitch, one of the fundamental building blocks of music, there are strong reasons to believe that there are commonalities amongst cultures on how pitch is organised A better insight in these common patterns may help to answer questions on the definition, origins and evolution of music.

Common patterns in pitch organisation can be studied from two perspectives. Firstly, the perspective of how humans perceive and make music can be gained from systematic, experimental work. Over the years this has yielded insights in which pitch organisations might be most fit for our perceptual, neurophysiological system. Secondly, these patterns can be observed directly in large-scale, corpus-based, cross-cultural studies which has a potential that is not exploited as of yet.

During this fellowship a large-scale global corpus with field recordings will be compiled in collaboration. Music Information Retrieval techniques will be employed to describe how pitch is organised in the corpus. More specifically, it will support claims on the use of discrete pitches, octave equivalence, the number of pitch classes in use and the pitch interval structures. The uncovered fundamental properties of pitch will be confronted with findings from experimental work.

Recently I presented the outline of the project with the following slides:

UGent and Computational ethnomusicology

~ Olaf - Acoustic fingerprinting on the ESP32 and in the Browser

» By Joren on Thursday 20 August 2020

Recognition of music. A good year ago I was asked to develop audio recognition technology for an e-costume. The idea was that lights in the costume would follow a sequence synchronised to a certain song. Only a single song should trigger the lights, all other music should be ignored. Recognition of music and synchronisation is typically done using audio fingerprinting techniques. The challenge was that the recognition needed to run on a cheap, battery-powered microcontroller with limited CPU and memory. I delivered a prototype but eventually a cheap, battle-tested, off-the-shelf, IP-cleared, alternative was found.

The prototype gathered dust for a while but the idea stuck in my head. With my daughters fourth birthday approaching during the lockdown, I decided to turn the prototype into an over-engineered birthday gift and let an ‘Elsa-dress’ react to ‘Let It Go’ from the Frozen soundtrack. With the prototype as a starting point, I ordered an RGB-LED-strip, a beefy Li-Ion Battery, an I2S digital microphone and, of course, an Elsa-dress.

I had an ESP32 microcontroller laying around and used it as the core of the system: it supports I²S, has a floating point unit (FPU), is easy to use together with LED strips and has enough memory. The FPU makes it straightforward to use the same code on traditional computers as on embedded devices: fixed-point math can be avoided.

After soldering the components together and with the help from my better half to sew in the LED strip, it all came together. In the video below, the result of our work can be seen. The video first shows a song that should not and is not recognised. Then, “Let It Go” is played and correctly recognised. After the song is stopped, the lights go on for a while and finally stop: this is by design to allow gaps in recognition. Lastly, the song is continued and again correctly recognised.

With my limited C experience the prototype code was not well organised. During my second attempt this improved enough so that I feel comfortable enough to share the code on GitHub: Olaf - Overly Lightweight Acoustic Fingerprinting.

The code went through several iterations and was expanded beyond the original scope and became a capable general purpose acoustic fingerprinting system with its many applications. Olaf performs quite well thanks to its resource friendly design and the use of PFFT and LMDB. Especially LMDB, a fast, B+-tree backed key value store with low storage overhead enables performant storage and lookups.

The GitHub does not contain an example for the ESP32. That code depends on the microcontroller, digital microphone and pins used and Olaf needs to be hacked to exhibit the requested behaviour. All in all that code is much less reusable (and sharable, testable, maintainable). I have, however, included a platformIO project for “Olaf on ESP32”:[ESP32-Olaf.zip] for reference.

WASM: Olaf in the browser

Olaf, being written in ANSI C, can run in the browser thanks to the Emscripten compiler. According to its website, Emscripten ‘…lets you run C and C on the web at near-native speed without plugins’ Combining the Web Audio API and the WASM version of Olaf makes web-based acoustic fingerprinting applications possible.

Below you can try out Olaf. The exact same code is running on your browser as on the ESP32 demonstrated above. This means that Olaf is listening to recognise ‘Let It Go’ from the Frozen soundtrack. For your convenience the song can be started below on the left. On the right, you can start Olaf by allowing incoming audio to be analysed. The FFT is calculated by Olaf and visualised using Pixi.js. After a few seconds the red fingerprints should become green, indicating a match. Once you stop the song, the fingerprints will eventually turn red again. As with the video above: going from a match to no match takes a couple of seconds to allow gaps in recognition.

1. Start the song and play it aloud. Singing along is encouraged. 2. Start the microphone and check whether recognition succeeds.

Olaf was featured on hackaday. There is also a small discussion about Olaf on Hacker News. A write-up of this project also ended up as a contribution to the Late Breaking Demo track of the first virtual ISMIR conference: Olaf ISMIR 2020 LBD abstract.

Code and UGent Attachments

embedded_demo.webm, embedded_use.mp4, ESP32-Olaf.zip, ISMIR2020_LBD_Olaf.pdf, web_embedded_olaf.jpg, and embedded_demo.mp4

~ LTC - SMPTE Decoder on Teensy

» By Joren on Friday 06 December 2019

Teensy with audio shield

For synchronisation between several devices SMPTE timecode data is often encoded into audio using LTC or linear time code.

This blog post presents an LTC decoder for a Teensy 3.2 microcontroller with audio shield.

The audio shield takes care of the line level audio input. This audio input is then decoded. The decoding is done by libltc. The library runs as is on a Teensy without modification. The three elements are combined in a relatively simple teensy patch

To use the decoder connect the line level input left channel to an SMPTE source via e.g. an RCA plug.

For code, comments, pull requests please consult the Github repository for the Teensy SMPTE LTC decoder

A teensy decoding an LTC SMPTE signal

Code and UGent Attachments

teensy_with_audio_shield.jpg and teensy_ltc_smpte_decoder.mp4

~ MIDImorphosis: recording audio and sensor data

» By Joren on Thursday 03 October 2019

During an experiment which monitors a music performance it might be a requirement to record music, video and sensor data synchronously. Recording analog sensors (balance boards, accelerometers, light sensors, distance sensors) together with audio and video is often problematic. Ideally standard DAW software can be used to record both audio and sensor data. A system is presented here that makes it relatively straightforward to record sensor data together with audio/video.

The basic idea is simple: a microcontroller is programmed to appear as a class compliant MIDI device. Analog measurements on the micro-controller are translated to a specific MIDI protocol. The MIDI data, on the capturing side, can then be converted again into the original sensor data. This setup has several advantages:

It makes it easy to record sensor data together with audio data in a standard DAW software package. Recording a recording a midi track and audio track simultaneously in, e.g., Ableton Live, is easy.
Communication with the micro-controller is bi-directional. The micro-controller can be programmed to react to certain MIDI messages. A note-on can, for example, be used to start analog sensor recording. These MIDI commands can be send from any possible source that can ‘speak’ MIDI.
Thanks to the Web MIDI API this construct presents an easy way to let analog sensors and websites interact.
Real time sonification of the sensor-data is also supported. There are many ready to go options to sonify MIDI. Axoloti, Max/MSP, Zupiter are some of the environments that can be pugged into.

Fig: Visualization in html of analog sensor data, captured as MIDI

While the concept is relatively simple, there are many details to get right. Please consult the MIDImorphosis github page which details the system that consists of an analog sensor, a MIDI protocol and a clocking infrastructure.

UGent, Computational ethnomusicology, and Code Attachments

cc_viz.html, cc_viz_screen.png, and cc_viz.html

~ LW Research Day 2019 on Digital Humanities

» By Joren on Tuesday 10 September 2019

On the 9th of September 2019 the second research day organized by the faculty of Arts and Philosophy of Ghent University took place. The theme of the day was ‘Digital Humanities’ and the program gave an overview of the breadth of research at our faculty with topics as logic, history, archeology, chemistry, geography

Together with Jeska, I presented an ongoing study on musical interaction. In the study one of the measurements was the body movement of two participants. This is done with boards that are equipped with weight sensors. The data that comes out of this can be inspected for synchronisation, quality and quantity of movement, movement periodicities.

The hardware is the work of Ivan Schepers, the software used to capture and transmit messages is called “the MIDImorphosis” and developed by me. The research is in collaboration with Jeska Buhman, Marc Leman and Alessandro Dell’Anna. An article with detailed findings is forthcoming.

Presentation, UGent, and Research papers

~ AAWM/FMA 2019 - Birmingham

» By Joren on Tuesday 02 July 2019

I am currently in Birmingham, UK at the 2019 at the joint Analytical Approaches to World Music (AAWM) and Folk Music Conference (FMA). The opening concert by the RBC folk ensemble already provided the most lively and enthusiastic conference opening probably ever. Especially considering the early morning hour (9.30). At the conference, two studies will be presented on which I collaborated:

Automatic comparison of human music, speech, and bird song suggests uniqueness of human scales

“Automatic comparison of human music, speech, and bird song suggests uniqueness of human scales”:[FMA2019_paper_12.pdf] by Jiei Kuroyanagi, Shoichiro Sato, Meng-Jou Ho, Gakuto Chiba, Joren Six, Peter Pfordresher, Adam Tierney, Shinya Fujii and Patrick Savage

The uniqueness of human music relative to speech and animal song has been extensively debated, but rarely directly measured. We applied an automated scale analysis algorithm to a sample of 86 recordings of human music, human speech, and bird songs from around the world. We found that human music throughout the world uniquely emphasized scales with small-integer frequency ratios, particularly a perfect 5th (3:2 ratio), while human speech and bird song showed no clear evidence of consistent scale-like tunings. We speculate that the uniquely human tendency toward scales with small-integer ratios may relate to the evolution of synchronized group performance among humans.

Automatic comparison of global children’s and adult songs

“Automatic comparison of global children’s and adult songs”:[FMA2019_paper_13.pdf] by Shoichiro Sato, Joren Six, Peter Pfordresher, Shinya Fujii and Patrick Savage

Music throughout the world varies greatly, yet some musical features like scale structure display striking crosscultural similarities. Are there musical laws or biological constraints that underlie this diversity? The “vocal mistuning” hypothesis proposes that cross-cultural regularities in musical scales arise from imprecision in vocal tuning, while the integer-ratio hypothesis proposes that they arise from perceptual principles based on psychoacoustic consonance. In order to test these hypotheses, we conducted automatic comparative analysis of 100 children’s and adult songs from throughout the world. We found that children’s songs tend to have narrower melodic range, fewer scale degrees, and less precise intonation than adult songs, consistent with motor limitations due to their earlier developmental stage. On the other hand, adult and children’s songs share some common tuning intervals at small-integer ratios, particularly the perfect 5th (\~3:2 ratio). These results suggest that some widespread aspects of musical scales may be caused by motor constraints, but also suggest that perceptual preferences for simple integer ratios might contribute to cross-cultural regularities in scale structure. We propose a “sensorimotor hypothesis” to unify these competing theories.

Presentation, Music Information Retrieval, Folk Music Analysis (FMA) conference, Research papers, and UGent

~ trix: Realtime audio over IP

» By Joren on Wednesday 13 March 2019

At work we have a really nice piano and I wanted to be able to broadcast a live performance over the internet with low latency to potential live listeners. In all honesty, only my significant other gets moderately lukewarm about the idea of hearing me play live. Anyhow:

I did not find any practical tool to easily pump audio over the internet. I did find something that was very close called trx by Mark Hills: trx is a simple toolset for broadcasting live audio from Linux. It unfortunately only works with the ALSA audio system and is limited to Linux. I decided to extend it to support macOS and Pulse Audio. I also extended its name to form trix.

Audio Transmitter/Receiver over Ip eXchange (trix) is a simple toolset for broadcasting live audio from Linux or macOS. It sends and receives encoded audio over IP networks, via an audio interface. If audio interfaces are properly configured, a low-latency point-to-point or multicast broadband audio connection can be achieved. This could be used for networked music performances. The inclusion of the intermediate rtAudio library provides support for various audio input and outputs.

More information on trix can be found on the trix github page.

Latency

The system can be configured for low latency use. The whole chain is dependent several different components which each add to the total latency: audio input latency, encoder (algorithmic) delay, network latency and finally audio output latency.

Thanks to the use of RtAudio it should be possible to use low latency API’s to access audio devices (ASIO on windows or Jack on Unix). This means that audio input and output latencies can be as low as the hardware allows. The opus encoder/decoder that is used has a low algorithmic delay. By default it has a 25ms delay but it can be configured to only 2.5ms (see here). The network latency (and jitter) is very much dependent on the distance to cover. On a local network this can be kept low, when using wide area networks (the internet) control is lost and latencies can add up depending on the number of hops to take. Jitter can be problematic if the smallest possible buffers are used: then dropouts might occur and this might affect the audio in a noticeable way.

Piano at the krook

UGent and Code

~ Audio marker finder

» By Joren on Friday 22 February 2019

I have uploaded a small piece of software which allows users to find a specific audio marker in audio streams. It is mainly practical to synchronise a camera (audio/video) recording with other audio with the same marker. The marker is a set of three beeps. These three beeps are found with millisecond accurate precision within the audio streams under analysis. By comparing the timing of marker synchronization becomes possible. It can be regarded as an alternative for the movie clapper boards.

The source code for the audio marker finder is on GitHub. The software is used in the Art Science Interaction Lab (ASIL) of the Krook. Below you can download the Audio marker finder and the marker itself.

Code and UGent Attachments

2019.02.19.AudioMarkerFinder.jar, screenshot.png, and marker.wav

~ Nano4Sports in Team Scheire

» By Joren on Tuesday 05 February 2019

‘Team Scheire’ is a Flemish TV program with a similar concept as BBC Two’s ‘The Big Life Fix’. In the program, makers create ingenious new solutions to everyday problems and build life-changing solutions for people in desperate need.

One of the cases is Ben. Ben loves to run but has a recurring running related injury. To monitor Ben’s running and determine a maximum training length a sensor was developed that measures the impact and the amount of steps taken. The program makers were interested in the results of the Nano4Sports project at UGent. One of the aims of that project is to build those type of sensors and knowhow related to correct interpretation of data and use of such devices. Below a video with some background information can be found:

The solution build for the program is documented in a Github Repository One of the scientific results of the Nano4Sports project can be found in an article for the Journal of Biomechanics titled Validity and reliability of peak tibial accelerations as real-time measure of impact loading during over-ground rearfoot running at different speeds.

UGent

~ Validity and reliability of peak tibial accelerations as real-time measure of impact loading during over-ground rearfoot running at different speeds - Journal of Biomechanics

» By Joren on Tuesday 05 February 2019

With the goal in mind to reduce common runner injuries we first need to measure some running style characteristics. Therefore, we have developed a sensor to measure how hard a runners foot repeatedly hits the ground. This sensor has been compared with laboratory equipment which proofs that its measurements are valid and can be repeated. The main advantages of our sensor is that it can be used ‘in the wild’, outside the lab on the runners regular tours. We want to use this sensor to provide real-time biofeedback in order to change running style and ultimately reduce injury risk.

We have published an article on this sensor in the journal of Biomechanics: Pieter Van den Berghe, Joren Six, Joeri Gerlo, Marc Leman, Dirk De Clercq Validity and reliability of peak tibial accelerations as real-time measure of impact loading during over-ground rearfoot running at different speeds, (author version, Journal of Biomechanics, 2019

Studies seeking to determine the effects of gait retraining through biofeedback on peak tibial acceleration (PTA) assume that this biometric trait is a valid measure of impact loading that is reliable both within and between sessions. However, reliability and validity data were lacking for axial and resultant PTAs along the speed range of over-ground endurance running. A wearable system was developed to continuously measure 3D tibial accelerations and to detect PTAs in real-time. Thirteen rearfoot runners ran at 2.55, 3.20 and 5.10 m*s-1 over an instrumented runway in two sessions with re-attachment of the system. Intraclass correlation coefficients (ICCs) were used to determine within-session reliability. Repeatability was evaluated by paired T-tests and ICCs. Concerning validity, axial and resultant PTAs were correlated to the peak vertical impact loading rate (LR) of the ground reaction force. Additionally, speed should affect impact loading magnitude. Hence, magnitudes were compared across speeds by RM-ANOVA. Within a session, ICCs were over 0.90 and reasonable for clinical measurements. Between sessions, the magnitudes remained statistically similar with ICCs ranging from 0.50 to 0.59 for axial PTA and from 0.53 to 0.81 for resultant PTA. Peak accelerations of the lower leg segment correlated to LR with larger coefficients for axial PTA (r range: 0.64–0.84) than for the resultant PTA per speed condition. The magnitude of each impact measure increased with speed. These data suggest that PTAs registered per stand-alone system can be useful during level, over-ground rearfoot running to evaluate impact loading in the time domain when force platforms are unavailable in studies with repeated measurements.

Research papers and UGent

~ ISMIR 2018 Conference - Automatic Analysis Of Global Music Recordings suggests Scale Tuning Universals

» By Joren on Monday 24 September 2018

Thanks to the support of a travel grant by the faculty of Arts and Philosophy of Ghent University I was able to attend the ISMIR 2018 conference. A conference on Music Information Retrieval. I am co author on a contribution for the the Late-Breaking / Demos session

The structure of musical scales has been proposed to reflect universal acoustic principles based on simple integer ratios. However, some studying tuning in small samples of non-Western cultures have argued that such ratios are not universal but specific to Western music. To address this debate, we applied an algorithm that could automatically analyze and cross-culturally compare scale tunings to a global sample of 50 music recordings, including both instrumental and vocal pieces. Although we found great cross-cultural diversity in most scale degrees, these preliminary results also suggest a strong tendency to include the simplest possible integer ratio within the octave (perfect fifth, 3:2 ratio, \~700 cents) in both Western and non-Western cultures. This suggests that cultural diversity in musical scales is not without limit, but is constrained by universal psycho-acoustic principles that may shed light on the evolution of human music.

IPEM, UGent, Research papers, Presentation, Music Information Retrieval, and ISMIR Attachments

ismir_2018_logo.jpg, musical_scales_extended_abstract__0826_revision_copy.pdf, and animated_logo.gif

~ JGaborator - Fast Gabor spectral transforms in Java

» By Joren on Thursday 13 September 2018

Recently I have published a small library on github called JGaborator. The library calculates fine grained constant-Q spectral representations of audio signals quickly from Java. The calculation of a Gabor transform is done by a C library named Gaborator. A Java native interface (JNI) bridge to the C Gaborator is provided. A combination of Gaborator and a fast FFT library (such as pfft) allows fine grained constant-Q transforms at a rate of about 200 times real-time on moderate hardware. It can serve as a front-end for several audio processing or MIR applications.

For more information on the Gaborator C library by Andreas Gustafsson, please see the gaborator.com website or a talk by the author on the library called Exploring time-frequency space with the Gaborator

While the gaborator allows reversible transforms, only a forward transform (from time domain to the spectral domain) is currently supported from Java.A spectral visualization tool for sprectral information is part of this package. See below for a screenshot:

Code and UGent

« Newer blog posts Previous blog posts »