~ SMPTE decoding in the browser

» By Joren on Friday 16 September 2022

I have created a web application to LTC.wasm decodes SMPTE timecodes from an LTC encoded audio signal.

To synchronize multiple music and video recordings a shared SMPTE timecode signal is often used. For practical purposes the timecode signal is encoded in an audio stream. The timecode can then be recorded in sync with microphone inputs or added to a video recording. The timecode is encoded in audio with LTC, linear timecode. A special decoder is needed to extract SMPTE timecode from the audio. This is exactly what the LTC.wasm application does.

Using the [web based SMTE decoder](https://0110.be/attachment/cors/ltc.wasm/ltc_decoder.html

Try out the SMPTE decoder with your own SMPTE files.

The advantage of the web-based version versus the command line ltc-tools is that it does not need to be installed separately and that ffmpeg decodes audio. This means that almost any multimedia format is supported automatically. The command line version only supports a limited number of audio formats.

For further information check out the the LTC.wasm GitHub repository

UGent

~ SyncSink.wasm - Synchronize media files by audio-to-audio alignment

» By Joren on Tuesday 06 September 2022

I have built a tool for audio-to-audio alignment. It has applications for synchronization of media files. It works in the browser and you can synchronize your media files here with SyncSink.wasm. SyncSink.wasm does the following:

From an incoming media-file audio is extracted, downmixed to mono and and resampled. This is done with ffmpeg.audio.wasm a wasm version of ffmpeg.
For each audio track, fingerprints are extracted. These fingerprints reduce the the search space for alignment drastically.
Each list of fingerprints is aligned with the list of fingerprints from the reference. Resulting in a rough alignment
Cross correlation is done to refine the alignment resulting in sample accurate results.

Fig: media synchronization with audio-to-audio alignment.

It supports small time-scale adjustments of around 5%: audio alignment can still be found if audio speed differs a bit.

Some potential use cases where it might be of use:

To stitch partially overlapping audio recordings together resulting in a single long audio recording.
To synchronize multiple independent video recordings of the same event each with an audio recording of the environment.
To align a high quality microphone recording with video/low-quality audio recording of the same event. The low quality audio recorded with a camera can then be replaced with the high quality microphone audio.

The code can be found in the SyncSink.wasm GitHub repository

UGent

media_sync_recording.apng

~ Sending audio over a network with ffmpeg

» By Joren on Tuesday 30 August 2022

Fig: stable diffusion imagining a networked music performance

This post describes how to send audio over a network using the ffmpeg suite. Ffmpeg is the Swiss army knife for working with audio and video formats. It is a command line tool that supports almost all audio formats known to man and woman. ffmpeg also supports streaming media over networks.

Here, we want to send audio recorded by a microphone, over a network to a single receiver on the other end. We are not aiming for low latency. Also the audio is going only in a single direction. This can be of interest for, for example, a networked music performance. Note that ffmpeg needs to be installed on your system.

The receiver - Alice

For the receiver we use ffplay, which is part of the ffmpeg tools. The command instructs the receiver to listen to TCP connections on a randomly chosen port 12345. The \?listen is important since this keeps the program waiting for new connections. For streaming media over a network the stateless UDP protocol is often used. When UDP packets go missing they are simply dropped. If only a few packets are dropped this does not cause much harm for the audio quality. For TCP missing packets are resent which can cause delays and stuttering of audio. However, TCP is much more easy to tunnel and the stuttering can be compensated with a buffer. Using TCP it is also immediately clear if a connection can be made. With UDP packets are happily sent straight to the void and you need to resort to wiresniffing to know whether packets actually arrive.

ffplay -nodisp -f mpegts tcp://0.0.0.0:12345\?listen

In this example we use MPEGTS over a plain TCP socket connection. Alteratively RTMP could be used (which also works over TCP). RTP , however is usually delivered over UDP.

The shorthand address 0.0.0.0 is used to bind the port to all available interfaces. Make sure that you are listening to the correct interface if you change the IP address.

The sender - Björn

Björn, aka Bob, sends the audio. First we need to know from which microphone to use. To that end there is a way to list audio devices. In this example the macOS avfoundation system is used. For other operating systems there are similar provisions.

ffmpeg -f avfoundation -list_devices true -i ""

Once the index of the device is determined the command below sends incoming audio to the receiver (which should already be listening on the other end). The audio format used here is MP3 which can be safely encapsulated into mpegts.

Note that the IP address 192.168.x.x needs to be changed to the address of the receiver. Now if both devices are on the same network the incoming audio from Bob should arrive at the side of Alice.

The tunnel

If sender and receiver are not on the same network it might be needed to do Network Addres Translation (NAT) and port forwarding. Alternatively an ssh tunnel can be used to forward local tcp connections to a remote location. So on the sender the following command would send the incoming audio to a local port:

ffmpeg -f avfoundation -i ":1" -acodec libmp3lame -ab 196k -f mpegts tcp://192.168.x.x:12345

The connection to the receiver can be made using a local port forwarding tunnel. With ssh the TCP traffic on port 12345 is forwarded to the remote receiver via an intermediary (remote) host using the following command:

ssh -v -L 12345:192.168.x.x:12345 user@host -N

UGent

~ Using Java LMDB on Apple Sillicon or other unsupported platforms

» By Joren on Wednesday 25 May 2022

LMDB is a fast key value store, ideal to store and query sorted data with small keys and values. LMDB is a pure C library but often used from other programming languages via some type of bindings. These bindings are ‘bridges’ between languages and are automatically present on supported platform. On new or unsupported platforms, however, you need to build a this bridge yourself.

This blog post is about getting java-lmdb working on such unsupported platform: arm64. The arm64 platform is much more popular since the introduction of the Apple silicon - M1 platform. On Apple M1 the default architecture of Docker images is also aarch64.

The java-lmdb project uses JNR-FFI in the background. This is only one of the many ways to bridge Java and other programming languages. The new version of JNR-FFI supports the arm64. Currently, only the ‘SNAPSHOT’ version of java-lmdb uses this version. So the dependencies need to be changed to e.g. (when using Gradle):

repositories {
    mavenCentral()
    maven { url 'https://oss.sonatype.org/content/repositories/snapshots' }
}

dependencies {
    implementation group: 'org.lmdbjava', name: 'lmdbjava', version: '0.8.3-SNAPSHOT'
}

Next you need to build the lmdb library for your platform and copy it to a location where Java looks for it. This only works when compilers are already available on your system. In macOS you might need to install the XCode command line tools:

#xcode-select --install
git clone --depth 1 https://git.openldap.org/openldap/openldap.git
cd openldap/libraries/liblmdb
make -e SOEXT=.dylib
cp  liblmdb.dylib ~/Library/Java/Extensions

On Debian aarch64 the procedure is similar but a different extension is used (.so):

#apt install build-essential
git clone --depth 1 https://git.openldap.org/openldap/openldap.git
cd openldap/libraries/liblmdb
make
mv liblmdb.so /lib

Finally, to use the library in a JAR-file is might be needed to allow lmdbjava to access some parts of the JRE:

java -jar your_jar.jar --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED

A very similar setup was needed for the docker version of Panako.

UGent

~ The Augmented Movement Platform For Embodied Learning (AMPEL)

» By Joren on Friday 22 April 2022

I have been lucky to be part of a fruitful interdisciplinary scientific collaboration around AMPEL: ‘The Augmented Movement Platform For Embodied Learning’. The recent publication of an article is an ideal occasion to give a glimpse behind the scenes.

Fig: Schematic representation of AMPEL, a floor with interactive tiles.

Around 2016 the idea arose to search for new potential rehabilitation approaches for persons with multiple sclerosis. Multiple sclerosis causes problems, in varying degrees, with both motor and cognitive function. Common rehabilitation approaches either work on motor or cognitive function. The idea (by Lousin Moumdjian, Marc Leman, Peter Feys) was to combine both motor and cognitive rehabilitation in a single combined ‘embodied learning’ paradigm.

After some discussion we wanted to perform a combined short-term memory and walking task. First the participants would be presented with a target trajectory which would then be performed by walking. During walking we would modulate feedback types (melodic, sounds or visual). To this end, an ‘intelligent floor’ device was needed that was able to present a target trajectory, register a performed trajectory and provide several types of feedback. After a search for off-the-shelf solutions it became clear that a custom hard-and-software platform was required.

After a great deal of cardboard prototyping we settled on a design consisting of interactive tiles. Thomas Vervust of UGhent NamiFab designed a PCB with force sensitive resistors (FSR) on the bottom and RGB LED’s on top. Ivan Schepers provided practical insights during prototyping and developed the hardware around the interactive tiles. I was responsible for programming the system. Custom software was developed for the tiles, a controller to drive the tiles and to run and record experiments. Finally the system was moved to a hospital where the experiments took place. To know more about the exact experiments, please read the following three publications on AMPEL:

The Augmented Movement Platform For Embodied Learning (AMPEL): development and reliability - 2021
Moumdjian, L., Vervust, T., Six, J., Schepers, I., Lesaffre, M., Feys, P., & Leman, M.
This article details the rationale behind AMPEL and provides technical details and reliability measurements. It was published in the Journal on Multimodal User Interfaces
Embodied learning in multiple sclerosis using melodic, sound, and visual feedback: a potential rehabilitation approach - 2022
Moumdjian, L., Six, J., Veldkamp, R., Geys, J., Van Der Linden, C., Goetschalckx, M., Van Nieuwenhoven, J., Bosmans, I., Leman, M. and Feys, P.
The main AMPEL study which presents the rehabilitation potential. This work was published in the Annals of the New York Academy of Sciences.
[Motor sequence learning in a goal-directed stepping task in persons with multiple sclerosis: a pilot study\
- 2022](https://nyaspubs.onlinelibrary.wiley.com/doi/10.1111/nyas.14702)
  Veldkamp, R., Moumdjian, L., van Dun, K., Six, J., Vanbeylen, A., Kos, D. and Feys, P.
  For this study AMPEL was slightly modified for a reaction time task, showing its flexibility. The participants were asked to step on a tile as quickly as possible after it lit up. They were either knowledgable of the tile trajectory or not. This work was also published in the Annals of the New York Academy of Sciences.

AMPEL ready to use
First tests of the I2C bus
Testing LEDS
AMPEL schematic
Custom software driving AMPEL
Examples of walking paths
Early test of AMPEL
Almost connected the full I2C bus
The driving laptop and main Arduino

UGent

~ An audio focused ffmpeg build for the web

» By Joren on Thursday 24 February 2022

I have prepared an audio focused ffmpeg build for the web which facilitates browser based audio applications. I have prepared three demos:

Audio transcoding and playback demo: converts any media file into audio compatible with the Web Audio API for in-browser playback or analysis.
High quality time-stretching or pitch-shifting: demonstrates how pitch and tempo can be modified independently thanks to the Rubber Band Library.
Basic media info: gives information about the streams and encodings used in a media file.

Fig: [audio transcodinging in the browser](/attachment/cors/ffmpeg.audio.wasm/transcode.html). A `wav` file is converted to an `mp3`.

A bit more about the rationale behind this effort: Browsers have become practical platforms for audio processing applications thanks to the combination of Web Audio API , performant Javascript environment and WebAssembly. Have a look, for example, at essentia.JS.

However, browsers only support a small subset of audio formats and container formats. Dealing with many (legacy) audio formats is often a rather painful experience since there are so many media container formats which can contain a surprising variation of audio (and video) encodings. In short, decoding audio for in-browser analysis or playback is often problematic.

Luckily there is FFmpeg which claims to be ‘a complete, cross-platform solution to record, convert and stream audio and video’. It is, indeed, capable to decode almost any audio encoding known to man from about any container. Additionally, it also contains tools to filter, manipulate, resample, stretch, … audio. FFmpeg is a must-have when working with audio. It would be ideal to have FFmpeg running in a browser…

Thanks to WebAssembly ffmpeg can be compiled for use in the browser. There have been efforts to get ffmpeg working in the browser. These efforts have been focusing on the complete ffmpeg suite. Now I have prepared an audio focused ffmpeg build for the web based on these efforts. I have selected only audio parts which makes the resulting .wasm binary four to five times smaller (from \~20MB to \~5MB). I also provided a simplified Javascript wrapper. The project brings audio decoding to the browser but also audio filtering, transcoding, pitch-shifting, sample rate conversions, audio channel manipulation, and so forth. It is also capable to extract audio streams from video container formats.

Next to the pure functionality of ffmpeg there are general advantages to run audio analysis software in the browser at client-side:

Ease-of-use: no software needs to be installed. The runtime comes with a compatible browser.
Privacy: Since media files are not transferred it is impossible for the system running the service to make unauthorised copies of these files. There is no need to trust the service since all processing happens locally, in the browser.
Speed: Downloading and especially uploading large media files takes a while. When files are kept locally, processing can start immediately and no time is wasted sending bytes over the internet. This results in a snappy user experience.
Computational load: the computational load of transcoding is distributed over the clients and not centralised on a (single) server. The server does not do any computing and only serves static files, so it can handle as many concurrent clients as its bandwidth allows.

Check out the audio focused ffmpeg build for the web on GitHub.

UGent

~ pffft.wasm: an FFT library for the web

» By Joren on Thursday 10 February 2022

PFFFT is a small, pretty fast FFT library programmed in C with a BSD-like license. I have taken it upon myself to compile a WebAssembly version of PFFFT to make it available for browsers and node.js environments. It is called pffft.wasm and available on GitHub.

The pffft.wasm library comes in two flavours. One is compiled with SIMD instructions while the other comes without these instructions. SIMD stands for ‘single instruction, multiple data’ and does what it advertises: in a single step it processes multiple datapoints. The aim of SIMD is to make calculations several times faster. Especially for workloads where the same calculations are repeated over and over again on similar data, SIMD optimisation is relevant. FFT calculation is such a workload.

Evidently the SIMD version is much faster but there is no need to take my word for it. Below you can benchmark the SIMD version of pffft.wasm and compare it with the non-SIMD version on your machine. A pure Javascript FFT library called FFT.js serves as a baseline.

When running the same benchmark on Firefox and on Chrome it becomes clear that FFT.js on Chrome is about twice as fast thanks to its superior Javascript engine for this workload. The performance of the WebAssembly versions in Chrome and Firefox is nearly identical. Safari unfortunately does not (yet) support SIMD WebAssembly binaries and fails to complete the benchmark.

The source code, the limitations and other info can be found at the pffft.wasm GitHub repository

Edit: PulseFFT might be of interest as well: a (as far as I can tell non-SIMD) WASM version of KissFFT.

Benchmark pffft.wasm - Chrome on an Apple M1 Pro chip
STFT calculated with pffft.wasm
Benchmark pffft.wasm - Chrome on an 2010 Macbook Air
Benchmark pffft.wasm - Firefox on a 2010 Macbook Air

UGent

~ Panako 2.0 - Updates for an acoustic fingerprinting system

» By Joren on Sunday 07 November 2021

At the online ISMIR 2021 conference I have presented updates to Panako, an audio fingerprinting system:

This work presents updates to Panako, an acoustic fingerprinting system that was introduced at ISMIR 2014. The notable feature of Panako is that it matches queries even after a speedup, time-stretch or pitch-shift. It is freely available and has no problems indexing and querying 100k sea shanties. The updates presented here improve query performance significantly and allow a wider range of time-stretch, pitch-shift and speed-up factors: e.g. the top 1 true positive rate for 20s query that were sped up by 10 percent increased from 18% to 83% from the 2014 version of Panako to the new version. The aim of this short write-up is to reintroduce Panako, evaluate the improvements and highlight two techniques with wider applicability. The first of the two techniques is the use of a constant-Q non-stationary Gabor transform: a fast, reversible, fine-grained spectral transform which can be used as a front-end for many MIR tasks. The second is how near-exact hashing is used in combination with a persistent B-Tree to allow some margin of error while maintaining reasonable query speeds.

Together with the paper there is also a poster and a short video presentation which explains the work:

UGent

2021.ismir-lbd-panako-updates.pdf, poster-movie.mp4, and panako_updates_poster.pdf

~ Decoding LTC and SMPTE on Teensy - Now using interrupts

» By Joren on Friday 24 September 2021

Have you ever found yourself wondering how to build an accurate, low-latency LTC decoder with a common micro-controller? Well! Wonder no more and read on! Or, stop reading and do go read something that is more appealing to your predispositions.

SMPTE timecodes were originally used to synchronize audio and video material. SMPTE timecode data is often encoded into audio using LTC or linear time code. This special audio stream can be recorded together with other audio and video material. By decoding the LTC audio afterwards and working back to SMPTE timecodes, synchronization of multiple camera angles and audio material becomes straightforward. This concept tagging data streams with SMPTE timecodes is also used for other types of data.

\ Fig: LTC is a 'self-clocking' protocol for which a period can be found automatically. Once the period is found, transitions within the period are counted. A period with a transition translates to a 1, a period without any transitions to a 0.

SMPTE timecodes supports up to 30 frames per second and this resolution might not be sufficient for some data streams. It helps if the frames could be split up and 60 or 120 frames per second could be generated. With a low latency LTC decoder it would be possible to support this case and, for example, provide four pulses for every SMPTE frame. To be more precise: a SMPTE frame consists of 80 bits and in this case we would send a pulse exactly when decoding bit 0, bit 20, bit 40 and bit 60. We would then be able to sample at 120Hz while staying in sync with the SMPTE.

My first attempt was to treat the signal like audio and use a ready built library for LTC audio decoding The problem there is that sampling is done which might not exactly match the SMPTE bit transition period and relatively large buffers are used to decode LTC. The bit exact decoding is not possible using this method: the latency is too large, the method also uses excessive computational power and memory.

Biasing circuit to offset voltages from zero centred to having a bias

Fig: Biasing circuit to offset voltages

In my second attempt, the current iteration, interrupts are used to detect rising and falling edges in the LTC stream. By counting the number of microseconds between these edges a bit string is constructed. Effectively decoding LTC without any wasted computational power or memory and at a very low latency. If the LTC stream is well-formed, following each incoming bit and reacting to it becomes straightforward. Finally, after gently massaging the LTC bit string, SMPTE timecodes ooze out of the system at a low latency.

I have implemented a low latency LTC and SMPTE timecode data decoder for a Teensy microcontroller. One of the current limitations is that only 30fps SMPTE without skipped frames is supported. Another limitation is that the precision of the derived 120Hz clock is dependent on the sampling rate of the encoded audio signal: if e.g. only 8000Hz is used, transitions can only be precise up until 125µs. The derived clock will jitter slightly but will not drift.

There is still a slight problem with audio and Teensy input: audio is generally transmitted from ~~1.8V to +1.8V and not~~ as a Teensy would expect - from 0 to 3.3V. To make this change a small biasing circuit is placed before the Teensy input. In my case two 100k resistors and a 0.1uF capacitor worked best. The interrupt is relatively robust against signals that are a clipping (outside the 0 - 3.3V) or slightly too silent. If the signal becomes too small LTC decoding obviously fails.

For more information and updates see the GitHub Repository for the low latency LTC and SMPTE timecode data decoder.

UGent

~ Updates for Panako - an acoustic fingerprinting system

» By Joren on Sunday 11 July 2021

Panako is an acoustic fingerprinting system I developed a couple of years ago. With acoustic fingerprinting systems it is possible to find duplicates in digital music archives and compare meta-data or identify unlabelled audio fragments. In the margins of my post-doc project working with large music archives, I have found the time to update Panako significantly. The updates simplify, improve and speed up Panako.

\ Fig. General content based audio search scheme.

The main algorithms are simplified. There is also a reduction of dependencies and a refocus to core functionality. This also simplifies building the software. The retrieval characteristics are improved, mainly thanks to the use of a fine-grained Gabor transform. Also new is the near-exact hashing construct which helps with off-by-one issues when matching time bins. The key-value store used is now LMDB, which speeds up the query performance of Panako significantly. The updates should make Panako stand the test of time somewhat better.

\ Fig. The top one true positive rate for 20s query fragments. The audio playback is speed modified from 84 to 116% with respect to the indexed reference audio. The original query length is 20s, if it is slowed down by 10% it takes, evidently, 22s. Note the improvement of the 2021 version of Panako (blue) vs the 2014 version (light-gray). As a baseline the standard algorithm (wang 2003) is included as well. For the 2021 Panako algorithm, audio recognition performance suffers (below 80) when playback speed is changed more than 10.

A more complete list of updates can be found below and on the Panako GitHub repository:

- The number of dependencies has been drastically cut by removing support for multiple key-value stores. - The key-value store has been changed to a faster and simpler system (from [MapDB](https://mapdb.org) to [LMDB](http://www.lmdb.tech/doc)). - The SyncSink functionality has been moved to another project (with Panako as dependency). - The main algorithms have been replaced with simpler and better working versions: - Olaf is a new implementation of the classic Shazam algorithm. - The algoritm described in the Panako paper was also replaced. The core ideas are still the same. The main change is the use of a [Gabor transform](https://en.wikipedia.org/wiki/Gabor_transform) to go from time domain to the spectral domain (previously a constant-q transform was used). The gabor transform is implemented by [JGaborator](https://github.com/JorenSix/JGaborator) which in turn relies on [The Gaborator](https://gaborator.com/) C library via JNI. - Folder structure has been simplified. - The UI which was mainly used for debugging has been removed. - A new set of helper scripts are added in the `scripts` directory. They help with evaluation, parsing results, checking results, building panako, creating documentation,... - Changed the default panako location to \~/.panako, so users can install and use panako more easily (without need for sudo rights)

Fig: An interactive CLI session with Panako.

UGent

Welcome

Contact

~ SMPTE decoding in the browser

~ SyncSink.wasm - Synchronize media files by audio-to-audio alignment

~ Sending audio over a network with ffmpeg

The receiver - Alice

The sender - Björn

The tunnel

~ Using Java LMDB on Apple Sillicon or other unsupported platforms

~ The Augmented Movement Platform For Embodied Learning (AMPEL)

~ An audio focused ffmpeg build for the web

~ pffft.wasm: an FFT library for the web

~ Panako 2.0 - Updates for an acoustic fingerprinting system

~ Decoding LTC and SMPTE on Teensy - Now using interrupts

~ Updates for Panako - an acoustic fingerprinting system

Previous blog posts

10-06-2021 ~ SyncSink - Synchronize media by aligning audio

01-06-2021 ~ Calling JNI code from multiple Java threads: sharing state

31-05-2021 ~ JGaborator Updated - Fine grained spectral transforms from Java

18-02-2021 ~ Music-based biofeedback to reduce tibial shock in over-ground running: a proof-of-concept study

20-10-2020 ~ ISMIR 2020 - Virtual Conference

30-09-2020 ~ PaPiOM: Patterns in Pitch Organization in Music

20-08-2020 ~ Olaf - Acoustic fingerprinting on the ESP32 and in the Browser

06-12-2019 ~ LTC - SMPTE Decoder on Teensy

03-10-2019 ~ MIDImorphosis: recording audio and sensor data

10-09-2019 ~ LW Research Day 2019 on Digital Humanities