~ NextCube, IRCAM Musical Workstation Demo @ Science Day

» By Joren on Monday 13 November 2023

I will be demoing an early digital music workstation at the Flanders 2023 Science Day. During the Science Day there will be demonstrations of several of the electronic music heritage instruments of the collection of IPEM, which used to be an early electronic music production studio. In the collection is a vintage analog synthesizer (an EMS Synthi 100), a Yamaha DX7, an analog plate reverb audio effect processor and, finally, a NeXTcube with a unique sound-card and early digital music workstation software.

The NeXTcube is an influential machine in computing history. The NeXTcube, with an additional soundcard, was also one of the first off-the-shelf devices for high-quality, real-time music applications. I have restored a NeXTcube to run an early version of MAX, an environment for interactive music applications. This combination of software and hardware was developed at IRCAM and was known as the IRCAM Musical Workstation or IRCAM Signal Processing Workstation (ISPW). See my previous blog posts on Electronic Music and the NeXTcube and USB MIDI interface for the NeXTCube

Fig: the NeXTcube's design stood out compared to the contemporary beige box PCs.`

The IPEM collection of electronic music instruments is unique with the aim to reintroduce the instruments into daily music practice an turn them into living heritage`. For example in 2020, the Dewaele Brothers released the album made exclusively on the IPEM ‘EMS Synthi 100’ synthesizer. The NeXTcube demo will be hands-on as well. See you there!

UGent and IPEM

~ Doorbell triggered Halloween window projection

» By Joren on Wednesday 25 October 2023

</img> Fig: *Door projection as imagined by DALL.E*.

I did a thing, and, similar to most stuff made here, it is quite a bit of effort and rather pointless. In that sense, it is a bit like life itself. Anyhow, it seems that the Halloween tradition of trick-or-treating has found a strong foothold in mainland Europe. Due to social embeddedness, I prepared Halloween themed projection that responds to my door-bell. I have a glass door, which is ideal for scary projections. The idea is to have a continuous door projection but with a twist: when kids press the doorbell a projected ghost reacts and rushes towards them along with a loud ghostly scream.

This blog post details the technical setup with the intention to inspire similar projects and serve as documentation for next year. First we need a way react to the doorbell.

Doorbell trigger setup

I sourced a couple of FSR (Force Sensitive Resistor)’s from a “sound book” that I had taken apart. Most of these sound books with e.g. animal sounds are meant for toddlers and have a some type of button and a small electronics circuit to make sound. Some of these books work with FSR ‘buttons’ which are similar in size to a doorbell. I took a single FSR from such a book.

I attached the FSR to a “Teensy LC” micro-controller with an additional resistor and put it in a small 3D-printed case. The Teensy was programmed to emit a MIDI Note On event when the FSR/doorbell is pressed. A Note Off follows when the button is released. Once it is connected via USB to a computer it is essentially regarded as a digital piano with only a single key. Making a micro-controller pretend to be a standard MIDI device is very practical since the message passing protocol is standardized and well supported by many types of systems. MIDI is also optimized for low-latency communication. Via the Web MIDI API there is even support for MIDI in web browsers.

Video projection

While software like Resolume allows for complex interactive video projections, my requirements are more modest: I need a continuous background video and I want the ‘scare’ video and audio to appear when the doorbell is triggered. I opted for a browser-based solution: multi-media capabilities, scripting and MIDI support are all present in modern browsers. Running things in a browser has advantages: there is no need for specialized software, it is easy to program, easy to run, relatively stable and future-proof. The proof-of-concept can be seen below. For the actual projection on a window or door you need to first cover the glass with a thin layer of white paper which lets most light through. A white paper tablecloth works well.

Demo: click the 'start video' to start the background video and click doorbell if you dare...

The code is not much special and a bit hacky but can be found attached. The download includes the “html, javascript, css, video, audio and the micro-controller software for a doorbell-triggered projection”.

Teensy LC in a 3D printed case and FSR
Schema for the setup

UGent and 0110.be

~ Started at the Ghent Centre for Digital Humanities

» By Joren on Tuesday 03 October 2023

Fig: *Gent Gemapt, an recent GhentCDH project*.

From the first of October I started at the Ghent Centre for Digital Humanities (GhentCDH) as research software engineer. GhentCDH ” engages in the field of ‘Digital Humanities’ at Ghent University, ranging from archaeology and geography to linguistics and cultural studies. GhentCDH develops DH collaboration and supports research projects, teaching activities and infrastructure projects across the faculties”.

I will be helping with the many projects they are involved in: ranging form public research valorization to internal research tools. I am sure I will learn a lot by discussing projects with a diverse range of researchers and hope to consolidate my expertise in the area of mulitimedia analysis and annotation in some ways. The current areas of expertise can be found on their website:

Collaborative databases: offering advice and support for collaborative databases at Ghent University. It helps researchers to develop a database instance, powered by e.g. Nodegoat. It provides advice regarding data standards and linked data.
Digital text analysis: aiming to improve digital text analysis at Ghent University by offering support and information to researchers. You can contact us for advice on TEI and digital editions, working with digital text analysis tools, and using computer-assisted qualitative data analysis.
Geospatial analysis: offering advice, support and training regarding geospatial data management, analysis and visualisation to the humanities and social sciences researchers at the Ghent University.
Digital heritage: offering support in regards to digital heritage, participation and virtual expositions. GhentCDH helps researchers, teachers and students to create, manage and enrich their own digital collections and set up virtual exhibitions around them.

A recent GhentCDH project is Gent Gemapt or Ghent mapped ‘an interatcive platform which connects places, historical maps and heritage collections which each other and the wider audience’.

UGent

~ Acoustic fingerprinting in the browser with Olaf

» By Joren on Wednesday 27 September 2023

The recent version of the OLAF (Overly Lightweight Acoustic Fingerprinting) audio fingerprinting system also includes an updated WASM build which deserves a bit more attention.

The browser version of Olaf enables audio fingerprinting in the browser. This can be used to e.g. react to music playing in the environment, so called second screen applications or to synchronize several devices to an audio stream.

The goal of the demo below is to play music aloud - not using headphones - using the controls on the left. You can either play the reference track or an unrelated distractor. Next, the Olaf fingerpinter system needs to be started using the button on the right which captures the microphone of your device. Then Olaf tries match the incoming sound of the microphone and the reference track. Once a match is found the exact time in the match is displayed until the sound matches no more. Note that there is no direct information flowing between the left and right part. You can also play the reference on another device to be sure.

Reference:

\ Distractor:

To get this demo working with the Web Audio API and use AudioWorklet objects, to process audio in the background an not on the main browser thread. There is surprisingly little info to find on how to combine WASM libraries - I used both Olaf and libsamplerate-js - and the AudioWorklet environment. Thanks to one of the very few resources on combining WASM, emscripten and AudioWorklets led me in the right direction.

For more information, check the Olaf acoustic fingerpinter system source code repository.

UGent and Code

~ Resampling audio via a Web Audio API Audio Worklet

» By Joren on Wednesday 27 September 2023

The Web Audio API offers some great functionality for web based audio applications. The API also has a couple of quirks and is not always easy to use. One of those quirks is the limited support for resampling audio. When requesting a microphone stream of a certain sample rate the API only allows configurations your hardware supports. Ideally there should be an option to resample the incoming stream to a requested sample rate (and format) independent of hardware.

On macOS and Chrome the issue becomes even more confusing: when using multiple AudioContexts they can only have the same sample rate. E.g. starting a microphone on 16kHz by itself is possible but not when there is also audio playback on the same page, then everything switches over to 48kHz. There even seems to be an effect of different browser tabs. Other browsers and platforms have similar issues. This is problematic when you need audio in a fixed sample rate.

The solution is to resample audio incoming samples in your code or use the OfflineAudioContext as a resampler. The OfflineAudioContext way needs a lot of code and, crucially, only works on the main browser thread and not in an AudioWorklet. The AudioWorklet should be the place for computationally intensive audio processing like resampling. To solve the resampling problem I have glued together an AudioWorklet and libsamplerate-js to provide an easy to use audio resampling solution which is demo’d below:

The demo does not seem to do much but it reads incoming microphone data and uses a high quality audio resampling library to resample an audio stream into a requested audio sampling rate. The browser development console shows some info on this process. To get this working in an audio worklet, the libsamplerate-js needed to be recompiled and directly included in the AudioWorklet. To inspect the source, check the “Web Audio API AudioWorklet resampler”:[web-audio-api-resample.zip].

The resampling issue came up in development the browser based component of Olaf, an audio fingerprinting solution.

Code and UGent Attachments

web-audio-api-resample.zip

~ ESP32 Olaf - Overly Lightweight Acoustic Fingerprinting on the ESP32

» By Joren on Tuesday 26 September 2023

Fig: In Dutch 'ESP 32' means 32 Hams...

Olaf is an acoustic fingerprinting system designed with embedded devices in mind. It has a low memory use and computational requirements which are compatible with e.g. the ESP32 line of microcontrollers devices like the SparkFun ESP32 Thing or devices based on the RP2040 chip. Recently I have prepared a demo with the newest version of Olaf running on an ESP32 which deserves some attention.

To match audio, Olaf needs access to streaming audio. This can be audio read from an SD-card but, more likely, audio comes from a microphone. Digital microphones have some great features: a low-noise floor, great at picking up omnidirectional sound and they are inexpensive. I have prepared a demo of Olaf which shows how to use Olaf on an ESP32 with an INMP441 MEMS microphone. To test the MEMS microphone I also made a MEMS microphone to WiFi program which sends incoming sound on the ESP32 over WiFi to a computer where the sound quality can be verified.

The example provides a scaffold for embedded music-reactive applications. Once the microcontroller knows which song is playing and where in the song the match is found it can trigger LED’s (or explosions, fireworks, lyrics, other effects…) which should happen in sync with the music. See the example below to get the idea, this demo runs an older version of Olaf but the idea stays the same:

The main difference between the current and previous versions of Olaf is that now the ESP32 version, the browser version and the PC version are all running the exact same code. No hacks are needed any more to support a platform. This means that testing and debugging can be done on a computer and, if everything goes well, the code should work as expected on the embedded device (or browser).

If you want to know more about Olaf, read the paper on the Olaf audio search, check out the Olaf source code repository or consult the Olaf on ESP32 readme.

UGent and Code

~ ESP32 I2S WiFi Microphone

» By Joren on Tuesday 26 September 2023

Fig: *Audio over WiFi*.

Getting MEMS microphones to work on microcontroller platforms as the ESP32 is challenging. In theory, the I²S protocol provides a standardised, easy way to receive audio from a microphone and send stereo audio to a DAC. In practice, the many parameters make I²S not straightforwards to use. As with most protocols and standards, the mismatch between limitations and quirks of specific hardware and software implementations can cause issues. To debug I2S microphones on ESP32 or the RP2040 I have prepared a small Arduino program.

The IS2 WiFi microphone program sends audio from the microphone over WiFi to a computer which listen to the microphone: this make sure that the microphone works as expected and audio samples are correctly interpreted. It validates the I2S settings like buffer sizes, sample rates, audio formats, stereo or mono settings, … After configuring an SSID, password and IP-address it becomes possible to listen — in real-time — to the microphone which also allows the listener to sense the microphone quality.


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

  size_t bytesIn = 0;
esp_err_t result = i2s_read(I2S_PORT, &sBuffer, bufferLen, &bytesIn, portMAX_DELAY);

int16_t *sample_buffer = (int16_t *)sBuffer;
int16_t samples_read = bytesIn / 2;
float audio_block_float[samples_read];

for (size_t i = 0; i < samples_read; i++) {
    sample_buffer[i] = gain_factor * sample_buffer[i];
    // Max for signed int16_t is 2^15
    audio_block_float[i] = sample_buffer[i] / 32768.f;
}

// Send raw audio 32bit float samples over UDP
Udp.beginPacket(outIp, outPort);
Udp.write((const uint8_t *)audio_block_float, bytesIn * 2);
Udp.endPacket();

Fig: The main part of reading i2s audio from a microphone and sending an UDP packet.

To listen to the incoming audio an UDP port needs to be captured and subsequently send to a program that can interpret and play or store audio. With netcat UDP data can be captured. With ffmpeg and ffplay audio can be payed or stored. In practice the receiving computer might run the following commands to decode UDP packages and hear the microphone:

# for playback, receive UDP packages and interpret raw audio
nc -l -u 3000 | ffplay -f f32le -ar 16000 -ac 1  -

# for playback, receive UDP packages and store in a wav file
nc -l -u 3000 | ffmpeg -f f32le -ar 16000 -ac 1 -i pipe: microphone.wav

The ESP32 WiFi microphone has been developed during the development of Olaf, an audio search system which also works for embedded devices. There is a page with more info on Olaf on ESP32.

Code and UGent Attachments

esp32_inmp441_wifi_mic.ino and audio_over_wifi.webp

~ A Python wrapper for Olaf - Acoustic fingerprinting in Python

» By Joren on Friday 22 September 2023

Fig: *Python wrapping C*.

I have just released a Python wrapper for the Olaf acoustic fingerprinting library. Olaf is a scalable audio search system based on indexing . Olaf is programmed in C but a wrapper now makes its functionality available in Python.

The python wrapper should make it more accessible for developers to get started with it and makes it compatible with other Python libraries. A few notable libraries are the librosa python package for music and audio analysis,nnAudio, A fast GPU audio processing toolbox and other more general plotting, data processing and machine learning libraries. Despite Python’s many flaws, its rich library ecosystem is unmatched.

The associated GitHub repository contains documentation on how to use the Olaf python wrapper and also contains examples. The first shows how to index a song into the database and subsequently query the database. The second visualises the event points extracted by Olaf. The figure below shows shows the resulting event points, extracted with Olaf, plotted on a magnitude spectrogram, calculated with Olaf. The spectrogram on top is calculated using librosa and is meant to be very similar to Olaf.

\
Fig: *A power spectrum from librosa and one from Olaf, with event points marked*.

The wrapper was made with Python CFFI which works reasonably well. The automatically generated wrapper library support a large part of the C language but it needs a compilation step for each platform. Currently, the instructions assume a POSIX-like system, but technically, the wrapper can also function on Windows, albeit with the potential need for Windows-equivalent instructions in place of certain POSIX ones. The wrapper is wrapped in an easy to use python class called Olaf.py:

```python\ from olaf import Olaf, OlafCommand\ import librosa

Store the first ten seconds of an audio file\ audio_file = librosa.ex(‘choice’)\ Olaf(OlafCommand.STORE,audio_file).do(duration=10.0)

Query for a part of the same file (with an offset of 7 seconds), but change volume\ y, sr = librosa.load(audio_file,mono=True, sr=16000,duration=10,offset=7.0)\ y = y * 0.8 #change the volume\ results = Olaf(OlafCommand.QUERY,audio_file).do(y=y)

We expect a match between the stored and partially overlapping query\ print(results)\ ```

Code, Command Line Application, and UGent

~ Installing a self-hosted UniFi Network Server on Debian 11

» By Joren on Friday 08 September 2023

Fig: UniFi manager.

I have been using a couple of UniFi devices in my home network for a couple of years. These devices proved to be reliable and full-featured, especially considering the relatively low price point. To manage UniFi devices a self-hosted instance of the UniFi network server is practical, especially when you already have a home server.

Unfortunately, the official installation instructions for UniFi miss a crucial step for installation on Debian 11. The network manager is not compatible with newer versions of mongodb. To install a version of mongodb compatible with UniFi on Debian 11, use the following commands:

wget -qO - https://www.mongodb.org/static/pgp/server-3.6.asc | sudo apt-key add -
echo "deb http://repo.mongodb.org/apt/debian stretch/mongodb-org/3.6 main" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.list
sudo apt-get install mongodb-org=3.6.23
sudo apt-get install mongodb-org-server=3.6.23

UGent

~ How Shazam IDs songs

» By Joren on Monday 04 September 2023

The Wall Street Journal made a video on the internals Shazam fingerprinter. The visuals and technical explanation serves as a very good introduction in spectral-peak-based audio fingerprinting. For those who want a more in depth view or want to try out such systems: I have implemented extensions on the Shazam technique in two open-source systems.

Olaf is a spectral-peak based fingerprinter aimed at embedded systems, traditional computers and browsers. Panako is implemented in Java and has robustness against pitch-shifting and time-stretching which is briefly mentioned in the video below as well:

UGent

~ Dragon! Sound effects for board games

» By Joren on Friday 01 September 2023

Fig: *Rock & Troll* collaborative board game.

I often play board games with my kids. One of them is an absolute board game fan while the other is a sore loser and only wants to play collaborative games. These games are played ‘against the board’ and you win, or lose, together. I myself also still have problems losing games so I do understand this predicament. Genetics…

Rock & Troll is one of those games. It is a chance based game where you collaboratively try to build a path to a treasure before the dragon reaches it. Every player has to flip a tile which is either a part of the path (good) or a dragon (very bad). To increase engagement during play I often add sound effects. I was thinking: this can be improved and automated. For example, by doing this when a dragon tile is flipped:

The idea is to unobtrusively detect game state and add sound effects at critical moments. The sound effect should be playing without too much lag, ideally within about 200ms, so it feels immediate and connected to the game event. To implement this a camera based system with robust, fast object detection seemed like the way to go.

Dragon detection

To detect dragons in a video stream I want to retrain an existing object-detection system. So two things need to happen: first a realistic, labeled dataset needs to be created. Then a system needs to be trained to detect the dragons. We do not want to label a massive dataset so we will use transfer learning to retrain an existing network. This existing network should already have learned basic features like edges, colors, geometries and other basic patterns. With the hope that this would result in robust detection, even with a limited dataset.

To create the dataset I wrote a small script which took a webcam picture every few seconds while I was manipulating the board and tiles. This resulted in about 130 pictures, some with no dragons and some with six, 300 labels in total. For annotating the dataset I used the free roboflow web-app which also hosts the final dragon dataset. After augmentation, the size of the dataset can be tripled. The command to extract images from a webcam looks like this on my system:

ffmpeg -y -r 30 -f avfoundation -i '0' -frames:v 1 snapshot.jpg

After some consideration for alternatives I landed on the YOLOv8 object-detection system: a robust and fast object-detection system. Additionally, it is well-documented, pytorch-based, easy-to-use and it has support for video streams. The annotated roboflow dataset can be downloaded in a YOLOv8 compatible format as well. Transfer learning, was based on the yolov8s.pt weights, which are downloaded automatically. With the system installed correctly and the dataset dowloaded, a local GPU based training command might look like this:

yolo train data=RocknTroll.v3i.yolov8/data.yaml epochs=30 model=yolov8s.pt device=mps imgsz=640 batch=32

Once the system was trained - download the “model wheights here”:[dragons.pt] - a bit of “glue code”:[rock_n_troll_v8.py] is needed. The python script needs to stream images from a camera, here via open cv, and detect dragons in each image. Every time a new dragon is found, the sound effect is played. Note that the Roboflow website automatically trains a model as well which can be tried out with a webcam.

There are a few ways improve the robustness of the system. During a game there are only more and more dragons: if the script detects less dragons than before it is probably a false negative or there is occlusion. Additionally, the dragon tiles remain in the same location once they are placed on the board. This means that new dragons are expected only in certain regions of the image. Both heuristics can be used to together to improve robustness.

Notes

One of the reasons I bought a M1 mac with unified memory is for exactly these types of AI applications. After installing pytorch 2.0, the GPU acceleration resulted in a 10x training speed improvement. Training on a GeForce 1080 GTX from 2016 was still quite a bit faster, probably thanks to years of performance tuning targeting CUDA. It is clear that the mac GPU acceleration software ecosystem can use more effort, even system tools in macOS are limited: e.g. in the macOS activity monitor, GPU activity is very much an afterthought.

I am and hesitant to use cloud based GPU computing due to lack of control and privacy. I am not willing to send pictures from my kids to e.g. Google Cloud GPUs. The dependency on hardware of others might also limit the longevity of systems.

The ease-of-use, performance and accessibility of these deep-learning systems is great. Only a couple of years ago it would take months of hard work to maybe only approach similar detection performance. Adapting this idea for other board games and more types of tiles or board game events should be very possible.

Mac's limited GPU usage guage
Rock & Troll game
Inference results on a webcam stream

Code, UGent, Deep-learning, and 0110.be Attachments

dragons.pt, thumb_rock_n_troll.webp, rock_n_troll_demo.mp4, rockntroll.webp, 315794__bevibeldesign__dragon-roar-distressed-7.mp3, and rock_n_troll_v8.py

~ ☀️ Solar sockets - Delivers power only on solar energy surplus

» By Joren on Wednesday 23 August 2023

Fig: Solar socket.

I recently installed a couple of smart electrical sockets. The sockets only switch on when there is a solar energy surplus: when my rooftop solar panels produce more than the current energy consumption. I use these ‘solar sockets’ to charge the battery of an electric bike, for air conditioning and for charging other smaller devices. This post describes the components needed for such a system with the aim to inspire similar build.

Solar panels and a solar inverter with some form of readout.
A device to measure electrical energy use in a home.
Smart sockets with an easy to use API.
Some software to glue everything together.

1. ☀️ Solar panels and inverter

Most solar inverters have some form of API to readout the current solar panel output. In my case I use a SMA inverter which has two ways to extract this data: via Bluetooth and via wired ethernet. I found the wired ethernet solution to be the most reliable. The SMA inverter does use a somewhat annoying data formatting protocol but luckily there is an open source solution to decode the data: SBFspot

For SMA inverters, and possibly for others as well, there another option: the data is also automatically uploaded to a cloud based platform. This platform has an API which can be used to extract data on solar energy production. I do not like to be dependent on external cloud based software platforms, which might change at any time. Additionally, for real-time data cloud based platforms can be slow.

2. Measuring total electrical energy use

To measure total power use, I use an “Eastron SDM220M” measurement device which communicates with a server over a serial connection. There are adapters to translate serial Modbus to USB. The device is installed in my wiring closet by a professional: it is directly connected to the 60A mains and I would not advise to DIY it.

Alternatively, some places are equipped digital energy meters which might have a way for direct readout or readout via a cloud based API, after a few minutes. This might suffice for a solar socket install.

Energy use measurement might not be strictly needed for the ‘solar sockets’: if energy use is predictable it might be ok to simply switch the sockets on your average peak solar power. Perhaps combined with a local weather API. Finally we need to switch on some sockets.

3. Smart WiFi Socket

There are many smart WiFi sockets on the market. Most come with a smartphone app which allows you to control the socket from anywhere. Behind the scenes the sockets communicates with the vendor’s cloud based system over the internet. Additionally there are some integrations with systems like Apple Home, Amazon Alexa en Google Assistant. For fundamental\ infrastructure like sockets in my home I want to avoid dependencies on a external cloud based systems. Next to the concerns about privacy and ownership there is a very practical concern: the cloud based system might just stop working in a few years. Especially any dependency on a Google service is suspect. Also I am not convinced of Amazon Alexa’s future.

Luckily, there is Tasmota which provides open source firmware targeting many types of ‘smart’ devices including smart sockets. The tagline for Tasmota is ‘Total local control with quick setup and updates’. I bought a couple of Nous A1T Tasmota Smart WiFi sockets which come with Tasmota firmware. Switching the socket on is done by sending an HTTP GET request to an url, which can be scripted easily. There is some

4. Control software

A script glues everything together: it logs energy usage and solar output. It switches on the solar sockets when a surplus is detected and switches again when the surplus is gone. There is some additional logic which ensures that the socket remains on for at least an hour even if there is no solar surplus. This to ensure that batteries are charged to a minimal usable state.

In summary, here we presented a couple of building blocks to build ‘solar sockets’ which are on only when there is a energy surplus. By using simple API’s offered by Tasmota and locally running software, there is no dependency on (in the long term) unreliable cloud based systems which ensures the longevity of the build.

As an additional bonus, the solar sockets also serve as an indicator. A small LED shows when they are on, or, in other words, when there is solar energy surplus and when it is a good idea to switch on other electrical appliances.

Harde waren, 0110.be, and UGent

~ Is a frequency present in a signal? A C implementation.

» By Joren on Thursday 27 July 2023

This post is an efficient way to determine whether a predefined frequency is present in a signal. If such an algorithm can be found, it can serve as a basis for a modem. With a modem data is modulated and demodulated at the receiving side. The modulation allows data to be send over a transmission channel.

With the ability to detect the presence of audible frequencies a modem can transform symbols into a combination of frequencies and send data over sound. This is exactly what happens with DTMF in the sound below. DTMF is also used in the dailup sound.

Audio: dail tone sequence: which numbers are pressed?

Typically, determining the presence of frequencies in a signal is done with an FFT: an FFT divides a signal into e.g. 512 linearly spaced frequency bands and determines the magnitude of each of these frequencies. The annoying thing is that a probe frequency can be right in between two bands: sample rate, FFT size and the frequency to look for need to be carefully chosen to reliably detect a frequency. Also, it is computationally inefficient to calculate the magnitudes for all frequency bands if only one band is actually needed.

Luckily there is an alternative approach which looks like the calculating the FFT but for only one predetermined frequencies. This algorithm is known as the Goertzel algorithm and is used in DTMF dail tone encoding and decoding. With the standard Goertzel algorithm it is still needed to consider sample rate and the frequency of interest.

Finally there is the “Generalized Goertzel”: algorithm. In this version of the algorithm employs a couple of tricks to allow an arbitrary frequency and sample rate while still respecting the Kotelnikov frequency limit, better known as the Nyquist frequency.

Recenlty I needed a piece of ANSI c code to detect the magnitude of an arbitrary frequency for a project. The following is a C implementation of this algorithm. It uses the C support for complex numbers in the complex.h header:


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

  #include <math.h>
#include <stddef.h>
#include <complex.h>

float detect_frequency(float frequency_to_detect,
                       float audio_sample_rate,
                       float *audio_block,
                       float *window,
                       size_t audio_block_size) {
    float audio_block_sizef = (float)audio_block_size;
    float indvec = frequency_to_detect / audio_sample_rate * audio_block_sizef;
    float pik_term = 2 * M_PI * indvec / audio_block_sizef;
    float cos_pik_term2 = cosf(pik_term) * 2;

    float s0 = 0;
    float s1 = 0;
    float s2 = 0;

    for (size_t i = 0; i < audio_block_size; i++) {
        // potential improvement: expect windowed samples
        float windowed_audio_sample = window[i] * audio_block[i];
        s0 = windowed_audio_sample + cos_pik_term2 * s1 - s2;
        s2 = s1;
        s1 = s0;
    }

    s0 = cos_pik_term2 * s1 - s2;

    float complex cc = cexpf(0 + -1.0f * pik_term * I);
    float complex neg_s1 = -s1 + 0 * I;
    float complex pos_s0 = s0 + 0 * I;
    float power = cabsf(cc * neg_s1 + pos_s0);

    return power;
}

Demo

Below you can try out the algorithm. You can choose a frequency to detect and a playback frequency. The magnitude of the frequency is reported via the slider. The demo uses a javascript translation of the code above.

Dual-tone Multi-Frequency - DTMF

DTMF in the browser.

On the right you can find a demo of dual tone frequency modulation and demodulation. A combination of frequencies is played and immediately detected.

The green bars show which frequencies have been detected. If for example 1209 Hz is detected together with 770 Hz then this means that we are looking for the symbol in the first column on the second row. Both the first column and the second row are highlighted in green. At that spot we see 4 so we can decode a 4. By using 2 combinations of four frequencies a total number of 16 symbols can be encoded.

Note that this code does not simply highlight the button press directly but encodes the symbol in audio, feeds it into an Web Audio API format and decodes audio, the result of the decoding step highlights the row and column detected.

Code, UGent, and Command Line Application

~ Olaf: a lightweight, portable audio search system

» By Joren on Tuesday 04 July 2023

Fig: Some AI imagining audio search.

Recently I have published a paper titled ‘Olaf: a lightweight, portable audio search system’ in the Journal of Open Source Software (JOSS). The journal is a ‘hack’ to circumvent the focus on citable papers in the academic world: getting recognition for publishing software as a researcher is not straightforward.

Both Ghent University’s research output tracking system and Flanders FWO academic profile do not allow to enter software as research output. The focus is still solely on papers, even when custom developed research software has become a fundamental aspect in many research areas. My role is somewhere between that of a ‘pure’ researcher and that of a research software engineer which makes this focus on papers quite relevant to me.

The paper aims to make the recent development on Olaf ‘count’. Thanks to the JOSS review process the Olaf software was improved considerably: CI, unit tests, documentation, containerization,… The paper was a good reason to improve on all these areas which are all too easy to neglect. The paper itself is a short, rather general overview of Olaf:

“Olaf stands for Overly Lightweight Acoustic Fingerprinting and solves the problem of finding short audio fragments in large digital audio archives. The content-based audio search algorithm implemented in Olaf can identify a short audio query in a large database of thousands of hours of audio using an acoustic fingerprinting technique.”

Research papers, UGent, Music Information Retrieval, and Command Line Application

~ Identifying memory leaks in C

» By Joren on Thursday 29 June 2023

Fig: Memory leaks.

The C programming language is deceptively simple. The syntax is straightforward, C has a limited amount of keywords and a small standard library. The first edition of the classic book ‘The C Programming Language’ is only about 200 pages. And yet, when programming in C, it is hard to avoid the many exiting footguns: integer type conversions, unchecked indexes and memory leaks can all cause subtle problems. This is part of the appeal of C: shooting yourself in the foot does make you feel alive. Here I want to focus on ways to check for memory leaks for C programs.

Memory leaks come about when memory is claimed but is never released again. If this is done in a loop or during a long running program, the claimed memory adds up and eventually the system may run out of memory. A memory leak is less a problem if a program forgets to free a small amount of memory it only claims once: after program shut down, the operating system reclaims all memory anyhow. However, it does feels very dirty to not clean up after oneself. And I for one, am not a dirty boy.

Another reason to look for memory use and leaks is when you are programming for embedded devices. For these systems memory is very limited: in that world 500kB RAM is considered a massive amount of memory. I have been busy programming a scalable audio search system called Olaf which targets both traditional computers, embedded systems and browsers (via WebAssembly). It is clear that memory use — and memory leaks — need to be kept in check to pull this of.

Now, these memory leaks might not be easy to spot by inspecting the code. There are tools which help to spot memory management problems. One of these is valgrind which is currently not easy to use on Apple system with ARM processors. Luckily there is an alternative which is probably already installed on macOS via the XCode Command Line Tools a command line tool aptly called leaks. To quote the apple documentation on leaks, leaks reports:

the address of the leaked memory
the size of the leak (in bytes)
the contents of the leaked buffer

The most straightforward use of leaks is to run a program and generate a report after program shutdown. See below to run a memory leak inspection, in this case for the bin/olaf_c program which indexes an audio file in a key-value store. For CI purposes it is practical to know that leaks has an exit status of zero only when no leaks have been found. The exit status can be used in an automated test script to break a build if a leak is detected. The --quiet option can be practical in such setting.


  1

  leaks --atExit -- bin/olaf_c store audio.raw audio

In the case of Olaf I made a classic mistake: I had called free() on hash table but I needed to call the hash table destructor: hash_table_destroy() which freed not only the hash table itself but also all memory associated with the hash table entries. After a quick fix the leaks command showed no more leaks!


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

  leaks Report Version: 4.0, multi-line stacks
Process 35293: 2200395 nodes malloced for 135146 KB
Process 35293: 2200171 leaks for 138371200 total leaked bytes.

STACK OF 1 INSTANCE OF 'ROOT LEAK: ':
5 dyld 0x1a16dbf28 ...
4 olaf_c 0x100db145c main ...
3 olaf_c 0x100db53c8 olaf_...
2 olaf_c 0x100db4400 olaf_...
1 olaf_c 0x100da5788 hash_...
0 libsystem_malloc.dylib 0x1a1874d88 _mall...

2200171 (132M) ROOT LEAK:  [64]
2200170 (132M)  [50348032]
2 (80 bytes)  [32]
1 (48 bytes)  [48]

Output of the `leaks` command which shows where a memory leak can be found.

General takaways

leaks is an easy to use memory leak inspector provided by Apple. It is an alternative for valgrind.
Memory leaks can be checked automatically using the leaks exit status in a CI-script. This makes spotting leaks timely and more straightforward to fix.
Programmers should at least once try to target embedded devices. It makes you conscious of the wealth of resources available when targeting modern computing devices.

Programming on C
Programming in C
Memory leeks

Code, UGent, and Command Line Application

~ Optimizing C code with profiling, algorithmic optimizations and 'ChatGPT SIMD'

» By Joren on Monday 26 June 2023

This post details how I went about optimizing a C application. This is about an audio search system called Olaf which was made about 10 times faster but contains some generally applicable steps for optimizing C code or even other systems. Note that it is not the aim to provide a detailed how-to: I want to provide the reader with a more high-level understanding and enough keywords to find a good how-to for the specific tool you might want to use. I see a few general optimization steps:

The zeroth step of optimization is to properly **question the need** and balance the potential performance gains against added code complexity and maintainability.
Once ensured of the need, the first step is to **measure the systems performance**. Every optimization needs to be measured and compared with the original state, having automazation helps.
Thirdly, the second step is to **find performance bottle necks**, which should give you an idea where optimizations make sense.
The third step is to **implement and apply** an optimization and measuring its effect.
Lastly, **repeat** steps zero to three until optimization targets are reached.

More specifically, for the Olaf audio search system there is a need for optimization. Olaf indexes and searches through years of audio so a small speedup in indexing really adds up. So going for the next item on the list above: measure the performance. Olaf by default reports how quickly audio is indexed. It is expressed in the audio duration it can process in a single second: so if it reports 156 times realtime, it means that 156 seconds of audio can be indexed in a second.

The next step is to find performance bottlenecks. A profiler is a piece of software to find such bottle necks. There are many options gprof is a command line solution which is generally available. I am developing on macOS and have XCode available which includes the “Instruments - Time Profiler”. Whichever tool used, the result of a profiling session should yield the time it takes to run each functions. For Olaf it is very clear which function needs optimization:

\ Fig: The results of profiling Olaf in XCode's time profiler. Almost all time is spend in a single function which is the prime target for optimization.

The function is a max filter which is ran many, many times. The implementation is using a naive approach to max filtering. There are more efficient algorithms available. In this case looking into the literature and implementing a more efficient algorithm makes sense. A very practical paper by Lemire lists several contenders and the ‘van Herk’ algorithm hits the sweet spot between being easy to implement and needing only a tiny extra amount of memory. The Lemire paper even comes with example c max-filters. With only a slight change, the code fits in Olaf.

After implementing the change two checks need to be done: is the implementation correct and is it faster. Olaf comes with a number of functional and unit checks which provide some assurance of correctness and a built in performance indicator. Olaf improved from processing audio 156 times realtime to 583 times: a couple of times faster.

After running the profiler again, another method came up as the slowest:


  1
2
3
4
5
6
7
8

  //Naive implementation
float olaf_ep_extractor_max_filter_time(float *array, size_t array_size) {
    float max = -10000000;
    for (size_t i = 0; i < array_size; i++) {
        if (array[i] > max) max = array[i];
    }
    return max;
}

src: naive implementation of finding the max value of an array.

This is another part of the 2D max filter used in Olaf. Unfortunately here it is not easy to improve the algorithmic complexity: to find the maximum in a list, each value needs to be checked. It is however a good contender for SIMD optimization. With SIMD multiple data elements are processed in a single CPU instruction. With 32bit floats it can be possible to process 4 floats in a single step, potentially leading to a 4x speed increase - without including overhead by data loading.

Olaf targets microcontrollers which run an ARM instruction set. The SIMD version that makes most sense is the ARM Neon set of instructions. Apple Sillicon also provides support for ARM Neon which is a nice bonus. I asked ChatGPT to provide a ARM Neon improved version and it came up with the code below. Note that these type of simple functions are ideal for ChatGPT to generate since it is easily testable and there must be many similar functions in the ChatGPT training set. Also there are less ethical issues with ‘trivial’ functions: more involved code has a higher risk of plagiarization and improper attribution. The new average audio indexing speed is 832 times realtime.


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

  #if defined(__ARM_NEON)
#include <arm_neon.h>
// ARM NEON implementation
float olaf_ep_extractor_max_filter_time(float *array, size_t array_size) {
    assert(array_size % 4 == 0);
    float32x4_t vec_max = vld1q_f32(array);
    for (size_t j = 4; j < array_size; j += 4) {
        float32x4_t vec = vld1q_f32(array + j);
        vec_max = vmaxq_f32(vec_max, vec);
    }
    float32x2_t max_val = vpmax_f32(vget_low_f32(vec_max), vget_high_f32(vec_max));
    max_val = vpmax_f32(max_val, max_val);
    return vget_lane_f32(max_val, 0);
}
#else
//Naive implementation
#endif

src: a ARM Neon SIMD implementation of a function finding the max value of an array, generated by ChatGPT, licence unknown, informed consent unclear, correct attribution impossible.

Next, I asked ChatGPT for an SSE SIMD version targeting the x86 processors but this resulted in noticable slowdown. This might be related to the time it takes to load small vectors in SIMD registers. I did not pursue the SIMD SSE optimization since it is less relevant to Olaf and the first performance optimization was the most significant.

Finally, I went over the code again to see whether it would be possible exit a loop and simply skip calling olaf_ep_extractor_max_filter_time in most cases. I found a condition which prevents most of the calls without affecting the total results. This proved to be the most significant speedup: almost doubling the speed from about 800 times realtime to around 1500 times realtime. This is actually what I should have done before resorting to SIMD.

In the end Olaf was made about ten times faster with only two local, testable, targeted optimizations.

General takeways

Only think about optimization if there is a need and set a target: otherwise it is infinite.
Try to find a balance between complexity, maintainability and performance.
Changing a naive algorithm to a more intelligent one can have a significant performance increase. Check the literature for inspiration.
Check for conditions to skip hot code paths before trying fancy optimization techniques.
Profilers are crucial to identify where to optimize your code. Applying optimizations blindly is a waste of time.
Try to keep optimizations local and testable. Sprinkling your code with small, hard to test performance oriented improvements might not be worthwile.
SIMD generated by ChatGPT can be a very quick way to optimize critical, hot code paths. I would advise to only let ChatGPT generate small, common, easily testable code: e.g. finding the maximum in an array.
Having only localized ‘trivial’ ChatGPT parts means you can take them out once it is clear that you have copied code without proper attribution or licensing.
The use of SIMD can slow down your code if you are not careful, measure the effects of your ‘optimizations’!

After optimization, a new method takes most time.
Pre optimization, a single method takes most of the time.

Music Information Retrieval, Command Line Application, UGent, and Code

~ Running MAX/FTS for NeXTStep in an emulator

» By Joren on Monday 22 May 2023

The aptly named emulator Previous allows you to emulate NeXT hardware on modern platforms. It allows you to run NeXTStep and run software made for this environment. I have prepared a downloadable disk image for this emulator to experiment with an early version of MAX, a visual programming environment for music applications.

I am interested in the early days of MAX an influential visual programming environment geared towards music applications. This software originated at IRCAM and an early version was build for the NeXTcube and required a specialized soundcard (ISPW). I have been lucky to have been able to restore a NeXTCube with an ISPW soundcard but this hardware is extremely rare.

A functioning NeXTcube is already a rare collectors item and the ISPW soundcard is even more uncommon. The card was developed at IRCAM and later commercialised by Ariel Inc. which brought it to market right when NeXT announced to phase out hardware production. Overnight, the market for NeXT peripheral hardware collapsed and nobody wanted to buy an expensive soundcard for an obsolete hardware platform. The soundcard was never mass-produced: there are only the prototype boards made at IRCAM and a few batches of the commercial version. The total number ISPW soundcards must have been around a few dozen worldwide, in 1992. Now, over 30 years later, it is virtually impossible to find an ISPW card. Running an early version of MAX on original hardware is nearly impossible.

\ Fig: MAX 0.25 running in the Previous emulator on a modern MacOS system.

To make experiencing an early version of MAX more accessible I have prepared a disk image for the Previous emulator. The emulator emulates several NeXT machines and lets you run NeXT software on modern machines. The disc image uses NeXTStep 3.3 for OS and has MAX 0.25 pre-installed. Several NeXT machines can be emulated by Previous but, crucially for MAX, the ISPW soundcard is not emulated. This puts a severe limitation on MAX: there is no audio or MIDI input or output. However, you can start MAX in simulation mode and experience the patcher and see the original documentation and have a feel for the original supplied patches and follow the logic of patches.

To run MAX, download the disc image with NeXTStep and MAX [about 100MB]. By downloading this material you agree to use it only for academic, educational, historic or documentary use and not for commercial or other purposes. The zip-file also contains some information on how to get started and boot the system. Note that it has only been tested on an M1 mac.

For some more context please listen to Philippe Manoury on ‘Informatique musicale’ (1992) for insights into the ISPW at IRCAM. Read “Steve Jobs & the Next Big Thing (1993)” by Randall E. Stross which paints the commercially bleak history of NeXT and the (managereal, commercial, emotional) incompetence of Steve Jobs. It complements the “Steve Jobs (2011)” biography by Walter Isaacson nicely. Also see Electronic Music and the NeXTcube - Running MAX on the IRCAM Musical Workstation and USB MIDI interface for the NeXTCube - ISPW board

Starting MAX/FTS in Previous
Running MAX/FTS in Previous

UGent

nextstep_max_emulation.zip and max_running.png

~ USB MIDI interface for the NeXTCube - ISPW board

» By Joren on Friday 05 May 2023

I have recently restored a NeXTCube with an ISPW Ariel soundcard with the aim to put it in the hands of artists and researchers in the context of a living electronic music instrument heritage project. To make the cube talk to keyboards, synths or other audio workstations I have built a MIDI interface for the NeXTcube.

Fig: the NeXTCube with the Ariel ProPort and MIDI input/output interface.

Recently, I was able to restore a NeXTCube and install an early version of MAX - a graphical music programming environment. However, a crucial part of the system was missing: there was no way to do MIDI input/output. MIDI is used to connect controllers, keyboards, synthesizers or other musical instruments to the audio workstation. The NeXTCube itself has a serial port which allows users to connect MIDI devices. Next to the serial port on the mainboard, the NeXTCube I am working with also has a RS-422 serial port on the ISPW ‘soundcard’. The serial port uses RS-422 and mini DIN 8 connectors which provide MIDI input and output. While the MIDI data bytes are transmitted according to spec, the connector and the electrical signals are not compatible with standard MIDI.

Fig: the IRCAM/Ariel ISPW soundcard with mini DIN-8 RS-433 serial port on the right.

For MIDI I/O we need a device which allows to connect the RS-422 MIDI to both legacy MIDI devices and to computers via USB MIDI. If a MIDI event arrives from the NeXTCube’s RS-422 it needs to be passed through to the USB and legacy MIDI ports and the other way around. The Teensy platform is ideal: it supports hardware serial and USB MIDI. In this retro-computing project, it seems wasteful to use the 600MHz Teensy 4.0 only for message passing: the Teensy has much more computing power than NeXTcube but it is cheap, easy to program, available and practical.

The RS-422 serial port uses –6V to 6V logic which needs to be transformed to the 0V to 3.3V logic for the Teensy microcontroller. A PCB provides this capability and is connected to a hardware serial port of the Teensy. The pinout of the RS-422 port was measured via a scope and matched the documentation. The Teensy has an usbMIDI mode and can present itself as a standard MIDI device to a PC. Two opto-isolated legacy MIDI DIN-5 ports were connected to another hardware serial port. The software on the Teensy conducts the “three-way MIDI message passing”:[midi_passthrough.ino].

Vid: Max/FTS FM synth reacting to USB MIDI input.

The electronics were fixed into a reused metal enclosure. The front panel of the enclosure was replaced by a custom 3D printed panel. The front contains the RS-422 port, two MIDI DIN 5 ports and a micro usb port either for power alone or MIDI messages and power. Feel free to check out the “OpenSCAD design with a level MINI DIN8 hole”:[midi_box.scad].

With a working MIDI interface for the NeXTcube allows interfacing with MIDI keyboards and controllers. It can also be used to measure roundtrip latency. MIDI to sound latency determines how long it takes between pressing a MIDI key and hearing sound. MIDI to MIDI roundtrip latency determines how long it takes to process, parse and return a MIDI message. For a responsive, reliable system both types of latencies should be constant and preferably in the range of 10ms or below.

Fig: Measured MIDI roundtrip latency on the ISPW board for the NeXTCube.

Measuring the MIDI roundtrip latency shows that the system is able to respond in 3.6+–0.4 ms (N=300). A combination of a MAX patch and “Teensy firmware”:[next_midi_roundtrip_latency.ino] was used to measure this automatically. The MIDI-to-audio latency was measured a few times manually and always was around 13ms. These figures show that the system is ideal for low-latency real-time music making in its default configuration. In MAX the audio buffer sizes could be reduced to achieve an even lower latency but with the risk of running into buffer underruns and audio glitches.

Small discussion the USB MIDI interface for the NeXTCube on Hackaday

Back of the Ariel ProPort interface
Back of the NeXTcube with ISPW soundcard
The original RS-422 MIDI message
The TTL MIDI message
Measuring MIDI-to-Audio latency
Front of the Ariel ProPort interface
NeXTcube with MIDI I/O box
OpenSCAD model of the front
It looks better on the outside..
Metal enclosure with 3D-printed front for IO
Metal enclosure with 3D-printed front for IO

Muziek, Harde waren, UGent, and Code Attachments

midi_box_example_web.webp, nextcube_midi_proport.webp, midi_roundtrip_test.pat, next_midi_roundtrip_latency.ino, midi_passthrough.ino, midi_roundtrip_latency.svg, midi_box.scad, ariel_soundcard_IRCAM_ISPW.jpg, and midi_box_example_web.mp4

~ Electronic Music and the NeXTcube - Running MAX on the IRCAM Musical Workstation

» By Joren on Tuesday 02 May 2023

The NeXTcube context and the IRCAM Musical Workstation

In 1990 NeXT started selling the NeXTcube, a high-end workstation. It introduced or brought together many concepts (objective-c, the Mach kernel, postscript, an app store) which are still in use today. The NeXTcube’s influence is especially felt in the Apple ecosystem with Mac OS X, iPhones and iPads being direct decedents of NeXT’s line of computers.

Due to its high price, the NeXTcube was not a commercial success. It mainly ended up at companies or in the hands of researchers. Two of those researchers, Tim Berners-Lee and Robert Cailliau created the first http server and web browser at CERN on a NeXTcube. Coincidently, the http software was publicly released exactly 30 years ago today. Famously, the cube was also used to develop games like the original Doom and Quake. So yes, the NeXTcube runs Doom.

Fig: the NeXTcube's design stood out compared to the contemporary beige box PCs.

Less well known is the fact that the NeXTcube is also one of the first computing devices capable enough for real-time, high-quality interactive music applications. In the mid 1980s this was still a dream at IRCAM, a French research institute with the aim to ‘contribute to the renewal of musical expression through science and technology’. The bespoke hardware and software systems for music applications from the mid 80s were further developed and commercialised in the early 90s. Together these developments resulted in a commercially available version of the “IRCAM Musical Workstation (IMW)”, an early, if not the first, off-the-shelf computer for interactive music applications.

The IRCAM Musical Workstation (IMW), sometimes called the IRCAM Signal Processing Workstation (ISPW), consisted of several hard and software modules working together to enable interactive music applications. An important component was a ‘soundcard’ which had two beefy 40MHz i860 intel CPUs for DSP. When installed in the NeXTcube, the soundcard had more computing power than the rest of the computer. This is similar to modern computers where some graphics cards have more raw computing power than the main CPU. The soundcard was developed at IRCAM and commercialized by Ariel inc. under the name “Ariel ProPort”.

The IRCAM Ariel DSP coprocessor, soundcard.

A few software environments were developed at IRCAM which made use of the new hardware. One was Animal, another, was the much more influential MAX. MAX provides a graphical programming environment specific for music applications. Descendants of MAX are still used today, see Ableton Max for Live and Pure Data. I consider the introduction of MAX as a pivotal point in electronic music history. Up until the introduction of MAX, creating a new electronic music instrument meant bespoke hardware development. With MAX, this is done purely in software. This made electronic sound or instrument design not only faster but also accessible to a much wider audience of composers, artists and thinkerers.

The NeXTcube at IPEM

IPEM was an early electronic music production studio embedded at Ghent University, Belgium. Now it is active as a internationally acclaimed research center for interdisciplinary music research. In the early 90s IPEM acquired a NeXTcube Turbo with an internal diskette drive, SCSI hard disk, NextDimension color graphics card and an Ariel ProPort DSP/ISPW module. The cube was preserved well and came with many of the original software, books and manuals. I have been trying to get this machine working and configure it as an “IRCAM Musical Workstation”.

IPEM's NeXTcube with IRCAM Ariel ProPort.

There were a few practical issues: the mouse was broken, the hard drive unreliable and the main system fan loud and full of dust. The mouse had a broken cable which was fixed, the hard drive was replaced by a SCSI2SD setup and the fan was replaced with a new one. On the software side of things, the Internet Archive hosts NeXTStep 3.3 which, after many attempts, was installed on the cube. Unfortunately there seemed to be a compatibility issue. The Ariel ProPort kernel module did not work. I started over installed NeXTStep 3.1, with the same result. Finally, I installed NeXTStep 3.0 which was compatible with the kernel module and MAX/FTS!

Vid: Max/FTS with a commercial Ariel soundcard running on a NeXTcube Turbo.

The restoration of the IRCAM Signal Processing Workstation instruments fits in a university project on living heritage The idea is to get key historic electronic music instruments into the hands of researchers and artists to pull the fading knowledge on these devices back into a living culture of interaction. This idea already resulted in an album: DEEWEE Sessions vol. 01. Currently the collection includes a 1960s reverb plate, an EMS Synti 100 analog synthesizer from the 70s, a Yamaha DX7 (80s) and finally the NeXTCube/ISPW represents the early 90s and the departure of physical instruments to immaterial software based systems.

Acknowledgements & Further reading

This project was made possible with the support of the Belgian Music Instrument Museum and IPEM, Ghent University. I was fortunate to get assistance by Ivan Schepers and Marc Leman at IPEM but also by the main developers of MAX: Miller Puckette. I would also like to thank Anthony Agnello formerly at Ariel Corp for additional image material and info. I also found the WinWorld and NeXTComputers communities and resources extremely helpful. Below a picture from the CERN public archives and Ghent University Archive is included. Thanks a lot!

See also the discussion on this article at Hacker News\ Lindemann, E., Dechelle, F., Smith, B., & Starkier, M. (1991). The Architecture of the IRCAM Musical Workstation - Computer Music Journal, 15(3), 41–49.
\ Puckette, M. (1991). FTS: A Real-Time Monitor for Multiprocessor Music Synthesis. Computer Music Journal, 15(3), 58–67.
\ Puckette, M. 1988. The Patcher, Proceedings, ICMC. San Francisco: International Computer Music Association, pp. 420-429.
\ Puckette, M. 1991. Combining Event and Signal Processing in the MAX Graphical Programming Environment. Computer Music Journal 15(3): 68-77.\

ISPW IRCAM Ariel soundcard
An original MAX manual
ISPW PCB provided by Anthony Agnello, Ariel Corp
Newsletter excerpt provided by Anthony Agnello, Ariel Corp
The Ariel ISPW software
The manual for the soundcard / DSP coprocessor
MAX/FTS screenshot
The first http server at CERN, Photograph by CERN
Marc Leman and IPEM's NeXTcube. From the UGhent Achives
MAX/FTS running
The original NextSTEP 3.0 software
Ariel soundcard inputs
Ariel installation procedure

Harde waren, Music Information Retrieval, Muziek, and UGent

~ An Arduino Trigger Box

» By Joren on Friday 31 March 2023

Vid: the trigger box set in recording mode via a button or a MIDI key press.

A while back I have build a trigger box. Such device can be used for various synchronisation tasks. It can be used to synchronise camera’s, capture devices and sensors. All compatible devices have a 5V TTL input, often a BNC connector. For a camera, TTL input could control the shutter time. For a sensor a TTL clock could determine the sample time or simply be registered along side an other data stream. The trigger box allows to either pass-through (or block) an incoming TTL clock. It also outputs a recording level.

There are two ways to use the trigger box. The first is by operating a manual switch to start (and later stop) a recording. When recording, the recording level output is set to 5V and the clock at the CLOCK IN is passed through to the CLOCK OUT port. The second way to set the recording state is by MIDI over USB. While a MIDI key is pressed, the recording state is high, when the key is released the state is low. The MIDI key input makes it compatible and controllable from any DAW. Both ways are shown in the video.

For practical reasons there are two microcontrollers in the device, a Teensy 3.2 and an Arduino. The Arduino is there for its 5V capabilities and is essentially a rather beefy level-shifter. The Teensy is there for the USB MIDI compatibility and controls everything.

For aesthetic reasons the trigger box has been build into a 1950s ‘Sieger portable explosive gas detector’. I did not feel too bad about gutting the original electronics since a battery leak had destroyed most of it. Also, the late WII era knobs are still unmatched for durability and tactile satisfaction.

The code running on the microcontrollers and some documentation can be found on the Trigger Box github repository.

Knobs. Dials.
WWII era hardware with a micro-usb port

Harde waren and UGent

~ Gabber - Visualizing constant-Q transform in the browser

» By Joren on Wednesday 01 March 2023

Today I have released Gabber a bit of code to transform audio from the time domain to a constant-Q frequency domain in the browser. To be more precise it does a constant-Q non-stationary Gabor transform. The heavy-lifting is done by ‘the gaborator’, a C library by Andreas Gustafsson.

I have compiled the Gaborator library to WebAssembly, added some glue code to bridge the Javascript and WASM worlds, implemented a Web Audio AudioWorkletProcessor to transform audio in the background and and finally visualized the results via WebGL2.

There is a Gabber live demo below. If you press start and grant microphone access, incoming audio is transformed and plotted onto a canvas. Thanks to WebAssembly and WebGL2 this should run relatively smoothly even on less powerful devices. Please do play around with the perspective slider.

While Gabber is currently a proof of concept, with some attention the library could be used as a front end for browser based music information retrieval applications. My main goal with Gabber is to use it in educational settings to explain the properties of sound, and more concretely pitch, via spectrograms and interactive demos. Also I plan to use it in a browser based tool to extract pitch patterns from music.

The name Gabber, refers to both Dennis Gabor and an infamous, mainly Dutch style of music which was popular when I was young. More information can be found in the GitHub repository: Gabber-High resolution spectral transforms for the web.

A constant Q transform in the browser

UGent

~ Pitch content on historic field recordings

» By Joren on Monday 20 February 2023

Wednesday the 15th of February I presented a collaboration with Olmo Cornelis titled Pitch content on historic field recordings: Analyzing 400 wax cylinder recordings at ULB, Brussels. The session was organized by the Belgian branch of ICTM (International Council for Traditional Music)\

The presentation is linked below and is interactive: it contains music analysis software being discussed within the presentation itself.

Thanks to ICTM Belgium for the invitation and the platform! The Ghent University BOF funded project PaPiOM allows me work on these topics.

UGent

~ DiscStitch at Deezer HQ

» By Joren on Wednesday 08 February 2023

I have presented DiscStitch at the MIR (Music Information Retrieval) get together at the Deezer headquarters in Paris.

DiscStitch is a solution to identify, align and mix digitized audio originating from (overlapping) laquer discs. The main contribution lays in the novel audio to audio alignment algorithm which is robust against some speed differences and variabilities.

Below a presentation can be found introducing DiscStitch. You can also try out the browser based DiscStitch audio-to-audio alignment page.

Rooftop View
Rooftop View

Music Information Retrieval and UGent

~ Updates for Olaf - The Overly Lightweight Acoustic Fingerprinting system

» By Joren on Tuesday 31 January 2023

Fig: Olaf fingerprinter.

I have updated Olaf - the Overly Lightweight Acoustic Fingerprinting system. Olaf is a piece of technology that uses digital signal processing to identify audio files by analyzing unique, robust, and compact audio characteristics - or “fingerprints”. The fingerprints are stored in a database for efficient comparison and matching. The database index allows for fast and accurate audio recognition, even in the presence of distortions, noise, and other variations.

Olaf is unique because it works on traditional computing devices, embedded microprocessors and in the browser. To this end tried to use ANSI C. C is a relatively small programming language but has very little safeguards and is full of exiting footguns. I enjoy the limitations of C: limitations foster creativity. I also made ample use of the many footguns C has to offer: buffer overflows, memory leaks, … However, with the current update I think most serious bugs have been found. Some of the changes to Olaf include:

Fixed a rather nasty array out of bounds bug. The bug remained elusive due to the fact that a segfault was rare on macOS. Linux seems to be more diligent in that regard.
Added a quick way to skip already indexed files. Which improves usability significantly when working with larger datasets.
Improved command line output and fixed incorrectly reported times. The reported start and stop time of a query was wrong and is now fixed.
Olaf now supports caching fingerprints in simple text files. This makes fingerprint extraction much faster since all cores of the system can be used to extract fingerprints and dump them to text files. Writing prints to the database from multiple threads is slow since they need to wait for access to the locked database. There is also a command to store all cashed fingerprints in a single go.
Added support for basic profiling with gprof. The profiler shows where optimizations can have the most impact.
Olaf now includes an algorithm for efficient max-filtering. The min-max filter algorithm by Daniel Lemire is implemented. The profiler showed that most time was spend during max-filtering: replacing the naive max-filter with the Lemire max-filter improved performance drastically.
CI with Github Actions which checks if checked in sources compile and tests some of the basic functionality automatically.
Updated the Zig build script for cross-compilation and updated the pre-build Windows version.
Tested the system with larger databases. The FMA-full datasets, which comprises almost a full year of audio was indexed and queried without problems on a single pc. The limits of Olaf with respect to indexed size is probably a few times larger.
Tested, fixed and improved the ‘memory database’ version. Also added documentation to the readme.
Made a basic web example to call the WASM version of Olaf.
Added an ESP32 example, showing how Olaf can run on this microprocessor. It runs without an external microphone but uses a test audio file. Previously some small changes were needed to Olaf to run on the ESP32, now the exact same code is used.

Anyhow, what originally started as a rather quick and dirty hack has been improved quite a bit. The takeaway message: in the world of software it does seem possible to polish a turd.

UGent and Music Information Retrieval

~ mot - MIDI and OSC Tools - Sending UDP messages from the browser

» By Joren on Thursday 26 January 2023

As a way to get to know the Rust programming language I have developed a couple of practical tools for OSC and MIDI debugging. OSC and MIDI are protocols which are almost always used for applications dealing with music. In these applications latency should be kept in check. Languages with garbage collection (Java, Go) and scripting languages (Ruby, Python, …) are hard to tune for low-latency applications and do not really have real-time guarantees. Rust, as a modern alternative for C/C, is a better fit for cross platform CLI low-latency applications.

The MIDI and OSC Tools (mot) are bundled in a single CLI application. The application includes:

midi_echo prints MIDI messages coming from a connected MIDI device.
osc_echo prints OSC messages arriving at a certain UDP port.
midi_to_osc a MIDI to OSC bridge which sends MIDI messages coming from a connected MIDI device to an OSC target.
osc_to_mid an OSC to MIDI bridge which receives OSC messages and sends them to a connected MIDI device.
midi_roundtrip_latency measure MIDI round-trip latency.

This opens a couple of possibilities which are discussed below.

Sending UDP messages from the browser

One of the ways to send OSC messages from a browser to a local network is by using the MIDI out capability of browsers and - using mot - translating MIDI to OSC an example can be seen below.

Fig: sending an UDP message to a network from a Browser using a the mot MIDI to OSC bridge, click the image for a better readable version.

Measuring UDP message latency

Both MIDI and OSC can be seen as rather general data encapsulation protocols with wide support in terms of libraries and cross platform support. Their value goes beyond mere musical applications. The same holds for mot. In this example we are using mot to measure UDP message latency between two hosts.

On the first host we send MIDI messages from MIDI device 0 over OSC to another host with e.g. mot midi_to_osc 192.168.1.12:3000 /m 0. At the other host we receive the OSC messages and send them to a virtual device: mot osc_to_midi 192.168.1.12:3000 /m 6666.

At the second host we return messages from the virtual device to the first host: mot midi_to_osc 192.168.1.4:5000 /m 1. Perhaps you first need to do mot midi_to_osc -l to find the index of the virtual device. As a final step the messages can be received at the first host and returned to the original midi device. On the first host: mot osc_to_midi 192.168.1.4:5000 /m 0.

If the original MIDI device is a Teensy running the “roundtrip patch” then finally the roundtrip time is accurately measured and shown in the serial console. I am sure the previous text is cromulent, totally not contrived and not confusing. Anyway, to make it more confusing: this is what happens when you use a single host to do midi to osc to midi to osc to midi and use the loopback networking device:

Fig: MIDI to OSC to MIDI to OSC to MIDI roundtrip latency.

Visualizing sensor data in the browser

Fig: Sensor data as MIDI.

When capturing sensor data on microcontrollers, data can be encoded into MIDI. This makes almost any sensor practically useful in Ableton Live or similar environments. It also makes it compatible with all other MIDI supporting devices. With mot it becomes trivial to send MIDI encoded sensor data over OSC e.g. to a central place to log that data.

Another use case is to visualize the incoming data in real-time. A single web page which reads and visualizes incoming MIDI-sensor data becomes much more useful if streams from other devices can be visualized as well with the mot midi_to_osc and mot osc_to_midi commands.

Cross-platform support

With the Rust compiler it is relatively easy to cross-compile for different targets. There is however an important limitation in mot. Windows has no support for virtual MIDI ports which limits the usefulness of mot on that platform.

Check the mot - MIDI and OSC Tools GitHub repository for the software. Perhaps also of interest for MIDI debugging are VMPK, MIDI Monitor and the web MIDI tools.

UGent and Code

~ Attempting humor in academic writing

» By Joren on Tuesday 24 January 2023

Screenshot of a browser based pitch organization extraction tool

Fig: Advances in Speech and Music Technology book cover.

I have recently published an chapter in an academic book published by Springer. The topic of the book is of interest to me but can be perceived as rather dry: Advances in Speech and Music Technology.

The chapter I co-authored presented two case studies on detecting duplicates in music archives. The fist case study deals with segmentation reuse in an archive of early electronic music. The second with meta-data reuse in an archive of a public broadcaster containing digitized commercial shellac disc recordings with many duplicates.

Duplicate detection being the main topic, I decided to title the article Duplicate Detection for for Digital Audio Archive Management. It is easy to miss, and not much is lost if you do, but there is a duplicate‘for’ in the title. If you did detect the duplicate you have detected the duplicate in the duplicate detection article. Since I have fathered two kids I see it as an hard earned right to make dad-jokes like that. Even in academic writing.

It was surprisingly difficult to get the title published as-is. At every step of the academic publishing process (review, editorial, typesetting, lay-outing) I was asked about it and had to send an email like the one below. Every email and every explanation made my second-guess my sense of humor but I do stand by it.

From: Joren
\ To: Editors ASMT
\
\ Dear Editors,
\
\ I have updated my submission on easychair in...\
\ I would like to keep the title however as is an attempt at word-play. These things tend to have less impact when explained but the article is about duplicate detection and is titled 'Duplicate detection for for digital audio archive management'. The reviewer, attentively, detected the duplicate 'for' but unfortunately failed to see my attempt at humor. To me, it is a rather harmless witticism. Regards Joren </pre>

Anyway, I do think that humor can serve as a gateway to direct attention to rather dry, academic material. Also the message and the form of the message should not be confused. John Oliver, for example, made his whole career on delivering serious sometimes dry messages with heaps of humor: which does not make the topics less serious. I think there are a couple of things to be learned there. Anyway, now that I have your attention, please do read the author version of Duplicate Detection for for Digital Audio Archive Management: Two Case Studies.

Poging tot humor, Research papers, and UGent

~ Updates for TarsosDSP

» By Joren on Friday 20 January 2023

TarsosDSP is a Java library for audio processing I have started working on more than 10 years ago. The aim of TarsosDSP is to provide an easy-to-use interface to practical music processing algorithms. Obviously, I have been using it myself over the years as my go-to library for audio-processing in Java. However, a number of gradual changes in the java ecosystem made TarsosDSP more and more difficult to use.

Since I have apparently not been the only one using it, there was a need to give it some attention. During the last couple of weeks I have found the time to give it this much needed attention. This resulted in a number of updates, some of the changes include:

Change of the build system from Apache Ant to Gradle
Make use of Java Modules to make TarsosDSP compatible with the ModulePath introduced in Java 9.
Packaged the software into a maven compatible format, which makes it easy to use as a dependency.
CI with GitHub actions to automatically build and test the software.
Updated some examples shipped with the TarsosDSP. I have still still some examples to verify.
Improved handling of errors on reading audio via ffmpeg

\ Fig: The updated TarsosDSP release contains many CLI and GUI example applications.

Notably the code of TarsosDSP has not changed much apart from some cosmetic changes. This backwards compatibility is one of the strong points of Java. With this update I am quite confident that TarsosDSP will also be usable during the next decade as well.

Please check out the updated TarsosDSP repository on GitHub.

Pitch estimator
Flanger
Oscilloscope

Music Information Retrieval, UGent, TarsosDSP, and Code

~ The automatic HTTPS capabilities of Caddy

» By Joren on Wednesday 18 January 2023

This blog has been running on Caddy for the last couple of months. Caddy is a http server with support for reverse proxies and automatic https. The automatic https feature takes care of requesting, installing and updating SSL certificates which means that you need much less configuration settings or maintenance compared with e.g. lighttpd or Nginx. The underlying certmagic ACME client is responsible for requesting these certificates.

Before, it was using lighttpd but the during the last decade the development of lighttpd has stalled. lighttpd version 2 has been in development for 7 years and the bump from 1.4 to 1.5 has been taking even longer. lighttpd started showing its age with limited or no support for modern features like websockets, http/3 and finicky configuration for e.g. https with virtual domains.

Caddy with Ruby on Rails

I really like Caddy’s sensible defaults and the limited lines of configuration needed to get things working. Below you can find e.g. a reusable https enabled configuration for a Ruby on Rails application. This configuration does file caching, compression, http to https redirection and load balancing for two local application servers. It also serves static files directly and only passes non-file requests to the application servers.

(cachestaticfiles) {
    @staticFiles {
        file
    }
    header @staticFiles Cache-Control "public, max-age=604800, must-revalidate"
}

(railsdefaults) {
    #compress responses
    encode zstd gzip

    #redirect from http to https
    @http {
        protocol http
    }

    redir @http https://{host}{uri}

    @notStatic {
        not file
    }

    import cachestaticfiles
    reverse_proxy @notStatic localhost:{args.0} localhost:{args.1}
    file_server
}

example.com {
    root * /var/www/example.com/current/public
    import railsdefaults 10000 10001
}

If you are self-hosting I think Caddy is a great match in all but the most exotic or demanding setups. I definitely am kicking myself for not checking out caddy sooner: it could have saved me countless hours installing and maintaining https certs or configuring lighttpd in general.

UGent and 0110.be

~ Crossplatform JNI builds with Zig

» By Joren on Friday 13 January 2023

JNI is a way to use C or C code from Java and allows developers to reuse and integrate C/C in Java software. In contrast to the Java code, C/C code is platform dependent and needs to be compiled for each platform/architecture. Also it is generally not a good idea to make users compile a C/C library: it is best provide precompiled libraries. As a developer it is, however, a pain to provide binaries for each platform.

With the dominance of x86 processors receding the problem of having to compile software for many platforms is becoming more pressing. It is not unthinkable to want to support, for example, intel and M1 macOS, ARM and x86_64 Linux and Windows. To support these platforms you would either need access to such a machine with a compiler or configure a cross-compiler for each system: both are unpractical. Typically setting up a cross-compiler can be time consuming and finicky and virtual machines can be tough to setup. There is however an alternative.

Zig is a programming language but, thanks to its support for C/C, it also ships with an easy-to-use cross-compiler which is of interest here even if you have no intention to write a single line of Zig code. The built-in cross-compiler allows to target many platforms easily.

#### The Zig cross-compiler in practice

Cross compilation of C code is possible by simply replacing the gcc command with zig cc and adding a target argument, e.g. for targeting a Windows. There is more general information on zig as a cross-compiler here.

Cross-compiling a JNI library is not different to compiling other libraries. To make things concrete we will cross-compile a library from a typical JNI project: JGaborator packs the C/C library gaborator. In this case the C/C code does a computationally intensive spectral transformation of time domain data. The commands below create an x86_64 Windows DLL from a macOS with zig installed:


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

  {style="overflow-x:scroll"}
<code>
bash
#wget https://aka.ms/download-jdk/microsoft-jdk-17.0.5-windows-x64.zip
#unzip microsoft-jdk-17.0.5-windows-x64.zip
#export JAVA_HOME=`pwd`/jdk-17.0.5+8/
git clone --depth 1 https://github.com/JorenSix/JGaborator
cd JGaborator/gaborator
echo $JAVA_HOME
JNI_INCLUDES=-I"$JAVA_HOME/include"\ -I"$JAVA_HOME/include/win32" 
zig cc  -target x86_64-windows-gnu -c -O3 -ffast-math -fPIC pffft/pffft.c -o pffft/pffft.o
zig cc  -target x86_64-windows-gnu -c -O3 -ffast-math -fPIC -DFFTPACK_DOUBLE_PRECISION pffft/fftpack.c -o pffft/fftpack.o
zig c++ -target x86_64-windows-gnu -I"pffft" -I"gaborator-1.7"  $JNI_INCLUDES -O3\
        -ffast-math -DGABORATOR_USE_PFFFT  -o jgaborator.dll jgaborator.cc pffft/pffft.o pffft/fftpack.o
file jgaborator.dll
# jgaborator.dll: PE32+ executable (console) x86-64, for MS Windows
</code>

Note that, when cross-compiling from macOS, to target Windows a Windows JDK is needed. The windows JDK has other header files like jni.h. Some commands to download and use the JDK are commented out in the example above. Also note that targeting Linux from macOS seems to work with the standard macOS JDK. This is probably due to shared conventions regarding compilation of libraries.

To target other platforms, e.g. ARM Linux, there are only two things that need to be changed: the -target switch should be changed to aarch64-linux-gnu and the name of the output library should be (by Linux convention) changed to libjgaborator.so. During the build step of JGaborator a list of target platforms it iterated and a total of 9 builds are packaged into a single Jar file. There is also a bit of supporting code to load the correct version of the library.

Using a GitHub action or similar CI tools this cross compilation with zig can be automated to run on a software release. For Github the Setup Zig action is practical.

Loading the correct library

In a first attempt I tried to detect the operating system and architecture of the environment to then load the library but eventually decided against this approach. Mainly because you then need to keep an exhaustive list of supporting platforms and this is difficult, error prone and decidedly not future-proof.

In my second attempt I simply try to load each precompiled library limited to the sensible ones - only dll’s on windows - until a matching one is loaded. The rationale here is that the system itself knows best which library works and failing to load a library is computationally cheap. There is some code to iterate all precompiled libraries in a JAR-file so supporting an additional platform amounts to adding a precompiled library in the JAR folder: there is no need to be explicit in the Java code about architectures or OSes.

Trying multiple libraries has an additional advantage: this allows to ship multiple versions targeting the same architecture: e.g. one with additional acceleration libraries enabled and one without. By sorting the libraries alphabetically the first, then, should be the one with acceleration and the fallback without. In the case of JGaborator for mac aarch64 there is one compiled with -framework Accelerate and one compiled by the Zig cross-compiler without.

Takehome messages

If you find yourself cross-compiling C or C for many platforms, consider the Zig cross-compiler. Even when you have no intention to write a single line of Zig code.
For JNI and Java the JGaborator source code might offer some inspiration to pre-compile and load libraries for many platforms with little effort.
CI tools can help to verify builds and automate Zig cross-compilation.
If you build for Windows make sure to include windows header-files even when there are no compilation errors using UNIX-header files.

If you find this valuable consider sponsoring the work on Zig

Code, Music Information Retrieval, and UGent

~ Emotopa - Patterns in Pitch Organization

» By Joren on Saturday 10 December 2022

\ Fig: Screenshot of *Emotopa*: a browser based tool to extract pitch organization from audio.

A couple of days ago I participated in the Music Hack Day - India. The event was organized the 10th and 11th of December in Bangaluru, India. During the event a representative of Smule suggested a task to evaluate the performance of karaoke-singers in terms of intonation. The idea was to employ pitch histogram like features to estimate pitch use of singers.

I offered to build a browser based application to extract pitch histograms from audio. At the end of the hack day I presented the first release of Emotopa with some limited functionality:

The application is able to decode and use audio from any format or container by using an audio focused webassambly build of ffmpeg.
Next, a pitch detector runs on the audio and returns a list of pitch estimates.
Finally a histogram (technically a kernel density estimate) is constructed using the pitch estimates.

The user can export the pitch histogram, the pitch class histogram and the pitch annotations. These features successfully show the intonation quality of singers but the applications are much broader. Some potential applications have been described in (amongst others) the Tarsos article.

The Emotopa name alludes to the Apotome browser based app where, starting from a scale you can make music. With Emotopa you do the reverse. Also very much of interest are Leimma and the rationale behind both Apotome and Leimma

The source can be verified on the Emotopa GitHub repository

This contribution was made possible thanks to travel funds by the FWO travel grant K1D2222N and the Ghent University BOF funded project PaPiOM.

UGent

« Newer blog posts Previous blog posts »