Hi, I'm Joren. Welcome to my website. I'm a research software engineer in the field of Music Informatics and Digital Humanities. Here you can find a record of my research and projects I have been working on. Learn more »
This morning I gave a guest lecture introducing the field of music information retrieval to musicology students at Ghent University. Next to the more general MIR intro, two specific topics are fleshed out: duplicate detection and pitch patterns in music around the world. Two topic I have been working on before.
The presentation has the form of an interactive website via reveal.js. It features a couple of slides which are full-blown applications or have an interactive sound visualization component. Please do try out the slides and check the Music Information Retrieval - Opportunities for digital musicology presentation or try it below.
If you’re considering adding USB MIDI functionality to a music project with an ESP32, it’s crucial to choose the right variant of the chip. The ESP32-S3 is currently the go-to model for USB-related tasks thanks to its native USB capabilities. Unlike other ESP32 models, the S3 can handle USB MIDI directly without the need for additional components, making it an ideal choice for integrating MIDI devices into your setup. For more details on using USB MIDI with the ESP32-S3, check out the ESP32USBMIDI project.
When combined with the ESP32-S3’s built-in WiFi and support for OSC (Open Sound Control) or ESP Now, the platform becomes very versatile for music controllers or applications. A quick tip: after flashing your device in MIDI mode, the serial is not available any more. Flashing the device also becomes impossible. If you need to reflash your device, the process is simple: just hold down the Boot button and press Reset.
Another short tip: for troubleshooting and logging, the mot project provides useful tools for debugging OSC or MIDI messages. The support is currently stil in flux but do not make the mistake I made and do not try to do MIDI with a ESP C3 series.
The last few halloweens I have been building one-off interactive installations for visiting trick-or-treaters. I did not document the build of last year, but the year before I built an interactive door bell with a jump scare door projection. This year I was trying to take it easy but my son came up with the idea of doing something with a talking pumpkin. I mumbled something about feasibility so he promptly invited all his friends to come over on Halloween to talk to a pumpkin. So I got to work and tried to build something. This blog post documents this build.
A talking pumkin needs a few functions. It needs to understand kids talking in Dutch, it needs to be able to respond with a somewhat logical respons and ideally have a memory about previous interactions. It also needs a way to do turn-taking: indicating who is speaking and listening. It also needs a face and a name. For the name we quickly settled on Pompernikkel.
For the face I tried a few interactive visualisations: a 3D implementation with three.js and a shader based approach but eventually setteled on an approach of using an SVG and CSS animations to make the face come alive. This approach makes it doable to control animations with javascript since animating a part of the pumkin means adding or removing a css class. See below for the result
The other functions I used the following components.
A decent quality bluetooth speaker for audio output and clear voice projection
A microphone setup to capture and record children’s voices speaking to the pumpkin
A glass door serving as a projection surface with a projector mounted behind it
Speech-to-text recognition powered by nvidia/parakeet-tdt-0.6b-v3 (see this paper), implemented via transcribe-rs for transcribing Dutch speech
Text-to-speech synthesis using the built-in macOS ‘say’ command to give Pompernikkel a voice
A controllable interactive projection system displaying the animated SVG website mentioned above
Response generation handled by the Gemma 3 12B large language model (paper), running locally through Ollama with a custom system prompt
A real pumpkin augmented with an ESP32 microcontroller and capacitive touch sensor embedded inside to detect physical touch - the microphone would only activate while someone was touching the pumpkin
A custom Ruby websocket driver orchestrating turn-taking behavior and managing the interactive loop of questions and responses
As an extra feature, I implemented a jump scare where a sudden movement would trigger lightning and thunder:
EMI-Kit for responsive movement detection. The mDNS support really makes it easy to use together with mot.
A 5V SK6812 LED strip controlled by an ESP32 programmed to react to EMI-Kit events, creating lightning effects synchronized with audio and visual elements on the HTML page
Lessons learned
Asking a code assisting LLM to add animations to a part of an SVG only works after manually adding identifiers to paths of the svg: eyes, mouth, nose, … Once added, CSS animation are generated with ease. Understanding which svg path corresponds to which semantic object seems out of reach for now for state of the art systems.
Gemma 3 is not a multilingual LLM. It generates responses in Dutch but these seem very translated: it seems that responses are generated in Eglish and translated to Dutch in the final step. This becomes very clear when the LLM attempts jokes. Of course these nonsensical jokes do work on a certain level.
Gemma 3 has a personality that is difficult to get around if a system promt contains trigger words like dark or scary. In my case it responses became philosophical, nihilistic and very dark. Which was unexpectedly great.
Parakeet speech to text in Dutch is faster than the several OpenAI Whisper based systems I managed to get running on my macOS. It also gave better results for short excerpts.
The SK6812 RGBWW is not the best supported by the FastLED library. I managed to get it working with a hack found on GitHub, not ideal.
There are a few end-to-end systems for voice chat with local LLMs on GitHub but they are not easy to get going with something else than CUDA/linux and almost never support other languages than English. For VAD and transcription the same holds.
Looking closely to several speech to text or text to speech sysems, the default of CUDA/Linux is difficult to get around.
The focus of open source models and tools on the English language is problematic, while Dutch is still relatively well represented many systems are limited to only English.
While some kids interacted with the pumpkin, the jump scare lighting and thunder effect worked better in the environment.
Websockets seems a decent way to do inter process communication, even without web technologies it could be considered. I never though of Websockets in this way. See the Pompernikkel GitHub repository for example Ruby scripts.
Most trick-or-treaters were at least intrigued by it, my son’s friends were impressed, and I got to learn a couple of things, see above. Next year, however, I will try to take it easy.
Since a couple of months FFmpeg supports audio transcription via OpenAI Whisper and Wisper-cpp. This allows to automatically transcribe interviews and podcasts or generate subtitles for videos. Most packaged versions of the command line tool ffmpeg do not ship with this option enabled. Here we show how to do this on macOS with the Homebrew package manager. On other platforms similar configuration will apply.
On macOS there is a prepared Homebrew keg which allows to enable or disable the many ffmpeg options. If you already have ffmpeg without options installed you may need to uninstall the current version and install a version with chosen options. See below on how to do this:
# check if you already have ffmpeg with whisper enabled
ffmpeg --help filter=whisper
# uninstall current ffmpeg, it will be replaced with a version with whisper
brew uninstall ffmpeg
# add a brew tap which provides options to install ffmpeg from source
brew tap homebrew-ffmpeg/ffmpeg
# this commands adds most common functionality and other default functions
brew install homebrew-ffmpeg/ffmpeg/ffmpeg \
--with-fdk-aac \
--with-jpeg-xl \
--with-libgsm \
--with-libplacebo \
--with-librist \
--with-librsvg \
--with-libsoxr \
--with-libssh \
--with-libvidstab \
--with-libxml2 \
--with-openal-soft \
--with-openapv \
--with-openh264 \
--with-openjpeg \
--with-openssl \
--with-rav1e \
--with-rtmpdump \
--with-rubberband \
--with-speex \
--with-srt \
--with-webp \
--with-whisper-cpp
Installation will take a while since many dependencies are required for the many options. Once the build is finished the whisper filter should be available in FFmpeg. See below on how this should look, once correctly installed:
Last Friday, I had the pleasure of facilitating a hands-on workshop in Luxembourg as part of MuTechLab workshop series, organized by Luc Nijs at the University of Luxembourg. Together with Bart Moens from XRHIL and IPEM, we presented a system to control musical parameters with body movement.
MuTechLab is a series of workshops for music teachers who wish to dive into the world of music technology. Funded by the Luxembourgish National Research Fund (FNR, PSP-Classic), the initiative brings together educators eager to explore how technology can enhance music education and creative practice.
What we built and presented
During the workshop, participants got hands-on experience with the EMI-Kit (Embodied Music Interface Kit) – an open-source, low-cost system that allows musicians to control Digital Audio Workstation (DAW) parameters through body movement.
The EMI-Kit consists of:
- A wearable sensor device (M5StickC Plus2) that captures body orientation and gestures
- A receiver unit (M5Stack STAMP S3A) that converts sensor data to MIDI messages
Unlike expensive commercial alternatives, EMI-Kit is fully open source, customizable, and designed specifically for creative music practice and embodied music interaction practice and research.
The Experience
Teachers experimented with mapping natural body movements – pitch, yaw, roll, and tap gestures – to various musical parameters in their DAWs. The low-latency wireless system made it possible to move and control sound, opening up new possibilities for expressive musical performance and pedagogy.
Learn More
Interested in exploring embodied music interaction yourself? Check out:
The EMI-Kit project as-is is a demonstrator to inspire educators to embrace these tools and imagine new ways of teaching and creating music. The EMI-Kit as a platform can - with some additional programming - be a good basis to control musical parameters using various sensors. Have fun with checking out the EMI-Kit.
ESP32-S3 USB MIDI receivers
Participant package - with sender and receiver pair
I’ve just pushed some updates to mot — a command-line application for working with OSC and MIDI messages. My LLM tells me that these are exciting updates but I am not entirely sure that this is the case. Let me know if this ticks your box and seek professional help.
1. Scriptable MIDI Processor via Lua
I have implemented a MIDI processor that lets you transform, filter, and generate MIDI messages using Lua scripts.
Why is this useful? MIDI processors act as middlemen between your input devices and output destinations.You can do the following on incoming MIDI messages:
Filter - Block unwanted messages - channels - or select specific ranges
Route - Send different notes to different channel
Generate - Create complex patterns from simple input
The processor reads incoming MIDI from a physical device, processes it through your Lua script, and outputs the modified messages to a virtual MIDI port that your DAW or synth can receive. Some examples:
# Generate chords from single notes
mot midi_processor --script scripts/chord_generator.lua 06666# Transpose notes up by one octave
mot midi_processor --script scripts/example_processor.lua 06666
2. Network Discovery via mDNS
OSC receivers now advertise themselves on the network using mDNS/Bonjour with the _osc._udp service type.
This makes mot compatible with the EMI-kit — the Embodied Music Interface Kit developed at IPEM, Ghent University. OSC-enabled devices can automatically discover mot receivers on your network, eliminating manual configuration if the OSC sources add this functionality.
This weekend the - more-or-less - yearly conference of Hackerspace Ghent took place: Newline.gent. Hackers, makers, and curious minds gathered to share ideas, tools, experiments and a few beers.
I had a small contribution with a short lecture-performance which covered how to control your computer with a flute. The lecture part covered the technical part of the build, the performance part included playing Flappy Bird with a flute. A third significant part of the talk — arguably the main focus — was devoted to bragging about the global attention the project received.
Other highlights of the Newline conference included talks on Home Assistant, 3D design, BTRFS and workshops that invited everyone to get involved.
Big thanks to the organizers and everyone who joined. I’m already looking forward to the next one!
This short guide will help you set up a local certificate using Caddy as the webserver to provide local TLS certificates to be able to develop websites immedately using HTTPS. Having a local HTTPS server in development can help with e.g. debugging CORS issues, accessing resources which require a HTTPS connection, or trying out analytics platforms.
1. Configure your hosts file
If you want to use a domain name, you need to first add a line to /etc/hosts which, in this case, sets localhost to correspond to example.com.
echo "127.0.0.1 example.com" | sudo tee -a /etc/hosts
2. Configure Caddy
In a directory of your choosing, create a Caddyfile with the following content, it sets Caddy to automatically generate certificates on the fly for example.com or any other domain name. Perhaps you will need to trust the main Caddy certificate on first use:
Still in the same directory as the Caddyfile and the index.html file, run the following command to start the Caddy web server: caddy run
5. Trust the locally generated certificate
In macOS this means adding the local caddy root certificate to your keychain. It can be found here /data/caddy/pki/authorities/local/root.crt In other environments a similar step is needed.
6. access the Test Site
Open your web browser and navigate to https://example.com to access the test site in the command line: open https://example.com. If you inspect the certificate it should be issued by the ‘Caddy local authority’.
Power banks have become a staple for charging smartphones, tablets, and other devices on the go. They seem ideal to power small microcontroller projects but, they often pose a problem for low-current applications. Most modern power banks include an auto-shutdown feature to conserve energy when they detect a current draw below a specific threshold, often around 50–200mA. The idea being that the power bank can shut off after charging a smartphone. However, if you rely on power banks to power DIY electronics projects or remote applications with low current draw, this auto-off feature can be a significant inconvenience.
To address this issue, consider using power banks designed with an “always-on” or “low-current” mode. These power banks are engineered to sustain power delivery even when the current draw is minimal. Look for models that explicitly mention support for low-power devices in their specifications. If replacing a power bank isn’t an option, you can add a small load resistor or a USB dummy load to artificially increase the current draw. It works, but feels wrong and dirty.
For a previous electronics project I bought a power bank randomly. After a bit of testing, I determined that the minimal power draw was around 150mA, so I added a resistor to increase current draw. Only afterwards did I check the manual of the power bank and noticed, luckily, that there was a low-current mode. I removed the resistor and improved the battery life of the project considerably. If you want to power your DIY Arduino or electronics project, first check the manual of the power bank you want to use!
Edit: after further testing it seemed that the low current mode of this specific power bank still shuts down after a couple of hours. Your mileage may vary, and the main point of this post still holds: check the manual of your power bank. Eventually I went with a solution designed for electronics projects.
There is this thing that starts playing birdsong when it detects movement. It is ideal to connect to nature while nature calls. It is a good idea, executed well but it got me thinking: this can be made less reliable, more time consuming, more expensive, and with a shorter battery life. So I started working on a DIY version.
Vid: Playing birdsong when presence is detected with an ESP32 microcontroller .
The general idea is to start playing birdsong if someone is present in a necessary room. In addition to a few of electronics components the project needs birdsong recordings. Freesound is a great resource for all kinds of environmental sounds and has a collection of birdsong which was used for this project.
For the electronics components the project needs a microcontroller and a way to detect presence. I had a laser ranging sensor lying around which measures distance but can be repurposed to detect presence in a small room: most of the time, the distance to an opposite wall is reported. If a smaller distance is measured it is probably due to a person being present. The other components:
As is often the case with builds like this, neither the software nor the hardware is challenging conceptually but, making hard and software cooperate is. Some pitfalls I encountered: the ESP32 C6 needs USB CDC set in the Arduino IDE, the non standard I2C GPIO pins. Getting the many I2S parameters right. Dealing with a nasty pop sound once audio started. A broken LiPo battery. Most of the fixes can be found in the Arduino code
I use a polling strategy to detect presence. A distance measurement is taken and then the ESP32 goes into a deep sleep until the next measurement. A sensor with the ability to wake up the microcontroller would be a better approach.
Once everything was installed it worked well enough — motion triggered a random birdsong, creating a soothing, natural vibe. It may be less practical than the off-the-shelf version but I did learn quite a lot more than I would have by simply filling in a form and providing payment details…