~ FFmpeg with Whisper support on macOS via Homebrew

» By Joren on Wednesday 22 October 2025

Since a couple of months FFmpeg supports audio transcription via OpenAI Whisper and Wisper-cpp. This allows to automatically transcribe interviews and podcasts or generate subtitles for videos. Most packaged versions of the command line tool ffmpeg do not ship with this option enabled. Here we show how to do this on macOS with the Homebrew package manager. On other platforms similar configuration will apply.

On macOS there is a prepared Homebrew keg which allows to enable or disable the many ffmpeg options. If you already have ffmpeg without options installed you may need to uninstall the current version and install a version with chosen options. See below on how to do this:


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

  # check if you already have ffmpeg with whisper enabled
ffmpeg --help filter=whisper

# uninstall current ffmpeg, it will be replaced with a version with whisper
brew uninstall ffmpeg

# add a brew tap which provides options to install ffmpeg from source
brew tap homebrew-ffmpeg/ffmpeg

# this commands adds most common functionality and other default functions
brew install homebrew-ffmpeg/ffmpeg/ffmpeg \
--with-fdk-aac \
--with-jpeg-xl \
--with-libgsm \
--with-libplacebo \
--with-librist \
--with-librsvg \
--with-libsoxr \
--with-libssh \
--with-libvidstab \
--with-libxml2 \
--with-openal-soft \
--with-openapv \
--with-openh264 \
--with-openjpeg \
--with-openssl \
--with-rav1e \
--with-rtmpdump \
--with-rubberband \
--with-speex \
--with-srt \
--with-webp \
--with-whisper-cpp

Installation will take a while since many dependencies are required for the many options. Once the build is finished the whisper filter should be available in FFmpeg. See below on how this should look, once correctly installed:


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

  ffmpeg version 8.0 Copyright (c) 2000-2025 the FFmpeg developers
  built with Apple clang version
        ...
Filter whisper
  Transcribe audio using whisper.cpp.
    Inputs:
       #0: default (audio)
    Outputs:
       #0: default (audio)
whisper AVOptions:
   model             <string>     ..F.A...... Path to the whisper.cpp model file
   language          <string>     ..F.A...... Language for transcription ('auto' for auto-detect) (default "auto")
   queue             <duration>   ..F.A...... Audio queue size (default 3)
   use_gpu           <boolean>    ..F.A...... Use GPU for processing (default true)
   gpu_device        <int>        ..F.A...... GPU device to use (from 0 to INT_MAX) (default 0)
   destination       <string>     ..F.A...... Output destination (default "")
   format            <string>     ..F.A...... Output format (text|srt|json) (default "text")
   vad_model         <string>     ..F.A...... Path to the VAD model file
   vad_threshold     <float>      ..F.A...... VAD threshold (from 0 to 1) (default 0.5)
   vad_min_speech_duration <duration>   ..F.A...... Minimum speech duration for VAD (default 0.1)
   vad_min_silence_duration <duration>   ..F.A...... Minimum silence duration for VAD (default 0.5)

UGent and Code

~ MuTechLab - Music Technology Workshop in Luxembourg

» By Joren on Thursday 02 October 2025

Last Friday, I had the pleasure of facilitating a hands-on workshop in Luxembourg as part of MuTechLab workshop series, organized by Luc Nijs at the University of Luxembourg. Together with Bart Moens from XRHIL and IPEM, we presented a system to control musical parameters with body movement.

MuTechLab is a series of workshops for music teachers who wish to dive into the world of music technology. Funded by the Luxembourgish National Research Fund (FNR, PSP-Classic), the initiative brings together educators eager to explore how technology can enhance music education and creative practice.

What we built and presented

During the workshop, participants got hands-on experience with the EMI-Kit (Embodied Music Interface Kit) – an open-source, low-cost system that allows musicians to control Digital Audio Workstation (DAW) parameters through body movement.

The EMI-Kit consists of: - A wearable sensor device (M5StickC Plus2) that captures body orientation and gestures - A receiver unit (M5Stack STAMP S3A) that converts sensor data to MIDI messages

Unlike expensive commercial alternatives, EMI-Kit is fully open source, customizable, and designed specifically for creative music practice and embodied music interaction practice and research.

The Experience

Teachers experimented with mapping natural body movements – pitch, yaw, roll, and tap gestures – to various musical parameters in their DAWs. The low-latency wireless system made it possible to move and control sound, opening up new possibilities for expressive musical performance and pedagogy.

Learn More

Interested in exploring embodied music interaction yourself? Check out:

The EMI-Kit project as-is is a demonstrator to inspire educators to embrace these tools and imagine new ways of teaching and creating music. The EMI-Kit as a platform can - with some additional programming - be a good basis to control musical parameters using various sensors. Have fun with checking out the EMI-Kit.

Workshop hardware set
ESP32-S3 USB MIDI receivers
Participant package - with sender and receiver pair

Projecten, UGent, and Code

~ MIDI and OSC tools improvements - MIDI processing and mDNS support

» By Joren on Thursday 02 October 2025

I’ve just pushed some updates to mot — a command-line application for working with OSC and MIDI messages. My LLM tells me that these are exciting updates but I am not entirely sure that this is the case. Let me know if this ticks your box and seek professional help.

1. Scriptable MIDI Processor via Lua

I have implemented a MIDI processor that lets you transform, filter, and generate MIDI messages using Lua scripts.

Why is this useful? MIDI processors act as middlemen between your input devices and output destinations.You can do the following on incoming MIDI messages:

Transform - Transpose notes, generate chords, map velocity curves
Filter - Block unwanted messages - channels - or select specific ranges
Route - Send different notes to different channel
Generate - Create complex patterns from simple input

The processor reads incoming MIDI from a physical device, processes it through your Lua script, and outputs the modified messages to a virtual MIDI port that your DAW or synth can receive. Some examples:


  1
2
3
4
5

  # Generate chords from single notes
mot midi_processor --script scripts/chord_generator.lua 0 6666

# Transpose notes up by one octave
mot midi_processor --script scripts/example_processor.lua 0 6666

2. Network Discovery via mDNS

OSC receivers now advertise themselves on the network using mDNS/Bonjour with the _osc._udp service type.

This makes mot compatible with the EMI-kit — the Embodied Music Interface Kit developed at IPEM, Ghent University. OSC-enabled devices can automatically discover mot receivers on your network, eliminating manual configuration if the OSC sources add this functionality.

Get started

Installation via Rust’s cargo:


  1
2
3
4

  git clone https://github.com/JorenSix/mot.git
cd mot
cargo install --path .
mot midi_processor -h

Check out the mot repository for full documentation and example Lua scripts!

UGent, Code, and Music Information Retrieval