~ Pompernikkel - the Interactive speaking pumpkin 🎃

» By Joren on Monday 03 November 2025

The last few halloweens I have been building one-off interactive installations for visiting trick-or-treaters. I did not document the build of last year, but the year before I built an interactive door bell with a jump scare door projection. This year I was trying to take it easy but my son came up with the idea of doing something with a talking pumpkin. I mumbled something about feasibility so he promptly invited all his friends to come over on Halloween to talk to a pumpkin. So I got to work and tried to build something. This blog post documents this build.

A talking pumkin needs a few functions. It needs to understand kids talking in Dutch, it needs to be able to respond with a somewhat logical respons and ideally have a memory about previous interactions. It also needs a way to do turn-taking: indicating who is speaking and listening. It also needs a face and a name. For the name we quickly settled on Pompernikkel.

For the face I tried a few interactive visualisations: a 3D implementation with three.js and a shader based approach but eventually setteled on an approach of using an SVG and CSS animations to make the face come alive. This approach makes it doable to control animations with javascript since animating a part of the pumkin means adding or removing a css class. See below for the result

The other functions I used the following components.

A decent quality bluetooth speaker for audio output and clear voice projection
A microphone setup to capture and record children’s voices speaking to the pumpkin
A glass door serving as a projection surface with a projector mounted behind it
Speech-to-text recognition powered by nvidia/parakeet-tdt-0.6b-v3 (see this paper), implemented via transcribe-rs for transcribing Dutch speech
Text-to-speech synthesis using the built-in macOS ‘say’ command to give Pompernikkel a voice
A controllable interactive projection system displaying the animated SVG website mentioned above
Response generation handled by the Gemma 3 12B large language model (paper), running locally through Ollama with a custom system prompt
A real pumpkin augmented with an ESP32 microcontroller and capacitive touch sensor embedded inside to detect physical touch - the microphone would only activate while someone was touching the pumpkin
A custom Ruby websocket driver orchestrating turn-taking behavior and managing the interactive loop of questions and responses

As an extra feature, I implemented a jump scare where a sudden movement would trigger lightning and thunder:

EMI-Kit for responsive movement detection. The mDNS support really makes it easy to use together with mot.
A 5V SK6812 LED strip controlled by an ESP32 programmed to react to EMI-Kit events, creating lightning effects synchronized with audio and visual elements on the HTML page

Lessons learned

Asking a code assisting LLM to add animations to a part of an SVG only works after manually adding identifiers to paths of the svg: eyes, mouth, nose, … Once added, CSS animation are generated with ease. Understanding which svg path corresponds to which semantic object seems out of reach for now for state of the art systems.
Gemma 3 is not a multilingual LLM. It generates responses in Dutch but these seem very translated: it seems that responses are generated in Eglish and translated to Dutch in the final step. This becomes very clear when the LLM attempts jokes. Of course these nonsensical jokes do work on a certain level.
Gemma 3 has a personality that is difficult to get around if a system promt contains trigger words like dark or scary. In my case it responses became philosophical, nihilistic and very dark. Which was unexpectedly great.
Parakeet speech to text in Dutch is faster than the several OpenAI Whisper based systems I managed to get running on my macOS. It also gave better results for short excerpts.
The SK6812 RGBWW is not the best supported by the FastLED library. I managed to get it working with a hack found on GitHub, not ideal.
There are a few end-to-end systems for voice chat with local LLMs on GitHub but they are not easy to get going with something else than CUDA/linux and almost never support other languages than English. For VAD and transcription the same holds.
Looking closely to several speech to text or text to speech sysems, the default of CUDA/Linux is difficult to get around.
The focus of open source models and tools on the English language is problematic, while Dutch is still relatively well represented many systems are limited to only English.
While some kids interacted with the pumpkin, the jump scare lighting and thunder effect worked better in the environment.
Websockets seems a decent way to do inter process communication, even without web technologies it could be considered. I never though of Websockets in this way. See the Pompernikkel GitHub repository for example Ruby scripts.

Most trick-or-treaters were at least intrigued by it, my son’s friends were impressed, and I got to learn a couple of things, see above. Next year, however, I will try to take it easy.

UGent, 0110.be, and Projecten