~ How To: Generate an Audio Fingerprinting Data Set With Sox Audio Effects

» By Joren on Wednesday 07 December 2011

A small part of Tarsos has been turned into a audio fingerprinting application. The idea of audio fingerprinting is to create a condensed representation of an audio file. A perceptually similar audio file should generate similar fingerprints. To test how robust a fingerprinting technique is, a data set with audio files that are alike in some way is practical.

SoX – Sound eXchange is a command line utility for sound processing. It can apply audio effects to a sound. Using these effects and a set of unmodified songs an audio fingerprinting data set can be created. To generate such a data set SoX can be used to:

Trim the first x seconds of a file
Speed-up or slow-down the audio
Change the pitch of a file without modifying the tempo
Generate background noise (white noise is used)
Reverse the audio stream


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

  #Trim the first 10 seconds
sox input.wav output.wav trim 10

#speed-up of 10%
sox input.wav output.wav speed 1.10

#change the pitch upwards 100 cents (one semitone)
#without changing the tempo
sox input.wav output.wav pitch 100

#generate white noise with the length of input.wav
sox input.wav noise.wav synth whitenoise
#mix the white noise with the input to generate noisy output
#-v defines how loud the white noise is
sox -m input.wav -v 0.1 noise.wav output.wav

#reverse the audio
sox input.wav output.wav reverse

A ruby script to generate a lot of these files can be found attached.

Code and HoGent Attachments

audio_fingerprinting_dataset_generator.rb.txt