Panako – Acoustic Fingerprinting

Panako is an acoustic fingerprinting system. The system is able to extract fingerprints from an audio stream, and either store those fingerprints in a database, or find a match between the extracted fingerprints and stored fingerprints. Several acoustic fingerprinting algorithms are implemented within Panako. The main algorithm, the Panako algorithm, has the feature that audio queries can be identified reliably and quickly even if they has been sped up, time stretched or pitch shifted with respect to the reference audio. The main focus of Panako is to serve as a demonstration of the Panako algorithm. Other acoustic fingerprinting schemes are implemented to serve as a baseline for comparison. More information can be found in the article about Panako.

Panako has also uses for synchronization of data-streams. For those applications please see the article titled Synchronizing Multimodal Recordings Using Audio-To-Audio Alignment.

Please be aware of the patents US7627477 B2 and US6990453 and perhaps others. They describe techniques used in some algorithms implemented within Panako. The patents limit the use of some algorithms under various conditions and for several regions. Please make sure to consult your intellectual property rights specialist if you are in doubt about these restrictions. If these restrictions apply, respect the patent holders rights. The first aim of Panako is to serve as a research platform to experiment with and improve fingerprinting algorithms.

This document covers installation, usage and configuration of Panako.

The Panako source code is licensed under the GNU Affero General Public License.

Overview

  1. Why Panako?
  2. Getting Started
  3. Usage
    1. Store Fingerprints
    2. Query for Matches
    3. Print Storage Statistics
    4. Print Configuration
    5. Synchronize Media Files
    6. Synchronize Sensor Streams with SyncSink
    7. Compare Media Files
    8. Run the REST webservice
    9. Query the REST service
  4. Further reading
  5. Credits
  6. Reproduce the ISMIR Paper Results

Why Panako?

Content based music search algorithms make it possible to identify a small audio fragment in a digital music archive with potentially millions of songs. Current search algorithms are able to respond quickly and reliably on an audio query, even if there is noise or other distortions present. During the last decade they have been used successfully as digital music archive management tools, music identification services for smartphones or for digital rights management.


Fig 1. General content based audio search scheme.

Most algorithms, as they are described in the literature, do not allow substantial changes in replay speed. The speed of the audio query needs to be the same as the reference audio for the current algorithms to work. This poses a problem, since changes in replay speed do occur commonly, they are either introduced by accident during an analog to digital conversion or are introduced deliberately.

Analogue physical media such as wax cylinders, wire recordings, magnetic tapes and grammophone records can be digitized at an incorrect or varying playback speed. Even when calibrated mechanical devices are used in a digitization process, the media could already have been recorded at an undesirable speed. To identify duplicates in a digitized archive, a music search algorithm should compensate for changes in replay speed

Next to accidental speed changes, deliberate speed manipulations are sometimes introduced during radio broadcasts: occasionally songs are played a bit faster to fit into a timeslot. During a DJ-set speed changes are almost always present. To correctly identify audio in these cases as well, a music search algorithm robust against pitch shifting, time stretching and speed changes is desired.

The Panako algorithm allows such changes while maintaining other desired features as scalability, robustness and reliability. Next to two versions of the Panako algorithm (internally identified as CTEQ and NCTEQ), two versions of the algorithm described in An Industrial-Strength Audio Search Algorithm are implemented (internally identified as FFT and NFFT). Also the algorithm in A Robust Audio Fingerprinter Based on Pitch Class Histograms – Applications for Ethnic Music Archives is available. To make comparisons between fingerprinting systems easy, researchers are kindly invited to contribute algorithms to the Panako project.

Alternative open source music identification systems are audfprint and echoprint. Alternative systems with similar features are described in US7627477 B2 and in Quad-based Audio Fingerprinting Robust To Time And Frequency Scaling by Reinhard Sonnleitner and Gerhard Widmer.

Getting started

To download the latest Panako version, wget needs to be installed on your system. Once downloaded the contents looks like this:

To compile Panako, the JDK 7 or later is required. See the installation instructions on the Java website, for installation on your operating system. See for example how to install Java 7 on Debian 7.

Panako uses the Apache Ant build system. Install it on your system. Once ant and the other components are installed correctly the following commands should get you started:

wget http://panako.be/releases/Panako-latest/Panako-latest-src.zip
unzip Panako-latest-src.zip
#sudo apt-get install default-jdk ant libav-tools# Optionally install JDK, ant libav-tools (on Ubuntu).
cd Panako/build
ant #Builds the core Panako library
ant install #Installs Panako in the /opt/panako directory
ant doc #Creates the documentation in Panako/doc
cp ../doc/panako /usr/bin #copies the panako startup script to your path

The last command copies the startup script in doc/panako to a directory in your path. The script allows for easy access to Panako’s functionality. If this does not work for you, you can still call Panako using java -jar /opt/panako/panako.jar [..args].

Panako decodes audio using by calling an utility that should be present on your system. Internally it reads the output of a sub-process via a pipe, in this case, the output are decoded PCM audio samples. By default Panako calls avconv, included in libav. avconv should be installed correctly and should be available on your systems path. Alternatively, Panako can be configured to use any utility (like ffmpeg) that can pipe decoded audio streams in the format PCM, one channel, 16bit per sample. Libav version 9 or later is advised. On a Debian like system:

apt-get install libav-tools

Test Panako. You might need a new shell to use panako.

panako -v #prints version
panako stats #db info

Panako Usage

Panako provides a command line interface, it can be called using panako subapplication [argument...]. For each task Panako has to perform, there is a subapplication. There are subapplications to store fingerprints, query audio fragments, monitor audio streams, and so on. Each subapplication has its own list of arguments, options, and output format that define the behavior of the subapplication.

To save some keystrokes the name of the subapplication can be shortened using a unique prefix. For example panako m file.mp3 is expanded to panako monitor file.mp3. Since both stats and store are valid subapplications the store call can be shortened to panako sto *.mp3, panako s *.mp3 gives an invalid application message. A trie is used to find a unique prefix.

What follows is a list of those subapplications, their arguments, and respective goal.

Store Fingerprints – panako store

The store instruction extracts fingerprints from audio tracks and stores those in the datastorage. The command expects a list of audio files, video files or a text file with a list of file names.

#Audio is converted automatically
panako store audio.mp3 audio.ogg audio.flac audio.wav 

#The first audio stream of video containers formats is used.
panako store audio.mpc audio.mp4 audio.acc audio.avi audio.ogv audio.wma 

#Glob characters can be practical
panako store */*.mp3

#A text file with audio files can be used as well
#The following searches for all mp3's (recursively) and
#stores them in a text file
find . -name '*.mp3' > list.txt
#Iterate the list
panako store list.txt

Query for Matches – panako query

The query command extracts fingerprints from a short audio frament and tries to match the fingerprints with the database.

panako query short_audio.mp3

Print Storage Statistics – panako stats

The stats command prints statistics about the stored fingerprints and the number of audio fragments. If an integer argument is given it keeps printing the stats every x seconds.

panako stats # print stats once
panako stats 20 # print stats every 20s 

Print Configuration – panako config

The config subapplication prints the configuration currently in use.

panako config

To override configuration values there are two options. The first option is to create a configuration file, by default at the following location: /opt/config.properties. The configuration file is a properties text file. An commented configuration file should be included in the doc directory at doc/config.properties.

The second option to override configuration values is by adding them to the arguments of the command line call as follows:

panako subapplication CONFIG_KEY=value

For example, if you do not want to check for duplicate files while building a fingerprint database the following can be done:

panako store file.mp3 CHECK_FOR_DUPLICATES=FALSE

The configuration values provided as a command line argument have priority over the ones in the configuration file. If there is no value configured a default is used automatically. To find out which configuration options are available and their respective functions, consult the documented example configuration file doc/config.properties..

Resolve an identifier for a filename – panako resolve

This application simply returns the identifier that is used internally for a filename. The following call returns for example 54657653:

panako resolve test.mp3 The internal identifiers are currently defined using integers.

Browse Fingerprints – panako browser

Shows a swing user interface with a spectrogram and dectected event points and fingerprints. It can be used
to check the effect of configuration parameters on fingerprint detection and to optimize configuration.

Synchronize media files – panako sync

This operation extracts fingerprints from a reference media file and returns how much time offset there is to other files with similar audio. This is practical to synchronize video streams with similar audio or recordings of the same performance from several microphones.

The synchonize option uses a two stage algorithm. First fingerprints are extracted from the media files. With these it is checked if the media files contain similar audio. If they do, a rough offset is determined (e.g. with 32ms accuracy). Subsequently crosscovariance of one audio frame is used to improve the offset estimation. This offset is returned.

panako sync reference.avi other.mkv other.mp3

Synchronize Sensor Streams with SyncSink – panako syncsink

This command starts a Swing user interface to synchronize several audio/video and data files. The aim is to make it easy to synchronize various media files and to allow synchronization of sensor data streams. Syncronization of sensor streams can be done effectively if for each sensor stream there is a corresponding synchronized ambient audio recording. The problem of synchronizing sensor streams is then reduced to audio-to-audio alignment. The SyncSink facilitates this type of data-stream synchronization.

The synchonize option uses a two stage algorithm. First fingerprints are extracted from the media files. With these it is checked if the media files contain similar audio. If they do, a rough offset is determined (e.g. with 32ms accuracy). Subsequently crosscovariance of one audio frame is used to improve the offset estimation. This offset is returned.


Fig 2. SyncSink user interface.

To use the application start it with panako syncsink. Subsequently add various audio or video files using drag and drop. If the same audio is found in the various media files a timebox plot appears, as in the screenshot below. To add corresponding data-files click one of the boxes on the timeline and choose a data file that is synchronized with the audio. The datafile should be a CSV-file. The separator should be ‘,’ and the first column should contain a timestamp in fractional seconds. After pressing Sync a new CSV-file is created with the first column containing correctly shifted time stamps. If this is done for multiple files, a synchronized sensor-stream is created. Also, ffmpeg commands to syncronize the media files themselves are printed to the command line.

It is also possible to start the application with media streams immediately pressent. The first file is considered to be the reference stream.

panako syncsink reference.avi first.mkv second.mp3 third.wav

By default Panako is configured to synchronize music and to minimize the number of event points and fingerprints per second. This reduces the storage requirements and processing needs. However, it can be of help to change the default configuration to synchronize sparse audio or speech. In such case try to start panako with the following set of configuration values:

panako syncsink \
  SYNC_MIN_ALIGNED_MATCHES=4 \
  NFFT_EVENT_POINT_MIN_ENERGY_RATIO_THRESHOLD=0.10 \
  NFFT_EVENT_POINT_MAX_ENERGY_RATIO_THRESHOLD=0.90 \
  NFFT_EVENT_POINT_MIN_ENERGY=0.01 \
  NFFT_MAX_FINGERPRINTS_PER_EVENT_POINT=7

Compare the structure of media files – panako compare

This operation extracts fingerprints from a reference media file and searches for the fingerprints in another media file. The aim is to identify the parts of audio that are present in both input files. If only one audio file is given, a self similarity matrix is constructed. Using the matrix it becomes clear when the exact same audio is repeated within the same song. It can be used to automatically detect where cuts have been made in a long original album edit to end up with a shorter radio edit of the same song. Another use case is musical structure analysis, this works especially well in electronic music. To call the application:

panako compare audio.wav > self_similarity.csv
bc. panako compare original_edit.wav radio_edit.wav > cuts_where.csv

The output is a CSV file with millisecond values. When only one file is given, the diagonal is detected in a self similarity matrix and the output could look like below, if there is a repetiion of the audio between 3s-3.5s at 90-90.5s. The data is easy to visualize in a spreadsheet application.

10,10
15,15
20,20
...
3000,3000,90000
3010,3010,90010
3500,3500,90500
...
4000,4000
4010,4010
...

Run the REST webservice – panako server

Panako contains a lightweight HTTP server to provide a REST webservice. The REST webservice provides two
methods /v1.0/match and /v1.0/metadata to query for a match and to fetch metadata
for a match respectively. The server can be configured using the HTTP_SERVER_PORT configuration setting.

panako server

Query the REST service – panako client

Panako contains functionality to send match requests to the REST webservice.

panako client "localhost:8080" test.mp3

Further Reading

Some relevant reading material about acoustic fingerprinting. The order gives an idea of relevance to the Panako project.

  1. Six, Joren and Leman, Marc Panako – A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification (2014)
  2. Wang, Avery L. An Industrial-Strength Audio Search Algorithm (2003)
  3. Cano, Pedro and Batlle, Eloi and Kalker, Ton and Haitsma, Jaap A Review of Audio Fingerprinting (2005)
  4. Six, Joren and Cornelis, Olmo A Robust Audio Fingerprinter Based on Pitch Class Histograms – Applications for Ethnic Music Archives (2012)
  5. Arzt, Andreas and Bock, Sebastian and Widmer, Gerhard Fast Identification of Piece and Score Position via Symbolic Fingerprinting (2012)
  6. Fenet, Sebastien and Richard, Gael and Grenier, Yves A Scalable Audio Fingerprint Method with Robustness to Pitch-Shifting (2011)
  7. Ellis, Dan and Whitman, Brian and Porter, Alastair Echoprint – An Open Music Identification Service (2011)
  8. Sonnleitner, Reinhard and Widmer, Gerhard Quad-based Audio Fingerprinting Robust To Time And Frequency Scaling (2014)

Credits

The Panako software was developed at IPEM, Ghent University by Joren Six.

Some parts of Panako were inspired by the Robust Landmark-Based Audio Fingerprinting Matlab implementation by Dan Ellis.

If you use Panako for research purposes, please cite the following work:

@inproceedings{six2014panako,
  author      = {Joren Six and Marc Leman},
  title       = {{Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification}},
  booktitle   = {{Proceedings of the 15th ISMIR Conference (ISMIR 2014)}}, 
  year        =  2014
}

If you use the synchronization algorithms for research purposes, please cite the following work:

@article{six2015synksink_jmui,
  author = {Six, Joren and Leman, Marc},
  title = {Synchronizing Multimodal Recordings Using Audio-To-Audio Alignment - In Press},
  journal = {Journal on Multimodal User Interfaces}
}

Reproduce the ISMIR Paper Results

The directory doc/Reproducibility contains scripts to reproduce the result found in the
Panako ISMIR 2014 paper. The scripts follow this procedure:

  1. The Jamendo dataset is downloaded.
  2. The fingerprints are extracted and stored for each file in the data set.o download an openly available dataset
  3. Query files are created from the Jamendo data set.
  4. Panako is queried for each query file, results are logged
  5. Results are parsed.

Requirements

To run the scripts, a UNIX like system with following utilities is required:

Also the panako software is needed. Please see above to install the Panako.
The configuration used during the test is also included in the doc/Reproducibility directory.

If all requirements are met, running the test is done using bash run_experiment.bash