Publication type: Book Chapters
Abstract: This chapter focuses on identification of duplicate audio material in large digital music archives. The music information retrieval (MIR) problem to efficiently find duplicate in large collections is a solved problem. There are even off-the-shelf systems available to find duplicates. The applications of this technology, however, are still too unknown and underexploited. This chapter describes duplicate detection and its many applications which include: meta-data quality verification, improving listening experiences, re-use of meta-data, informed noise cancellation, optimising storage space, and linking and merging archives. The applications of duplicate detection are illustrated with two case studies. 1) The first case study uses a collection of digitised shellac discs from the Belgian national public-service broadcaster. It shows a surprisingly high amount of duplicate material of around 38\%. With some discs better preserved (and digitized) than others, linking duplicate material allows to redirect listeners to higher quality audio. 2) An archive of early electronic music is the focus of a second case study. The archive has been digitized twice. Segmentation timestamps and other meta-data, originating from first digitisation campaign, is reused to annotate higher fidelity digital audio from a second campaign. The main contribution of this chapter is to highlight practical uses of duplicate detection. A secondary contributions are the findings detailed in the case studies. A third contribution is an evaluation of an updated fingerprinting system.
Cite this article:@inbook{six2023forfor, author = "Joren Six, Federica Bressan and Koen Renders", title = "Advances in Speech and Music Technology", chapter = "Duplicate detection for for digital audio archive management: two case studies", publisher = "Springer", year = "2023" }