The Papers
Click here to view papers associated with this dataset.
A ''cover song'' is a different version of the same song, usually performed by a different artist, and often with different instruments, recording settings, mixing/balance, tempo, and key. Although humans can readily identify cover songs, automatically identifying cover songs with a machine remains a challenging problem. One might wonder why it isn't possible to use an app like Shazam to automatically identify, say, a live recording of a song. As it turns out, the algorithm that powers Shazam looks for exact clips of recordings using an a technique known as audio fingerprinting. It is extremely good at its job, especially given a large database, but it is unable to detect re-renditions, even by the same artist. To help move research in automatic cover songs forward, we present a medium sized cover songs dataset consisting of a collection of features from 395 groups of cover songs, which have been checked by hand. We also have a live demo of our recent technique for identifying and aligning cover songs beat-by-beat, which currently achieves state of the art results on automatic cover song identification. Finally, we have implemented an algorithm to synthesize new cover songs in a fully automated fashinon from raw audio, and we present two tools (LoopDitty and GraphDitty) which we created to help design our algorithms.
Click here to view papers associated with this dataset.
Click here to download the dataset.
Click here to view examples of cover songs which have been aligned by the algorithms in the paper.
Click here to view examples from our cover song synthesis algorithm.
An app that was the seed for this research, showing how music can be thought of as a shape.
A geometric music structure visualization app.
The Covers 80 Dataset | A dataset with low quality audio consisting of 160 songs which are split into two disjoint subsets A and B, each with exactly one version of a pair of songs, for a total of 80 pairs. Mostly '80s and early '90s pop music |
Kara1k Karaoke Songs Dataset | A dataset with features for 2000 songs: 1000 originals and 1000 corresponding karaoke versions. Also a great dataset for singing voice analysis. |
http://www.secondhandsongs.com | A community project of annotations of cover songs which formed the basis of this dataset. |
The Second Hand Songs Dataset | Another dataset based off of annotations from secondhandsongs.com, which is a subset of the Million Songs Dataset consiting of about 20,000 tracks with EchoNest features. |
The Youtube Covers Dataset | A collection chroma, CRP, and CENS features for 350 songs of various genres. |