Internet services expose vast amounts of multimedia data for exchange and browsing, which induces the need for effective and unambiguous ways of data description based on small fractions of information. Digital music databases cannot be easily searched through. The diversity of musical trends and genres, uncommon instruments as well as the variety of performers and their compositions compel the existence of numerous music recognition systems. However, the key issue in the automatic query is parametrization. Parametrization has so far experienced extensive development, however, there are still some important areas of Music Information Retrieval, such as for example music genre classification, that is researching this aspect. In the first task, 'Genres', we ask you to devise an algorithm for recognizing the music genre of given fragments of music tracks.
A database of 60 music performers has been prepared for the competition. The material is divided into six categories: classical music, jazz, blues, pop, rock and heavy metal. For each of the performers 15-20 music pieces have been collected. All music pieces are partitioned into 20 segments and parameterized. The descriptors used in parametrization also those formulated within the MPEG-7 standard, are only listed here since they have already been thoroughly reviewed and explained in many studies.
The feature vector consists of 191 parameters, the first 127 parameters are based on the MPEG-7 standard, the remaining ones are cepstral coefficients descriptors and time-related dedicated parameters:
a) parameter 1: Temporal Centroid,
b) parameter 2: Spectral Centroid average value,
c) parameter 3: Spectral Centroid variance,
d) parameters 4-37: Audio Spectrum Envelope (ASE) average values in 34 frequency bands
e) parameter 38: ASE average value (averaged for all frequency bands)
f) parameters 39-72: ASE variance values in 34 frequency bands
g) parameter 73: averaged ASE variance parameters
h) parameters 74,75: Audio Spectrum Centroid – average and variance values
i) parameters 76,77: Audio Spectrum Spread – average and variance values
j) parameters 78-101: Spectral Flatness Measure (SFM) average values for 24 frequency bands
k) parameter 102: SFM average value (averaged for all frequency bands)
l) parameters 103-126: Spectral Flatness Measure (SFM) variance values for 24 frequency bands
m) parameter 127: averaged SFM variance parameters
n) parameters 128-147: 20 first mel cepstral coefficients average values
o) parameters 148-167: the same as 128-147
p) parameters 168-191: dedicated parameters in time domain based of the analysis of the distribution of the envelope in relation to the rms value.
Training and test datasets contain tracks of distinct performers. They can be found in the Repository:
Note: you must be registered to this challenge in order to access the files.
Solutions and evaluation
Solution should be a text file containing one label (Classical, Jazz, Rock, Blues, Metal or Pop) per line. The labels are not case sensitive.
The baseline solution can be found in the Repository. It was generated by 1-NN algorithm, without any data preprocessing.
The metric used for evaluating the solutions is standard accuracy, i.e. the ratio of the correctly classified samples to the total number of samples.
Examples of references related to parametrization:
The International Society for Music Information Retrieval /Intern. Conference on Music Information Retrieval website
KOSTEK B., CZYZEWSKI A., Representing Musical Instrument Sounds for their Automatic Classification, J. Audio Eng. Soc., vol. 49, 768-785, 2001.
KOSTEK B., Soft Computing in Acoustics, Applications of Neural Networks, Fuzzy Logic and Rough Sets to Musical Acoustics, Studies in Fuzziness and Soft Computing, Physica Verlag, Heildelberg, New York, 1999.
KOSTEK B., Perception-Based Data Processing in Acoustics. Applications to Music Information Retrieval and Psychophysiology of Hearing, Springer Verlag, Series on Cognitive Technologies, Berlin, Heidelberg, New York 2005.
LINDSAY A., HERRE J., MPEG-7 and MPEG-7 Audio - An Overview, vol. 49, No. 7/8, 589-594, 2001.
SIKORA T., HYOUNG-GOOK K., MOREAU N., MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval, Wiley, 2005.
ZWAN P., KOSTEK B., System for Automatic Singing Voice Recognition, J. Audio Eng. Soc., vol. 56, No. 9., 710-723, 2008.
Task 1, Music Genres, was prepared by prof. Bozena Kostek, dr Pawel Zwan, Andrzej Sitek and prof. Andrzej Czyzewski from the Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology.
This material is partly based upon a research entitled Establishment of the universal, open, hosting and communication, repository platform for network resources of knowledge to be used by science, education and open knowledge society financed by Polish National Centre for Research and Development (NCBiR, Poland) under grant No. SP/I/1/77065/10 and realized within the strategic scientific research and experimental development program called Interdisciplinary System for Interactive Scientific and Scientific-Technical Information.