Feature calculation specificities

Postby boblsturm » Thu Jan 27, 2011 5:09 pm


I am wondering about the decision to compute features from 40 ms audio segments (what is the sample rate, by the way?) for the instrument recognition task. And, what was the overlap between windows? Is this purely an exercise in classifier building, and not musical instrument recognition? I am asking because classifying musical instruments using bags of features extracted from arbitrarily-aligned frames of one scale will not perform as well as approaches that use features computed over several scales, and with late integration.

Similarly, for the music genre features, what window size and overlap was used to compute the features? It says, "All music pieces are partitioned into 20 segments and parameterized." What is the length of each segment? It would be nice to know in order to build higher-level models incorporating tempo, rhythm, etc.

Thank you!
Re: Feature calculation specificities

Postby zwan » Fri Mar 11, 2011 7:25 pm

for "Genres" task:
sample rate was 44.100 Hz,
there is no overlap and excerpts parameterized are 5 sec. long (evenly distributed through a musical piece).
