How teams did well on the Genres contest

Questions, answers, discussions related to ISMIS 2011 Contest

How teams did well on the Genres contest

Postby JRM » Tue Mar 22, 2011 8:40 pm

I know a number of teams on the leaderboard were using this contest as a way to learn more about data mining.

Rather than waiting for the best approaches to be formalized in papers and at ISMIS, it would greatly aid our learning to read short summaries of others' approaches via this forum.

Personally, I was able to get an accuracy of .74 by combining several submodels (in part by using class-probability estimates as weights). The two most effective classifiers in these models were MultilayerPerceptron and Logistic (which I ran using weka). Using PCA also helped to simplify the number of variables (191 to 75, while maintaining 95% of the variance).

Could others share (at least brief) insights from their learnings?

Thanks very much,
John McDowell
The Wharton School at the University of Pennsylvannia
Posts: 4
Joined: Mon Mar 21, 2011 9:13 pm

Re: How teams did well on the Genres contest

Postby cdasilva » Tue Mar 22, 2011 8:54 pm

Hello! I think that would be a great idea, I'm also beginning to learn data mining and I can say I learnt a lot from this competition. Mostly by trial and error.

Until last week my best result was mixture of models. I used weka for classifying and excel for aggregating data. With about 18 different models and using the most frequent class for each instance I got about 0.75. But this week, I learnt about Random Forests and with 320 trees I got my final result of 0.7639 (final result: 0.76723).

I tried to find which classifiers perfomed better in certain regions and picked the classes chosen by those but the results were not so good.

I'm really eager to know how did domcastro got that amazing result!

Hope everyone share their experience :)

EDIT: I did try PCA but the results were not good. This week I was starting with ReliefF to select variables but didn't have time to finish the experience.

Best Regards,
Posts: 4
Joined: Thu Jan 13, 2011 5:40 pm

Re: How teams did well on the Genres contest

Postby JRM » Tue Mar 22, 2011 11:13 pm

Thanks for the pointer to Random Forests-- looks like a powerful classification method to learn about. When TunedIT posts the full test set, it will be fun to experiment with this and any other methods that participants post.

Regarding PCA, I didn't see a real accuracy improvement either, but the time required to run some models (like Neural Networks with many sublayers and trials) went down significantly without losing much accuracy.

In addition to others model building efforts, I'm also curious about any attempts to separate the training data by artist for improved cross-validation. If others did this, how? (I originally believed this would be important to reduced overfitting to the preminary test set through submission of many models. However, looking through the results now, it looks like there was very little drop-off between preliminary and final evaluation.)
Posts: 4
Joined: Mon Mar 21, 2011 9:13 pm

Re: How teams did well on the Genres contest

Postby wahoo » Thu Mar 24, 2011 3:33 am

Bootstrapping (a.k.a. pseudo-labeling, self-training) methods worked the best for me. I'll submit a short writeup for the blog soon. I am looking forward to hearing about what domcastro did!

Posts: 3
Joined: Thu Jan 07, 2010 1:15 am

Re: How teams did well on the Genres contest

Postby domcastro » Thu Mar 24, 2011 1:36 pm

One of the most important things I did was to forget about performers/segments/17-20 per song etc. I had started to think that one song had many labels. I therefore chose to forget about it all and deal with each segment as an independent instance. So with this mind, I could try things that didn't fit the data "statistics". We are writing up our report and we will come up with a name at the end. But it was an iterative semi-supervised mixture model of classifiers and clustering.
Posts: 10
Joined: Fri Nov 26, 2010 5:00 pm

Return to ISMIS 2011 Contest: Music Information Retrieval

Who is online

Users browsing this forum: No registered users and 1 guest