Challenges / VideoLectures.Net Recommender System Challenge/1. Cold start

Status Closed
Type Scientific
Start 2011-04-18 10:00:00 CET
End 2011-07-08 11:59:59 CET
Prize 5,500€

Registration is required.

The first task of the challenge is to solve the “cold start problem” commonly associated with pure collaborative filtering (CF) recommenders.
Generally, “cold start” recommender success should be measured through some user satisfaction survey, or analysis. For the challenge, we needed a quantitative measure, and we have simulated the cold start situation. In order to be able to score solutions, new video lectures are those that entered the site more recently, but for which there is already some viewing information available. Competitors are required to predict which of the newly acquired lectures at the site should be recommended after viewing some of the “older” lectures (submission and evaluation for the task 1).
Solution for the task 1 is based on ranking of lectures according to withheld lecture co-viewing frequencies in descending order.

Submission and evaluation

In this task, we assume that the user has seen one of the lectures from the task1_query.csv file, which are characterized by the earlier times of entering the site (‘older’ lectures). As a solution for this task a ranked list of lectures from the lectures_test.csv file (‘new lectures’) is to be recommended after viewing some of the ‘older lectures’. Lectures from the lectures_test.csv file are introduced into the system more recently.

The length of the recommended list is fixed to 30 lectures and the format of submission is given as:
oldL_ID1: rcm_L1(oldL_ID1), rcm_L2(oldL_ID1), …, rcm_L30(oldL_ID1)
oldL_ID2: rcm_L1(oldL_ID2), rcm_L2(oldL_ID2), …, rcm_L30(oldL_ID2)

where rcm_Lx(oldL_IDy) represents x’th recommendation for lecture y.

Overall score for the submission/solution is based on the MARp - mean average R-precision score. For the task 1, cut-off lengths for the calculation of MARp are z ∈ {5, 10, 15, 20, 25, 30}.

The preliminary results, comprising of randomly sampled 20% of the final results, are evaluated after submission and published on Leaderboard, allowing comparison with other participants. The final results are scored on the full test dataset.

Example of submission file for baseline random solution for track 1 can be found here.


Altogether, data for task 1 and task 2 includes 12 textual (CSV, Comma Separated Values) files with the appropriate attribute names denoted in first rows. Two of these files are query files – these files have to be filled up with recommendations and submitted as solutions in order to be scored. The files are briefly described in the following text.

NOTE: To download the data, you must be logged in and registered to the challenge.
Common data (for both tracks)
Track 1 specific data:

Whole dataset (for both tracks) download: (zip),(rar).

Copyright © 2008-2013 by TunedIT
Design by luksite