Challenges / VideoLectures.Net Recommender System Challenge/2. Pooled sequences

In order to comply with privacypreserving constraints, lecture viewing sequences have been transformed into what we named pooled sequences. Pooled viewing sequence is given by the set of three lectures on the left side (triplet) and a ranked list of at most ten lectures on the right side. The set of three lectures does not imply an ordering, it is merely a set that comes “upstream” of lectures given on the right of one pooled viewing sequence. Ranked list on the right side of some pooled viewing sequence is constructed from all the clickstreams with the particular triplet on the left side. The transformation process and format of the data in triplets_train_left.csv file and triplets_train_right.csv file is described here. In this task contestants are asked to recommend a ranked list of ten lectures that should be recommended after viewing a set of three lectures. In contrast to task 1, this is the situation close to typical recommendation scenario (submission and evaluation for the task 2). Solution for the task 2 is based on ranking of lectures according to frequencies in withheld pooled lecture viewing sequences in descending order. Submission and evaluation In this task, we assume that the user has seen one of the triplets from the query_task2.csv file. Each triplet in the task2_query.csv file is a set of three lectures (triplet) {x1, x2, x3} from the lectures_train.csv file. As a solution for this task, we expect recommendation lists of all possible lectures for each triplet given in the task2_query.csv file. Each recommended list, is a ranked list of lectures, that for this task can be from both of the files (lectures_train.csv and lectures_test.csv). The length of recommended list is fixed to 10, and the format of submission is given as: triplet_ID1: rcm_L1(triplet_ID1), rcm_L2(triplet_ID1), … , rcm_L10(triplet_ID1) triplet_ID2: rcm_L1(triplet_ID2), rcm_L2(triplet_ID2), … , rcm_L10(triplet_ID2) … where rcm_Lx(triplet_IDy) represents x’th recommendation for triplet y. Cutoff lengths for the calculation of MARp in task 2 are z ∈ {5, 10}. The preliminary results, comprising of randomly sampled 20% of the final results, are evaluated after submission and published on Leaderboard, allowing comparison with other participants. The final results are scored on the full test dataset. Example of submission file for baseline random solution for track 2 can be found here. Creating pooled viewing sequences Consider a sequence of viewed lectures: id1 > id7 > id2 > id1 > id4 > id5 > id6 > id3 We first filter out duplicates (here  id1): id1 > id7 > id2 > id4 > id5 > id6 > id3 Then, we determine all possible unordered triplets in the sequence. For each triplet, cut the sequence after the rightmost lecture from the triplet. In the above example, if {id1, id4, id5} is the triplet, the sequence is cut right after id5. Finally, increase tripletspecific counts for all the lectures after the cut. In the above example, given the triplet {id1, id4, id5}, the tripletspecific counts for id6 and id3 are increased: {id1, id4, id5} > id6 : 1, id3 : 1 Suppose there is another clickstream sequence, that amongst others, contains unordered triplet {id1, id4, id5} and that id6, id3, and id7 are lectures appearing after the cut. Then the counts for the {id1, id4, id5} are increased as follows: {id1, id4, id5} > id6 : 2, id3 : 2, id7 : 1 Dataset Altogether, data for task 1 and task 2 includes 12 textual (CSV, Comma Separated Values) files with the appropriate attribute names denoted in first rows. Two of these files are query files – these files have to be filled up with recommendations and submitted as solutions in order to be scored. The files are briefly described in the following text. Download NOTE: To download the data, you must be logged in and registered to the challenge. Common data (for both tracks) Track 2 specific data: Whole dataset (for both tracks) download: (zip),(rar). 