Questions, answers, discussions related to VideoLectures.Net Recommender System Challenge

Postby Maxus » Fri Jun 03, 2011 9:23 am

Hello, dear organizers!

In the readme.txt I could read the following about views column:
views --- the total (aggregated) number of views of this lecture since
the day it was published online to the day the snapshot was

Can you say, when exactly this snapshot was taken?
At least, was this snapshot was taken at one time for all lectures?

Thank you.
Re: views column

Postby ninoaf2 » Fri Jun 03, 2011 10:05 am

Dear Contestant,

We were hoping somebody will ask this kind of question :)
Snapshot time or the moment when the lecture (co)viewing frequencies were taken (for all lectures!)
was (July 2010).
1.7.2009 was taken in order to have reasonable training/test split for the task 1. Note however,
that training/test split is both "vertical" (all test lectures are published after the 1.7.2009. !) and
"horizontal" (approximately half of the lectures published after 1.7.2009. are in the training set!).
By this split we wanted to provide samples in the training set that can be used to learn
the temporal impact on lecture (co)viewing frequencies.

Best regards,
Re: views column

Postby haibin » Mon Jun 13, 2011 9:26 pm

Regarding this view column, I found something not so consistent.
In lectures_train.csv, lecture 13245 has aggregated views of 2, while in pairs it has the following entries:

That means the co-viewed frequency is apparently larger than the aggregated total views. Is it because these two snapshots were not taken at the same time?

Re: views column

Postby ninoaf2 » Tue Jun 14, 2011 4:27 pm

Dear contestant,

You are correct: in ideal circumstances, that "3" should not be there provided that the aggregated number of views is only 2. We were acquiring the data from the database "table by table". We acquired the lectures data first and the click-streams in the end. In the meantime, the site was "live". Even though the time between the acquisitions was only a few minutes, we believe that this resulted in one or more anomalies of the kind that you report. Even though such anomalies should be quite harmless wrt solving the challenge task, the contestants should be aware of this and either ignore it or make simple corrections if consistency is required.

Best regards,
