Challenges / JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers

Status Closed
Type Scientific
Start 2012-01-02 00:00:00 CET
End 2012-03-30 23:59:59 CET
Prize 1,500$

Registration is required.


The challenge is over now. Click here to view the Summary.

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers, is a special event of Joint Rough Sets Symposium (JRS 2012, that will take place in Chengdu, China, August 17-20, 2012. The task is related to the problem of predicting topical classification of scientific publications in a field of biomedicine. Money prizes worth 1,500 USD will be awarded to the most successful teams. The contest is funded by the organizers of the JRS 2012 conference, Southwest Jiaotong University, with support from University of Warsaw, SYNAT project and TunedIT.

Introduction: Development of freely available biomedical databases allows users to search for documents containing highly specialized biomedical knowledge. Rapidly increasing size of scientific article meta-data and text repositories, such as MEDLINE [1] or PubMed Central (PMC) [2], emphasizes the growing need for accurate and scalable methods for automatic tagging and classification of textual data. For example, medical doctors often search through biomedical documents for information regarding diagnostics, drugs dosage and effect or possible complications resulting from specific treatments. In the queries, they use highly sophisticated terminology, that can be properly interpreted only with a use of a domain ontology, such as Medical Subject Headings (MeSH) [3]. In order to facilitate the searching process, documents in a database should be indexed with concepts from the ontology. Additionally, the search results could be grouped into clusters of documents, that correspond to meaningful topics matching different information needs. Such clusters should not necessarily be disjoint since one document may contain information related to several topics. In this data mining competition, we would like to raise both of the above mentioned problems, i.e. we are interested in identification of efficient algorithms for topical classification of biomedical research papers based on information about concepts from the MeSH ontology, that were automatically assigned by our tagging algorithm. In our opinion, this challenge may be appealing to all members of the Rough Set Community, as well as other data mining practitioners, due to its strong relations to well-founded subjects, such as generalized decision rules induction [4], feature extraction [5], soft and rough computing [6], semantic text mining [7], and scalable classification methods [8]. In order to ensure scientific value of this challenge, each of participating teams will be required to prepare a short report describing their approach. Those reports can be used for further validation of the results. Apart from prizes for top three teams, authors of selected solutions will be invited to prepare a paper for presentation at JRS 2012 special session devoted to the competition. Chosen papers will be published in the conference proceedings.

Contest Participation Rules:

  • The competition is open for all interested researchers, specialists and students. Only members of the Contest Organizing Committee cannot participate.
  • Participants may submit solutions as teams made up of one or more persons. Each team needs to designate a leader responsible for communication with the Organizers. One person may be incorporated in maximally 2 teams.
  • The total number of submission for any single team is limited to 200 solutions.
  • Each team is obliged to provide a short report describing their final solution. Reports must contain information such as the name of a team, names of all team members, the last preliminary evaluation score and a brief overview of the used approach. Their length should not exceed 1000 words and they should be sent in the pdf format to by April 2, 2012. Only submissions made by teams that provided the reports will qualify for the final evaluation.

JRS 2012 conference special session: There will be a special session at the JRS 2012 conference devoted to the competition. We will invite authors of selected reports to extend them for publication in the proceedings (after reviews by Organizing Committee members) and presentation at the conference. The invited teams will be chosen based on their rank and innovativeness of approach.

Awards: Top ranked solutions (based on the final evaluation scores) will be awarded with prizes:

  • First Prize: 1,000 USD + free JRS 2012 conference registration,
  • Second Prize: 500 USD + free JRS 2012 conference registration,
  • Third Prize: free JRS 2012 conference registration.
Additionally, at the conference, authors of all papers accepted for presentation at the special session will receive a diploma and a competition T-shirt.


  • Jan. 2, 2012: start of the challenge, data sets become available,
  • Mar. 30, 2012: deadline for submitting the predictions,
  • Apr. 2, 2012: deadline for sending the reports, end of the challenge,
  • Apr. 6, 2012: on-line publication of final results, sending invitations for submitting short papers for the special session,
  • May 10, 2012: deadline for submissions of camera-ready papers selected for presentation at the JRS special session.

Contest Organizing Committee:

  • Andrzej Janusz (Chairman), University of Warsaw
  • Hung Son Nguyen, University of Warsaw
  • Dominik Ślęzak, University of Warsaw & Infobright Inc.
  • Sebastian Stawicki, University of Warsaw
  • Adam Krasuski, Main School of Fire Service & University of Warsaw


[1] National Library of Medicine: PubMed: The Bibliographic Database. In McEntyre J., Ostell J.(Eds.): The NCBI Handbook. Available online,

[2] National Library of Medicine: PubMed Central (PMC): An Archive for Literature from Life Sciences Journals. In McEntyre J., Ostell J. (Eds.): The NCBI Handbook. Available online,

[3] National Library of Medicine: Introduction to MeSH - 2012. Available online (2012),

[4] Greco S., Pawlak Z., Słowiński R.: Generalized Decision Algorithms, Rough Inference Rules, and Flow Graphs. J. J. Alpigini, J. F. Peters, A. Skowron and N. Zhong (Eds.): Rough Sets and Current Trends in Computing 2002, LNCS 2475, Springer-Verlag, London, UK (2002)

[5] Guyon I. et al.: Feature Extraction: Foundations and Applications. Studies in Fuzziness and Soft Computing. Springer (August 2006)

[6] Hassanien A. E., Suraj Z., Ślęzak D., Lingras P. (Eds.): Rough Computing: Theories, Technologies and Applications. Idea Group Inc (2007)

[7] Stavrianou A., Andritsos P., Nicoloyannis N.: Overview and semantic issues of text mining. SIGMOD Rec. 36, 3, pp. 23-34, (September 2007)

[8] Nguyen H. S.: Scalable Classification Method Based on Rough Sets. In Alpigini J. J., Peters J. F., Skowronek J., Zhong N. (Eds.): Rough Sets and Current Trends in Computing 2002, LNCS 2475, pp. 433-440. Springer-Verlag, London, UK (2002)

Copyright © 2008-2013 by TunedIT
Design by luksite