JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers, is a special event of Joint Rough Sets Symposium (JRS 2012, http://sist.swjtu.edu.cn/JRS2012/) that will take place in Chengdu, China, August 17-20, 2012. The task is related to the problem of predicting topical classification of scientific publications in a field of biomedicine. Money prizes worth 1,500 USD will be awarded to the most successful teams. The contest is funded by the organizers of the JRS 2012 conference, Southwest Jiaotong University, with support from University of Warsaw, SYNAT project and TunedIT.
Introduction: Development of freely available biomedical databases allows users to search for documents containing highly specialized biomedical knowledge. Rapidly increasing size of scientific article meta-data and text repositories, such as MEDLINE  or PubMed Central (PMC) , emphasizes the growing need for accurate and scalable methods for automatic tagging and classification of textual data. For example, medical doctors often search through biomedical documents for information regarding diagnostics, drugs dosage and effect or possible complications resulting from specific treatments. In the queries, they use highly sophisticated terminology, that can be properly interpreted only with a use of a domain ontology, such as Medical Subject Headings (MeSH) . In order to facilitate the searching process, documents in a database should be indexed with concepts from the ontology. Additionally, the search results could be grouped into clusters of documents, that correspond to meaningful topics matching different information needs. Such clusters should not necessarily be disjoint since one document may contain information related to several topics. In this data mining competition, we would like to raise both of the above mentioned problems, i.e. we are interested in identification of efficient algorithms for topical classification of biomedical research papers based on information about concepts from the MeSH ontology, that were automatically assigned by our tagging algorithm. In our opinion, this challenge may be appealing to all members of the Rough Set Community, as well as other data mining practitioners, due to its strong relations to well-founded subjects, such as generalized decision rules induction , feature extraction , soft and rough computing , semantic text mining , and scalable classification methods . In order to ensure scientific value of this challenge, each of participating teams will be required to prepare a short report describing their approach. Those reports can be used for further validation of the results. Apart from prizes for top three teams, authors of selected solutions will be invited to prepare a paper for presentation at JRS 2012 special session devoted to the competition. Chosen papers will be published in the conference proceedings.
Contest Participation Rules:
JRS 2012 conference special session: There will be a special session at the JRS 2012 conference devoted to the competition. We will invite authors of selected reports to extend them for publication in the proceedings (after reviews by Organizing Committee members) and presentation at the conference. The invited teams will be chosen based on their rank and innovativeness of approach.
Awards: Top ranked solutions (based on the final evaluation scores) will be awarded with prizes:
Contest Organizing Committee:
 National Library of Medicine: PubMed: The Bibliographic Database. In McEntyre J., Ostell J.(Eds.): The NCBI Handbook. Available online, http://www.ncbi.nlm.nih.gov/books/NBK21094/
 National Library of Medicine: PubMed Central (PMC): An Archive for Literature from Life Sciences Journals. In McEntyre J., Ostell J. (Eds.): The NCBI Handbook. Available online, http://www.ncbi.nlm.nih.gov/books/NBK21087/
 National Library of Medicine: Introduction to MeSH - 2012. Available online (2012), http://www.nlm.nih.gov/mesh/introduction.html
 Greco S., Pawlak Z., Słowiński R.: Generalized Decision Algorithms, Rough Inference Rules, and Flow Graphs. J. J. Alpigini, J. F. Peters, A. Skowron and N. Zhong (Eds.): Rough Sets and Current Trends in Computing 2002, LNCS 2475, Springer-Verlag, London, UK (2002)
 Guyon I. et al.: Feature Extraction: Foundations and Applications. Studies in Fuzziness and Soft Computing. Springer (August 2006)
 Hassanien A. E., Suraj Z., Ślęzak D., Lingras P. (Eds.): Rough Computing: Theories, Technologies and Applications. Idea Group Inc (2007)
 Stavrianou A., Andritsos P., Nicoloyannis N.: Overview and semantic issues of text mining. SIGMOD Rec. 36, 3, pp. 23-34, (September 2007)
 Nguyen H. S.: Scalable Classification Method Based on Rough Sets. In Alpigini J. J., Peters J. F., Skowronek J., Zhong N. (Eds.): Rough Sets and Current Trends in Computing 2002, LNCS 2475, pp. 433-440. Springer-Verlag, London, UK (2002)