Question of PMID and MESHID release

Q&A related to the challenge: JRS 2012 DM Competition: Topical Classification of Biomedical Research Papers

Question of PMID and MESHID release

Postby rick » Wed Jan 04, 2012 6:31 pm

Hi organizers,

I am a text mining category researcher and obtained the message of this competition today.
I have a question about this competition.
Will the organizer provide the pubmed id and mesh id information for training and testing data? Because I cannot see any information about that.
Pubmed id and mesh id is necessary for a text mining team. If you can provide this information, that will be great.

Best,
Rick
rick
 
Posts: 6
Joined: Wed Jan 04, 2012 5:29 pm

Re: Question of PMID and MESHID release

Postby cwilkes » Wed Jan 04, 2012 7:17 pm

Looking at this page: http://tunedit.org/challenge/JRS12Contest?m=task

It looks like the information relating the MeSH number to a qualifer name (the terminology from http://www.nlm.nih.gov/mesh/topsubscope.html) and the document id number to an abstract will be released after the contest:

In order to ensure that participants who are not familiar with biomedicine, and with the MeSH ontology in particular, have equal chances as domain experts, the names of concepts and topical classifications are removed from data. Those names and relations between data columns, as well as a dictionary translating decision class identifiers into MeSH subheadings, can be provided on request after completion of the challenge.
cwilkes
 
Posts: 3
Joined: Fri Feb 25, 2011 8:06 am

Re: Question of PMID and MESHID release

Postby rick » Wed Jan 04, 2012 8:03 pm

thanks for your reply~ I got it!

Cheer,
Rick
rick
 
Posts: 6
Joined: Wed Jan 04, 2012 5:29 pm

Re: Question of PMID and MESHID release

Postby NosferatoCorp » Wed Jan 04, 2012 8:16 pm

Dear Rick,

During the contest we cannot disclose information regarding PMIDs, MESHIDs or any other that could be used to disambiguate the data, since it might spoil the competitive spirit. Someone could use such information to obtain classification of the test documents directly from publicly available PubMed resources, which would be cheating. It would make the whole competition pointless.

If your team wants to use the competition data for text analysis, then please contact us after completion of the contest. Until that time, we sincerely invite your team to participate :-).

Best regards,
Andrzej Janusz
NosferatoCorp
 
Posts: 15
Joined: Tue Jul 13, 2010 5:44 pm

Re: Question of PMID and MESHID release

Postby rick » Thu Jan 05, 2012 5:19 am

Hi Andrzej,

Many thanks for inviting us~ That sounds great.
Our team still want to join this competition. We agree your consideration.
To look the MESH term annotation upon a specific classification issue is suitable for all machine learning and data mining groups. We look forward to join this competition. Thanks for your response again.

Rick
rick
 
Posts: 6
Joined: Wed Jan 04, 2012 5:29 pm

Re: Question of PMID and MESHID release

Postby rick » Thu Jan 05, 2012 5:48 pm

Hi organizers,

I have a question about the values in training and test data.

In Data format description, you mentioned :
in the consecutive columns, it contains integers ranging from 0 to 1000, expressing association strengths to corresponding MeSH terms.

Does it mean that those values have some relationship with Mesh terms and higher value represents higher relevance?

Best,
Rick
rick
 
Posts: 6
Joined: Wed Jan 04, 2012 5:29 pm

Re: Question of PMID and MESHID release

Postby NosferatoCorp » Thu Jan 05, 2012 6:58 pm

rick wrote:Does it mean that those values have some relationship with Mesh terms and higher value represents higher relevance?


Yes it does :-). All attributes in this data set correspond to some MeSH terms. They measure relevance of a term in a column to a document in a row of the data table. Those values were assigned by our tagging algorithm – higher value means higher relevance. More details can be found in the task description.

Best,
Andrzej Janusz
NosferatoCorp
 
Posts: 15
Joined: Tue Jul 13, 2010 5:44 pm

Re: Question of PMID and MESHID release

Postby rick » Thu Jan 05, 2012 8:25 pm

Hi organizers,

Thanks for your patience for answering my question.
I have an additional question which may not relate to this competition.
The question is - how did you choose the 83 topics( MesH Subheading) from a huge MesH ontology?

Best,
Rick
rick
 
Posts: 6
Joined: Wed Jan 04, 2012 5:29 pm

Re: Question of PMID and MESHID release

Postby NosferatoCorp » Thu Jan 05, 2012 9:35 pm

Dear Rick,

The MeSH ontology comprises a total of 83 subheadings (qualifiers). The 2011 MeSH release also contains 26,142 headings (also called terms, descriptors or just concepts). In our data set, they correspond to the attributes. Actually, in the competition data there are only 25,640 columns, since information available in MeSH for the remaining 502 headings was insufficient for our tagging method. The remaining part of MeSH is called Supplementary Concept Records and it mostly contains information about specific chemical compounds, which is not in the scope of this competition.

Best regards,
Andrzej Janusz
NosferatoCorp
 
Posts: 15
Joined: Tue Jul 13, 2010 5:44 pm

Re: Question of PMID and MESHID release

Postby rick » Fri Jan 06, 2012 3:50 pm

Dear Andrzej,

Thanks for your introduction of the relationship between Mesh ontology and evaluation data. I am clearly understand now.

Best,
Rick
rick
 
Posts: 6
Joined: Wed Jan 04, 2012 5:29 pm


Return to JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers

Who is online

Users browsing this forum: No registered users and 1 guest

cron