Incorrect evaluation for 2008 Obesity Challenge?

Reading the way labels are being handled in the [training script](https://github.com/AndriyMulyar/bert_document_classification/blob/572883204cb1aca50d346979319905f698ad7049/examples/ml4health_2019_replication/predict_n2c2_2008.py) and [pred script](https://github.com/AndriyMulyar/bert_document_classification/blob/572883204cb1aca50d346979319905f698ad7049/examples/ml4health_2019_replication/predict_n2c2_2008.py), for each label (i.e. "Obesity" and the co-morbidities) the classes simply converted to binary (`if intuitive[name] is not None and intuitive[name] == 'Y': label[idx] = 1`). More importantly, the evaluation seems to have been conducted under the (multi-label) binary classification setting. 

Is this correct or have I missed something important here? If so, why would it be comparable to the results of the original challenge?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect evaluation for 2008 Obesity Challenge? #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Incorrect evaluation for 2008 Obesity Challenge? #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions