The dataset for this challenge was obtained by carefully annotating tissue images of several patients with tumors of different organs and who were diagnosed at multiple hospitals. This dataset was created by downloading H&E stained tissue images captured at 40x magnification from TCGA archive. H&E staining is a routine protocol to enhance the contrast of a tissue section and is commonly used for tumor assessment (grading, staging, etc.). Given the diversity of nuclei appearances across multiple organs and patients, and the richness of staining protocols adopted at multiple hospitals, the training datatset will enable the development of robust and generalizable nuclei segmentation techniques that will work right out of the box.
The challenge data is released under the creative commons license (CC BY-NC-SA 4.0).
Training data containing 30 images and around 22,000 nuclear boundary annotations has been released to the public previously as a dataset article in IEEE Transactions on Medical imaging in 2017. The details can be found in the related publication:
- N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Vahadane and A. Sethi, "A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology," in IEEE Transactions on Medical Imaging, vol. 36, no. 7, pp. 1550-1560, July 2017 [code]
- The dataset (images and annotations) can be downloaded using the following link- MoNuSeg 2018 Training data
- Supplementary document containing organ information is availabe at the following link- Training data organ information
- MATLAB Code for reading in xml annotations can be downloaded at the following link- HE_to_binary_nary_masks.m
- No external data should be used for training, it is the violation of the spirit and rules of challenge.
Test set images with additional 7000 nuclear boundary annotations are available here MoNuSeg 2018 Testing data. Please cite the following papers if you use the training and testing datasets of this challenge:
N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Vahadane and A. Sethi, "A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology," in IEEE Transactions on Medical Imaging, vol. 36, no. 7, pp. 1550-1560, July 2017 [Code]
Instructions given to the participants
Five weeks before the challenge, a test set of 14 new images will be released only to the participating teams who have submitted their methodology manuscripts before the deadline, upon assigning the agreements. The format of the new cases will be exactly the same as the training data (without annotations).
A few other nuclei segmentation datasets can be downloaded from the following links:
Janowczyk and Madabhushi, JPI 2016 - http://www.andrewjanowczyk.com/use-case-1-nuclei-segmentation/
Weinart et. al., Scientific Reports 2012 - https://www.nature.com/articles/srep00503
Naylor et. al., IEEE TMI 2019 - https://zenodo.org/record/1175282#.WyP61xy-l5E
Irshad et. al., PSB 2015 - https://becklab.hms.harvard.edu/software/psb-crowdsourced-nuclei-annotation-data-1
Gelasca et. al., BMC Bioinformatics 2009 - https://bioimage.ucsb.edu/research/bio-segmentation
Please cite the respective papers if you use any of these datasets in your work.