Solution to overcome the sparsity issue of annotated data in medical domain
Appan K. Pujitha* and Jayanthi Sivaswamy
Appan K. Pujitha, Jayanthi Sivaswamy: Center for Visual Information Technology, IIIT Hyderabad, Hyderabad, India
Appan K. Pujitha: email@example.com
Under the Creative Commons Attribution-NonCommercial License
Open Access funded by Chongqing University of Technology
Received 17/07/2018, Accepted 19/07/2018, Published 30/07/2018
Annotations are critical for machine learning and developing computer aided diagnosis (CAD) algorithms. Good performance of CAD is critical to their adoption, which generally rely on training with a wide variety of annotated data. However, a vast amount of medical data is either unlabeled or annotated only at the image-level. This poses a problem for exploring data driven approaches like deep learning for CAD. In this paper, we propose a novel crowdsourcing and synthetic image generation for training deep neural net-based lesion detection. The noisy nature of crowdsourced annotations is overcome by assigning a reliability factor for crowd subjects based on their performance and requiring region of interest markings from the crowd. A generative adversarial network-based solution is proposed to generate synthetic images with lesions to control the overall severity level of the disease. We demonstrate the reliability of the crowdsourced annotations and synthetic images by presenting a solution for training the deep neural network (DNN) with data drawn from a heterogeneous mixture of annotations. Experimental results obtained for hard exudate detection from retinal images show that training with refined crowdsourced data/synthetic images is effective as detection performance in terms of sensitivity improves by 25%/27% over training with just expert-markings.