A research team at DGIST (led by President Kuk Yang) was led by Professor Park Sang-Hyun of the Department of Robotics and Mechatronics Engineering to create a weakly supervised deep learning model that can accurately depict the presence and location of cancer in pathological images based only on data where the cancer is present. To identify the cancer site, existing deep learning models required a dataset that displayed the precise location of the cancer.
The location of the cancer site must typically be accurately marked in order to resolve the zoning issues that indicate the location information of cancer, which takes a lot of time and consequently increases cost.
The weakly supervised learning model, which classifies cancer sites based on illustrative information such as “whether the cancer in the image is present or not,” could provide a solution to this problem. On a large dataset of pathological images, where each image can be several gigabytes in size, the current weakly supervised learning model would perform noticeably worse. Researchers attempted to improve performance by splitting the pathological image into patches because there is a limit to how much information can be used. The divided patches, on the other hand, lose the correlation between the location information and each split data.
In rebuttal, Professor Park Sang-research Hyun’s team devised a method for narrowing down to the cancer site by focusing only on the learned data indicating the presence of cancer by slide. The team developed a pathological image compression technology that first trains the network to successfully extract important features from the patches using unsupervised contrastive learning in order to reduce the size of the image while maintaining the correlation between the patches. The network is then used to detect the main features while keeping data for each location. Later, using a pixel correlation module and a class activation map, the team created a model that zones all of those regions across all of the pathology images to identify regions in compressed pathology images that are highly likely to have cancer (PCM).
With learning data that included slide-level cancer labels in the cancer zoning problem, the newly created deep learning model only achieved a dice similarity coefficient (DSC) score of up to 81-84. It significantly outperformed other weakly supervised learning techniques or previously proposed patch level techniques.