MGH Capstone

Fighting Cancer with a Web-App


Machine Learning


gif gif gif gif gif gif gif gif

Does the shape of a nodule annotation itself reveal something about malignancy?

Spoilers: The malignancy scores (in order) are 5, 5, 4, 4, 2, 3, 3

SAKE is an annotation framework aimed at making labeling nodules easier for radiologists. At the very heart of this software is the belief that accurate and standardized annotations are very important for precise algorithms. We aim to provide analytics behind this claim by investigating how annotation shapes affect predicted malignancy.

In our analysis, we use convolutional neural networks to detect features about shape and structure in nodules annotations.


We are isolating the effect of annotation shape alone. There are clearly other factors at play when determining malignancy which we do not account for in this network:


We use data from the National Cancer Institute (NCI) that includes a 1-5 malignancy rating on DICOM images. For our training and validation purposes, we chose a subset of 1202 annotations. Please click on this link for more details.


Our ML contribution consists of two parts:

ML Pipeline


ShapeNet: a shape-based ConvNet for Malignancy Prediction

How should we model malignancy?

Input Generated Masks

The input volume is a 10x128x128 binary mask of the segmentation polygon. Below are samples animated in the z-direction.

gif gif gif gif gif gif gif gif

How do we build the architecture?

We build our CNN with the following architecture:


We perform two convolutions in a row before shrinking the image by half for a total of 8 layers. In our flattened layers, we add a dropout of 50% to allow for generalization. Finally, we use ReLU activation on our final layer to accommodate the fact that we are performing a regression.

While parameter selection and tuning can turn into an infinitely time consuming and complex task, we selected several key meta-paramters to focus on.

The results of parameter tuning are displayed below (full results in next section):

Key takeaways:

normal => flipped

How do we evaluate this model?

We trained and validated our model using a 70/30 split on 1202 total nodule annotations. We decide to measure loss using mean squared error since this is a regression problem. However, since we are using ReLU, the maximum prediction can be unbounded past 1, so we introduce a slightly modified MSE that clips values greater than 1 to be just 1.

The following results are reported with respect to our validation data only.

First-layer Kernels

Loss across Epochs

Finally, to evaluate results, we compare both MSE and residual plots when testing on our validation data.

Residual Plot of ShapeNet and Random Noise

Since an explicit formulation of R^2 not available, we chose a rough proxy by comparing the variance of residuals. In effect, we are measuring the spread of the errors. The variance reported is 0.1373 for random predictions and 0.03677 for our model. Note that the red lines indicate the 95th percentile of values.


We aim to investigate how much affect the shape of an annotation has on the probability of malignancy of a proposed region. Using a CNN optimized for performance and speed, we modeled malignancy based off extracted features of the contours and edges in 3D annotations. Our model produces a mean squared error of 0.036774, which is 3.79 times better than random predictions. In addition, the variance of residuals for our model is 0.036771, which is 3.73 times better than random predictions. Another way to interpret these results is to look at the mean absolute error of 0.15809, which indicates that on average, the difference between the predicted malignancy probability is about 16% compared to the actual malignancy.

Discussion and Future Work

Our results suggest that seeing shape alone can play a significant role in determining the malignancy of a nodule. We believe this information can be very useful in helping radiologists validate the semi-automated segmentations produced by SAKE. Our hope is that ShapeNet and the SAKE ML pipeline can serve as a benchmark for future work in assisting high-quality annotations. In particular, we believe that incorporating image-wise binary classification can be a useful “double-check” mechanism for radiologists.