Page 90 - sjsi
P. 90
Research Article: Alsibai & Heydari 90
research on medical image analysis using and curved lines that construct images) are
ImageNet” (19) discusses this very topic. After universal to most of the image analysis tasks (21) .
inspecting tens of research papers and studies Therefore, transferred parameters (i.e.,
that utilize ImageNet models to train on weights) may serve as a powerful set of
medical datasets, Author proves that transfer features, which reduce the need for a large
learning of ImageNet models is a viable option dataset as well as the training time and memory
to train on medical datasets. The idea behind cost (21) . The structure of DenseNet is shown in
transfer learning is that although medical figure 4:
datasets are different from non-medical
datasets, the low-level features (e.g., straight
Figure 4: An ultrasound image with size (224, 224, 1) as an input to the DenseNet model using its weights and architecture
to make a prediction.
(range 01) using the sigmoid function:
Model Fine-Tuning σ(x)=1/(1+e^(-x) ) where x is the model logit.
Transfer learning can be utilized to import the After the logit is converted into a probability, it
DenseNet201 model and fine-tune it to adjust it is converted into label (0 or 1) by rounding the
to according to the dataset. This fine-tuning in probability into either 0 or 1 to indicate the
this study involves 2 stages: prediction of the model.
1. Adjusting the very first layer to make it After fine-tuning the model, the total number of
accept grayscale images that are composed of 1 parameters in the model is 18,088,577 in which
color channel as opposed to 3 color channels that 1,921 is trainable, and 18,086,656 is non-
the ImageNet dataset consists of which is used trainable (frozen).
to train the DenseNet201 model on. This is a
better method than expanding a single channel to
channels because it requires additional resources Picking Loss Function and Optimizer
to store and process the additional channels that When working on binary classification in
don’t provide any new information. PyTorch, the most common loss function to use
2. ImageNet dataset consists of 1000 classes. is binary cross-entropy loss, also known as log
Therefore, the DenseNet201 model also has loss. This loss function is appropriate for binary
1000 corresponding outputs, 1 output classification problems where the output of the
probability for each class. This dataset consists model is a probability, and the goal is to
of 2 classes only. Thus, the very last layer is minimize the difference between the predicted
adjusted to output 1 probability only which will probabilities and the true labels. The loss is
be rounded to either 0 if the value is below 0.5
which indicates that the image is `infected` or 1 calculated as:
if the value is above 0.5 which signifies that the loss = -(y * log(p) + (1 - y) * log(1 - p))
image is `not infected`. The raw output of the As for the optimizer, the two most common
model i.e., the logit which is the final optimizers used are Adam and Stochastic
unnormalized score of the model (unbounded Gradient Descent (SGD). The latter was the
real number) is converted into a probability choice for this project as this paper (22) mentions
SJSI – 2023: VOLUME 1-1