Page 90 - sjsi
P. 90

Research Article: Alsibai & Heydari                                                             90


            research  on  medical  image  analysis  using         and  curved  lines  that  construct  images)  are
            ImageNet”   (19)   discusses  this  very  topic.  After   universal to most of the image analysis tasks  (21) .
            inspecting tens of research papers and studies        Therefore,   transferred   parameters     (i.e.,
            that  utilize  ImageNet  models  to  train  on        weights)  may  serve  as  a  powerful  set  of
            medical  datasets,  Author  proves  that  transfer    features,  which  reduce  the  need  for  a  large
            learning of ImageNet models is a viable option        dataset as well as the training time and memory
            to train on medical datasets. The idea behind         cost  (21) . The structure of DenseNet is shown in
            transfer  learning  is  that  although  medical       figure 4:
            datasets  are  different  from  non-medical
            datasets,  the  low-level  features  (e.g.,  straight













             Figure 4: An ultrasound image with size (224, 224, 1) as an input to the DenseNet model using its weights and architecture
              to make a prediction.
                                                                  (range  01)  using  the  sigmoid  function:
            Model Fine-Tuning                                     σ(x)=1/(1+e^(-x) ) where x is the model logit.
            Transfer learning can be utilized to import the       After the logit is converted into a probability, it
            DenseNet201 model and fine-tune it to adjust it       is converted into label (0 or 1) by rounding the
            to according to the dataset. This fine-tuning in      probability  into  either  0  or  1  to  indicate  the
            this study involves 2 stages:                         prediction of the model.
            1.  Adjusting  the  very  first  layer  to  make  it   After fine-tuning the model, the total number of
            accept grayscale images that are composed of 1        parameters in the model is 18,088,577 in which
            color channel as opposed to 3 color channels that     1,921  is  trainable,  and  18,086,656  is  non-
            the ImageNet dataset consists of which is used        trainable (frozen).
            to  train  the  DenseNet201  model  on.  This  is  a
            better method than expanding a single channel to
            channels because it requires additional resources     Picking Loss Function and Optimizer
            to store and process the additional channels that     When  working  on  binary  classification  in
            don’t provide any new information.                    PyTorch, the most common loss function to use
            2.  ImageNet  dataset  consists  of  1000  classes.   is binary cross-entropy loss, also known as log
            Therefore,  the  DenseNet201  model  also  has        loss. This loss function is appropriate for binary
            1000    corresponding    outputs,   1   output        classification problems where the output of the
            probability for each class. This dataset consists     model  is  a  probability,  and  the  goal  is  to
            of  2  classes  only.  Thus,  the  very  last  layer  is   minimize the difference between the predicted
            adjusted to output 1 probability only which will      probabilities  and  the  true  labels.  The  loss  is
            be rounded to either 0 if the value is below 0.5
            which indicates that the image is `infected` or 1     calculated as:
            if the value is above 0.5 which signifies that the         loss = -(y * log(p) + (1 - y) * log(1 - p))
            image is `not infected`. The raw output of the        As  for  the  optimizer,  the  two  most  common
            model  i.e.,  the  logit  which  is  the  final       optimizers  used  are  Adam  and  Stochastic
            unnormalized  score  of  the  model  (unbounded       Gradient  Descent  (SGD).  The  latter  was  the
            real  number)  is  converted  into  a  probability    choice for this project as this paper  (22)  mentions

                      SJSI – 2023: VOLUME 1-1
   85   86   87   88   89   90   91   92   93   94   95