Page 89 - sjsi
P. 89

Research Article: Alsibai & Heydari                                                             89


            corresponding  image  dimension.  The  pixel          Model Description
            values of the image are also normalized to the        After  pre-processing  the  data,  a  classification
            range [0,1] when converted to tensors.                model is needed. Reviewing related work,  most
            3.  Normalize  the  images:  It  is  common  to       of research done in the medical field that uses
            normalize the images by  subtracting the mean         deep learning utilizes transfer learning to train
            and  dividing  by  the  standard  deviation  of  the   on  the  data  through  a  pre-trained  model.  In
            image pixels. This helps to center the data and       fact, the conclusion of paper  (17)  mentions that
            improve the model's ability to learn from it. The     transfer learning is the best available option for
            mean  and  standard  deviation  values  for  each
            color channel are:                                    training even  if  the  data  model  is pre-trained
            mean = [0.485, 0.456, 0.406] std = [0.229, 0.224,     on, it is weakly related to data in hand.
            0.225]                                                Thus,  in  order  to  train  on  this  dataset,  the
            This  is  done  due  to  it  being  mentioned  as  a   DenseNet  architecture    (18)   was  chosen.
            necessary pre-process in the documentation   (16)     DenseNet  is  one  of  the  leading  architectures
            of DenseNet201.                                       used  in  ultrasonography  analysis  when
            4.  Transforming  all  ultrasound  images  to         conducting transfer learning as table 3 in this
            grayscale as this will eliminate the unnecessary      paper  (19)  shows. DenseNet is a variation of the
            color  channels  if  they  exist  and  the  resultant   traditional  CNN  architecture  that  uses  dense
            image will have only 1 color channel due to it        blocks, where each layer in a block is connected
            being grayscale.                                      to every other layer in the block. This allows the
            5.  The images and their corresponding labels are     network to pass information from earlier layers
            turned into batches where each batch consists of      to  later  layers  more  efficiently,  alleviate  the
            32 images. This is very beneficial for a variety      problem  of  vanishing  gradients  in  deeper
            of reasons:                                           networks,  and  can  improve  performance  on
              Memory constraints: Training a deep learning       tasks  such  as  image  classification.  There  are
            model  on  a  large  dataset  can  require  a  lot  of   several variations of the DenseNet architecture,
            memory.  Grouping  the  images  into  batches
            allows the model to train on a subset of the data     such  as  DenseNet121,  DenseNet201,  and
            at a time, which can be more memory efficient.        DenseNet264.  Each  variation  refers  to  the
              Computational  efficiency:  Processing  the        number of layers it consists of. DenseNet201 is
            images in batches can be more computationally         the variation chosen for this project as it has a
            efficient, as it allows the model to make better      moderate  number  of  layers  and  it  is  not  too
            use  of  the  parallel  processing  capabilities  of   computationally  demanding.  DenseNet  and
            modern hardware such as GPUs.                         many other pre-trained architectures are pre-
              Stochastic gradient descent: When training a       trained on the ImageNet dataset which is long-
            model  using  stochastic  gradient  descent,  it  is   standing  landmark  in  computer  vision   (20) .  It
            common to update the model's weights using the        consists  of  1,281,167  training  images,  50,000
            gradients  computed  on  a  small  batch  of  data    validation  images,  and  100,000  test  images
            rather  than  the  entire  dataset.  This  is  because   belonging to 1000 classes. These images consist
            computing the gradients on the full dataset can       of a wide variety of scenes, covering a diverse
            be  computationally  expensive,  and  using  a        range  of  categories  such  as  animals,  vehicles,
            smaller  batch  of  data  can  provide  a  good       household  objects,  food,  clothing,  musical
            approximation.                                        instruments,  body  parts,  and  much  more.  A
              Regularization: Using small batches of data        question might arise from this information, why
            can  also  introduce  noise  into  the  learning      would  a  model  trained  on  a  such  unrelated
            process,  which  can  act  as  a  form  of            dataset  might  be  used  for  training  on  and
            regularization  and  help  the  model  generalize     predicting  medical  images?  In  fact,  the  paper
            better to new data.
                                                                  titled  “A  scoping  review  of  transfer  learning

                      SJSI – 2023: VOLUME 1-1
   84   85   86   87   88   89   90   91   92   93   94