Page 94 - sjsi
P. 94
Research Article: Alsibai & Heydari 94
findings from the datasets. The DenseNet201
model not performing well on dataset B could
As figure 9 shows, the train accuracy increases be due to a variety of reasons such as the
steadily until it reaches 83.33% while the test complexity of the ultrasound images or the
accuracy remains relatively unchanged at relatively small number of data points available
62.92%. The same can be observed for train and in the original dataset. To address this, the
test loss where the train loss steadily decreases entire DenseNet201 model can be trained
but not for the test loss. This is a clear indication rather than freezing the feature extractor to
of overfitting and the inability of the model to perhaps produce better accuracy results on the
generalize well on unseen data. The poor model test data of dataset B as training only the
performance is also confirmed in the confusion classifier seems to be not sufficient for this task.
matrix where the number of true positives and Also, experimenting with different learning
true negatives is unsatisfactory. rates and optimizers could yield more
Table 1 exhibits the precision, recall and F1- satisfactory results. However, the accuracy and
score for the infected data points in both reliability of the model's predictions depend
datasets: heavily on the quality of the data used to train
it. If the data is flawed or biased, the model will
Table 1: Precision, recall, and F1-score for the likely produce inaccurate or unreliable results
infected data points in both datasets even if the results appear to be satisfactory. In
Dataset A Dataset B the medical and health field, this can have
Precision (%) 99.57 55.47 serious consequences as it can lead to incorrect
Recall (%) 100.0 36.98 diagnoses or treatment recommendations,
potentially causing harm to patients. Therefore,
F1-score (%) 99.78 44.38
it is essential to ensure that the data used to
Further inspection on the dataset A and after train these models is of highest quality and
represents accurately the population intended
consulting a professional specialist in this
to serve. This includes ensuring that the data is
medical field, it turned out that the dataset is
highly erroneous and misleading. The free from errors, and represents the target
population, and has been collected using
`notinfected` class which is supposed to
appropriate methods. Ensuring data quality is
represent the healthy ovaries having no sign of
PCOS are in fact not images of ovaries at all. an ongoing process that requires continuous
monitoring and improvement.
Rather, they are ultrasound images of uterus
which completely falsify this dataset.
References
1. HF. EM. Polycystic ovary syndrome: definition,
Conclusion
aetiology, diagnosis and treatment. Nat Rev
Two experiments were conducted on two Endocrinol. 2018; 14(5):270–84.
different datasets as called in this paper: 2. Farkas J. Rigo A, Demitrovic Z. Psychological Aspects
Dataset A and dataset B. Dataset A gave much of the Polycystic Ovary Syndrome. Gynecol
Endocrinol. 2013; 30(2).
better results but, it turned out that the dataset
3. Louwers YV, Laven JSE. Characteristics of polycystic
is highly erroneous and misleading. Therefore,
ovary syndrome throughout life. Therapeutic
data quality is of the utmost importance when Advances in Reproductive Health. 2020; 14.
training deep learning models, especially in the 4. Adams Jea. Prevalence of polycystic ovaries in
medical and health fields. The results of this women with anovulation and idiopathic hirsutism.
British Medical Journal. 1986 Aug 9; 293,6543 (1986):
study, show the ability of CNN and deep
355-9.
learning models in detecting the suspicious
SJSI – 2023: VOLUME 1-1