Abstract:
Breast cancer is the most common cancer among women in Sri Lanka, with significant mortality rates. Over the years, there has been a significant focus on developing more efficient, convenient screening methods and identifying diagnostically sensitive, non-invasive and minimally invasive biomarkers. DNA and microRNAs are some of the most promising biomarkers currently being implemented, with Biglycan emerging as a potential biomarker. Biglycan is a small leucine-rich extracellular proteoglycan identified to be associated with the aggressiveness of cancers. This study presents a novel approach combining Convolutional Neural Networks and the exploration of Biglycan as a potential biomarker for breast cancer prediction using the Biglycan breast cancer dataset. The dataset consists of histological images of cancerous (n = 203) and non-cancerous (n = 133) breast tissue, with the expression of the Biglycan biomarker. The class imbalance of the dataset was handled using several data augmentation techniques. The study utilized a CNN model architecture with two fully connected layers to reduce the risk of overfitting of the model due to the relatively small dataset size. The model training process employed a 70:30 train:validation split and RMSprop optimizer with 40 epochs and achieved a training and validation accuracy of 61% and 60%, respectively. The cancerous and non-cancerous images were classified with a precision of 0.70 and 0.65 and a recall of 0.80 and 0.70, respectively. Hence, additional model fine-tuning techniques and further validation using data representing diverse populations are required to assess the potential of using Biglycan as a biomarker for breast cancer prediction.