Background: Bladder cancer (BC) is a life-threatening malignancy that can be successfully treated if diagnosed in its early stages. Machine learning techniques, by using large biological databases, are suggested as important approaches for identifying accurate diagnostic biomarkers. The present study aimed to introduce a simple and accurate model for the diagnosis of BC.
Methods: RNA-sequencing information of 412 primary bladder tumors versus 19 normal bladder tissues from The Cancer Genome Atlas were analyzed using the TCGAbiolinks R package to identify differentially expressed genes (DEGs). Gene ontology properties and the corresponding pathways of DEGs were investigated using the online ShinyGO tools. To develop a diagnostic model for BC, two binary classifier machine learning algorithms, C5.0 and CHAID, were employed in three subgroups of train, test, and validation sets using SPSS Modeler version 18.1. Their efficacy was evaluated using performance measures for binary classification.
Results: Most of the identified DEGs were associated with microtubule organization, coagulation, and myelination. Based on the constructed models, four important RNAs (Tubulin Polymerization-Promoting Protein: ENSG00000171368, Proteolipid Protein-1: ENSG00000123560, RP11-473E2: ENSG00000228877, and Coagulation Factor X: ENSG00000126218) were identified as important classifiers for diagnosis in both C5.0 and CHAID models. The CHAID model demonstrated superior performance in the testing dataset, achieving an accuracy of 98.75%, an F1-score of 99.36%, and an AUC of 99.4%.
Conclusion: According to the results, machine learning algorithms are beneficial for the diagnosis of BC and potentially useful for improving personalized medicine in BC patients. The developed model may serve as a non-invasive, data-driven tool to support early diagnosis and personalized treatment planning in clinical settings. Further evaluation using laboratory tests is suggested to validate the obtained results.