Department of Urology, Kunming Medical University Second Hospital, NO. 374 Dianmian Avenue, Wuhua District, Kunming, China.
*Corresponding Author: Jianhe Liu
Department of Urology, Kunming Medical University Second Hospital, NO. 374 Dianmian Avenue, Wuhua District, Kunming, China.
Email: 972306000@qq.com
Received : Mar 18, 2023
Accepted : Apr 21, 2023
Published : Apr 28, 2023
Archived : www.jclinmedimages.orgs
Copyright : © Liu J (2023).
Objective: The aim of this study is to develop a prediction model for upper urinary tract infective stones by a machine learning approach at to provide a basis for decision making for the assisted diagnosis and personalized treatment of upper urinary tract infective stones in vivo.
Methods: Preoperative CT plain images and other clinical data of 780 patients with upper urinary tract stones, including 165 infected stones and 615 noninfected stones, whose stone composition was determined by infrared spectroscopy at the Second Affiliated Hospital of Kunming Medical University from January 2016 to December 2021 were retrospectively analyzed. The CT plain scan images of stones were manually segmented, and imaging histology (Rad) features were extracted, while Deep Transfer Learning (DTL) features of CT images of stones were extracted using the pretrained ResNet34 algorithm. At test, Spearman rank correlation test and minimum absolute shrinkage and selection operator regression were used for the fusion features of clinical, Rad and DTL features and images Rad- DTL features for feature selection, and then machine learning classification models such as Support Vector Machine (SVM), K-Nearest Neighbor (KNN), random decision forest (RF) and XGBoost were trained to build classification models separately and compare their performance by determining the Area Under The Curve (AUC). The accuracy of the three models, Rad, DTL and Rad-DTL, as well as the AUC, were compared, and the model with the best performance was selected. A nomogram plot was created by combining the classification models built from the clinical data.
Results: Imageomics and deep transfer learning extracted 1218 and 512 image features, respectively, and clinical features screened 7 clinical risk factors with P < 0.05. The accuracy and AUC of the Rad feature model were 82.1% and 0.763, respectively, the accuracy and AUC of the DTL feature model were 80.7% and 0.806, respectively, and the accuracy and AUC of the Rad-DTL feature model. The accuracy and AUC of the classification model built with clinical features were 80.8% and 0.630, respectively. By comparison, the optimal model for image features was Rad-DTL, and the AUC of the nomogram map built with clinical features was 0.917 (95% CI: 0.850-0.985).
Conclusion: The upper urinary tract infected stone prediction model established by fusing imaging histology, deep transfer learning features, and clinical features can successfully predict infected and noninfected stones in vivo before surgery, and the established nomogram map has good clinical utility.
Keywords: Infected stones; Radiomics; Transfer learning; Machine learning; Predictive models.
Yuan Zhou, Yujie Zhang and Guiming Zhou contributed equally to this work and should be considered co-first authors.
Urolithiasis is a common disease of the urinary system, and according to statistics, more than 1%-15% of the world’s population has a history of urolithiasis [1]. The prevalence of kidney stones in domestic adults is approximately 5.88% [2]. The main components of urinary stones are calcium oxalate, calcium phosphate, uric acid, magnesium ammonium phosphate, and cysteine [3]. Infected stones account for approximately 15% of urinary stone disease and are therefore an important group [4]. Infected stones have a higher rate of recurrence and loss of kidney function and often carry a high risk. Infected stones can cause severe intraoperative and postoperative urogenital sepsis, often leading to patient death due to infectious shock. Therefore, preoperative prediction of the composition of infected stones is of great significance. However, accurate detection of stone composition can only be performed in vitro and usually requires the use of Fourier Transform Infrared Spectroscopy (FTIR), X-ray diffraction [5,6] or polarized light microscopy to analyze the stone composition of the removed stone fragments after the stone specimens are obtained. Therefore, preoperative prediction of stone composition in the body by a rapid and noninvasive method is important for the treatment of urolithiasis and prevention of recurrence.
Nonenhanced CT is the gold standard for the diagnosis of urolithiasis, and CT image features have great value in predicting stone composition [3,5,6]. It has been shown that CT stone attenuation values in Hounsfield units can be used to predict stone composition [7,8], with less satisfactory results. Dual-energy CT has also been used to predict stone composition, but several studies have shown that dual-energy CT is only highly accurate in differentiating uric acid stones from nonuric acid stones but fails to effectively differentiate nonuric acid stones, such as calcium stones, cystine stones, or infected stones [9,10]. Imaging histology is a specific research approach of artificial intelligence in the medical field that can mine massive quantitative image features from medical images and use statistical/machine learning methods to filter the most valuable imaging features for parsing clinical information [11]. In recent years, deep Convolutional Neural Networks (CNNs) have made significant achievements in the field of computer vision with similar functions in medical imaging [12-14]. Successful implementation of the above methods in medical imaging requires a sufficient number of training cohorts. However, acquiring a large number of medical images is difficult [15]. Pretrained CNNs, called “Transfer Learning (TL)”, have been increasingly used in various medical image analysis fields in recent years [16,17]. TL improves model performance in the target task by transferring previously learned features from the source task.
The purpose of this study is to compare the performance of imaging histology features and deep transfer learning features in building an infectious stone prediction model and to try to fuse the two features to build a fusion model with better performance for in vivo upper urinary tract infectious stone component prediction and personalized treatment.
Patients
The case data for this study were extracted from the XX Hospital Urolithiasis Specialized Database. The preoperative CT plain images and other clinical data of 1780 patients with upper urinary tract stones whose stone composition was determined by infrared spectroscopy from 01-2016 to 12-2021 in the urinary tract stone specialization database were retrospectively analyzed. The nadir criteria and case screening process are shown in Supplementary Figure 1.
The CT images in this study were all nonenhanced urological stone CT images of patients before surgery, and the CT image acquisition settings are shown in Supplementary Table 1. All CT images were desensitized, all CT images were reviewed by an experienced radiologist and a urological clinician, and the CT image data were recorded while processing the images. According to the literature [18,19], the predictive factors included the maximum cross-sectional area of stones, stone location (renal/ureteral), stone site (left side/right side/bilateral), number of stones (single/multiple), and CT values of stones, with any disagreement resolved by negotiation. Other clinical features included sex, age, hypertension, diabetes mellitus, urine culture, urine white blood cell count, urine nitrite, C-reactive protein, calcitoninogen, interleukin-6, urine pH, triglycerides, and blood white blood cell count. Stone composition was analyzed by FTIR, and if the stone composition exceeded 50%, the composition was considered to be major. Through inclusion and exclusion criteria, 780 patients were finally enrolled, including 165 with infected stones and 615 with noninfected stones.
Image preprocessing and stone segmentation.
The preoperative nonenhanced CTs of all enrolled patients were dicom files, and for possible errors caused by different layer thicknesses of different machines, all enrolled CT images were normalized and resampled, and the Region of Interest (ROI) was semiautomatically segmented using the threshold segmentation method with MITK (v2021.02), i.e., The region of interest (ROI) is the area where the urinary stones are located. The stones were segmented layer by layer to ensure that the stones in each layer were segmented, as shown in Figure 1, and the window widths were manually set to facilitate the differentiation of stones and surrounding tissues for more accurate VOI outlining.
Feature extraction
The PyRadiomics platform (version 3.0) based on python (version 3.6) was used to extract image histology features from each volume of interest. We named Rad features and adjusted the built-in parameters of the PyRadiomics platform to set the image type and the type of extracted features. The image types are Original, Laplacian of Gaussian (LoG) and Wavelet, and the feature types are shape, firstorder and texture features.
In this study, we trained a TL learning network using a MedicalNet-based pretrained ResNet34 network to overcome the overfitting problem suffered by conventional deep learning due to insufficient training data. The steps are as follows: ROI 3D images are fed to the pretrained network; the average probability from all ROI images is used to generate TL features; and the penultimate FC layer output is used as TL features [20]. Based on these pretrained deep learning network extracted features, we call them DTL features. The DTL feature extraction process is shown in Figure 2.
Feature fusion
To improve the accuracy of predicting infective stones in upper urinary tract stones, we attempted to fuse imaging histology features and deep transfer learning features. The fusion scheme combines various features for subsequent analysis. The fused features are named "Rad-DTL" features.
Feature selection
Intra- and intergroup correlation coefficients (ICCs) were used to assess the consistency of radiomics features. CT images of 50 enrolled patients were randomly selected, and ROIs were simultaneously labeled by one radiologist and one urology clinician to calculate intergroup correlation coefficients. Ten days later, CT images of these 50 enrolled patients were again labeled by this radiologist with the ROIs labeled 10 days earlier. ICC calculations were performed, and features with ICC >0.75 were considered consistent, reproducible and stable [21] and retained for subsequent analysis.
For the three features of Rad, DTL and Rad-DTL, we used a three-step feature selection method to select the best features to distinguish infected stones from noninfected stones. First, Mann‒Whitney Utest statistical tests and feature screening were performed for all Rad features. Only Rad features with a P value < 0.05 were retained. Then, we used Spearman's correlation coefficient to assess the linear correlation between individual features with redundancy elimination [22]. Once two features have a stronger correlation, they will have a higher absolute value of the correlation coefficient. When the Spearman correlation coefficient between each feature was >0.9, we selected one of the features for subsequent analysis. Finally, feature selection was performed using least absolute shrinkage and selection operator (LASSO) regression, using nonzero coefficients as valuable predictors in each feature group [23]. In addition, clinical features were screened using a t test, and P values were calculated for each clinical feature with significance at P < 0.05.
Model construction
After feature selection, the dataset was divided by 10-fold cross-validation for subsequent modeling, and we used the Python machine learning algorithm library to develop machine learning classification models for each feature. The different performances of machine learning classification models such as Support Vector Machine (SVM), K-Nearest Neighbor (KNN), random decision forest (RF), and XGBoost are compared. The discriminative power of the models is assessed by comparing their area under the curve (AUC) and accuracy line graphs. The process of model building for the three features is shown in Figure 3.
In this study, the clinical feature data partitioning followed the 10-fold cross-validated dataset partitioning of image features, and the best model screened by CT image feature modeling above was used for clinical baseline data modeling.
Nomogram
The three models (Rad, DTL, and Rad-DTL) were compared, and the optimal model was derived. The clinical baseline data modeling was named the "clinic model", and the two models were used to calculate the probability of each patient's stone being an infected stone. The two models were named the "Rad-DTL_ signature" and "clinic signature". A nomogram was created from the two signatures for visualization and application of the models.
Statistical analysis
All statistical analyses were performed using Python software (version 3.6) based on PyRadiomics and deep transfer learning image features extracted from the pretrained ResNet34 network, and the U test, Spearman correlation coefficient and LASSO were used for image feature screening. Clinical baseline data were screened using a t test, and this study used 10-fold cross-validation to divide the dataset, thus eliminating the effect of data division on model evaluation and avoiding better or worse results for a particular division.
Clinical features
Supplementary Table 2 summarizes the characteristics of the patients in the training and test sets. Of these patients, 21.2% (165 of 780) had infected stones, and the stone composition characteristics of all enrolled patients are shown in Supplementary Table S3, with age, sex, stones size, interleukin-6, urinary leukocyte count, urinary pH, urinary culture, and nitrite significant by t test.
Image features
A total of 1218 Rad features were extracted from the CT images of each patient. Since the edges of urinary tract stones in CT images were very clear, satisfactory interobserver feature extraction reproducibility was achieved, 22 features were excluded after ICC screening, and 1196 Rad features were retained. DTL features were extracted by a pretrained ResNet34 network, and a total of 512 DTL features were extracted. A total of 1708 Rad-DTL features were fused with Rad features and DTL features, and the above three features were regularized (Z score) so that the data obeyed N~(0, 1). The above features were screened by the U test, and P < 0.05 was considered significant; then, the correlation between the features was calculated using the Spearman correlation coefficient for the screened features, and for the features with correlation coefficients greater than 0.9, one of the two was retained. Then, cross-validation of the data using Lasso was used to screen the best penalty coefficient lambda. For the features, 10 Rad features, 21 DTL features and 41 "Rad-DL" features with nonzero coefficients were retained for further screening. The three features and their corresponding coefficients are shown in Figure 3.
Image feature model construction and model comparison
The accuracy and test set AUC of the five models for the three features are shown in Figure 5. By comparison, the SVM model has the best performance. The accuracy and AUC of the Rad feature model are 81.1% and 0.826 and 82.1% and 0.763 in the training and test sets, respectively; the accuracy and AUC of the DTL feature model are 84.4% and 0.965 and 80.7% and 0.806 in the training and test sets, respectively; and the accuracy and AUC of the Rad-DTL feature model are 91.8% and 91.8% in the training and test sets, respectively. AUCs were 91.8%, 0.985 and 87.2%, 0.902, respectively, and after comparison, the Rad-DTL feature model had the best prediction performance among the three image features established for the infected stone prediction model, and the ROC plots of the Rad-DTL feature model in the training and test sets are shown in Supplementary Figure 2.
Clinical characteristics model evaluation
Based on the optimal model SVM filtered by CT image features to build the clinical feature model, the model accuracy was 79.1% for the training set and 80.8% for the test set, the AUC was 0.585 for the training set and 0.630 for the test set, and the model ROC graph and accuracy line graph are shown in Supplementary Figure 3. The model efficacy was worse than the model built by CT image features.
Nomogram mapping
For a more friendly application to clinical scenarios, a nomogram plot was developed in this study based on the CT image feature optimal model Rad-DTL feature model combined with the clinical feature model (Figure 6). rad-DTL_ Sig and Clinic Sig represent the probability that the stones are infected stones for each patient in the image optimal model and the clinical model.
Model evaluation and comparison
The diagnostic effectiveness of the nomogram map was tested in the test set, and ROC curves were plotted (Figure 7a) to evaluate the diagnostic effectiveness of comparing the optimal model of CT images, the Rad-DTL feature model, the clinical feature model and the nomogram map. The nomogram map had the best diagnostic effectiveness, and the nomogram map showed good discrimination between infected and noninfected stones in discriminatory ability with an AUC of 0.917 (95% CI: 0.850-0.985). The calibration curves (Figure 7b) were plotted to assess the calibration efficiency of the Rad-DTL characteristic model, clinical characteristic model and nomogram plot. The calibration curves depict the calibration of the infected stone prediction model in terms of the agreement between the predicted and observed probabilities, and it can be seen from the figure that the Rad-DTL characteristic model, clinical characteristic model and nomogram plot all showed good calibration. The DCA curves (Figure 7c) were plotted to evaluate the clinical utility of the prediction models, and the three curves represented by each of the Rad-DTL characteristic model, the clinical characteristic model and the nomogram plot were above the two reference lines, indicating that all three models were well calibrated. The combination of the three evaluation methods suggests that the use of nomogram plots for preoperative prediction of infected stones has been shown to have better clinical benefit.
Urolithiasis is the most common disease in urology, and its prevalence has increased globally in recent decades [24-26]. China is one of the high prevalence areas for urolithiasis, especially in southern China [24]. Different types of stones have different treatment strategies. A special subset of urinary stones is mostly caused by urinary tract infections, and the management of infected stones is particularly challenging [27]. The risk of infectious complications after surgery for infected stones has been reported to be high, and when infected stones are present, they may cause urogenital sepsis, which can be life-threatening [28-30]. In recent years, with the progress of science and technology and the improvement of medical standards, various minimally invasive surgical techniques have been carried out in large numbers. However, the incidence of intraoperative and postoperative urogenital sepsis is also significantly higher [31]. It can lead to death of patients due to infectious shock, mainly related to the release of bacteria inside the stone and the endotoxin produced by it into the blood during surgical lithotripsy. Compared to other component stones, infected stones have a higher rate of recurrence and loss of renal function and are often associated with greater risk, and urologists are often faced with the challenge of infected stones. Treatment of infected stones relies on the use of antibiotics (before and after stone removal) to remove floating bacteria from the urinary tract and surgical treatment to remove all stone fragments. Complete stone removal is critical because residual stones after surgery are an independent risk factor for recurrence of infected stones [32,33]. Therefore, clinically, preoperative determination of stone composition helps to identify the underlying etiology of urinary stones and facilitates early interventions such as preoperative prophylactic antibiotics, more aggressive removal of all stone fragments and extraction of pelvic urine for culture and drug sensitivity testing at the time of surgery, and postoperative maintenance antibiotic therapy based on drug sensitivity test results to reduce the risk of postoperative urogenital sepsis [34].
To date, there have been many studies on preoperative prediction models for urological stone composition, including the use of dual-energy CT with dual-energy characteristics to perform densitometry of stone CT at two photon energies and color coding of different types of stones as a way to identify stone composition [35-37], but the results show that DECT is only significantly effective in predicting uric acid stones and dual-energy CT in the general clinical setting. However, the results showed that DECT was only significant in predicting uric acid stones, and dual-energy CT is not routinely used in general clinical settings, so its use in clinical practice is limited. In addition, Black KM et al [10] automatically detected the composition of kidney stones from digital photographs of stones from 63 patients by deep learning. A deep convolutional neural network (CNN), ResNet-101, was used to build a multiclassification model. Although this study suggests that deep learning algorithms can be applied to the classification and prediction of urinary stone composition with good results, the specimens of this study were stones removed after surgery, which did not achieve the purpose of preoperative prediction and therefore could not achieve the purpose of preoperative intervention for the patient’s stone composition. In recent years, imaging histology has been widely used in various medical disciplines and has achieved better results in the classification and prognosis of diseases. Zheng J et al. [38] established a nomogram graph, a prediction model for infected stones, by extracting the radiomicsal features of CT images of stones through machine learning, and the nomogram graph was found to be more effective in the training and three validation sets (area under the curve [95% confidence interval] 0.898 [0.840-0.956], 0.832 [0.742-0.923], 0.825 [0.783-0.866], and 0.812 [0.710-0.914], respectively), and achieved more satisfactory results, and our study, not only extracted the deep transfer learning features of CT images on this basis, and separately built the Our study, however, not only extracted the deep transfer learning features of CT images based on this, and built machine learning models separately, but finally fused the two features to build an infected stone prediction model, and the final model Nomogram maps had AUCs of 0.985 (95% CI: 0.972-0.998), 0.917 (95% CI: 0.850-0.985) in the training and test sets, which were significantly higher than that study. Our study not only established a prediction model for preoperative in vivo prediction of infected stones but also mined more features from CT images of upper urinary tract stones, which enriched the feature engineering of the urinary stone prediction model and provided more references for the prediction of urinary stone components.
In this study, we reveal some imaging-omics and deep transfer learning features that help to distinguish infected stones from noninfected stones. The prediction models built with radiomics and deep transfer learning features showed better performance in terms of accuracy and AUC in the training set, and the accuracy and AUC of the Rad-DTL model built by fusing radiomics and deep transfer learning features were better than those of the prediction models built with radiomics and deep transfer learning features alone.
This study is an exploratory comparison and fusion of imaging histology and deep transfer learning feature models in differentiating stone components. The accuracy and AUC of radiomics models in stone composition still need to be improved. Similar studies on deep learning features for stone composition classification prediction have not been found, and our study shows that deep learning features are effective in stone composition classification. Studies directly comparing the performance of radiomics and deep transfer learning models have not been explored. In this study, we address these issues and aim to further enhance the interpretability of such machine learning models. A prediction model with better predictive performance for infected stone composition was also established by fusing radiomics and deep learning features, which can well predict infected and noninfected stones with higher accuracy before stone removal and provide assistance for targeted prevention and treatment of infected stones in the upper urinary tract.
Our study has some limitations. First, this is a single-center study, and due to the differences in medical equipment and scanning parameters between hospitals, it is unclear how well the machine learning model works when applied to other centers. In the next step of our study, we will consider integrating data from multiple centers to create an external test set to further test the generalization ability of the machine learning model. On the other hand, although we extracted CT image features using a pretrained ResNet34 model, these deep transfer learning features are not specifically defined, and in the future, we will explore the image coding process used to generate each deep learning feature to further enhance the interpretability of these features.
This study reveals deep migratory learning and imaging histology features associated with infected stones that contribute to the understanding of the imaging phenotype of infected stones. In addition, we compared the performance of deep migratory learning features, imaging histology features, and fusion features (Rad-DTL features) for predicting infected stones using nonenhanced CT images and showed the potential to improve the performance of preoperative aid in diagnosing infected stones with the help of machine learning prediction models. In addition, nomogram maps built by combining Rad-DTL features and clinical features have better performance in predicting infective stone composition and can be used as a noninvasive adjunctive diagnostic tool to identify infective stones in the upper urinary tract in vivo, better informing preoperative personalized prevention and treatment decisions for patients with infective stones in the upper urinary tract.
Ethics declarations: This study did not involve informed consent; All CT images are desensitized.
Data statement: The data that support the findings of this study are available from the corresponding author upon reasonable request.
Acknowledgments: This work was supported by the National Natural Science Foundation of China (82060137 and 82100808).
Conflicts of interest/competing interests: The authors report no conflicts of interest.