In this experiment, we can sequentially run the scripts in the '数据预处理' folder. We start by analyzing the dataset quantity from the original RGB images to determine the relationship between the data volumes of the selected disease categories. Then, we select the training, validation, and test sets in a balanced manner based on age, gender, and left/right eye, with a ratio of 8:1:1. After that, we can perform data augmentation using Gaussian blur and mirror flipping to balance the different categories. Additionally, we apply elliptical cropping to trim irregularly shaped images into circles, making them easier for model input. To use different models for predictive classification, we need to adjust the size and structure of the normalized npy files, with all the methods listed in the Python files. Finally, we use the trained U-Net model for vessel segmentation. You only need to replace the dataset with your desired dataset to immediately obtain the vessel segmentation results. The performance metrics are listed in file 7.
Next, we can use the code from the '上游实验' to select and evaluate upstream deep learning models. We have chosen six models as upstream models: MobileNetV3, ResNet50, DenseNet, Swin Transformer, EfficientNetV2, and InceptionV3. We also conducted experiments on model stacking using a binary tree method. The purpose of this is to accurately identify categories where model precision issues exist, allowing us to better determine which deep learning network is most effective at extracting the corresponding high-dimensional features. Even if the accuracy of direct predictions by a single network is suboptimal, this balanced approach is significant for the classification by downstream networks. Additionally, we utilized Grad-CAM technology to conduct a detailed analysis of the model heatmaps, aiming to visualize the high-dimensional features. This aligns with the results of the downstream models, enhancing the experiment's interpretability.
In the final part, the '多模态实验' we can run the code provided and read the accompanying txt documentation. The input format of the ResNet models differs significantly from that of the Transformer models, requiring additional adjustments to the npy file size. However, the npy inputs for the other selected deep learning networks are entirely consistent with ResNet50, so the desired results can be achieved by simply adjusting the model names and the number of training layers. This approach is easy to extend and understand. We used the original RGB images combined with the binary vessel segmentation images and employed the six deep learning models mentioned earlier, along with SVM, MLP, and XGB, to ensure that the model results were ultimately well-presented. Additionally, we utilized Grad-CAM technology to fully validate the model's interpretability and conducted corresponding ablation experiments to demonstrate that the multimodal performance of ResNet50 is both efficient and reasonable.
tips:Due to the large size of the additional 'Upstream Experiments' code, if needed, you can find it at the link: https://pan.baidu.com/s/182NsZYOdKDyiD0srNSoUqg with the extraction code: mstr.