FRD-Net: a full-resolution dilated convolution network for retinal vessel segmentation

Hua Huang; Zhenhong Shang; Zhenhong Shang; Chunhui Yu

doi:10.1364/BOE.522482

1. Introduction

Diabetic retinopathy, glaucoma, and age-related macular degeneration are major causes of blindness in the elderly [1]. In clinical practice, physicians diagnose these retinal diseases by analyzing the morphology of blood vessels, and based on the structure and location of vessels, they plan surgeries and guide interventions [2,3]. However, due to the limitations of imaging devices and the inherent characteristics of biological tissues, the initially acquired medical images often fail to provide an accurate representation of structural information. As a result, experienced clinicians are required to manually annotate retinal vessel lesions, a time-consuming and tedious task. This becomes especially crucial in acute cases that require timely treatment [4]. Therefore, automatic retinal vessel segmentation using computers is of significant importance and has become a research hotspot in the field of computer-aided medical diagnosis in recent years.

Initially, research on retinal vessel segmentation primarily focused on mathematical morphology methods [5], matched filtering methods [6], multi-scale methods [7], and region growing methods [8]. These methods aimed to make final predictions through manually designed feature extractors. However, due to diverse and complex backgrounds in images, such as low-contrast vessels, these methods tend to misclassify vessels as background. Benefiting from the influence of data-driven approaches and innovations in computing devices, deep learning methods have made significant progress in retinal vessel segmentation [9,10]. By leveraging the outstanding automatic feature learning and end-to-end learning capabilities of deep neural networks (DNNs), the accuracy of retinal vessel segmentation has been significantly improved [11–13]. Especially, after the introduction of the landmark U-Net architecture [14], various outstanding variants for vessel segmentation have emerged [15–17]. Despite achieving good segmentation results based on evaluation metrics, these approaches still face challenges in accurately segmenting fine vessels within complex backgrounds.

We conducted experiments on CE-Net to investigate the loss of detailed information, such as edges and textures, in the encoder-decoder structure. Vessel images differ significantly from other medical images, such as cardiac and cellular images, with the former often being relatively thicker and having lower pixel ratios, especially in capillary sections (see Fig. 1(a)). Traditional encoder-decoder segmentation networks use down-sampling operations to expand the receptive field [18] and reduce computational complexity. This leads to a reduction in valuable spatial information. As a result, extracting semantic information about small vessels in low-contrast areas poses a challenge for the model. In the decoder, up-sampling layers struggle to recover these fine structures, causing the network to prioritize identifying larger vessels while overlooking smaller vessels (see Fig. 1(c)). When reducing the number of down-sampling rounds, information about some smaller vessels is preserved (see Fig. 1(d) and (e)).

Fig. 1. Visualization of segmentation results with different down-sampling operations on CE-Net. (a) Original image from the DRIVE dataset; (b) Ground truth; (c) Segmentation result of the original CE-Net; (d) Segmentation result of CE-Net without one round of down-sampling; (e) Segmentation result of CE-Net without two rounds of down-sampling.

Download Full Size | PDF

Inspired by the aforementioned challenges, this paper proposes an effective full-resolution retinal vessel segmentation network called FRD-Net, comprising two core components: the backbone network and the Multi-Scale Feature Fusion Module (MFFM). To address the issue of spatial information loss due to excessive down-sampling, existing methods attempt to alleviate this problem to some extent through multi-scale output strategies, but limitations persist. In order to tackle this problem, the backbone network replaces pooling down-sampling with convolution down-sampling, aiming to reduce the loss of spatial details during the pooling process. While reducing the number of down-sampling rounds can mitigate detail loss, this action also diminishes the network’s receptive field. To balance the loss of detailed information and expand the receptive field, we opt for three down-sampling rounds. To mitigate detail a loss, we employ a strategy in the backbone network where the dilation rate of dilated convolutions is first increased and then decreased in both the horizontal and vertical directions. Initially, dilated convolutions increase the dilation rate to balance large receptive fields and high spatial resolution. However, due to the sparsity of the kernel, further increasing the dilation rate fails to aggregate local features, leading to loss of information about fine vessels. Consequently, reducing the dilation rate restores spatial consistency to address this issue. Additionally, we introduce MFFM to fuse shallow multi-scale features with deeper multi-scale information, aiding in complementing deeper feature information to restore more vessel edge details, thus achieving precise retinal vessel segmentation.

Specifically, the main contributions of this paper include:

1. We propose a new and effective full-resolution retinal vessel segmentation network, named FRD-Net, which consists of interconnected multi-resolution dilated convolution layers.FRD iteratively learns full-resolution representations to mitigate the loss of spatial information. Simultaneously, an effective sequence of dilated convolutions is employed to compensate for their limitations, preserving fine vessel details.
2. To integrate multi-scale features and enhance segmentation performance, we introduce MFFM for extracting vessel structural information at different scales. This enables detail-rich features from shallow layers to be directly transmitted to deeper layers, thus protecting both thick and thin vessels from down-sampling degradation. Concurrently, while retaining edge details, it suppresses background noise, further improving the precision of the segmentation results.
3. Experimental results on publicly available datasets, including DRIVE, STARE, CHASE_DB1, and HRF, demonstrate that FRD-Net achieves superior segmentation performance with fewer model parameters compared to existing retinal vessel segmentation methods. The code for FRD-Net is available on the following website: https://github.com/papercodeHua/FRD-Net.

The upcoming work is outlined as follows: The second section discusses representative deep learning methods for retinal vessel segmentation, while the third section provides a detailed exposition of the proposed methodology and network architecture. The fourth section reports on experimental parameter settings, datasets, and data preprocessing. In the fifth section, experimental results and ablation studies are presented, followed by a comparative analysis of performance against other state-of-the-art methods. The sixth section concludes the research findings of this paper.

2. Related work

2.1 Retinal vessel segmentation

In recent years, deep learning has emerged as the predominant approach for retinal vessel segmentation, with U-Net being one of the most widely applied deep learning frameworks in medical image segmentation tasks. Numerous researchers have employed U-Net for retinal vessel segmentation tasks [14]. Compared to traditional unsupervised learning methods, U-Net-based approaches can automatically learn complex features, enhancing the accuracy of retinal vessel segmentation. To address the issue of spatial information loss caused by down-sampling operations in deep convolutional neural networks, U-Net introduced a mechanism of skip connections to fuse low-level and high-level features. While U-Net models benefit from feature fusion, some spatial information from the encoder’s shallow stages is challenging to recover in the decoder. Moreover, U-Net has limitations in handling fine and irregular retinal vessel structures, as well as issues related to limited data annotations. Therefore, recent research has focused on improvements to U-Net to further enhance its performance. For instance, Guo, Pei, et al. [19] proposed an enhanced SD-Net model based on U-Net. They introduced the DropBlock structure into U-Net to alleviate network overfitting issues through normalized convolutional architecture. SA-Net [20] incorporated batch normalization layers (BN) into the convolutional blocks of SD-Net, utilizing spatial attention mechanisms to enhance the network’s representational capacity. Although these methods further improved retinal vessel segmentation, they primarily addressed the issue of limited data annotations leading to network overfitting, neglecting the irregular characteristics of retinal vessel structures. Mou et al. addressed the curved structure of vessels with CS$^{2}$-Net, utilizing 1$\times$3 and 3$\times$1 convolutions in two directions to capture vessel morphology features. They emphasized regions of interest through channel and spatial attention mechanisms [21]. To address spatial information loss caused by consecutive convolution and pooling operations, Gu, Cheng et al. proposed CE-Net, constructing a context extractor module using dense dilated convolution blocks and residual multi-kernel pooling blocks to obtain more contextual information [12]. These methods add extra modules between the U-Net encoder and decoder to obtain more high-level semantic features, thus making the model’s ability to segment blood vessels greatly improved. However, these methods ignore the importance of underlying spatial information for retinal fine blood vessel segmentation. In order to preserve the underlying features with rich spatial information, Wang et al [22] introduced a spatial refinement path and semantic refinement path structure, focusing on fusing blood vessel features with different resolutions and levels in the network, but the method failed to fully fuse the underlying and high-level semantic features, resulting in the loss of some features representing fine blood vessels.Inspired by these works, we made targeted improvements to the network architecture, such as multi-scale fusion, changing pooling methods and the number of pooling layers, as well as the application of the DropBlock structure.

2.2 High/full-resolution network

In semantic segmentation, high/full-resolution networks typically have deeper and more complex structures, enabling them to learn and represent richer feature information, thus preserving more local and contextual details. Sun et al. [23,24] proposed High-Resolution Network (HRNet), specifically designed for tasks such as human pose estimation, object detection, and semantic segmentation. HRNet leverages multi-resolution fusion and cross-stage connections to effectively utilize information from different scales, thereby enhancing the understanding of semantic information in images. Additionally, HRNet’s efficient feature learning mechanism, while maintaining high-resolution representations and effectively utilizing multi-scale features, improves the model’s performance and efficiency. UNet++ [25,26] redesigns rich skip connections to reduce the semantic gap between feature maps of encoder and decoder sub-networks, maintaining richer feature representations. Furthermore, a pruning method is designed to accelerate the inference speed of UNet++. To address the discontinuity issue in vessel segmentation results, Liu et al. proposed FR-UNet [27], which expands in horizontal and vertical directions through multi-resolution convolutional interaction mechanisms while preserving the full image resolution. Finally, a Dual Threshold Iterative (DTI) algorithm is employed to extract fine vessel pixels, improving vessel connectivity. Previous studies have shown that full or high-resolution convolutional networks perform well in medical image segmentation tasks. However, these networks still face challenges such as redundant skip connections, complex network architectures, and high parameter computation. We aim to address these issues and apply them to retinal vessel segmentation to achieve better vessel segmentation performance.

3. Method

3.1 Network architecture

This paper proposes a simple yet effective retinal vessel segmentation model, and Fig. 2 illustrates the complete architecture of the proposed FRD-Net. FRD-Net consists of two components: the backbone network and MFFM. The backbone network is responsible for extracting retinal vessel information at different scales. The outputs of the different layers of the backbone network are used as inputs to the MFFM to generate the final fusion result. The following sections will provide a detailed explanation of the backbone network and MFFM.

Fig. 2. The network architecture of FRD-Net

Download Full Size | PDF

3.2 Backbone network

As shown in Fig. 2, the backbone network adopts a three-layer architecture, where each layer utilizes dilated residual convolutional blocks with distinct dilation rates to extract retinal vessel features. The interaction mechanism is employed to achieve horizontal and vertical expansion. Simultaneously, the effective utilization sequence of dilated convolutions is maintained in both horizontal and vertical directions. This approach preserves the details of fine vessels while retaining the integrity of the entire image resolution.

3.2.1 Effective sequential utilization of dilated convolutions

In many encoder-decoder models, a common strategy to balance large receptive fields and high spatial resolution is the substitution of traditional convolutions with dilated convolutions. However, continuous use of dilated convolutions gives rise to two issues:(1) Grid Effect: During feature learning in the network, not every pixel contributes to the computation, resulting in information discontinuity. This negatively impacts the learning of fine vessel details within the retina. (2) Weak Correlation between Distant Information: The structural form of dilated convolutions indicates their suitability for capturing long-range information. This implies that employing high dilation rates is effective only for segmenting coarse vessels, providing limited benefit for the segmentation of fine vessels. Therefore, it is essential to carefully consider the sequence of applying dilated convolutions to address these challenges in medical image segmentation for retinal vessels.

Handling the relationship between retinal vessel thickness effectively is a critical challenge in the design of well-constructed dilated convolutional networks. The introduction of the Hybrid Dilated Convolution (HDC) structure, as proposed in [28], aims to address the issue of information discontinuity. This structure incorporates two key design principles: (1) the dilation rates of the stacked convolution cannot exceed 1, and (2) the null rate is designed as a sawtooth structure, such as [1,2,5,1,2,5]. For tackling the segmentation challenges associated with fine vessels within the vasculature, and considering the sparse connectivity characteristics of dilated kernels, as illustrated in Fig. 3, further increasing dilation rates can result in decreased spatial consistency between adjacent information units [29]. we opt to gradually increase the dilation rates (1,2,3) to mitigate the speed of spatial consistency decline, to facilitate subsequent enhanced recovery of spatial consistency between adjacent information units. Simultaneously, the decline in spatial consistency poses challenges for higher-level information units, as they can only capture partial information from non-overlapping units, leading to difficulties in extracting local structural information and potential information loss. To address this issue, similar to HDC, we decrease the dilation rates after their augmentation to restore spatial consistency. The distinction lies in our choice of symmetric dilated convolution sequences for rate reduction, facilitating the reconnection of information pyramids between adjacent units and enabling the extraction of higher-level local structures [29]. Specifically, as depicted in Fig. 3, we propose a horizontal dilated rate sequence of 1,2,3,2,1,3. This study introduces a backbone network structure that utilizes an interaction mechanism, employing dilated convolution sequences in both horizontal and vertical directions. In the horizontal direction, the effective application of dilated convolution sequences enables us to balance the demands of large receptive fields and high spatial resolution in vessel segmentation tasks. In the vertical direction, leveraging the interaction mechanism realizes the designed dilated convolution sequence, facilitating accurate detection of smaller objects (such as tiny blood vessels) while achieving cross-resolution information sharing and feature fusion, thereby enhancing the capability to accurately detect local structures like tiny blood vessels.

Fig. 3. The order of the dilated convolutions.

Download Full Size | PDF

3.2.2 Multi-resolution interaction mechanisms and residual modules

Figure 4 illustrates the structure of the backbone network. The backbone network employs various convolutional operations (2x2 convolution and deconvolution, dilated convolutions with different dilation rates) to achieve horizontal and vertical expansions, similar to the structure of HRNet. Additionally, we introduce a multi-resolution interaction mechanism at each feature map stage to facilitate information exchange between adjacent stages. Shallow stages contribute refined semantic information, while deeper stages augment high-level context information and local receptive fields of feature maps. In order to reduce parameter count and maintain the effective utilization sequence of dilated convolutions, in contrast to the feature fusion module in FR-Net that employs parallel multiple dilated convolutions, we exclusively utilize dilated residual modules (as depicted in Fig. 4) for feature extraction from horizontally and vertically concatenated feature maps. This approach helps alleviate potential semantic misunderstandings that may arise from using a fixed receptive field during feature learning, while also reducing the overall number of parameters.

Fig. 4. Backbone network of FRD-Net.

Download Full Size | PDF

Our multi-resolution interaction mechanism (highlighted by the red box in Fig. 4) operates as follows:

(1)$$C_i,j= \begin{cases} & \left [U(X_{i+1,j-1}),X_{i,j-1} \right ],i=0\\ & \left[D(X_{i-1,j-1}),U(X_{i+1,j-1}),X_{i,j-1}\right],i=1 \\ & \left[D(X_{i-1,j-1}),X_{i,j-1}\right],i=2 \\ \end{cases}$$

Where $D(x)$ and $U(x)$ represent down-sampling and up-sampling operations, respectively. [u,v,…] represents the concatenation operation. As illustrated in Fig. 4, $X_{i,j}$ represents a fusion stage of feature maps, where i and j correspond to the rows and columns of the defined backbone network. Depending on the hierarchical level of the fusion stage, the feature fusion methods can be categorized into three stages:(1) Concatenation of the output of $X_{i,j-1}$ and the up-sampling output of $X_{i+1,j-1}$; (2) Concatenation of the output of $X_{i,j-1}$ and the down-sampling output of $X_{i-1,j-1}$; (3) Concatenation of the output of $X_{i,j-1}$, the down-sampling output of $X_{i-1,j-1}$, and the up-sampling output of $X_{i+1,j-1}$. The resulting three different concatenation outputs are used as inputs to the dilated residual module.

The structure of the backbone network is relatively simple, primarily consisting of the following components: dilated residual blocks, up-sampling, and down-sampling. Figure 5 illustrates the configuration of dilated residual blocks with varying dilation rates. Considering the limited number of publicly available retinal datasets, which poses a challenge of overfitting during training, we introduce the DropBlock structure within the dilated residual blocks as an effective regularization method. The incorporation of these dilated residual modules not only addresses overfitting challenges but also contributes to accelerated convergence during network training, enhancing the model’s performance and generalization ability. The core components of up-sampling and down-sampling include convolution layers, BN layers, and LeakyReLU activation functions. Specifically, down-sampling utilizes a 2$\times$2 convolution with a stride of 2 to increase channel numbers and reduce spatial dimensions. Up-sampling, on the other hand, employs a 2$\times$2 transpose convolution with a stride of 2 to halve channel numbers and increase spatial dimensions. The channel count in FRD-Net starts from 32 and gradually doubles to meet the feature extraction requirements at different levels.

Fig. 5. Dilated Residual Module.

Download Full Size | PDF

3.3 Multi-scale feature fusion module (MFFM)

Retinal vessels exhibit diverse thicknesses and sizes, imposing higher demands on accurate segmentation across multiple scales. Recognizing this characteristic, and aiming to better integrate feature maps from different layers of the backbone network to enhance the final segmentation accuracy, we introduce MFFM block, as depicted in Fig. 6. The primary functionalities of this block include effective background noise suppression and fine segmentation of retinal vessels. Prior research commonly employed the strategy of restoring all scale feature maps to the same scale, followed by pixel-wise addition and concatenation. However, these approaches often failed to adequately consider the spatial relationships between feature maps of different scales, posing a challenge to improving segmentation performance. Considering the semantic gap between feature maps of different scales, we decide to initiate the fusion process from the lowest layer of the backbone network, i.e., the smallest scale feature map. We then progressively up-sample and concatenate with the feature map from the previous layer, ultimately employing a 1x1 convolution for feature fusion. This process is repeated until the features are restored to the original size of the image. To compensate for the detailed information lost during the fusion process, we feed the original image into the Multi-Scale Fusion Module and concatenate it with the final fused feature map from the backbone network. This ensures that shallow multi-scale information can directly propagate to deeper layers, preserving the details of both coarse and fine retinal vessels from the effects of down-sampling. Subsequently, the MFFM employs dilated convolutions with varying dilation rates on the concatenated feature maps to perform feature learning, building upon traditional convolutional operations. Specifically, we utilize traditional convolutions with a kernel size of 1x1, coupled with dilated convolutions with different rates (1, 2, and 3), to extract multi-scale vessel features, thereby enhancing the final segmentation accuracy. By fusing features of different scales, we can more accurately capture and represent the details and diversity of retinal vessels.

Fig. 6. Internal Structure of the MFFM.

Download Full Size | PDF

4. Materials and data preprocessing

4.1 Dataset

In this study, training and testing of the FRD-Net network were conducted on four publicly available retinal image datasets: DRIVE [30], STARE [31], CHASE_DB1 [32], and HRF [33]. The DRIVE dataset comprises 40 color fundus images of the retina (33 from non-diabetic individuals and 7 from patients with mild diabetic retinopathy), collected from different patients aged 25-90 in the Netherlands. The STARE dataset consists of 20 color fundus images (10 depicting pathological retinas and 10 normal retinas). The CHASE_DB1 dataset includes 28 color fundus images taken from 14 school children, providing binocular retinal color images. The HRF dataset contains 45 color fundus images (15 from healthy individuals, 15 from patients with diabetic retinopathy, and 15 from patients with glaucoma). For the first three datasets, where two expert annotations are available, to maintain consistency with other methods, we use the annotations from the first expert as labels, which are input into the network alongside the original images. The annotations from the second expert are treated as human observers in these three datasets.

Due to the limited number of images in the four datasets (a total of 133 images), overfitting issues may arise during network training. To address this, we employed data augmentation methods. The specific augmentation techniques include random horizontal flipping with a probability of 0.5, random vertical flipping with a probability of 0.5, random rotation within the range of [0, 360], and random cropping. By applying these methods, the number of training images was increased fivefold, mitigating overfitting issues and significantly improving the model’s performance. Table 1 summarizes the quantity of each dataset, the training/testing set split, the resolutions before and after cropping, and the number of training images after data augmentation.

Table 1. Details of the Four Datasets

View Table | View all tables in this article

4.2 Data preprocessing

Due to significant variations in color tones and contrast within the retinal datasets’ color fundus images, grayscale conversion is initially applied to mitigate these interfering factors. Notably, the low contrast between vessels and background in retinal images often results in unclear vessel details. To address this issue, an Adaptive Histogram Equalization (CLAHE) method [34,35] is employed to enhance the contrast between vessels and background. CLAHE performs histogram equalization locally, significantly improving image contrast. Finally, Gamma Correction (GC) is introduced as an effective contrast enhancement technique. GC effectively highlights darker vessel structures in retinal images [36]. Figure 7 illustrates the image processing results of the aforementioned steps.

Fig. 7. Image Preprocessing: (a) Original image from the DRIVE dataset, (b) Grayscale processed image, (c) Image after CLAHE processing, (d) Image after Gamma Correction.

Download Full Size | PDF

4.3 Experimental environment and parameter settings

The experiments were conducted on a Dell Intel Xeon Gold 6226R processor and a Dell RTX A6000 graphics card. The experimental environment employed the Windows 10 (64-bit) operating system, with development and testing carried out using PyCharm Community Edition 2022.2.3 x64. The PyTorch open-source framework was utilized for training and testing the network models. The optimization algorithm for network training in this study was the Adam algorithm [37], with a learning rate (lr) of 0.001. The network underwent 250 training iterations. The DropBlock configuration for each dataset involved a dropout block size of 7, while maintaining an output probability of 0.9 for each neuron.

4.4 Evaluation metrics

To quantitatively analyze the effectiveness of the proposed model and assess its segmentation performance, the manually segmented results provided by the datasets served as standard segmentation images. Five performance evaluation metrics–Accuracy (Acc), Sensitivity (Se), Specificity (Sp), F1 score, and Area Under the ROC Curve (AUC)–were introduced for objective quantitative evaluation. The calculation formulas for the evaluation metrics are defined as follows:

(2)$$Acc = \frac{TP+TN}{TP+TN+FP+FN},$$

(3)$$Se = \frac{TP}{TP+FN},$$

(4)$$Sp = \frac{TN}{TN+FP},$$

(5)$$F1 = \frac{2TP}{2TP+FP+FN},$$

where TP, TN, FP, and FN represent True Positive, True Negative, False Positive, and False Negative, respectively. In the calculation, TP, TN, FP, and FN values are obtained by comparing the pixel-wise results of the retinal vessel segmentation from the testing method with the corresponding pixel values in the Ground Truth label images. The Area Under the ROC Curve (AUC) is defined as the area enclosed by the Receiver Operating Characteristic (ROC) curve and the axes. The values for Acc, Se, Sp, and F1 range from 0 to 1, while AUC ranges from 0.5 to 1. Larger values for these five metrics indicate better model classification performance, i.e., improved retinal vessel segmentation results.

4.5 Loss function

In the task of retinal image segmentation, the choice of a loss function is crucial for accurately segmenting essential structures in the retina, such as vessels and lesions. In retinal images, background pixels typically occupy the majority, while the target structures of interest (e.g., vessels) constitute a minority, leading to a class imbalance issue. To better train the proposed model, we employ the binary cross-entropy loss function to optimize the network parameters, measuring the difference between predicted values P(i) and actual values G(i) as the basis for network parameter optimization:

(6)$$L_{bce} ={-}\frac{1}{N}\sum\nolimits_{i}^{N}([target[i]*log(y[i]))+(1-[target[i])*log(1-y[i]),$$

where, P(i) represents the predicted value, where $P(i) \in (0,1)$, reflecting the likelihood of a pixel being predicted as a vessel pixel; a higher value indicates a higher likelihood. G(i) denotes the label, taking values of 0 or 1. N represents the total number of pixels in the image.

5. Analysis of experimental results

5.1 Quantitative comparison experiments between FRD-Net and other methods

To validate the superior performance of FRD-Net, we conducted comparisons with widely recognized retinal vessel segmentation methods on four public datasets. We comprehensively assessed five key evaluation metrics, including sensitivity (Se), specificity (Sp), accuracy (Acc), F1 score, and area under the curve (AUC). The best-performing results are highlighted in red, while the second-best results are marked in blue. "N/A" indicates that the corresponding results were not provided in the respective papers. We re-implemented five methods, CE-Net [12], SA-Unet [20], CS$^{2}$-Net [21], FR-Unet [27], and FRD-Net, and compared their performance. For other methods without available code, we referenced the comparison data provided in the relevant literature.

As shown in Table 2, on the DRIVE dataset, FRD-Net outperforms other methods in Se, Acc, F1, and AUC metrics. While CS$^{2}$-Net achieves the best result in Sp, it lags behind FRD-Net in Se, Acc, F1, and AUC metrics, highlighting the impact of severe class imbalance due to the small pixel proportion occupied by retinal vessels. In comparison to Sp, Se and F1 effectively evaluate overall segmentation performance. CS$^{2}$-Net falls short of FRD-Net by 1.55% in Se and 0.64% in F1. Table 3 provides quantitative comparative experimental results of FRD-Net with other methods on the STARE dataset. FRD-Net exhibits superior performance in Sp, Acc, AUC, and F1 metrics. Specifically, FRD-Net achieves a 0.45% higher Acc than the second-best SCS-Net. Although FRD-Net’s Se metric slightly decreases compared to DM-Net and SDDC-Net, with gaps of 0.19% and 0.23%, it outperforms both methods in Se, Sp, Acc, and AUC metrics. Notably, in terms of Acc and F1 metrics, FRD-Net surpasses SDDC-Net by 1.12% and 2.89%, respectively, and DM-Net by 0.49% and 0.58%.

Table 2. Comparative Experimental Results of FRD-Net with Other Methods on the DRIVE Dataset.

View Table | View all tables in this article

Table 3. Comparative Experimental Results of FRD-Net with Other Methods on the STARE Dataset.

View Table | View all tables in this article

Tables 4 and 5 demonstrate the quantitative comparative experimental results of FRD-Net with other methods on the CHASE_DB1 and HRF datasets. The results indicate that on the CHASE_DB1 dataset, FRD-Net achieves optimal performance in all five metrics except for the Sp indicator. On the HRF dataset, FRD-Net outperforms other methods in all five metrics. Despite a 0.03% decrease in Sp on the CHASE_DB1 dataset compared to FR-Unet, FRD-Net exhibits superior performance in Se, Acc, F1, AUC, and other indicators. Notably, FRD-Net surpasses FR-Unet by 2.86% in correctly segmented vessel pixels (Se) and by 1.1% in the F1 indicator, which comprehensively measures segmentation performance. Moreover, on the HRF dataset, FRD-Net outperforms SA-Unet by 3.12% in the F1 indicator.

Table 4. Comparative Experimental Results of FRD-Net with Other Methods on the CHASE_DB1 Dataset.

View Table | View all tables in this article

Table 5. Comparative Experimental Results of FRD-Net with Other Methods on the HRF Dataset.

View Table | View all tables in this article

The above experimental results indicate that the proposed FRD-Net consistently achieves the highest values in key evaluation metrics, including Se, Acc, and F1 scores, across four public image datasets. FRD-Net is designed to enhance the extraction capability of retinal vessels, particularly in the presence of complex backgrounds and small retinal vessels. When compared to other state-of-the-art segmentation methods on retinal vessels, FRD-Net demonstrates superior performance in Se, Acc, and F1 scores. Higher Se values indicate stronger detail detection capabilities, while elevated Acc and F1 scores signify excellent performance in retinal vessel segmentation tasks.

5.2 Visualization comparison of the method with other methods

To demonstrate the performance of the proposed FRD-Net model in terms of vessel segmentation performance. under the same experimental conditions, Figs. 8, 9 show a visual comparison of FRD-Net with four methods such as CE-Net [12], SA-Unet [20], CS$^{2}$-Net [21], and FR-Unet [27] in terms of local details for evaluating the model on four datasets, namely, DRIVE, STARE, CHASE-DB1, and HRF, which have been performance testing. We selected two images in each dataset for testing and used green and red borders to mark the local images of tiny blood vessels and provide a zoomed-in display of the local images. In Fig. 8, for the DRIVE, STARE, and CHASE_DB1 datasets, comparative methods exhibit instances of missed, incomplete, or discontinuous segmentation in small vessels (indicated in red). FRD-Net excels in accurately segmenting these small vessels. Additionally, other methods may experience missing or over-segmentation issues when dealing with larger vessels, as observed in the first image of the STARE dataset. In this case, both other methods and FRD-Net successfully segment the larger vessel within the red border. However, compared to the ground truth, CS$^{2}$-Net segments the larger vessel with a diameter that is excessively large, while FRD-Net accurately segments these larger vessels. In terms of small vessels in this image, other methods struggle to accurately segment the small vessels adjacent to the larger vessel, while FRD-Net demonstrates more accurate segmentation of these small vessels. Figure 9 illustrates the comparison results of FRD-Net and other methods on the HRF dataset, which contains a higher proportion of small vessels, providing a good reflection of the methods’ performance in segmenting small vessels. In Fig. 9, other methods exhibit instances of missed, incomplete, or discontinuous segmentation in small vessels (indicated in red and green boxes with yellow arrows). In contrast, FRD-Net excels in accurately segmenting these small vessels. In addition to this, we have appended key quantitative metrics, i.e., Acc and F1 scores, next to each of the subfigures in Figs. 8, 9. Through the dual validation of quantitative metrics and qualitative analysis, we can clearly see that the proposed method exhibits a significant advantage in vessel segmentation performance compared to the comparison methods. It is not only accurate, but also capable of capturing the details of the vessels more comprehensively, thus providing us with more accurate and reliable vessel segmentation results. Overall, compared to the comparative methods, FRD-Net demonstrates more accurate segmentation of small vessels and excellent performance in larger vessel segmentation.

Fig. 8. Comparative analysis of local regions in retinal vessel. (a) Original image; (b) Ground truth; (c) FRD-Net prediction; (d) FR-UNet prediction; (e) CS$^{2}$-UNet prediction; (f) SA-UNet prediction; (g) CE-Net prediction.

Download Full Size | PDF

Fig. 9. Comparative analysis of local regions in retinal vessel. (a) Original image; (b) Ground truth; (c) FRD-Net prediction; (d) FR-UNet prediction; (e) CS$^{2}$-UNet prediction; (f) SA-UNet prediction; (g) CE-Net prediction.

Download Full Size | PDF

5.3 Performance of FRD-Net in challenging areas of retinal vessel segmentation

The intricate and curved structure of retinal blood vessels, low-contrast retinal images, and interference from vessel lesions pose significant challenges in the task of retinal vessel segmentation. The presence of numerous tiny vessels in the peripheral and intermediate regions of the retina further complicates the segmentation process by causing blurred outlines of these micro-vessels, making their accurate identification difficult. To assess the outstanding robustness of FRD-Net in such complex scenarios, we selected specific segmentation results from four datasets and illustrated the segmentation performance of FRD-Net in these challenging situations through Fig. 10. The first row demonstrates that FRD-Net excels in successfully segmenting micro-vessels under low-contrast conditions, showcasing its superiority in low-contrast environments. The second and third rows showcase the accurate segmentation of complex and curved vessels in pathological regions, highlighting the robustness of the FRD-Net against interference. The fourth and fifth rows depict FRD-Net’s ability to accurately segment micro-vessels despite the challenges posed by low contrast and pathological factors. In the quantitative metrics presented in Fig. 10, by comparison, we can clearly see that FRD-Net is able to capture the fine structure of blood vessels more accurately when dealing with the segmentation challenges, thus achieving more precise vessel segmentation results.

Fig. 10. Comparison of Challenging Local Areas in Retinal vessel Segmentation.(a) Original Image; (b) Ground Truth; (c) FRD-Net Prediction; (d) FR-UNet Prediction; (e) CS$^{2}$-UNet Prediction; (f) SA-UNet Prediction; (g) CE-Net Prediction.

Download Full Size | PDF

Figure 11 illustrates that when comparing the segmentation results of the FRD-Net model with ground truth, our method successfully extracted tiny blood vessels that were not annotated by the first expert. Despite being trained using annotations from the first expert, during testing, we observed that the model successfully identified more vessels than those marked by the first expert. This outcome indicates that the FRD-Net model has learned superior vessel representations. Even in regions where the first expert missed some tiny vessels, our model still performs well in the task of vessel segmentation.The quantitative indicators in Fig. 11 provide further evidence of this performance.

Fig. 11. Comparison of Fine Blood Vessel Segmentation between the Proposed Method and Two Observers. (a) Original Image; (b) First Observer; (c) Second Observer; (d) Our Proposed Method.

Download Full Size | PDF

5.4 Computational complexity

Recent studies [48] suggest that increasing the complexity of a network typically enhances its representational capacity, leading to improved performance. However, this might not be the optimal choice in many medical applications, where sufficient computational resources for deploying and running highly complex models are often unavailable in clinical settings. The number of model parameters serves as an objective metric for evaluating computational complexity. We compare the proposed FRD-Net with other state-of-the-art methods in terms of complexity by estimating the number of parameters. As shown in Figs. 12, our approach achieves superior performance without a significant increase in the number of parameters compared to other methods, reaching the highest values for sensitivity (Se) and F1 at 83.11% and 82.96%, respectively. This indicates that the performance enhancement of our FRD-Net is achieved not by sacrificing network complexity but by introducing a novel and effective set of techniques for extracting additional scale and semantic information, resulting in more representative features. In comparison with state-of-the-art methods, our FRD-Net remains relatively lightweight, utilizing only 2.78 million parameters.

Fig. 12. F1 and Sensitivity (Se) Scores on the DRIVE Dataset . The values in parentheses represent the number of parameters (in MB). Larger circles indicate a higher number of parameters.

Download Full Size | PDF

In addition, the floating-point operations per second (FLOPs) of the model is also a measure of the model complexity metrics, while the training time (one epoch) and inferencing time (one image) can help to prove the efficiency of the model in practical applications. As shown in Table 6, by comparing the above metrics, the proposed method FRD_Net performs well in terms of the number of parameters, training time and inference time, and the FLOPs of the model are lower than those of U-Net and FR-Unet.The fast training time and inference time of the model indicate the value of FRD-Net for practical applications.

Table 6. Comparison of computational complexity and time (on the DRIVE dataset)

View Table | View all tables in this article

5.5 Experimental ablation

To validate the effectiveness of the modules proposed in our FRD-Net, we conducted ablation experiments on the well-established DRIVE dataset. The U-Net architecture served as the baseline model for the ablation study. Sequentially, we introduced the multi-resolution interaction mechanism and residual module (referred to as FR in this section), the effective utilization of dilated convolutions (referred to as DCEU in this section), and the multi-scale feature fusion module (referred to as MFFM in this section) into the baseline model, resulting in five distinct models. The experimental results are summarized in Table 7, providing an intuitive understanding of the contribution of each module to the model’s performance. In comparison to the baseline, the multi-resolution interaction mechanism (Baseline + FR) significantly improved overall segmentation performance, with increases of 4.67%, 0.22%, 0.74%, 1.03%, and 0.8% in Se, Sp, Acc, F1, and AUC, respectively. As illustrated in Figs. 13(g) and (h), local visual results from the ablation experiments further demonstrate the enhancement in segmentation performance. The experimental results indicate that introducing multi-resolution interaction mechanisms between adjacent stages enables interactive fusion of contextual information, thereby enhancing the model’s segmentation ability for retinal vessels.

Fig. 13. Visual Results of Ablation Experiments(a) Original image; (b) Local details of the original image; (c) Ground Truth; (d) Baseline + FR + DCEU + MFFM; (e) Baseline + FR + MFFM; (f) Baseline + FR + DCEU; (g) Baseline + FR; (h) Baseline.

Download Full Size | PDF

Table 7. Ablation Experiments on the DRIVE Dataset

View Table | View all tables in this article

Building upon the multi-resolution interaction mechanism (Baseline + FR), we successively introduced the effective use of dilated convolutions (DCEU) and the multi-scale feature fusion module (MFFM). The former constitutes the backbone network of our FRD-Net, while the latter aims to highlight the role of the MFFM module. The experimental results are depicted in Table 7 as "Baseline + FR + DCEU" and "Baseline + FR + MFFM". In the effective use of dilated convolutions, we employed a strategy of increasing and then decreasing dilation rates applied in both horizontal and vertical directions, replacing conventional 3x3 convolutions. This strategy led to improvements in Se, Sp, Acc, F1, and AUC metrics, with notable advancements in Se and F1 by 0.62% and 0.69%, respectively, in "Baseline + FR + DCEU" compared to "Baseline + FR." We posit that the combination of the effective use of dilated convolutions and the multi-resolution interaction mechanism aggregates richer contextual information before fusion, advantageous for the segmentation of heavily imbalanced vessel pixels and delicate vessels. While the multi-scale fusion module is widely utilized in various mainstream network architectures, we modified its internal fusion approach, co-learning with the original image. In comparison to "Baseline + FR," the multi-scale fusion module facilitated the direct transmission of multi-scale information-rich features from shallow layers to deeper ones, preserving thick and thin vessels from downsampling degradation. This improvement enhances the overall performance, particularly achieving the highest score in the F1 metric. Finally, we further incorporate DCEU and MFB into the model (Baseline+FR+DCEU+MFB) to validate the combined effect of all modules. As shown in Table 7, this method achieves the highest Sp, Acc, and AUC in the ablation study, with slightly lower scores in Se and F1 metrics, but with marginal differences compared to the highest scores. Visual details from the local segments of the ablation experiment, as shown in Fig. 13, demonstrate that incorporating each module further elevates segmentation performance. These findings collectively underscore the efficacy of FRD-Net for vessel segmentation. In conclusion, the results suggest that maintaining a full-resolution interaction mechanism and overlaying effective dilated convolution sequencing and multi-scale aggregation modules form a strategic approach for vessel segmentation.

To assess the impact of down-sampling frequency on the performance of the FRD-Net model, we designate a model with a single down-sampling as a 2-layer FRD-Net. Subsequently, we incrementally increase the down-sampling frequency, resulting in four models ranging from 2-layer to 5-layer FRD-Nets. As depicted in Table 8, 3-layer FRD-Net and 5-layer FRD-Net exhibit distinct advantages and disadvantages among the four models. In comparison to the 5-layer FRD-Net, the 3-layer FRD-Net demonstrates improvements in the Se (Sensitivity), F1, and AUC metrics by 1.11%, 0.16%, and 0.09%, respectively. However, the Sp (Specificity) and Acc (Accuracy) metrics experience reductions by 0.09% and 0.01%, respectively. From a parameter perspective, it is noteworthy that the 3-layer FRD-Net, relative to the 5-layer FRD-Net, experiences an 8.4-fold reduction. Regarding the five conventional metrics, the performance enhancement of the 3-layer FRD-Net is more pronounced, considering the significantly lower parameter count compared to the 5-layer FRD-Net. As shown in Table 8, experimental results indicate that the 3-layer FRD-Net effectively balances the compromise between the loss of detailed information and the influence of expanding the receptive field on model performance.

Table 8. Ablation experiments of FRD-Net with varying numbers of layers on the DRIVE dataset.

View Table | View all tables in this article

5.6 Cross-validation

In order to assess the generalization performance of our proposed model, we employed a cross-training strategy using the DRIVE and STARE datasets. In the first set of experiments, we trained the FRD-Net on the DRIVE dataset and subsequently tested its performance on the STARE dataset. The experimental results demonstrate that the FRD-Net achieved optimal scores across all five metrics. Specifically, in terms of Sensitivity (Se) reflecting accurate vessel segmentation and the overall segmentation performance measured by Accuracy (Acc), our FRD-Net attained 82.38% and 97.31%, respectively, in the cross-training experiment on the STARE dataset. However, when trained on the STARE dataset and tested on the DRIVE dataset, we observed suboptimal vessel segmentation and damage due to the scarcity of fine vessels in the ground truth of the STARE dataset. As shown in Table 9, the results of the second set of experiments reveal that compared to the first set, other methods exhibited inferior performance in the second set. In comparison to the other five methods, our proposed FRD-Net achieved superior performance in Se, Acc, Area Under the Curve (AUC), and F1-score, reaching 80.79%, 96.63%, 80.68%, and 98.07%, respectively. Overall, our FRD-Net, trained through cross-validation on both the DRIVE and STARE datasets, demonstrates satisfactory generalization performance as evidenced by the experimental results.

Table 9. Performance Comparison Using Cross-Training Strategy

View Table | View all tables in this article

6. Conclusion and future work

In this paper, we proposed a novel Retinal Vessel Segmentation Network (FRD-Net) that effectively utilizes dilated convolutions at full resolution. FRD-Net comprises two components: the main network and the Multi-Scale Feature Fusion module (MFFM). The main network consists of interactive multi-resolution dilated convolution layers, enabling the continuous learning of full-resolution representations to mitigate spatial information loss. Simultaneously, we employed an effective sequence of dilated convolutions to address their limitations, thereby preserving vessel details. To fuse features across scales for enhanced segmentation performance, we introduced the MFFM block to efficiently extract vessel structural information at different scales. This facilitates the direct transmission of multi-scale information-rich features from shallow layers to deep layers, preserving both thick and thin vessels from degradation due to down-sampling. Concurrently, it enhances the segmentation results’ precision while retaining edge details. Experimental results on four publicly available datasets (DRIVE, STARE, CHASE_DB1, and HRF) demonstrate that, compared to state-of-the-art retinal vessel segmentation methods, FRD-Net achieves better segmentation performance with fewer model parameters. In future work, we will attempt to validate the segmentation performance of the FRD-Net method in other medical image segmentation tasks to further confirm its effectiveness and generalization.

Funding

National Natural Science Foundation of China (12063002).

Hua Huang: Conceptualization, Methodology, Software, Visualization, Writing - original draft,Writing - review & editing.

Zhenhong Shang: Supervision, Project administration, Funding acquisition, Writing - review & editing.

Chunhui Yu: Data curation, Investigation, Validation.

Disclosures

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

Data underlying the results presented in this paper are available in [30–33]. We also appreciate the efforts devoted to collect and share the DRIVE, STARE, CHASE DB1 and HRF databases for retinal vessel segmentation.

References

1. Q. Guo, S. P. Duffy, K. Matthews, et al., “Microfluidic analysis of red blood cell deformability,” J. Biomech. 47(8), 1767–1776 (2014). [CrossRef]

2. W. Xu, H. Yang, M. Zhang, et al., “Decnet: A dual-stream edge complementary network for retinal vessel segmentation,” in 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), (2021), pp. 1595–1600.

3. R.-Q. Li, X.-L. Xie, X.-H. Zhou, et al., “Real-time multi-guidewire endpoint localization in fluoroscopy images,” IEEE Trans. Med. Imaging 40(8), 2002–2014 (2021). [CrossRef]

4. M. D. Abràmoff, M. K. Garvin, and M. Sonka, “Retinal imaging and image analysis,” IEEE Rev. Biomed. Eng. 3, 169–208 (2010). [CrossRef]

5. E. Imani, M. Javidi, and H.-R. Pourreza, “Improvement of retinal blood vessel detection using morphological component analysis,” Comput. Methods Programs Biomed. 118(3), 263–279 (2015). [CrossRef]

6. N. P. Singh and R. Srivastava, “Retinal blood vessels segmentation by using gumbel probability distribution function based matched filter,” Comput. Methods Programs Biomed. 129, 40–50 (2016). [CrossRef]

7. U. T. Nguyen, A. Bhuiyan, L. A. Park, et al., “An effective retinal blood vessel segmentation method using multi-scale line detection,” Pattern Recognit. 46(3), 703–715 (2013). [CrossRef]

8. R. Panda, N. Puhan, and G. Panda, “New binary Hausdorff symmetry measure based seeded region growing for retinal vessel segmentation,” Biocybern. Biomed. Eng. 36(1), 119–129 (2016). [CrossRef]

9. J. Mo and L. Zhang, “Multi-level deep supervised networks for retinal vessel segmentation,” Int. J. Comput. Assist. Radiol. Surg. 12(12), 2181–2193 (2017). [CrossRef]

10. C. Chen, J. H. Chuah, R. Ali, et al., “Retinal vessel segmentation using deep learning: a review,” IEEE Access 9, 111985–112004 (2021). [CrossRef]

11. O. O. Sule, “A survey of deep learning for retinal blood vessel segmentation methods: taxonomy, trends, challenges and future directions,” IEEE Access 10, 38202–38236 (2022). [CrossRef]

12. Z. Gu, J. Cheng, H. Fu, et al., “Ce-net: context encoder network for 2d medical image segmentation,” IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019). [CrossRef]

13. J. Wei, G. Zhu, Z. Fan, et al., “Genetic u-net: automatically designed deep networks for retinal vessel segmentation using a genetic algorithm,” IEEE Trans. Med. Imaging 41(2), 292–307 (2022). [CrossRef]

14. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, eds. (Springer International Publishing, 2015), pp. 234–241.

15. O. Oktay, J. Schlemper, L. L. Folgoc, et al., “Attention u-net: learning where to look for the pancreas,” arXiv, arXiv:1804.03999 (2018). [CrossRef]

16. M. Z. Alom, M. Hasan, C. Yakopcic, et al., “Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation,” arXiv, arXiv:1802.06955 (2018). [CrossRef]

17. L. Li, M. Verma, Y. Nakashima, et al., “Iternet: retinal image segmentation utilizing structural redundancy in vessel networks,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2020).

18. W. Luo, Y. Li, R. Urtasun, et al., “Understanding the effective receptive field in deep convolutional neural networks,” in Advances in Neural Information Processing Systems, vol. 29 D. Lee, M. Sugiyama, U. Luxburg, et al., eds. (Curran Associates, Inc., 2016).

19. C. Guo, M. Szemenyei, Y. Pei, et al., “Sd-unet: a structured dropout u-net for retinal vessel segmentation,” in 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE) (2019), pp. 439–444.

20. C. Guo, M. Szemenyei, Y. Yi, et al., “Sa-unet: spatial attention u-net for retinal vessel segmentation,” in 2020 25th International Conference on Pattern Recognition (ICPR) (2021), pp. 1236–1242.

21. L. Mou, Y. Zhao, H. Fu, et al., “Cs2-net: deep learning segmentation of curvilinear structures in medical imaging,” Med. Image Anal. 67, 101874 (2021). [CrossRef]

22. D. Wang, G. Hu, and C. Lyu, “Frnet: an end-to-end feature refinement neural network for medical image segmentation,” Vis. Comput. 37(5), 1101–1112 (2021). [CrossRef]

23. K. Sun, B. Xiao, D. Liu, et al., “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

24. J. Wang, K. Sun, T. Cheng, et al., “Deep high-resolution representation learning for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021). [CrossRef]

25. Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, et al., “Unet++: a nested u-net architecture for medical image segmentation,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4 (Springer, 2018), pp. 3–11.

26. Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, et al., “Unet++: redesigning skip connections to exploit multiscale features in image segmentation,” IEEE Trans. Med. Imaging 39(6), 1856–1867 (2020). [CrossRef]

27. W. Liu, H. Yang, T. Tian, et al., “Full-resolution network and dual-threshold iteration for retinal vessel and coronary angiograph segmentation,” IEEE J. Biomed. Health Inform. 26(9), 4623–4634 (2022). [CrossRef]

28. P. Wang, P. Chen, Y. Yuan, et al., “Understanding convolution for semantic segmentation,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018), pp. 1451–1460.

29. R. Hamaguchi, A. Fujita, K. Nemoto, et al., “Effective use of dilated convolutions for segmenting small object instances in remote sensing imagery,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018), pp. 1442–1450.

30. J. Staal, M. Abramoff, M. Niemeijer, et al., “Ridge-based vessel segmentation in color images of the retina,” IEEE Trans. Med. Imaging 23(4), 501–509 (2004). [CrossRef]

31. A. Hoover, V. Kouznetsova, and M. Goldbaum, “Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response,” IEEE Trans. Med. Imaging 19(3), 203–210 (2000). [CrossRef]

32. M. M. Fraz, P. Remagnino, A. Hoppe, et al., “An ensemble classification-based approach applied to retinal blood vessel segmentation,” IEEE Trans. Biomed. Eng. 59(9), 2538–2548 (2012). [CrossRef]

33. J. I. Orlando, E. Prokofyeva, and M. B. Blaschko, “A discriminatively trained fully connected conditional random field model for blood vessel segmentation in fundus images,” IEEE Trans. Biomed. Eng. 64(1), 16–27 (2017). [CrossRef]

34. Q. Jin, Z. Meng, T. D. Pham, et al., “Dunet: A deformable network for retinal vessel segmentation,” Knowledge-Based Syst. 178, 149–162 (2019). [CrossRef]

35. Y. Wu, Y. Xia, Y. Song, et al., “Nfn+: A novel network followed network for retinal vessel segmentation,” Neural Networks 126, 153–162 (2020). [CrossRef]

36. S. Hussain, F. Guo, W. Li, et al., “Dilunet: A u-net based architecture for blood vessels segmentation,” Comput. Methods Programs Biomed. 218, 106732 (2022). [CrossRef]

37. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014). [CrossRef]

38. H. Wu, W. Wang, J. Zhong, et al., “Scs-net: A scale and context sensitive network for retinal vessel segmentation,” Med. Image Anal. 70, 102025 (2021). [CrossRef]

39. J. Zhang, Y. Zhang, and X. Xu, “Pyramid u-net for retinal vessel segmentation,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021), pp. 1125–1129.

40. K. Ren, L. Chang, M. Wan, et al., “An improved u-net based retinal vessel image segmentation method,” Heliyon 8(10), e11187 (2022). [CrossRef]

41. H. Zhang, X. Zhong, G. Li, et al., “Bcu-net: Bridging convnext and u-net for medical image segmentation,” Comput. Biol. Med. 159, 106960 (2023). [CrossRef]

42. X. Li, Y. Jiang, M. Li, et al., “Lightweight attention convolutional neural network for retinal vessel image segmentation,” IEEE Trans. Ind. Inf. 17(3), 1958–1967 (2021). [CrossRef]

43. X. Deng and J. Ye, “A retinal blood vessel segmentation based on improved d-mnet and pulse-coupled neural network,” Biomed. Signal Process. Control. 73, 103467 (2022). [CrossRef]

44. B. Yang, L. Qin, H. Peng, et al., “Sddc-net: a u-shaped deep spiking neural p convolutional network for retinal vessel segmentation,” Digit. Signal Process. 136, 104002 (2023). [CrossRef]

45. L. Pan, Z. Zhang, S. Zheng, et al., “Msc-net: multitask learning network for retinal vessel segmentation and centerline extraction,” Appl. Sci. 12(1), 403 (2022). [CrossRef]

46. Y. Li, Y. Zhang, W. Cui, et al., “Dual encoder-based dynamic-channel graph convolutional network with edge enhancement for retinal vessel segmentation,” IEEE Trans. Med. Imaging 41(8), 1975–1989 (2022). [CrossRef]

47. Y. Huang and T. Deng, “Multi-level spatial-temporal and attentional information deep fusion network for retinal vessel segmentation,” Phys. Med. Biol. 68(19), 195026 (2023). [CrossRef]

48. G. Huang, Z. Liu, L. van der Maaten, et al., “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).

Datasets	DRIVE	STARE	CHASE_DB1	HRF
Total number	40	20	28	45
Train/Test number	20/20	16/4	20/8	30/15
Resolution(pixel)	$565 \times 584$	$605 \times 700$	$999 \times 960$	$3504 \times 2336$
Resize(pixel)	$400 \times 400$	$400 \times 400$	$720 \times 720$	$1008 \times 1008$
Augmented Training Data	100	80	100	150

Methods	Years	Se(%)	Sp(%)	Acc (%)	F1%)	AUC(%)
U-net [14]	2015	78.00	98.10	96.34	80.99	97.68
AttU-Net [15]	2018	79.06	98.31	96.62	80.39	97.74
R2U-Net [16]	2018	77.92	98.13	96.56	N/A	97.84
U-Net++[25]	2018	78.91	98.50	96.79	81.14	98.25
Iter-Net [17]	2020	81.77	98.32	96.87	81.81	98.39
SCS-Net [38]	2021	81.89	98.39	96.97	N/A	98.37
Pyramid U-Net [39]	2021	82.13	98.07	96.15	N/A	98.15
Ren et al [40]	2022	79.55	98.48	96.82	N/A	98.34
BCU-Net [41]	2023	81.42	98.16	96.67	80.96	97.91
CE-Net [12]	2019	80.90	98.24	96.72	80.32	98.35
SA-Unet [20]	2021	80.78	98.44	96.89	82.32	98.58
CS $^{2}$ -Unet [21]	2021	81.54	98.76	96.75	82.38	98.53
FR-Unet [27]	2022	81.27	98.75	96.82	82.42	98.56
FRD-Net(Ours)	2024	83.11	98.47	97.19	82.96	98.75

Methods	Years	Se(%)	Sp(%)	Acc (%)	F1%)	AUC(%)
U-net [14]	2015	80.90	98.38	96.78	78.68	96.64
AttU-Net [15]	2018	81.35	98.44	97.17	81.27	98.62
R2U-Net [16]	2018	82.45	98.62	97.12	N/A	98.56
U-Net++ [25]	2018	82.64	98.51	97.33	82.20	98.83
Iter-Net [17]	2020	82.18	98.46	96.87	82.18	98.74
SCS-Net [38]	2021	82.07	98.39	97.36	N/A	98.58
LA-Net [42]	2021	82.52	98.23	96.78	N/A	98.75
DM-Net [43]	2022	82.72	97.79	97.32	81.96	98.55
SDDC-Net [44]	2023	82.68	97.89	96.69	79.65	98.45
CE-Net [12]	2019	81.83	98.61	97.27	81.95	98.73
SA-Unet [20]	2021	79.05	98.66	96.85	78.22	97.78
CS $^{2}$ -Unet [21]	2021	81.54	98.56	96.75	82.38	98.53
FR-Unet [27]	2022	81.27	98.60	96.72	82.41	98.56
FRD-Net(Ours)	2024	82.49	98.84	97.81	82.54	99.04

Methods	Years	Se(%)	Sp(%)	Acc (%)	F1%)	AUC(%)
U-net [14]	2015	77.13	98.08	96.76	81.18	97.83
AttU-Net [15]	2018	82.25	98.43	97.41	79.98	98.47
R2U-Net [16]	2018	78.56	98.20	96.34	N/A	98.15
U-Net++ [25]	2018	83.17	98.50	97.43	80.94	98.61
Iter-Net [17]	2020	83.03	98.50	97.52	80.58	98.61
SCS-Net [38]	2021	83.65	98.39	97.44	N/A	98.67
MSC-Net [45]	2022	80.56	98.69	96.86	N/A	97.14
BCU-Net [41]	2023	80.13	98.63	97.46	79.99	97.70
CE-Net [12]	2019	81.78	98.31	97.46	81.20	98.59
SA-Unet [20]	2021	80.43	98.43	97.33	81.43	98.71
CS $^{2}$ -Unet [21]	2021	81.54	98.56	96.75	82.38	98.53
FR-Unet [27]	2022	81.27	98.75	97.42	82.41	98.56
FRD-Net(Ours)	2024	84.13	98.72	97.71	83.51	99.02

Methods	Years	Se(%)	Sp(%)	Acc (%)	F1%)	AUC(%)
U-net [14]	2015	77.38	98.27	96.63	80.27	97.70
AttU-Net [15]	2018	77.13	98.22	96.56	77.89	97.86
U-Net++ [25]	2018	79.50	98.32	96.84	79.80	98.11
Iter-Net [17]	2020	80.57	98.28	96.90	80.36	98.27
SCS-Net [38]	2021	81.14	98.23	96.87	N/A	98.42
DE-DCGCN-EE [46]	2022	81.69	98.25	96.95	80.97	98.45
MSAFNet [47]	2023	73.79	98.82	96.48	N/A	98.27
CE-Net [12]	2019	79.33	98.25	96.76	79.39	98.08
SA-Unet [20]	2021	77.30	98.75	96.86	82.51	98.67
CS $^{2}$ -Unet [21]	2021	81.24	98.76	96.75	82.38	98.53
FR-Unet [27]	2022	76.56	98.67	96.87	80.67	98.55
FRD-Net(Ours)a	2024	83.78	98.87	97.38	85.63	98.80

FRD-Net: a full-resolution dilated convolution network for retinal vessel segmentation

Abstract

1. Introduction

2. Related work

2.1 Retinal vessel segmentation

2.2 High/full-resolution network

3. Method

3.1 Network architecture

3.2 Backbone network

3.2.1 Effective sequential utilization of dilated convolutions

3.2.2 Multi-resolution interaction mechanisms and residual modules

3.3 Multi-scale feature fusion module (MFFM)

4. Materials and data preprocessing

4.1 Dataset

4.2 Data preprocessing

4.3 Experimental environment and parameter settings

4.4 Evaluation metrics

4.5 Loss function

5. Analysis of experimental results

5.1 Quantitative comparison experiments between FRD-Net and other methods

5.2 Visualization comparison of the method with other methods

5.3 Performance of FRD-Net in challenging areas of retinal vessel segmentation

5.4 Computational complexity

5.5 Experimental ablation

5.6 Cross-validation

6. Conclusion and future work

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (9)

Equations (6)

Biomedical Optics Express

Methods	Paremeters(M)	FLOPs(G)	Training time(S)	Inferencing time(S)
U-Net	7.78	533.81	29.25	0.034
CE-Net	29.00	54.64	20.07	0.020
Attu-Net	34.88	163.48	26.42	0.038
U-Net++	9.16	139.47	22.16	0.021
Iter-Net	13.60	194.46	23.24	0.019
CS $^{2}$ -Unet	8.40	136.77	21.34	0.024
FR-Unet	5.72	575.19	25.14	0.045
FRD-Net(Ours)	2.78	309.84	17.21	0.018

Datesets	Model Layers	Se(%)	Sp(%)	Acc (%)	F1%)	AUC(%)	Params(M)
	2	82.49	98.44	97.16	82.81	98.70	0.77
	3	83.11	98.47	97.19	82.96	98.75	2.78
DRIVE	4	83.12	98.42	97.16	82.83	98.73	8.62
	5	82.00	98.56	97.20	82.80	98.66	23.24

Training	Test	Method	Se(%)	Sp(%)	Acc (%)	F1%)	AUC(%)
		U-Net	69.71	98.06	95.91	72.14	96.60
		CE-Net	79.14	98.35	96.89	79.47	97.42
DRIVE	STARE	SA-Unet	78.52	98.17	96.64	78.08	97.77
		CS $^{2}$ -Net	79.89	98.29	96.98	78.76	97.45
		FR-Unet	80.25	98.37	96.86	79.81	97.82
		Ours	82.38	98.51	97.31	80.38	98.02
		U-Net	77.24	97.45	96.31	79.38	96.93
		CE-Net	77.41	97.52	96.31	78.49	96.98
STARE	DRIVE	SA-Unet	75.36	97.35	96.28	78.64	97.12
		CS $^{2}$ -Net	78.45	98.06	96.45	79.72	97.78
		FR-Unet	79.23	98.23	96.53	80.15	97.79
		Ours	80.79	98.14	96.63	80.68	98.07