Adaptive brightness fusion method for intraoperative near-infrared fluorescence and visible images

Chong Zhang; Chong Zhang; Kun Wang; Kun Wang; Jie Tian; Jie Tian; Jie Tian

doi:10.1364/BOE.446176

1. Introduction

Near-infrared fluorescence imaging has been widely used in the intraoperative phase of surgery for target tissue identification [1]. It has also been proven to accurately identify small lesion of lung cancer [2], liver cancer [3], and colorectal cancer peritoneal metastasis [4] during surgery. For this technique, the contrast agent is usually injected into the patient before surgery, and after the patient's metabolic circulation, the contrast agent will converge to the specific tissue. Contrast agents have fluorescent properties. When irradiated with excitation light, fluorescence signals will be excited. After signal acquisition and processing, localization and imaging of specific tissues can be achieved. Intraoperative fluorescence imaging collects fluorescence signals in the near-infrared spectrum, which are combined with visible light signals, to achieve precise imaging of specific tissues. Therefore, the fusion algorithm of near-infrared fluorescence and visible imaging is essential for making fluorescence imaging convenient to use. The fusion method can simultaneously obtain the structure and detailed information from visible and fluorescence images, realizing the fusion across multi-source information [5]. However, image brightness is easily influenced by ambient light and excitation light during surgical applications [6]. The imaging frames are often captured under suboptimal lighting conditions due to the inevitable interference of the imaging environment; this results in the need to constantly adjust manually the exposure and contrast parameters to adapt to changes in lighting so that the biological tissue information can be better imaged. As a consequence, this is not conducive to the convenient application of intraoperative optical imaging. Therefore, the lack of a fusion method that realizes the adaptive brightness adjustment of near-infrared fluorescence imaging is a problem that needs to be solved.

With regards to the fusion of near-infrared fluorescence imaging technology, the traditional method that obtains effective image overlays depends mainly on the selection of a pseudo-color map and transparency functions [7] without involving a specific fusion algorithm. For the fusion of infrared and visible light images, various methods have been proposed in recent years. Considering the implementation of fusion algorithms, they can be divided into two categories, fusion methods based on traditional algorithms, as well as methods based on deep learning. Traditional algorithms have concise and stable structure but poor flexibility, while deep learning-based algorithms have higher flexibility and adaptability but usually require high-performance hardware. Among traditional methods, multi-scale transform-based methods, such as the wavelet transform and pyramid algorithm, are some of the most popular [8]. The wavelet transform and pyramid fusion methods decompose the image according to different scales, extract the feature maps, and then, generate a fused image. Furthermore, sparse representation [9], subspace methods [10], and hybrid models [11] are also representative fusion methods, but they are often limited by complex fusion rule design [12]. Another fusion method is based on inference, such as Bayesian Fusion [5], which uses probability and statistics to infer the fused image. Besides those traditional methods, fusion algorithms based on deep learning have also shown a good fusion effect. The DenseFuse network [13] designs the encoder-decoder network, learns feature detection and extraction only through the encoder module, learns to restore images from feature maps through the decoder module, and achieves image fusion through the addition of feature maps, while avoiding large demand for infrared and visible images. The deep learning fusion method [14] is based on deep convolutional neural networks (DCNN), and can realize the fusion of RGB and infrared images. In addition, fusionGAN [15] shows good infrared fusion performance based on generative adversarial network, and using infrared and visible data pairs to train network. At the same time, multi-functional fusion method gradually emerges. U2Fusion [12] is a novel unified and unsupervised image fusion network, which can solve multi-model, multi-exposure, and multi-focus image fusion tasks. In addition, there is NestFuse [16], which is a multi-scale fusion network based on the attention fusion strategy. Similar multi-scale decomposition fusion networks also have MDLatLRR [17] and target-enhanced multiscale transform fusion method [18]. VIF-Net [19] is an unsupervised framework method for infrared and visible image fusion. SeAFusion [20] is a method that combines high-level vision tasks with image fusion, and proposes a semantic-aware image fusion framework. SDNet [21] is a general fusion framework that can be used for a variety of tasks. RFN-Nest [22] is an end-to-end, multi-scale residual fusion network. Li et al. proposed a meta learning-based deep framework for infrared and visible image fusion [23]. Li et al. proposed a multigrained attention network for fusion [24]. A latent low-rank representation method is proposed for infrared and visible image fusion [25]. Liu et al. proposed a fusion network of saliency detection combined with CNN [26]. However, most of the aforementioned fusion methods are realized on infrared data and does not have the fusion function with brightness adjustment. Different from them, our proposed method is designed to realize fusion on intraoperative near-infrared fluorescence imaging data under different light conditions. Fluorescence images show the distribution information of contrast agents in tissues and have less information on texture and structure compared to infrared images.

According to the demands of fluorescence imaging, this paper proposes an image fusion method with adaptive brightness adjustment for fluorescence and visible images. The main contributions of this work are presented as follows. First, we modify the network structure of Attention Unet [15], so that it can be used to fuse intraoperative near-infrared fluorescence images (FI) and visible images (VI). To achieve that, the ABFM network is proposed. The network can be divided into three parts, including the encoder, which is used to extract multi-dimensional features of FI and VI, fusion strategy part for the addition of same scale features, and the attention-based multi-scale decoder part for fusion images generation. Second, the network training process is done by integrating natural images and intraoperative imaging data (including both fluorescence and visible images). Natural images with adjusted brightness are used to train the network to adapt to the light change in the process of learning feature extraction and restoration. Besides that, seven groups of intraoperative imaging data are used as the internal validation dataset to further assist network training. Third, the proposed ABFM also integrates four types of loss functions. By combining the network design, training strategy, and loss function together, ABFM can automatically generate fused images with adaptive brightness adjustment, while avoiding the manual adjustment of brightness parameters during the fusion process. This ability meets the actual demand of surgeons, when they are using near-infrared fluorescence imaging for surgical navigation in operating rooms. Lastly, we conduct experiments to compare ABFM with multiple existing approaches, both qualitatively and quantitatively, using 70 groups of intraoperative imaging data, containing near-infrared fluorescence, visible, and fused images as the external testing dataset. The results show that ABFM successfully achieves image fusion and brightness adjustment at the same time. The data set used in this article is available [27].

The remainder of the paper is organized as follows. We first introduce the methodology of the adaptive brightness fusion method. Then, the implementation details of the experiments are described, and the experimental results are analyzed with regard to perception and quantification. Clinical application of ABFM under different light conditions is also analyzed. Finally, we briefly summarize the conclusions of the study.

2. Methods

Near-infrared fluorescence imaging generally adopts a hardware-based spectroscopic optical system design, as shown in Fig. 1. The collected optical beams are transmitted separately to different charge-coupled device (CCD) cameras and then, filtered to realize the collection of fluorescence signals in a specific spectrum. The light beam splitting design realizes the pre-registration of the fluorescence and visible images. In order to significantly present the in vivo distribution of the fluorescence contrast agent, the fusion algorithm in this paper follows the fluorescence imaging technology with green pseudo-color. Therefore, the fluorescence images are preprocessed to achieve the green imaging effect (shown in Fig. 1). In general, the imaging equipment obtains overlay images through transparency processing and superposition, and finally obtains GT images after manual adjustment of imaging parameters. The ABFM is proposed to omit the manual adjustment step and make the output fusion image close to GT, which requires the fusion algorithm to have the function of adaptive brightness adjustment. For this purpose, the training dataset is created by natural images with different brightness to realize adaptive brightness fusion and design the loss function for network training in order to constrain the network parameters effectively. In the following subsections, the network model, training strategy and loss function of the adaptive brightness fusion method are described in detail.

Fig. 1. Schematic diagram of near-infrared fluorescence imaging based on optical beam splitting. The light signal is divided by the splitting unit, the fluorescence signal is transmitted to the NIR camera, and the white light signal is transmitted to the visible light camera. The equipment obtains overlay images through transparency processing and superposition, and finally obtains GT images after manual adjustment of imaging parameters.

Download Full Size | PDF

2.1 Adaptive brightness fusion network

The brightness adaptive fusion network in this paper refers to the fusion method of DenseFuse [13], which is based on learning features instead of designing a direct fusion method to build the network structure. In this way, the problem of lacking intraoperative near-infrared fluorescence imaging data as training data can be resolved. However, DenseFuse's effect on feature extraction needs to be enhanced [12]. The network proposed in this paper adopts Attention Unet [28] structure to realize the extraction of multi-scale feature maps from the image; the Attention Unet module [29] can retain more key information in fluorescence and visible images through the addition of multi-scale features [30]. The Attention Unet network has shown its advantages regarding feature extraction and regression fit in the field of image segmentation [28]. The use of attention module will focus on the key features of the image [28,31], so as to make full use of self-similarity in order to capture the correlation between features across scales [32,33].According to fusion function, the network structure of ABFM is proposed, which including the multi-dimensional feature extraction part, same scale feature addition module for features fusion purpose, and the attention-based multi-scale feature reconstruction part. The solid line is the data transmission process during training, and the solid and dashed lines are the data fusion process during validation and testing (shown in Fig. 2). ABFM network is approximately the encoder-decoder architecture. The encoder part is used for feature extraction and the decoder part is used for feature reconstruction. The numbers subscripted at the output of each layer indicate the channels of the feature maps. The network includes an addition module for multi-dimensional feature fusion between images, and an attention module is integrated into each scale in the decoder.

Fig. 2. Network structure of adaptive brightness fusion method, including encoder module for feature extraction, decoder module for fusion image generation, addition module (yellow filled triangle and rectangle) and attention module (red filled circle) to each layer.

Download Full Size | PDF

More details of the network structure and training strategy are shown in Fig. 3. The overall structure and function of the network can be summarized into three parts, which are multi-dimensional feature extraction, same scale feature addition module, and attention-based feature reconstruction. When training the network, we focus on the feature extraction of images with different brightness, in order to make the network adapt to the influence of brightness changes. Therefore, the feature addition module is discarded during training. The original high-resolution natural images are sent to the encoder part after randomly adjusting the brightness. It is divided into 4 scales by maxpool, and image features are extracted at each scale. Then send the multi-dimensional features to the corresponding up-sample block respectively. Before the up-sample block, attention modules are added to calculate feature weights to highlight key features of each image scale. Thus, important features are preserved and the influence of brightness changes on the image is reduced. The brightness of the generated images can be approximated to that of the original images.

Fig. 3. Network structure details and training, validation and testing strategy.

Download Full Size | PDF

In particular, we add exposure loss to the loss function design, which has been proven to realize the adjustment of image brightness [34]. Combined with other loss functions, the training network can achieve better results. Retaining the attention module during training can effectively optimize the feature extraction function of each scale image. In the network design process, it is found that the fusion strategy by using the average of each scale feature can already achieve the fusion of fluorescence and visible images.

Because the implementation of ABFM focuses on the adaptive adjustment of brightness, compared with NestFuse [16], which the attention module is designed using in the fusion strategy, our method is used in the decoder part of image reconstruction. The attention module is used in image reconstruction and participates in the training process of the network, which can make the network more adaptable to the adjustment of image brightness. The addition modules are added to the validation process of network parameters adjustment and testing, as shown on the right in Fig. 3. The design of the validation strategy is to make the network adapt to the characteristics of fluorescence images. After each epoch of network training, we use 7 groups of fluorescence imaging data to calculate the evaluation metrics and focus on adjusting the weights of each loss function to gradually optimize the fusion effect. Finally, the network model is fixed, and another 70 groups of fluorescence imaging data are used to test the fusion results.

The detailed structure of the addition and attention modules is shown in Fig. 4. It shows the connection structure between the addition and the attention module. Among them, the addition module only adds calculations in the process of network validation and testing to achieve image fusion (as shown in formula (1)). This module will average the feature maps of each layer, and then, send the mean vector to the decoder. Formula (1) is given as follows:

(1)$$\begin{aligned} {f_1} &= \frac{1}{2}({x_{{1_{VI}}}} + {x_{{1_{FI}}}})\\ {f_2} &= \frac{1}{2}({x_{{2_{VI}}}} + {x_{{2_{FI}}}})\\ {f_3} &= \frac{1}{2}({x_{{3_{VI}}}} + {x_{{3_{FI}}}})\\ {f_4} &= \frac{1}{2}({x_{{4_{VI}}}} + {x_{{4_{FI}}}}) \end{aligned}$$

where ${x_{{1_{VI}}}}$-${x_{{4_{VI}}}}$ are the feature maps output for each layer of the visible images (VI) after convolution, and ${x_{{1_{FI}}}}$-${x_{{4_{FI}}}}$ are the feature maps calculated from fluorescence images (FI). After the summation operation, the fused feature maps ${f_1}$-${f_4}$ are obtained.

Fig. 4. Structure of attention and addition module.

Download Full Size | PDF

The attention module refers to attention gate [28]. Each attention gate learns to focus on a subset of target structures, so that more salient features can be accurately incorporated for image fusion. It is formulated as follows [28]:

(2)$$q_{att}^l = {\psi ^T}({\sigma _1}(W_x^Tx_i^l + W_g^T{g_i} + {b_g})) + {b_\psi }$$

(3)$$\alpha _i^l = {\sigma _2}(q_{att}^l(x_i^l,{g_i};{\Theta _{att}}))$$

where $x_i^l$ and ${g_i}$ are the feature maps sent to the module; ${W_x} \in {R^{{F_l} \times {F_{{\mathop{\rm int}} }}}}$, ${W_g} \in {R^{{F_g} \times {F_{{\mathop{\rm int}} }}}}$ and $\psi \in {R^{{F_{{\mathop{\rm int}} }} \times 1}}$ represent the convolution operation, and they are used for linear transformation of vectors. ${b_\psi } \in R$ and ${b_g} \in {R^{{F_{{\mathop{\rm int}} }}}}$ are bias terms, ${\sigma _1}$ represents the ReLU module, and ${\sigma _2}({x_{i,c}}) = \frac{1}{{1 + \exp ( - {x_{i,c}})}}$ is the sigmoid activation function. The mechanism is to learn the salient features of the image and obtain the weight as an attention map to act on the image.

2.2 Loss functions

In network training, four types of loss are designed for regression fitting, including pixel loss, perceptual loss, structural loss and exposure loss. Among them, pixel loss is used to narrow the distance between the output image and the reference image at the pixel level, including mean square error loss (Mse_loss) and SmoothL1_loss [35], as shown in formulas (4) and (5):

(4)$${L_{mse}}(x,y) = {(x - y)^2}$$

(5)$${L_{SmoothL1}}(x,y) = \left\{ \begin{array}{l} 0.5{(x - y)^2}\quad \quad \;\;\,{\kern 1pt} |{x - y} |< 1\\ |{x - y} |- 0.5\quad \quad (x - y) < - 1\;or\;(x - y) > 1 \end{array} \right.$$

where y is the output image, and x is the reference image, which is usually used as the ground truth (GT).

Perceptual loss is used to achieve the approximation of the output image and the reference image regarding the perceptual characteristics, including Perceptual_loss [36] and Style_loss [37]. They are both loss for feature extraction and the comparison is based on the pre-trained VGG19 network (by the Visual Geometry Group (VGG)) [38], as shown in formulas (6) and (7), respectively:

(6)$${l_{VGG(i,j)}} = \frac{1}{{{W_{i,j}}{H_{i,j}}}}\sum\limits_{x = 1}^{{W_{i,j}}} {\sum\limits_{y = 1}^{{H_{i,j}}} {{{({\phi _{i,j}}{{({I_{t\arg et}})}_{x,y}} - {\phi _{i,j}}{{({I_{output}})}_{x,y}})}^2}} }$$

where ${W_{i,j}}$, ${H_{i,j}}$ are the dimension parameters of feature maps. ${\phi _{i,j}}$ is the feature map obtained by the j^th convolution after activation and before the i^th max pooling layer within the VGG19 [36]. Then, we compare the Euclidean distance between the features of the generated image and the target image as perceptual loss.

(7)$${L_{style}} = G_{i,j}^l = \sum\nolimits_k {F_{i,k}^lF_{j,k}^l}$$

For Style_loss that is presented in formula (7), $F_{i,k}^l$ and $F_{j,k}^l$ represent the feature vectors after flattening of the output image and the target image. G stands for Gram matrix, which is a matrix obtained by the outer product of two vectors.

Structural loss is used to reduce structural error between images, including SSIM_loss (structural similarity index (SSIM)) [39], as shown in formula (8):

(8)$$SSIM(x,y) = \frac{{2{\mu _x}{\mu _y} + {C_1}}}{{\mu _x^2 + \mu _y^2 + {C_1}}} \times \frac{{2{\sigma _{xy}} + {C_2}}}{{\sigma _x^2 + \sigma _y^2 + {C_2}}}$$

where ${\mu _x}$ and ${\mu _y}$ are the average values of image pixels, $\sigma _x^2$, $\sigma _y^2$ and ${\sigma _{xy}}$ are the variances and covariance. The SSIM includes the product of three parts, i.e., the image illumination comparison, the image contrast comparison and the image structure comparison. The value of SSIM is between 0 and 1; when it is equal to 1, the structure is consistent.

Exposure loss [34] is expected to control the exposure level of the image and suppress too low/high brightness by measuring the distance between the average light intensity of local areas and the exposure value E. Based on previous studies [34,40,41], we set E to 0.6, and formula (9) is shown below:

(9)$${L_{\exp }} = \frac{1}{M}\sum\nolimits_{k = 1}^M {|{{Y_k} - E} |}$$

where M represents the number of image regions with size $\textrm{16} \times \textrm{16}$ that do not overlap, and Y represents the average intensity of the output image.

Therefore, the total training loss can be expressed as:

(10)$$\begin{aligned} {L_{total}} &= {W_{mse}}{L_{mse}} + {W_{smt}}{L_{SmoothL1}} + {W_{stl}}{L_{style}}\\ & + {W_{per}}{L_{perception}} + {W_{sim}}{L_{ssim}} + {W_{\exp}}{L_{exposure}} \end{aligned}$$

where ${W_{mse}}$, ${W_{smt}}$, ${W_{stl}}$, ${W_{per}}$, ${W_{sim}}$ and ${W_{\exp}}$ are the weights of the losses.

2.3 Implementation details and evaluation methods

2.3.1 Implementation details

We implemented the proposed ABFM with PyTorch 1.7.0, which is a library of Python 3.8. The code was run on a GPU (NVIDIA GeForce RTX 3080, 10 GB) and CPU (Intel Core i7-10700K @ 3.80GHz).

2.3.2 Evaluation methods

The entropy (EN) [42], standard deviation (SD) [15], root mean square error (RMSE) [43], peak signal-to-noise ratio (PSNR) [44], SSIM [39], Brightness and Contrast [39] are used to compare the proposed method with five other fusion algorithms.

EN is used in this paper to measure the amount of information contained in the fusion image, and is defined as follows:

(11)$$\textrm{EN ={-} }\sum\limits_{l = 0}^{L - 1} {{p_l}{{\log }_2}{p_l}}$$

where L is the number of gray level and ${p_l}$ is the normalized histogram corresponding to the gray level of the fused image. The larger the Entropy is, the more information it contains [15].

SD reflects the deviation of the pixel from the average and can be defined as follows:

(12)$$SD = \sqrt {\sum\limits_{i = 1}^M {\sum\limits_{j = 1}^N {{{(F(i,j) - \mu )}^2}} } }$$

where M x N is the dimension of the fusion image F and $\mu$ is the mean value of F. The higher the SD, the better the visual effect.

RMSE is an evaluation metric for image quality based on pixel error. It is used to measure the difference between the fusion image x and ground truth y, as shown in formula (13).

(13)$$RMSE(y,x) = \sqrt {\frac{1}{{M \times N}}\sum\limits_{i = 1}^M {\sum\limits_{j = 1}^N {{{(y(i,j) - x(i,j))}^2}} } }$$

where M and N represent the length and width of the image, respectively. The smaller the RMSE, the better the quality of the fused image.

PSNR is shown in formulas (14) and (15). It is used to measure the ratio between image information and noise. The larger the PSNR value, the better the quality of the fused image.

(14)$$MSE = \frac{1}{{MN}}\sum\limits_{i = 1}^M {\sum\limits_{j = 1}^N {{{(y(i,j) - x(i,j))}^2}} }$$

(15)$$PSNR = 10 \times {\log _{10}}(\frac{1}{{MSE}})$$

Brightness and contrast are separated from SSIM [39]. We use brightness and contrast similarity separately to evaluate the fusion performance. Mathematics of brightness and contrast can be defined as formulas (16) and (17):

(16)$$Brightness(x,y) = \frac{{2{\mu _x}{\mu _y} + {C_1}}}{{\mu _x^2 + \mu _y^2 + {C_1}}}$$

(17)$$Contrast(x,y) = \frac{{2{\sigma _x}{\sigma _y} + {C_2}}}{{\sigma _x^2 + \sigma _y^2 + {C_2}}}$$

where ${\mu _x}$ and ${\mu _y}$ are the average values of image pixels, ${\sigma _x}$ and ${\sigma _y}$ are the standard deviation. Both of them are between 0 and 1. The closer they are to 1, the more similar they are to the ground truth in brightness and contrast.

The Bayesian Fusion method (as the traditional one) is implemented in Matlab and the running code is available [45]. The DCNN (as one of the deep learning fusion methods) is implemented in PyTorch [46]. The running code of Pytorch version of DenseFuse is available [47]. The running code of U2Fusion is implemented in Pytorch [48]. NestFuse is implemented in Pytorch [49] with the max fusion strategy method. Because the original code of fusion algorithm was used on 1-channel images, we adjusted the data input to 3-channel and retrained the network. For ABFM development, natural images are used for training, and intraoperative fluorescence and visible images are used for validation and testing. Intraoperative fluorescence imaging data are obtained using FLARE R1 (Curadel, LLC, 11 Erie Drive, Natick, MA 01760 USA). The device has a color video camera and two near-infrared cameras, which can collect 700nm and 800nm emission signals [50]. 400-650nm and 689-725nm bandpass signals are used in the experiment. During the operation of the device, after adjusting manually brightness and contrast, fusion images are regarded as the ground truth to evaluate the fusion results of the above methods.

3. Experiments and results

3.1 Dataset preparation and experimental settings

Dataset preparation. This paper adopts a fusion method based on feature extraction and addition, i.e., when training the network, the focus is on the processes of feature extraction and image restoration from the feature map. When image fusion is needed, fluorescence and visible images are used simultaneously as inputs to extract features, and image fusion is realized by the addition of feature maps. Therefore, in the preparation of image data, different schemes are adopted for the training, validation and test datasets. The training dataset of our method is the natural image set of VOC2012 [51], which contains 16700 natural images with rich textures. During training, the addition module is ignored, and we only perform feature extraction and image restoration. To increase the dynamic range adjustment of brightness, our method uses preprocessed images with different intensity for training the network. Specifically, the preprocessing step consists of conversion of the RGB image to the hue, lightness, saturation (HLS) image, separation of the lightness component, and multiplication of the lightness component by a random brightness coefficient. The random procedure can ensure the brightness inconsistency of the training dataset. As for the brightness coefficient, the dynamic range set by the method in this paper is [0.5, 1.5]; this is a preliminary setting based on the clinical imaging effect. In the training procedure, the brightness-adjusted image and the original image are used as inputs in pairs and cropped to a size of 60 × 60, so that the image processing procedure is learnt from deviated brightness to moderate brightness.

To evaluate the fusion effect, intraoperative fluorescence imaging data are collected and used. Our validation and test datasets are composed of near-infrared fluorescence imaging data of breast cancer lymphatic vessels and breast tumors. From 25 surgical videos recorded in real-time, 77 groups of images with significantly different feature distributions are intercepted using a frame size of 1024 × 1024. Among them, seven groups are divided as the internal validation dataset to assist network training, and the other 70 groups are used as the external test dataset. Each group of images includes fluorescence, visible, and fusion images. The fluorescence image is black and white, and white represents the fluorescence signal. In order to improve the recognition of fluorescence signals, green pseudo-color is added to the fluorescence images, which is used in the next step of fusion. The pseudo-color setting procedure is to set the R and B channels of the RGB format fluorescence image to zero. In clinical applications, the output fusion images need to have the brightness and contrast parameters manually adjusted to maintain clear identification of the tissue under different lighting conditions. When testing the method, we use these manually adjusted fusion images as the ground truth to analyze the output results of the ABFM. The intraoperative near-infrared (NIR) fluorescence imaging videos were acquired by the Wuxi People’s Hospital (clinical trial number: ChiCTR1900020801 in http://www.chictr.org.cn/). The Institutional Review Board of the hospital approved the imaging study (KS2019004), and all patients provided informed consent.

Experimental settings. During training, the loss parameters are set as follows: ${W_{mse}} = 10$, ${W_{smt}} = 10$, ${W_{stl}} = 0.05$, ${W_{per}} = 0.76$, ${W_{sim}} = 13.4$, ${W_{\exp}} = \textrm{10}$, and the batch size is set to 47. In order to maintain the consistency of the results, random seeds are set to be 970. The Adam optimizer is used to optimize the network parameters, with the learning rate being set to $1{e^{ - 3}}$. During training, in order to prevent the network from overfitting of natural image features, we train only one epoch using natural images. Then, PSNR and SSIM are calculated at the same time through the validation dataset to further optimize the network parameters. Finally, a set of parameters with better results is selected for the final network.

3.2 Ablation Study

To explore the influence of different losses on the imaging fusion effect, an ablation study is carried out. We block each loss and retrain the network separately, while the parameters of loss are still set according to the previous settings. After training, part of the fluorescence and visible images is used for testing. The test results are shown in Fig. 5.

Fig. 5. Experimental results of ABFM under different loss ablation. Columns 1 and 2 exhibit 4 groups of fluorescence images with pseudo-color and the corresponding visible images. The last column shows the manually adjusted fusion images as the ground truth and column 9 is the ABFM results with all losses included.

Download Full Size | PDF

From left to right in Fig. 5, each column shows the lack of Exposure loss, Mse loss, SmoothL1 loss, Perception loss, SSIM loss and Style loss. The first two columns exhibit the fluorescence and visible images. The last two columns represent the ABFM with complete loss functions and the ground truth for comparison purposes.

Contribution of Each Loss. It can be seen from the test results that after excluding the exposure loss, the image has a brightness deviation. In the case of low-light input images, the output fusion image will also be darker. Therefore, Exposure loss is a constraint condition that controls the brightness of the output image in a constant interval. After excluding Mse loss or SmoothL1 loss, the probability of introducing noise will be increased. It is speculated that Mse loss and SmoothL1 loss act on image pixels to reduce the noise generation. In the absence of Perception loss, the fluorescence sharpness will be reduced. Perception loss play a major role in increasing the sharpness of image details and textures. After the SSIM loss is removed, the fused image shows a white fog phenomenon, which reduces the visibility of the image structure details. The SSIM loss has a greater effect on the restoration of the image structure. After Style loss is removed, it shows a degree of color deviation.

3.3 Intraoperative dataset evaluation

To test the proposed method, 70 groups of intraoperative breast cancer fluorescence imaging data are used in this paper. The test results are compared with the other five fusion methods that have been mentioned previously.

Visual and Perceptual Comparisons. The network can be divided into encoder and decoder parts according to the function. We add pseudo-color to the fluorescence images and then input them to the encoder part of the network together with the visible images. The feature maps of the two images are obtained separately, added through the addition module, and after that, the fused feature maps are the inputs to the attention-based decoder module to restore the fusion image. Part of the fusion results is shown in Fig. 6. The first row is the fluorescence image with added pseudo-color, the second row is the visible image, and the third row is the overlay image, which simulates the imaging device process by using transparent processing and overlay methods. The last row is the ground truth, which is the fusion image after manually adjusting brightness and contrast. The penultimate row is the fusion image of our method, the fourth row is the DenseFuse method, the fifth row is the DCNN method, the sixth row is the Bayesian Fusion method, the seventh row is the U2Fusion method, and the eighth row is the NestFuse method.

Fig. 6. Fusion results of proposed method and comparison methods. Rows 1 and 2 are fluorescence and visible images, row 3 are overlay images directly integrated by using transparent processing and overlay method (device fusion method). Row 1 and 2 from the bottom are the ground truth (manual adjust imaging parameters) and ABFM results (our method), rows 4,5,6,7,8 are the fusion results of DenseFuse, DCNN, Bayesian, U2Fusion and NestFuse, respectively.

Download Full Size | PDF

From the comparison of the results, it can be seen that the proposed ABFM in this paper has better performance in terms of adaptive brightness adjustment. Under low illumination conditions, fusion images from our method have higher brightness compared to the original overlay image, as well as the images by using DenseFuse, DCNN, Bayesian Fusion, U2Fusion and NestFuse. Under normal brightness conditions, our method still shows a stable fusion result with moderate lightness. Furthermore, under the condition of over-exposure, as shown in the fifth group of Fig. 6 (column 5), the brightness of the over-exposure area will also be reduced.

Quantitative Comparisons. We perform a quantitative analysis on the results of the above fusion methods. Our example is the third group of the fusion results in Fig. 6 (column 3). We draw a red line at the same position of the fusion images, obtain the intensity change curves and display them all in a single graph. Then, we compare the intensity difference between the fusion images of our method (ABFM), existing methods and the ground truth. As shown in Fig. 7, the intensity curve that is quantified by the ABFM (blue line) is closer to the intensity curve of the ground truth (red line), compared to the other five methods. This shows that our method has a better brightness adaptation effect under a low-light fusion condition.

Fig. 7. Comparison of image intensity curves of different fusion methods, including the ground truth, ABFM, DenseFuse, DCNN, Bayesian, U2Fusion and NestFuse. The lines are drawn at the same position (red horizontal lines in the seven images), and the intensity curves are drawn in a single graph.

Download Full Size | PDF

Further, the EN, SD, RMSE, PSNR, SSIM, Brightness and Contrast are calculated to evaluate the fusion results of different algorithms. The test dataset consists of 70 groups of intraoperative fluorescence and visible images. We calculate the aforementioned metrics between the fusion results and the ground truth, respectively. EN and SD are calculated only on the output fusion results. As shown in Table 1, the statistical results of the above metrics are used for comparison purposes.

Table 1. Statistics of Fusion Results Using Different Methods

View Table | View all tables in this article

It can be seen from Table 1 that the proposed ABFM has better performance, mainly because it has the ability to adjust the brightness of the fused image adaptively. This makes the fusion result closer to the reference acquired from manual adjustment. However, SSIM of ABFM (0.7491) is lower than the value of NestFuse (0.7664). It indicates NestFuse can preserve more structural information, which is closer to visible images, but NestFuse cannot interpret the fluorescence information very well. The contrast of fluorescence signal may be even low by using NestFuse, which is not conducive to identification, as shown in the first group of images (column 1 in Fig. 6). Besides that, Contrast (0.9503) is not as high as the value of Brightness (0.9775) by using ABFM method. This indicates that compared with the brightness adjustment, the proposed method has limited effect on contrast adjustment. These defect needs further optimization in future studies.

Further, we compare the running time of the above algorithms. Because the original codes of DCNN and Bayesian are completed based on CPU and did not use GPU acceleration, so these two methods are not included in the comparison. The above four methods are tested on 70 image groups of 1024 × 1024 pixel. The operation time includes image reading, algorithm model loading, image generation. The average time (s) per image is counted in Table 2. From the statistical results, it can be seen that DenseFuse with a relatively simple network takes less time. NestFuse is slow in operation due to the dense network connection and many parameters. The operation speed of ABFM proposed in this paper is 0.349s, and further optimization is needed to shorten the processing time.

Table 2. Statistics of Processing Time Using Different Methods^a

View Table | View all tables in this article

4. Analysis of clinical application in different light conditions

Near-infrared fluorescence imaging technology is an intraoperative imaging technique that uses a contrast agent to gather in the specific tissue. Fluorescence signals in the near-infrared spectrum are excited through laser irradiation, and then collected to achieve imaging, thereby realizing real-time visualization of specific tissues. During the clinical application of the device, the fluorescence and visible images in the same field of view need to be fused to realize the localization of the fluorescent tissue. The fusion method applied on the device is realized by superimposing the fluorescence image with green pseudo color on the visible light image after transparent processing (the parameter is usually 50%). Overlay images, which simulates these procedures, are shown in Fig. 8. Because the effect of this fusion method is limited, it is often necessary to manually adjust parameters such as exposure and contrast of the overlay image in order to better distinguish and locate the fluorescent tissue. However, manual adjustment will increase the operation time and reduce the convenience of using the equipment. At the same time, manual adjustment may not necessarily achieve a better fusion effect in some cases.

Fig. 8. Comparison of fusion results and image intensity curves under different illumination and fluorescence intensities, through the equipment (Overlay), the fusion method proposed in this paper (ABFM) and the further optimization by manually adjustment (GT). The brightness evaluation curves are obtained by quantifying the dashed yellow lines shown in GT of 4 groups (A,B,C,D), respectively.

Download Full Size | PDF

In the following, the function of ABFM in adaptive brightness adjustment is analyzed in combination with several situations in clinical application. Examples of several different illumination and fluorescence intensity situations are shown in Fig. 8.

Figure 8 shows four cases: low ambient light, normal ambient light, low ambient light and weak fluorescence signal, and normal ambient light and strong fluorescence signal.

Low ambient light. When the ambient light is weak, the captured VI shown in row 1 of Fig. 8 is dark. FI with green pseudo color is shown in row 1 and column 2 and overlay image is shown in row 1 and column 3, which is the initial imaging effect without manually adjustment. It can be seen that it is difficult to identify fluorescence tissue in the overlay image due to low ambient light. After manual adjustment, better results can be achieved and used as ground truth (row 1, column 5 in Fig. 8). By using the ABFM algorithm and inputting VI and FI, the output image is obtained (column 4 in Fig. 8). We draw a yellow line at the same position of Overlay, ABFM and GT images, obtain the intensity change curves and display them all in a single graph. It can be seen that after the fusion processing of ABFM, the output can be close to the manual adjustment result, the overall brightness is improved, and the contrast of fluorescence is raised, thereby improving the recognizability of fluorescent tissue.

Normal ambient light. When the ambient light is sufficient and the fluorescence intensity in the FI image is moderate, the fluorescence tissue can be better identified in the overlay image. On this basis, further manual adjustment of the exposure parameter can improve the overall brightness and obtain the GT image. The ABFM algorithm can still adjust the brightness adaptively under normal ambient light condition to achieve a better fusion effect. Quantitative analysis is performed to obtain a brightness evaluation image. The triangle marked the fluorescence positions at the yellow line. It can be seen that ABFM can improve the brightness and highlight the fluorescent tissues compared with Overlay.

Low ambient light and weak fluorescence signal. As shown in the third row of Fig. 8 is the case of low ambient light and weak fluorescence. The overlay image is affected by light and fluorescence intensity, making it difficult to observe tissue and fluorescence. It is necessary to manually adjust the parameters to obtain a better fusion image (GT) so that the fluorescence tissue can be observed. However, using ABFM can directly obtain the fusion image with automatically adaptive brightness adjustment, which can realize the identification of fluorescence tissue. Brightness evaluation image also shows that the intensity of ABFM at the yellow line is close to the curve of GT. Both of the two are better than the overlay image.

Strong fluorescence signal. The fourth row of Fig. 8 is a group of strong fluorescence and normal ambient light. Fluorescence tissue can be observed in the overlay image. However, because the overlay has the process of 50% transparent, it will be slightly dark. After adjusting the exposure parameter, the fluorescence in GT is over exposed. In contrast, ABFM can improve the overall brightness while preserving the characteristics of the fluorescence image and avoiding over exposure. It can also be seen from the brightness evaluation image that the curve in the circle of GT is different from the curve of original overlay image, while the quantization curve of ABFM is a proportional increase of it, which can retain the original details.

In summary, ABFM can realize the fusion of FI and VI with adaptive brightness, avoid the steps of manual parameter adjustment, and approach the fusion effect to GT. Especially in the dark environment, it can improve the recognition of fluorescence tissue. At the same time, for strong fluorescence imaging condition, ABFM can be more convenient to improve the imaging brightness and reduce the possibility of over exposure.

5. Conclusions

This paper proposes a fusion method (ABFM) that can adjust brightness adaptively. The method is based on the Attention Unet structure design. The addition module is added hierarchically to achieve multi-scale feature map extraction and addition, which can better perform the task of brightness adaptive fusion. The network training only requires learning the parameters of the encoder module of feature extraction and the decoder module of image restoration based on the feature map. In order to enhance the ability of the network to process images with different brightness, the attention module is set on each scale layer of the decoder to realize the brightness adjustment in the image reconstruction process. A natural image dataset that is preprocessed with regards to brightness adjustment is used for training; this effectively prevents the shortage problem of intraoperative fluorescence images. Seven and 70 groups of fluorescence and visible images from intraoperative breast cancer surgery are used for internal validation and external testing. After considering the fusion images with manually adjusted brightness and contrast as the references, the experimental results show that our method has more flexible adaptive brightness adjustment functions than DenseFuse, DCNN, U2Fusion, NestFuse and Bayesian Fusion. The ABFM has a better brightness enhancement function for low illumination images and brightness adjustment of over-exposed images, but it also has the problem of reducing brightness for strong fluorescence images. This will result in a certain degree of contrast reduction for fluorescence fusion images. In future research on the fusion method of intraoperative fluorescence imaging, various imaging problems that may arise will be considered comprehensively. In particular, the problem of contrast adjustment can be taken into account for the design of the image fusion algorithm to achieve improvement in image quality. The structure of ABFM can be further adjusted to reduce computational complexity and processing time. In the field of near-infrared fluorescence imaging, the use of artificial intelligence can open up some work on target tracking and large-scale image stitching. Especially for endoscopic imaging, image stitching can assist doctors to observe a larger surgical field. In addition, fluorescence imaging combined with 3D reconstruction of the lesion area can achieve more precise tissue positioning, which is expected to promote the development of clinical precision surgery.

Funding

Ministry of Science and Technology of China (2017YFA0205200); National Natural Science Foundation of China (62027901, 81930053); Chinese Academy of Sciences (QYZDJ-SSW-JSC005, YJKYYQ20180048); the Project of High-Level Talents Team Introduction in Zhuhai City (Zhuhai HLHPTP201703).

Acknowledgments

The authors would like to acknowledge the instrumental and technical support of Multimodal Biomedical Imaging Experimental Platform, Institute of Automation, CAS and the clinical trials support of the Wuxi People's Hospital.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Near-infrared fluorescence imaging data underlying the results presented in this paper are available in Ref. [27].

References

1. A. L. Vahrmeijer, M. Hutteman, J. R. Van Der Vorst, C. J. Van De Velde, and J. V. Frangioni, “Image-guided cancer surgery using near-infrared fluorescence,” Nat. Rev. Clin. Oncol. 10(9), 507–518 (2013). [CrossRef]

2. Y. Mao, C. Chi, F. Yang, J. Zhou, K. He, H. Li, X. Chen, J. Ye, J. Wang, and J. Tian, “The identification of sub-centimetre nodules by near-infrared fluorescence thoracoscopic systems in pulmonary resection surgeries,” European Journal of Cardio-Thoracic Surgery 52(6), 1190–1196 (2017). [CrossRef]

3. K. Gotoh, T. Yamada, O. Ishikawa, H. Takahashi, H. Eguchi, M. Yano, H. Ohigashi, Y. Tomita, Y. Miyamoto, and S. Imaoka, “A novel image-guided surgery of hepatocellular carcinoma by indocyanine green fluorescence imaging navigation,” J. Surg. Oncol. 100(1), 75–79 (2009). [CrossRef]

4. G. Liberale, S. Vankerckhove, M. G. Caldon, B. Ahmed, M. Moreau, I. E. Nakadi, D. Larsimont, V. Donckier, and P. Bourgeois, “Fluorescence imaging after indocyanine green injection for detection of peritoneal metastases in patients undergoing cytoreductive surgery for peritoneal carcinomatosis from colorectal cancer,” Ann. Surg. 264(6), 1110–1115 (2016). [CrossRef]

5. Z. Zhao, S. Xu, C. Zhang, J. Liu, and J. Zhang, “Bayesian fusion for infrared and visible images,” Signal Processing 177, 107734 (2020). [CrossRef]

6. R. R. Zhang, A. B. Schroeder, J. J. Grudzinski, E. L. Rosenthal, J. M. Warram, A. N. Pinchuk, K. W. Eliceiri, J. S. Kuo, and J. P. Weichert, “Beyond the margins: real-time detection of cancer using targeted fluorophores,” Nat. Rev. Clin. Oncol. 14(6), 347–364 (2017). [CrossRef]

7. J. T. Elliott, A. V. Dsouza, S. C. Davis, J. D. Olson, K. D. Paulsen, D. W. Roberts, and B. W. Pogue, “Review of fluorescence guided surgery visualization and overlay techniques,” Biomed. Opt. Express 6(10), 3765–3782 (2015). [CrossRef]

8. J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: a survey,” Inform. Fusion 45, 153–178 (2019). [CrossRef]

9. J. Wang, J. Peng, X. Feng, G. He, and J. Fan, “Fusion method for infrared and visible images by using non-negative sparse representation,” Infrared Phys. Technol. 67, 477–489 (2014). [CrossRef]

10. D. P. Bavirisetti, G. Xiao, and G. Liu, “Multi-sensor image fusion based on fourth order partial differential equations,” in 2017 20th International Conference on Information Fusion (Fusion) (IEEE, 2017), 1–9.

11. Y. Liu, S. Liu, and Z. Wang, “A general framework for image fusion based on multi-scale transform and sparse representation,” Inform. fusion 24, 147–164 (2015). [CrossRef]

12. H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2Fusion: a unified unsupervised image fusion network,” IEEE Trans. on Pattern Analysis and Machine Intelligence (2020).

13. H. Li and X.-J. Wu, “Densefuse: A fusion approach to infrared and visible images,” IEEE Trans. on Image Process. 28(5), 2614–2623 (2019). [CrossRef]

14. H. Li, X.-J. Wu, and J. Kittler, “Infrared and visible image fusion using a deep learning framework,” in 2018 24th International Conference on Pattern Recognition (ICPR) (IEEE, 2018), 2705–2710.

15. J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “FusionGAN: A generative adversarial network for infrared and visible image fusion,” Inform. Fusion 48, 11–26 (2019). [CrossRef]

16. H. Li, X.-J. Wu, and T. Durrani, “NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models,” IEEE Trans. Instrum. Meas. 69(12), 9645–9656 (2020). [CrossRef]

17. H. Li, X.-J. Wu, and J. Kittler, “MDLatLRR: A novel decomposition method for infrared and visible image fusion,” IEEE Trans. on Image Process. 29, 4733–4746 (2020). [CrossRef]

18. J. Chen, X. Li, L. Luo, X. Mei, and J. Ma, “Infrared and visible image fusion based on target-enhanced multiscale transform decomposition,” Inf. Sci. 508, 64–78 (2020). [CrossRef]

19. R. Hou, D. Zhou, R. Nie, D. Liu, L. Xiong, Y. Guo, and C. Yu, “VIF-Net: an unsupervised framework for infrared and visible image fusion,” IEEE Trans. Comput. Imaging 6, 640–651 (2020). [CrossRef]

20. L. Tang, J. Yuan, and J. Ma, “Image fusion in the loop of high-level vision tasks: a semantic-aware real-time infrared and visible image fusion network,” Inform. Fusion 82, 28–42 (2022). [CrossRef]

21. H. Zhang and J. Ma, “SDNet: a versatile squeeze-and-decomposition network for real-time image fusion,” Int. J. Comput. Vis. 129(10), 2761–2785 (2021). [CrossRef]

22. H. Li, X.-J. Wu, and J. Kittler, “RFN-Nest: An end-to-end residual fusion network for infrared and visible images,” Inform. Fusion 73, 72–86 (2021). [CrossRef]

23. H. Li, Y. Cen, Y. Liu, X. Chen, and Z. Yu, “Different input resolutions and arbitrary output resolution: a meta learning-based deep framework for infrared and visible image fusion,” IEEE Trans. on Image Processing 30, 4070–4083 (2021). [CrossRef]

24. J. Li, H. Huo, C. Li, R. Wang, C. Sui, and Z. Liu, “Multigrained attention network for infrared and visible image fusion,” IEEE Trans. Instrum. Meas. 70, 1–12 (2021). [CrossRef]

25. H. Li and X.-J. Wu, “Infrared and visible image fusion using latent low-rank representation,” arXiv preprint arXiv:1804.08992 (2018).

26. D. Liu, D. Zhou, R. Nie, and R. Hou, “Infrared and visible image fusion based on convolutional neural network model and saliency detection via hybrid l0-l1 layer decomposition,” J. Electronic Imaging 27(6), 063036 (2018). [CrossRef]

27. J. Tian,“ABFM dataset,” Radiomics, 2022, http://www.radiomics.net.cn/owncloud/index.php/s/SyDcAEz5H380aUA.

28. O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, and B. Kainz, “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999 (2018).

29. Q. Yan, B. Wang, W. Zhang, C. Luo, W. Xu, Z. Xu, Y. Zhang, Q. Shi, L. Zhang, and Z. You, “An attention-guided deep neural network with multi-scale feature fusion for liver vessel segmentation,” IEEE J. Biomed. Health Inform. 25(7), 2629–2642 (2021). [CrossRef]

30. W. Cao, M. J. Pomeroy, Y. Gao, M. A. Barish, A. F. Abbasi, P. J. Pickhardt, and Z. Liang, “Multi-scale characterizations of colon polyps via computed tomographic colonography,” Vis. Comput. Ind. Biomed. Art 2(1), 25 (2019). [CrossRef]

31. L. Chen, R. Liu, D. Zhou, X. Yang, and Q. Zhang, “Fused behavior recognition model based on attention mechanism,” Vis. Comput. Ind. Biomed. Art 3(1), 7 (2020). [CrossRef]

32. Y. Mei, Y. Fan, Y. Zhang, J. Yu, Y. Zhou, D. Liu, Y. Fu, T. S. Huang, and H. Shi, “Pyramid attention networks for image restoration,” arXiv preprint arXiv:2004.13824 (2020).

33. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv preprint arXiv:1706.03762 (2017).

34. C. Guo, C. Li, J. Guo, C. C. Loy, J. Hou, S. Kwong, and R. Cong, “Zero-reference deep curve estimation for low-light image enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), 1780–1789.

35. R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, (2015), 1440–1448.

36. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, and Z. Wang, “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2017), 4681–4690.

37. L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), 2414–2423.

38. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).

39. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing 13(4), 600–612 (2004). [CrossRef]

40. T. Mertens, J. Kautz, and F. Van Reeth, “Exposure fusion,” in 15th Pacific Conference on Computer Graphics and Applications (PG'07), (IEEE, 2007), 382–390.

41. T. Mertens, J. Kautz, and F. Van Reeth, “Exposure fusion: A simple and practical alternative to high dynamic range photography,” in Computer graphics forum (Wiley Online Library, 2009), 161–171.

42. J. W. Roberts, J. A. Van Aardt, and F. B. Ahmed, “Assessment of image fusion procedures using entropy, image quality, and multispectral classification,” J. Appl. Remote Sens 2(1), 023522 (2008). [CrossRef]

43. J.-B. Martens and L. Meesters, “Image dissimilarity,” Signal processing 70(3), 155–176 (1998). [CrossRef]

44. M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale video prediction beyond mean square error,” arXiv preprint arXiv:1511.05440 (2015).

45. Bayesian Fusion, running code available at https://github.com/Zhaozixiang1228/Bayesian-Fusion.

46. DCNN, running code available at https://github.com/GrimReaperSam/imagefusion_pytorch.

47. DenseFuse, running code available at https://github.com/hli1221/densefuse-pytorch.

48. U2Fusion, running code available at https://github.com/ytZhang99/U2Fusion-pytorch.

49. NestFuse, running code available at https://github.com/hli1221/imagefusion-nestfuse.

50. S. L. Troyan, V. Kianzad, S. L. Gibbs-Strauss, S. Gioux, A. Matsui, R. Oketokoun, L. Ngo, A. Khamene, F. Azar, and J. V. Frangioni, “The FLARE™ intraoperative near-infrared fluorescence imaging system: a first-in-human clinical trial in breast cancer sentinel lymph node mapping,” Ann. Surg. Oncol. 16(10), 2943–2952 (2009). [CrossRef]

51. M. Everingham, S. Eslami, L. V. Gool, C. Williams, J. Winn, and A. Zisserman, “The Pascal visual object classes challenge: a retrospective,” Int. J. Comput. Vis. 111(1), 98–136 (2015). [CrossRef]

Method	EN	SD	RMSE	PSNR(dB)	SSIM	Brightness	Contrast
Bayesian	6.1848	86.84	0.1489	16.79	0.7391	0.8849	0.9078
DenseFuse	6.5746	99.67	0.0933	21.71	0.7351	0.9626	0.9416
U2Fusion	6.2093	77.20	0.1912	14.62	0.5747	0.6625	0.9310
DCNN	6.2860	82.37	0.1299	18.32	0.7190	0.9267	0.9446
NestFuse	6.4515	104.30	0.1264	18.20	0.7664	0.9170	0.9180
ABFM	6.7358	114.42	0.0730	23.14	0.7491	0.9775	0.9503

Method	EN	SD	RMSE	PSNR(dB)	SSIM	Brightness	Contrast
Bayesian	6.1848	86.84	0.1489	16.79	0.7391	0.8849	0.9078
DenseFuse	6.5746	99.67	0.0933	21.71	0.7351	0.9626	0.9416
U2Fusion	6.2093	77.20	0.1912	14.62	0.5747	0.6625	0.9310
DCNN	6.2860	82.37	0.1299	18.32	0.7190	0.9267	0.9446
NestFuse	6.4515	104.30	0.1264	18.20	0.7664	0.9170	0.9180
ABFM	6.7358	114.42	0.0730	23.14	0.7491	0.9775	0.9503

Adaptive brightness fusion method for intraoperative near-infrared fluorescence and visible images

Abstract

1. Introduction

2. Methods

2.1 Adaptive brightness fusion network

2.2 Loss functions

2.3 Implementation details and evaluation methods

2.3.1 Implementation details

2.3.2 Evaluation methods

3. Experiments and results

3.1 Dataset preparation and experimental settings

3.2 Ablation Study

3.3 Intraoperative dataset evaluation

4. Analysis of clinical application in different light conditions

5. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (2)

Equations (17)

Biomedical Optics Express