Multi-kernel driven 3D convolutional neural network for automated detection of lung nodules in chest CT scans

Ruoyu Wu; Changyu Liang; Jiuquan Zhang; QiJuan Tan; QiJuan Tan; Hong Huang; Hong Huang

doi:10.1364/BOE.504875

1. Introduction

Lung cancer is one of the malignant tumors that pose the greatest menace to human life and health. As reported in a statistical analysis of the global cancer burden, lung cancer accounts for nearly 11.4% and 18% of 19.3 million new cancer cases and 10.0 million cancer deaths, respectively [1]. The five-year survival of early-stage lung cancers is significantly higher than that of advanced lung cancers, thus early detection and timely treatment are effective solutions for lung cancer [2,3]. Chest computed tomography (CT) is a frequent way for noninvasive lung cancer screening, and it contributes to decreasing the mortality of high-risk individuals [4,5]. Lung nodule is the principal clinical manifestation of early lung cancer, and CT-based nodule location detection is an indispensable procedure in lung cancer screening [6–9].

In clinical diagnosis, to conduct a thorough examination, radiologists are usually required to read dozens or even hundreds of CT slices for each patient in a slice-by-slice manner, while such work is labor-intensive and easy to cause operator bias [10–12]. As a result, computerized lung nodule position detection is an active research topic in the medical imaging analysis field, which aims at assisting clinicians to improve diagnostic efficiency [13–18]. As illustrated in Fig. 1, lung nodules change greatly in scale, appearance and intensity, and they may occur anywhere in lungs and are often surrounded by complex background tissues. Therefore, it is crucial to extract 3D multi-scale discriminative features for achieving accurate nodule detection.

Fig. 1. Visualization of lung nodules on transverse, sagittal and coronal planes in thoracic CT scans, and it can be found that lung nodules change greatly in scale, shape, density, and lesion location. The lung nodule positions are marked by red arrows.

Download Full Size | PDF

In recent years, many methodologies have been designed for automated lung nodule location, and they could be grouped into two main patterns, including traditional detection modes and deep learning-based fashions [19–21]. In the former detection means, threshold-based algorithms, morphological operations, energy optimization approaches and clustering methods are generally used to detect nodules from CT images [22]. For example, Rezaie et al. [23] first exploited image segmentation and threshold limit to select the regions of interest which may have nodule objects, and then an edge detection method was used for nodule location. EI-Regaily et al. [24] presented a hybrid algorithm to detect nodules that incorporates several traditional approaches, including thresholding algorithm, region growth and morphological operation. Generally, when candidate nodules are located, some hand-crafted features such as shape, intensity, context feature and texture feature are further designed for false positive reduction [25–27]. However, the traditional nodule detection methods heavily depend on domain experiences to extract features, and they could merely describe single or partial nodule characteristics.

Deep learning possesses excellent feature extraction capability, and it has shown outstanding performance in the computer vision field [28–32]. Convolutional neural network (CNN) is one of the most influential deep learning paradigms, and a variety of CNN-based models have been developed for the medical image analysis work [33–37]. In the nodule detection task, 2D CNN-based approaches have been widely employed [38–40]. Xie et al. [41] constructed a 2D CNN-based nodule detection architecture composed of three sub-models with the same network structure, in which each sub-model consists of a feature extraction module, a deconvolutional layer and two region proposal modules, and the results produced by the three sub-models are merged to acquire nodule candidates. Nguyen et al. [42] employed a Faster Region-based 2D CNN structure as the backbone, and an adaptive anchor box size generating strategy was proposed to improve the nodule detection sensitivity. Ramachandran et al. [43] introduced the You Only Look Once (YOLO) architecture which is a frequently-employed model for object detection in 2D natural image set into the lung nodule detection task, and they used a large-sized input to get good detection results. Although improved detection performance has been attained, it is difficult for 2D CNN models to learn the 3D spatial structure information of CT images, which may limit the further improvement of detection accuracy.

Recently, developing a 3D CNN-based learning model is the mainstream research for the analysis tasks of medical images with inherent 3D attributes [44–49]. Wang et al. [50] exploited 3D convolutional networks to simultaneously learn spatial and spectral features of medical hyperspectral images, and a specific loss function was designed to improve network performance. Guan et al. [51] proposed an attention mechanism-based 3D model to automatically analyze brain magnetic resonance imaging data, the model introduced a squeeze-excite module and attention guide filter module to enhance feature learning ability. Dou et al. [52] designed a two-stage framework with 3D convolutional operation for lung nodule detection, they first established a fully convolutional network based on a training strategy with online sample filtering for candidate screening, then developed a hybrid-loss residual model to identify nodule objects from the candidates. Liao et al. [53] developed a 3D UNet-like structure to achieve automatic nodule location, and a set of 3D residual learning blocks was designed to extract 3D presentation information of lung nodules. Zhu et al. [54] developed a nodule detection framework called DeepLung, which can benefit both from the advantages of dense connection and residual learning by using a 3D dual-path network structure. Mei et al. [55] designed a slice-aware network where a slice grouped non-local structure is introduced for learning long-distance relationships in the feature map, and a new dataset was collected for estimating the nodule detection performance of the network. Lin et al. [56] presented a 3D nodule detection architecture named Inception Residual UNet++ (IR-UNet++), the IR-UNet++ model combined the ResNet and Inception as the building block, and embedded a squeeze-and-excitation architecture into the building block for better feature learning. Zhu et al. [57] employed a U-shaped 3D residual structure to achieve computer-aided lung nodule detection, and an improved attention gate and a channel interaction unit were designed to improve detection sensitivity. Jian et al. [58] developed a 3D convolutional model termed 3D Deep Attention and Global Search Network (3DAGNet) for lung nodule detection, the 3DAGNet included a global and channel module to strengthen the global and spatial information learning capability, and a multi-layer module to capture the multi-level feature. Xu et al. [59] proposed a slice grouped domain attention (SGDA)-based nodule detection method to enhance the generalization performance, the SGDA module worked in the axial, sagittal, and coronal directions for exploring the inter-dependencies of each group feature mapping in different directions. However, the above-mentioned methods are based on CNN with fixed kernel size and single-mode pooling operation, and this will limit their ability to describe complex lung nodule CT images with variable lesion size, appearance and density.

To address the aforementioned limitations, a multi-kernel driven 3D convolutional neural network (MK-3DCNN) is proposed to improve nodule detection accuracy. Unlike conventional two-stage detection frameworks (including nodule candidate detection and false positive reduction), our approach abandons the false-positive reduction procedure and trains an end-to-end model to achieve automated nodule detection in a one-stage learning paradigm. The main contributions of this work are concluded as follows.

(1) To overcome the limitation that traditional single receptive field-based convolutional networks are difficult to effectively cope with the variable imaging morphology of lung nodules, a multi-kernel joint learning algorithm is proposed to fully explore the 3D multi-scale discriminative information of lung nodule CT images, which contributes to improving nodule detection performance.

(2) A multi-mode mixed pooling strategy integrating the max pooling, average pooling, and center cropping pooling is designed to surrogate the conventional single-mode pooling fashion, these three different types of pooling operations can complement each other, and more comprehensive nodule descriptions can be obtained in this way.

(3) To evaluate the effectiveness of our MK-3DCNN in clinical application, a new dataset CQUCH-LND annotated through the biopsy-based cytological analysis is collected. Experimental results on the public dataset LUNA16 and the clinical dataset CQUCH-LND prove that the MK-3DCNN achieves superior performance than some state-of-the-art nodule detection approaches and possesses a good generalization ability.

The rest of this article is arranged as follows. Section 2 describes the two employed datasets LUNA16 and CQUCH-LND. In Section 3, the proposed MK-3DCNN method is detailed. To evaluate the nodule detection performance of the MK-3DCNN, experimental results on the LUNA16 and the CQUCH-LND are provided and analyzed in Section 4. The advantages and disadvantages of the proposed method are discussed in Section 5. Section 6 concludes this paper.

2. Dataset description

In this study, the most commonly used public dataset LUNA16 [13] and the Chongqing University Cancer Hospital Lung Nodule Diagnosis (CQUCH-LND) dataset constructed from a grade-A tertiary cancer hospital are adopted for evaluating the lung nodule detection performance of the developed MK-3DCNN framework.

2.1 LUNA16 dataset

The LUNA16 is the largest public dataset for computerized lung nodule location in chest CT images at present, and it is constructed from the well-known publicly available database LIDC-IDRI collected by 7 academic institutions and 8 medical imaging corporations [13,60]. After removing those cases with missing slices, inconsistent pixel spacing and scanning thickness $>$ 3 mm from the LIDC-IDRI, the LUNA16 has a total of 888 scans, and these scans are provided in the MHD/RAW format.

In the LUNA16, the slice number of each scan varies from 95 to 764, the scanning thickness alters from 0.45 to 2.5 mm, the pixel spacing changes from 0.46 to 0.98 mm, and all slices have the same size of $512 \times 512$ pixels. Moreover, for the LUNA16, only the nodules $\,\ge \,$ 3 mm and annotated by at least 3 out of 4 radiologists are considered as positive samples. In this sampling rule, 1186 lung nodule examples are collected in the dataset. The LUNA16 gives rich annotations including the center coordinate and scale information for each sampled lung nodule, as well as the lung segmentation images of all CT scans. In addition, the LUNA16 explicitly affords the patient-level data split based on a 10-fold cross-validation, and more details concerning the LUNA16 could be gained at https://luna16.grand-challenge.org/.

2.2 CQUCH-LND dataset

The CQUCH-LND is collected from Chongqing University Cancer Hospital, and it is consented by the review committee for use in this work. This dataset includes 263 low-dose CT scans (DICOM format) that are acquired from the Philips Brilliance 64 spiral CT scanner, and all CT data have been anonymized to protect patient privacy. In the CQUCH-LND, the slice quantity in each case alters from 128 to 715, the scanning thickness changes from 0.5 to 2.0 mm, the slice resolution ranges from 0.54 to 0.97 mm, and all slices are fixed to the scale of $512 \times 512$ pixels. In this dataset, a total of 263 lung nodules are labeled in the light of the biopsy-based cytological analysis, and the lung nodule scale alters from 3 to 30 mm, which is similar to that of the LUNA16 dataset.

3. Proposed method

3.1 Overview

Lung nodules possess great discrepancies in size, appearance and density, and exploring 3D multi-scale discriminative representations is a remarkable approach to boost detection performance. Given this fact, a multi-kernel driven 3D convolutional neural network (MK-3DCNN) is proposed to fulfill automated lung nodule detection, and the general structure of the MK-3DCNN model is displayed in Fig. 2. As exhibited in Fig. 2, the MK-3DCNN framework uses a UNet-like encoder-decoder structure as the backbone network to utilize the multi-layer features of the deep model, and introduces a region proposal network (RPN) [61] as the output module to generate high-quality proposals. In the encoder part of the MK-3DCNN, a multi-kernel joint learning model is developed to capture multi-scale lung nodule information. Furthermore, a residual learning module combining a multi-model mixed pooling (M$^3$P) operation is designed to learn more comprehensive descriptions of nodule CT images, which could relieve the problem of information loss caused by the traditional single-model pooling manner. In addition, the decoder part mainly involves three components, including the deconvolution layer, residual learning unit, and concatenation operation.

Fig. 2. The general framework of the presented MK-3DCNN method. $M$ represents the spatial scale (height=width=number of slices), ${C_{e1}}$-${C_{e5}}$ and ${C_{d1}}$-${C_{d4}}$ denote the channel number, and ${L_{inf}}$ means the embedded position information. The MK-3DCNN is trained end-to-end adopting a multi-task loss ${{\rm {{\cal L}}}_{dec}}$ composed of regression loss ${{\rm {{\cal L}}}_{reg}}$ and classification loss ${{\rm {{\cal L}}}_{cla}}$.

Download Full Size | PDF

In the following sections, the above contents will be detailed. For convenience, Table 1 sketches the mathematical symbols used in this paper.

Table 1. Notations and definitions.

View Table | View all tables in this article

3.2 Image preprocessing and input patch cropping

Thoracic CT scans in a dataset are generally gotten from diverse scanners and patients, which unavoidably leads to the inconsistencies of intensity distribution and spatial resolution (slice thickness and pixel pitch). As a result, standardizing original CT data is essential for data-driven models to achieve satisfactory detection performance. Moreover, due to the actuality that lung nodules only occur in the lung region, thus it is important to remove the background area for reducing computation overhead and improving detection accuracy. Figure 3 illustrates the main processes of our designed CT image preprocessing approach.

Fig. 3. Image preprocessing procedure.

Download Full Size | PDF

As shown in Fig. 3, we first normalize the image intensity of all original CT data to a unified distribution. Specifically, a window-level $\left [ {{\zeta _{min}},{\zeta _{max}}} \right ]$ is employed to prune the intensity values of CT images, and then pruned values are further mapped to $\left [ {0,{\zeta _{nor}}} \right ]$. Then, the lung region mask are used to eliminate background areas, and this processing procedure mainly includes the following steps. (a) Mask images are binarized by setting a threshold. (b) Convex hull computation operation is performed to effectively involve the lung nodules sticking to the lung walls. (c) The lung region is extracted by multiplying the mask with the intensity normalized image, and the background area outside the mask is filled by using an intensity of common tissues. (d) The intensities of the background areas with high luminance (e.g., bone) are clipped for interference information suppression. In the public dataset LUNA16, the lung segmentation annotation provided in this dataset is employed as the mask label. For the clinical dataset CQUCH-LND, a threshold-based method [53] is introduced to obtain the lung region mask. Further, the cubic spline interpolation approach is exploited to resample each scan to a unified spatial resolution $\gamma \times \gamma \times \gamma$.

After the above steps, redundant background areas are removed to improve computational efficiency. In addition, considering limited GPU memory, small 3D patches with a voxel size $h \times w \times s$ (height $\times$ width $\times$ slice) are cropped from the pre-processed CT scans to be used as the inputs of the proposed model. Following the settings in [12,53,54], $\left [ {{\zeta _{min}},{\zeta _{max}}} \right ]$ is set to $\left [ {{\rm {\ }\hbox{-}{\rm \ }}1000,400} \right ]$, $\gamma \times \gamma \times \gamma$ is set to $1 \times 1 \times 1$ mm, ${\zeta _{nor}}$ is set to 255, and $h \times w \times s$ is set to $96 \times 96 \times 96$ voxels.

Data augmentation is a functional strategy for avoiding the overfitting problem of data-driven methods in small sample learning tasks. To increase the diversity and quantity of training samples, the extracted patches are stochastically flipped and rotated. Furthermore, inspired by several previous works [53,54], we also use a cropping ratio between 0.75 to 1.25 to augment the dataset. In this augmentation operation, if a cropped patch is smaller than the set input size, the patch is padded with a constant to adjust it to the input size. Similarly, if a cropped patch is larger than the set input size, the excess part is removed. The above size adjustment process is only performed on one side of each dimension of the patch to change the center position of the patch.

3.3 Multi-kernel joint learning block

In traditional lung nodule detection networks, the convolutional kernels in each layer are generally designed to share the same size, and this learning paradigm using fixed receptive fields is difficult to effectively capture the discriminative feature of nodule CT images with variable lesion sizes. Aiming at this restriction, a multi-kernel joint learning module (MKJLM) is developed to enhance the nodule detection ability. Furthermore, given the truth that the size information of a lung nodule will be gradually encoded into the high-level representations as the network depth increases and the scale of convolution kernel matters less compared to the case in shallower layers [62], two MKJLM modules with the same network structure are stacked and then embedded in the MK-3DCNN framework for continuous multi-scale feature learning from input patches. Figure 4 shows the first embedded MKJLM module.

Fig. 4. Network details of the designed MKJLM.

Download Full Size | PDF

As described in Fig. 4, to effectively capture the feature information of lung nodule CT images with various lesion scales and appearances, three 3D convolution operations ${\rm {{\cal F}}}_{conv}^1$, ${\rm {{\cal F}}}_{conv}^2$ and ${\rm {{\cal F}}}_{conv}^3$ with different kernel sizes $k_d^1 \times k_d^1 \times k_d^1$, $k_d^2 \times k_d^2 \times k_d^2$ and $k_d^3 \times k_d^3 \times k_d^3$ are performed in parallel on the input ${X_{in}} \in {{\rm {\Re }}^{1 \times h \times w \times s}}$, which can be defined as ${\rm {{\cal F}}}_{conv}^1$: ${X_{in}} \to {U_1} \in {{\rm {\Re }}^{C \times H \times W \times S}}$, ${\rm {{\cal F}}}_{conv}^2$: ${X_{in}} \to {U_2} \in {{\rm {\Re }}^{C \times H \times W \times S}}$ and ${\rm {{\cal F}}}_{conv}^3$: ${X_{in}} \to {U_3} \in {{\rm {\Re }}^{C \times H \times W \times S}}$ , respectively. Then, an element-wise summation $\oplus$ is introduced for fusing the feature tensors generated by the multiple convolution branches:

(1)$$V = {U_1} \oplus {U_2} \oplus {U_3}$$

Subsequently, the global average pooling ${{\rm {{\cal F}}}_{gap}}$ is added to obtain the global information as ${G_{inf}} \in {{\rm {\Re }}^{C \times 1 \times 1 \times 1}}$, and this operation can be described by

(2)$${G_{inf}}(i) = {{\rm{{\cal F}}}_{gap}}(V(i,j,k,l)) = \frac{1}{{H \times W \times S}}\sum_{j = 1}^H {\sum_{k = 1}^W {\sum_{l = 1}^S {V(i,j,k,l)} } } {\kern 1pt} {\kern 1pt} ,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} 1 \le i \le C$$

in which ${G_{inf}}(i)$ is $i$th element of ${G_{inf}}$. Further, the output feature map of this multiple convolutional kernel collaborative learning module ${F_{out}} \in {{\rm {\Re }}^{C \times H \times W \times S}}$ can be obtained by

(3)$$\begin{array}{l} {F_{out}} = {{\tilde U}_1} \oplus {{\tilde U}_2} \oplus {{\tilde U}_3}\\ \;\;\;\;\;\;\;\; = \left( {{{\tilde q}_c} \otimes {U_1}} \right) \oplus \left( {{{\tilde k}_c} \otimes {U_2}} \right) \oplus \left( {{{\tilde v}_c} \otimes {U_3}} \right) \end{array}$$

where $\otimes$ represents element-wise multiplication. ${\tilde q_c}$, ${\tilde k_c}$ and ${\tilde v_c}$ refer to the Softmax operation for the elements among ${q_c}$, ${k_c}$ and ${v_c}$, and the motive of this stage is to adaptively aggregate the feature maps produced by different convolution operations in an end-to-end learnable fashion. Furthermore, ${q_c}$ could be computed as

(4)$$\begin{array}{l} {q_c}{\rm{ = }}\varphi \left( {W_{fc}^2\left( {\varphi \left( {W_{fc}^1\left( {{G_{inf}}} \right)} \right)} \right)} \right) \end{array}$$

in which $\varphi$ denotes the activation function Rectified Linear Unit (ReLU), $W_{fc}^1 \in {{\rm {\Re }}^{{C \mathord {\left / {\vphantom {C r}} \right. } r} \times C}}$ and $W_{fc}^2 \in {{\rm {\Re }}^{C \times {C \mathord {\left / {\vphantom {C r}} \right. } r}}}$ are the weights of the fully-connected layers ${\rm {{\cal F}}}_{fc}^1$ and ${\rm {{\cal F}}}_{fc}^2$, and $r$ means reduction ratio. The computational procedures of ${k_c}$ and ${v_c}$ are like that of ${q_c}$.

3.4 Multi-mode mixed pooling-based residual learning block

The pooling layer is an essential functional module in the standard CNN structure, and it is employed for scale reducing and information refining of feature maps. The average pooling and max pooling are two popular selections in existing CNN-based nodule detection approaches, but both of them only focus on a certain information component (average component or maximum component) in feature maps, which will inevitably cause information loss during the pooling process and is detrimental to the feature learning of lung nodule CT images with variable lesion sizes and image intensities (e.g., the intensities of solid nodules is significantly higher than that of ground glass nodules [10]).

Considering the above fact and inspired by the recent advance in residual learning [63], four multi-mode mixed pooling(M$^3$P)-based residual learning modules are designed and placed after the multi-kernel joint learning block for progressive deep feature extraction. As shown in Fig. 5, each M$^3$P-based residual learning module is composed of three 3D residual learning units and one M$^3$P unit. Moreover, a residual learning unit includes two identical convolutional layers with kernel size ${k_r} \times {k_r} \times {k_r}$, two batch normalization (BN) layers, two ReLU operations, and one residual connection. The learning procedure of the residual learning unit is defined as

(5)$$\begin{array}{l} X_r^{l + 1} = \varphi \left( {{{\rm{{\cal F}}}_{res}}\left( {X_r^l,{W_{res}}} \right) \oplus X_r^l} \right) \end{array}$$

where $\oplus$ denotes element-wise summation, $X_r^l \in {{\rm {\Re }}^{C \times H \times W \times S}}$ and $X_r^{l + 1} \in {{\rm {\Re }}^{C \times H \times W \times S}}$ are 4D tensors of input and output in a residual learning unit, $\varphi$ is the activation function ReLU, ${{\rm {{\cal F}}}_{res}}\left ( {X_r^l,{W_{res}}} \right )$ represents a learning function, and it is calculated as

(6)$$\begin{array}{l} {{\rm{{\cal F}}}_{res}}\left( {X_r^l,{W_{res}}} \right) = W_{res}^2\left( {\varphi \left( {W_{res}^1\left( {X_r^l} \right)} \right)} \right) \end{array}$$

in which $W_{res}^1$ and $W_{res}^2$ denote the weights of two continuous 3D convolutional operations.

Fig. 5. The structure of the developed M$^3$P-based residual learning module. The spatial sizes of feature tensors $X_r^l$, ${X'_r}$, ${X''_r}$ and $X_r^{l + 1}$ are kept at $H \times W \times S$ by a padding strategy, the scale of the convolutional kernels ${K'_r}$ and ${K''_r}$ are both ${k_r} \times {k_r} \times {k_r}$, The shortcut between $X_r^l$ and $X_r^{l + 1}$ is employed to prevent gradient vanishing.

Download Full Size | PDF

In the MK-3DCNN framework, the traditional single-mode pooling computing is extended to the designed M$^3$P operation which allows the extraction of more comprehensive sensitive information of lung nodules. As exhibited in Fig. 6, the M$^3$P model reasonably integrates three different pooling operations to complement each other, including max pooling, average pooling, and center cropping pooling [64].

Fig. 6. Illustration of the developed M$^3$P module. The input $X_p^l$ of M$^3$P is the convolutional feature gained from the residual learning unit, ${X_{cenp}}$ is the center area cropped from $X_p^l$, ${X_{maxp}}$ and ${X_{avep}}$ are max pooled and average pooled feature maps, respectively. The above three pooled features are concatenated to form the mixed map ${X_{mixp}}$, and the ${X_{mixp}}$ is further processed through a channel-wise squeeze operation to obtain the final output $X_p^{l + 1}$ of this module.

Download Full Size | PDF

Given a feature tensor $X_p^l \in {{\rm {\Re }}^{C \times H \times W \times S}}$ generated from the previous convolutional layer, the output feature map $X_p^{l + 1} \in {{\rm {\Re }}^{C \times H' \times W' \times S'}}$ of the developed M$^3$P model can be expressed as

(7)$$\begin{array}{l} X_p^{l + 1} = \psi \left[ {{X_{maxp}},{X_{avep}},{X_{cenp}}} \right] \end{array}$$

where $\left [ \cdot \right ]$ denotes concatenation operation, $\psi$ represents a channel-wise squeezing operation, ${X_{maxp}} \in {{\rm {\Re }}^{C \times H' \times W' \times S'}}$, ${X_{avep}} \in {{\rm {\Re }}^{C \times H' \times W' \times S'}}$ and ${X_{cenp}} \in {{\rm {\Re }}^{C \times H' \times W' \times S'}}$ are formed from the max pooling ${{\rm {{\cal F}}}_{maxp}}\left ( {X_p^l} \right )$, average pooling ${{\rm {{\cal F}}}_{avep}}\left ( {X_p^l} \right )$, and center cropping pooling ${{\rm {{\cal F}}}_{cenp}}\left ( {X_p^l} \right )$, respectively. In our proposed MK-3DCNN method, the size of the $H' \times W' \times S'$ is half that of the $H \times W \times S$, and a convolutional layer with kernel scale $1 \times 1 \times 1$ is exploited to achieve the channel-wise squeezing.

3.5 Model output

As exhibited in Fig. 2, in the decoding section of the MK-3DCNN framework, two deconvolutional layers and two residual learning modules are exploited to continuously decode the features extracted from the encoding part. Among them, the residual module is made up of three residual units in the designed M$^3$P-based residual learning model (i.e. without the pooling part), and meanwhile, two concatenation operations are added to learn the multi-level features of lung nodule CT images. In addition, following the work in [53], the location information ${L_{inf}}$ of the proposal is introduced in the MK-3DCNN to get better nodule detection performance.

After decoding the learned features, two convolutional layers with the kernel scale $1 \times 1 \times 1$ are used to map the obtained feature tensor to the result output with the dimension ${C_{d4}} \times \left ( {{M \mathord {\left / {\vphantom {M 4}} \right. } 4}} \right ) \times \left ( {{M \mathord {\left / {\vphantom {M 4}} \right. } 4}} \right ) \times \left ( {{M \mathord {\left / {\vphantom {M 4}} \right. } 4}} \right )$. Further, the output 4D tensor is resized to $5 \times {N_{anc}} \times \left ( {{M \mathord {\left / {\vphantom {M 4}} \right. } 4}} \right ) \times \left ( {{M \mathord {\left / {\vphantom {M 4}} \right. } 4}} \right ) \times \left ( {{M \mathord {\left / {\vphantom {M 4}} \right. } 4}} \right )$ (i.e. ${C_{d4}}{\rm {\ =\ }}{N_{anc}} \times 5$), in which ${N_{anc}}$ is the number of anchors. Inspired by the RPN, to cope with the variable lung nodule sizes, three different anchors are designed at every location, with the side lengths of 5, 10, and 20, respectively (ie. the value of ${N_{anc}}$ is 3). Moreover, the 5 regression values are $\left ( {\tilde g,{{\tilde v}_x},{{\tilde v}_y},{{\tilde v}_z},{{\tilde v}_r}} \right )$, the activation function Sigmoid is performed for the $\tilde g$, and this procedure is defined as

(8)$$\begin{array}{l} {\tilde p_{cla}} = \frac{1}{{1 + \exp \left( { - \tilde g} \right)}} \end{array}$$

In addition, a non-maximum suppression (NMS) operation [61] is introduced for reducing redundancy and optimizing detection results.

3.6 Loss function

The proposed MK-3DCNN framework is a one-stage lung nodule detection model, and it can concurrently predict nodule probabilities and nodule locations. The MK-3DCNN network is trained end-to-end exploiting a multi-task loss that contains a classification loss and a regression loss, and the overall training loss ${{\rm {{\cal L}}}_{dec}}$ is defined as

(9)$$\begin{array}{l} {{\rm{{\cal L}}}_{dec}} = {\lambda _{add}}{{\rm{{\cal L}}}_{cla}} + {p_{gt}}{{\rm{{\cal L}}}_{reg}} \end{array}$$

in which ${\lambda _{add}}$ denotes the summation weight, ${p_{gt}} \in \left \{ {0,1} \right \}$ means the label of an anchor box (1 for positive samples and 0 for negative samples), ${{\rm {{\cal L}}}_{cla}}$ and ${{\rm {{\cal L}}}_{reg}}$ are the classification loss and regression loss, respectively. In the MK-3DCNN architecture, the cross entropy loss [64] is adopted as the ${{\rm {{\cal L}}}_{cla}}$, and it is defined by

(10)$$\begin{array}{l} {{\rm{{\cal L}}}_{cla}} ={-} {p_{gt}}log\left( {{{\tilde p}_{cla}}} \right) - \left( {1 - {p_{gt}}} \right)log\left( {1 - {{\tilde p}_{cla}}} \right) \end{array}$$

in which ${\tilde p_{cla}}$ means predicted probability.

We denote the bounding box of an anchor as $\left ( {{a_x},{a_y},{a_z},{a_r}} \right )$ and the ground truth bounding box of a lung nodule object as $\left ( {{g_x},{g_y},{g_z},{g_r}} \right )$, in which the first three elements represent the center point coordinates and the last one means the size of the bounding box. The intersection over union (IoU) [61] is exploited for determining anchor label. Specifically, if the IoU between an anchor and the ground truth bounding box is larger than ${V_{ps}}$, the anchor is labeled as a positive sample (i.e. ${p_{gt}} = 1$), meanwhile, if the IoU is smaller than ${V_{ns}}$, the corresponding anchor is served as a negative sample (i.e. ${p_{gt}} = 0$). The other cases are not considered in the training procedure.

Moreover, the regression labels $\left ( {{v_x},{v_y},{v_z},{v_r}} \right )$ of bounding boxes are calculated by

(11)$$\begin{array}{l} \begin{array}{l} {v_x} = {{\left( {{g_x} - {a_x}} \right)} \mathord{\left/ {\vphantom {{\left( {{g_x} - {a_x}} \right)} {{a_r}}}} \right. } {{a_r}}}\\ {v_y} = {{\left( {{g_y} - {a_y}} \right)} \mathord{\left/ {\vphantom {{\left( {{g_y} - {a_y}} \right)} {{a_r}}}} \right. } {{a_r}}}\\ {v_z} = {{\left( {{g_z} - {a_z}} \right)} \mathord{\left/ {\vphantom {{\left( {{g_z} - {a_z}} \right)} {{a_r}}}} \right. } {{a_r}}}\\ {v_r} = \log \left( {{{{g_r}} \mathord{\left/ {\vphantom {{{g_r}} {{a_r}}}} \right. } {{a_r}}}} \right) \end{array} \end{array}$$

The predicted values corresponding to the above regression labels are ${\tilde v_x}$, ${\tilde v_y}$, ${\tilde v_z}$ and ${\tilde v_r}$, respectively. Further, regression loss ${{\rm {{\cal L}}}_{reg}}$ could be defined as

(12)$$\begin{array}{l} {{\rm{{\cal L}}}_{reg}} = \sum \limits_{ip \in \left\{ {x,y,z,r} \right\}} {{{\rm{{\cal L}}}_{smoothl1}}\left( {{v_{ip}},{{\tilde v}_{ip}}} \right)} \end{array}$$

in which, the loss function smooth L1 [65] is exploited as the ${{\rm {{\cal L}}}_{smoothl1}}$, thus ${{\rm {{\cal L}}}_{smoothl1}}$ is defined by

(13)$$\begin{array}{l} {{\rm{{\cal L}}}_{smoothl1}}\left( {v,\tilde v} \right) = \left\{ \begin{array}{l} 0.5{\left( {v - \tilde v} \right)^2},\;\;\;\;\;\;\;\;\;if\left| {v - \tilde v} \right| < 1\\ \left| {v - \tilde v} \right| - 0.5,\;\;\;\;\;\;\;otherwise \end{array} \right. \end{array}$$

In our experiments, ${V_{ps}}$ and ${V_{ns}}$ are set to 0.5 and 0.02, respectively.

4. Experimental results and analysis

4.1 Experimental design

In this work, the benchmark dataset LUNA16 and the clinical dataset CQUCH-LND are employed to evaluate the nodule detection performance of the presented MK-3DCNN, and several state-of-the-art (SOTA) approaches are utilized for result comparing. To obtain reliable nodule detection results, we conduct 10-fold cross-validation experiments according to the dataset split provided in the LUNA16 (i.e., in one iteration, one fold of the dataset is exploited for testing and the others for training, and this operation is iterated until each fold has been tested). In the proposed MK-3DCNN framework, the convolution kernel size ${k_r} \times {k_r} \times {k_r}$ in the residual learning unit is set to $3 \times 3 \times 3$, the sizes of the three convolution kernels $k_d^1 \times k_d^1 \times k_d^1$, $k_d^2 \times k_d^2 \times k_d^2$ and $k_d^3 \times k_d^3 \times k_d^3$ in the MKJLM module are respectively set to $3 \times 3 \times 3$ , $5 \times 5 \times 5$ and $7 \times 7 \times 7$ , the stride of all the above convolution operations is set to 1. The reduction ratio $r$ is set to 2. Moreover, the spatial scale $M$ (height=width=number of slices) is set to 96, the number of channels ${C_{e1}}$-${C_{e5}}$ are set to 24, 32, 64, 64 and 64, ${C_{d1}}$-${C_{d4}}$ are set to 64, 64, 128 and 15, respectively. In our experiments, the batch size is set to 16 based on the GPU memory, and the stochastic gradient descent operator is selected for optimization. Because the number of samples is small, a dynamic learning rate mechanism with an initial value of 0.01 is performed for training, and the learning rate decays ten times every 50 epochs. In the training, we employ 200 epochs in total to optimize the deep model to convergence.

All experiments are conducted on a server that holds the following major configurations: 6 NVIDIA RTX TITAN GPUs, 2 Intel Xeon 3.6 GHz CPUs, 256 GB memory and Ubuntu 18.04.1 system. Moreover, the Python-based PyTorch library is used for implementing the developed MK-3DCNN model.

4.2 Evaluation metrics

Following previous studies [13], the free-response receiver operating characteristic (FROC) analysis is employed for evaluating the lung nodule detection performance of the proposed MK-3DCNN method. In the FROC curve, the recall is plotted as a function of the average number of false positives per scan (FPs/scan). The recall denotes a rate of the quantity of detected true positives to the quantities of all nodules, and it is calculated as

(14)$$\begin{array}{l} Recall = \frac{{T{P_{det}}}}{{T{P_{det}} + F{N_{mis}}}} \end{array}$$

where $T{P_{det}}$ and $F{N_{mis}}$ represent the number of detected and undetected nodules, respectively. Clearly, the sum of $T{P_{det}}$ and $F{N_{mis}}$ is the total nodule sample number.

Furthermore, the competition performance metric (CPM) [13,66] is introduced to extract one overall score from m the FROC curve. The CPM is defined as the mean of the recall at seven fixed false positives, and it can be computed by

(15)$$\begin{array}{l} CPM = \frac{1}{{{N_{fps}}}}\sum \limits_{{i_{fps}} \in \left\{ {0.125,\;0.25,\;0.5,\;1,\;2,\;4,\;8} \right\}} {Recall\left( {{i_{fps}}} \right)} \end{array}$$

in which $Recall\left ( {{i_{fps}}} \right )$ is the recall when the average number of false positives per scan is set to ${i_{fps}} \in \left \{ {0.125,\;0.25,\;0.5,\;1,\;2,\;4,\;8} \right \}$, and the value of ${N_{fps}}$ is 7. Obviously, the lowest possible CPM value is 0, and a perfect detection model will have a CPM with a score of 1.

4.3 Parameter sensitivity analysis

The summation weight ${\lambda _{add}}$ about the classification loss ${{\rm {{\cal L}}}_{cla}}$ and regression loss ${{\rm {{\cal L}}}_{reg}}$ is an important hyper-parameter in our proposed MK-3DCNN method, and it will affect the detection results. In this part, parameter sensitivity experiments are performed to analyze the influence of the summation weight and select the most appropriate value. To evaluate the detection performance concerning different weighting coefficients, the parameter ${\lambda _{add}}$ is tuned with a given set $\left \{ {0.1,\;0.2,\;0.3,\; \ldots,\;1.0} \right \}$. In this experiment, we stochastically choose 90% of the CT scans in the LUNA16 as the training set and the other 10% as the test set. The histogram provided in Fig. 7 illustrates the variations of CPM regarding the parameter ${\lambda _{add}}$.

Fig. 7. The lung nodule detection performance with different values of weight coefficient ${\lambda _{add}}$ on the LUNA16.

Download Full Size | PDF

As can be seen from Fig. 7, the CPM first improves and then reaches a peak value with the increase of weight ${\lambda _{add}}$. When the ${\lambda _{add}}$ exceeds a certain value, the detection result begins to decline slightly. The reason is that the introduction of classification loss can enable the deep model to learn the representation information of nodules more effectively and boost the detection ability. However, a large ${\lambda _{add}}$ value will weaken the influence of the position regression operation, and lead to a decrease in the detection performance. To balance the importance of the classification and position detection operations, the parameter ${\lambda _{add}}$ is set to 0.3 in the following experiments.

4.4 Ablation study

In the proposed MK-3DCNN framework, the MKJLM component and M$^3$P component play key roles in achieving accurate nodule detection. To validate the contributions of different functional modules, a set of ablation studies is performed by constructing three comparison models, including the BaseNet, BaseNet-MK, and BaseNet-M$^3$P. BaseNet is the baseline model (i.e. the MKJLM and M$^3$P modules in the MK-3DCNN are replaced by the standard convolutional layer and the maximum pooling operation, respectively). BaseNet-MK model is designed via embedding the MKJLM into the BaseNet (i.e. the M$^3$P module is removed in the MK-3DCNN). Similarly, the BaseNet-M$^3$P model is a combination of the BaseNet and M$^3$P (i.e. the MKJLM module is removed in the MK-3DCNN). Moreover, considering that our proposed MKJLM module includes the squeeze-and-excitation technique, we also designed two contrast models BaseNet-SE and BaseNet-SE-M$^3$P. The BaseNet-SE model is constructed by placing a squeeze-and-excitation module [67] behind each of the two standard convolution layers of the baseline model BaseNet. The BaseNet-SE-M$^3$P model is a combination of the BaseNet-SE and the M$^3$P.

In addition, the standard deviation (SD) of the CPM and the statistical significance (P-value) of the FROC (versus the proposed MK-3DCNN) are added as evaluation metrics to fully evaluate the nodule detection performance of the MK-3DCNN. The Wilcoxon signed-rank test [68] is introduced for statistical significance testing. To ensure sufficient samples and effective P-value calculations, 6 additional sample points are extracted (one point between each two adjacent values in the 7 fixed false positives is extracted), resulting in a total of 13 observation points.

Table 2 reports the nodule detection results of the MK-3DCNN and the above three component-based algorithms. As shown in Table 2, the developed MKJLM and M$^3$P modules can improve the nodule detection capability of the deep model. Specifically, the detection performance gains obtained from the multi-kernel joint learning (BaseNet-MK vs BaseNet) and multi-mode mixed pooling learning (BaseNet-M$^3$P vs BaseNet) are 1.2% and 0.92% in terms of CPM, respectively. Furthermore, the CPM of the BaseNet-SE-M$^3$P is higher than that of the BaseNet-SE, which also indicates the effectiveness of the designed M$^3$P module. Obviously, the proposed MK-3DCNN method achieves the highest CPM with a low SD value, and the lung nodule detection performance of the MK-3DCNN is better than all the component-based contrast approaches at the P < 0.05 level. The aforementioned experimental results prove that the developed MKJLM module could effectively cope with the intrinsic characteristics of the variable nodule scale through cooperatively learning the 3D multi-scale spatial information of nodule CT images. By designing a multi-mode mixed pooling architecture, the M$^3$P module reasonably incorporates the max pooling, average pooling, and center cropping pooling operations to simultaneously learn high-intensity, low-intensity, and scale information, thus it can strengthen the nodule detection ability in complex lung environments compared to traditional single-mode pooling-based learning manner. The FROC curves in Fig. 8 visually display the superior nodule detection capability of the MK-3DCNN compared with the component-based models.

Fig. 8. Comparisons of FROC curves generated by the proposed MK-3DCNN and different components-based methods.

Download Full Size | PDF

Table 2. The nodule detection performance of the proposed MK-3DCNN and different components-based algorithms.

View Table | View all tables in this article

4.5 Validity analysis of data augmentation operation

Data augmentation is a frequently used method for resolving the overfitting issue of the deep learning model in the nodule CT image analysis work with limited training samples. In this work, an augmentation approach based on flipping, rotating, and resizing operations is introduced to increase sample quantity. To assess the effectiveness of the designed data augmentation strategy in the nodule detection task, a controlled experiment is organized by devising a comparison model MK-3DCNN without DA. The MK-3DCNN without DA has the same network structure as the developed MK-3DCNN model, but data augmentation processing is not used during the model training. Figure 9 shows the nodule detection performance of the MK-3DCNN model under different conditions.

Fig. 9. The nodule detection performance of the proposed MK-3DCNN in the cases with and without data augmentation operation.

Download Full Size | PDF

As illustrated in Fig. 9, compared with the comparison method MK-3DCNN without DA, the proposed MK-3DCNN method achieves higher recalls at six of the seven set false positive rates, and the CPM gain reaches 0.79%. The above experimental results evidence that the designed data augmentation operation can effectively improve the nodule detection performance of the deep model by expanding the diversity of samples. The FROC curves in Fig. 10 intuitively show the validity of the used data augmentation strategy.

Fig. 10. Comparisons of FROC curves generated by the proposed MK-3DCNN method in the cases with and without data augmentation operation.

Download Full Size | PDF

4.6 Comparison with some state-of-the-art methods

To comprehensively assess the lung nodule detection ability of the proposed MK-3DCNN, several SOTA approaches are chosen for comparison on the LUNA16. Following the previous researches [48,52], the nodule detection results provided in corresponding articles are employed for contrast, and the particulars are reported in Table 3. All these results listed in Table 3 are produced by one-stage detection models to make a fair comparison.

Table 3. The nodule detection performance of the MK-3DCNN and some existing models on the benchmark dataset LUNA16.

View Table | View all tables in this article

In Table 3, the 3D-RES [54] and the LNOR-Net [53] construct an encoder-decoder structure as the backbone of the nodule detection model to learn multi-layer information. In the 3D-DPN [54], a dual path block is designed to simultaneously explore the advantages of residual learning and dense connection. The YOLOv3-Net [14] is a YOLO architecture-based method. The SGDA [59] is an attention mechanism-based model. In addition, a multi-scale feature learning-based approach MSANet [48] is also chosen for comparison, in which a multi-scale attention block is constructed to boost nodule detection sensitivity.

According to Table 3, due to the utilization of multi-scale features of nodule CT images, the MSANet method gets good detection results. It is obvious that our developed MK-3DCNN framework obtains more competitive nodule detection performance (the highest CPM value) when compared with the aforementioned SOTA methods. Owing to variable lesion sizes and complicated anatomic structures, extracting 3D discriminative features is critical for achieving accurate nodule position detection. In the proposed MK-3DCNN method, a multi-kernel joint learning module is constructed to fully learn the 3D multi-scale spatial information of nodule CT images. By building a multi-mode mixed pooling-based residual learning block for feature extraction, the MK-3DCNN model can effectively alleviate the issue of information loss in traditional single-mode pooling-based detection models, as a result, more discriminative nodule representations can be obtained.

4.7 Visual analysis of detection results

To further analyze the nodule detection capability of the proposed MK-3DCNN model, some representative examples in the detection results generated by the MK-3DCNN model and the baseline model BaseNet (i.e. without the MKJLM and M$^3$P modules) are illustrated in Fig. 11. Since thoracic CT scan is volumetric imaging data, only the central slice where a detected nodule is located is plotted for visualization. In Fig. 11, the red rectangular boxes in the first row of images anchor the position ground truths of nodule samples, the blue rectangular boxes in the second and the third rows of images anchor the detection results produced by the BaseNet and MK-3DCNN models, respectively. The central slice number is provided below each image. The second row below each image of the detection result part exhibits the nodule detection probabilities. The side length of the rectangular box corresponds to the nodule scale.

Fig. 11. Visualization of several representative lung nodule detection results yielded by the proposed MK-3DCNN and comparison model BaseNet. The red and blue rectangular boxes show the position ground truths and detection results in the central slices, respectively. The central slice number is provided below each image. The second row below each image of the detection result part shows the detection probabilities. The side length of the rectangular box is relative to the nodule scale. The Null indicates a missed detection case.

Download Full Size | PDF

As shown in Fig. 11, both our proposed MK-3DCNN method and the contrasted method BaseNet can achieve good detection results for those nodules with noticeable visual differences from the background, such as #1 and #2 nodule samples. Furthermore, when nodule lesions share many visual similarities with surrounding tissues (e.g., #3, # 4 and #7 nodules are similar in size and appearance to neighboring tissues, #5, #6 and # 8 nodules have similar intensity to the background), the baseline approach BaseNet cannot effectively locate the nodule objects, which will result in low detection confidence value, even missed detection. By comparison, the proposed MK-3DCNN model is able to locate the nodule bodies more accurately, and produce better detection performance. The above results visually demonstrate that the MK-3DCNN can effectively cope with variable nodule size, appearance and intensity, and work well to detect nodules from complex lung environments.

4.8 External validation on clinical dataset CQUCH-LND

In clinical practice, the golden standard in lung nodule diagnosis is the cytological analysis based on biopsy, not just the radiological characteristics. In view of this, the CQUCH-LND dataset annotated through the diagnosis golden standard is built and exploited to validate the generalization capability of our proposed MK-3DCNN method.

To fully assess the nodule detection performance of the deep model, two kinds of experiments are performed on the CQUCH-LND: (1) the MK-3DCNN model and three component-based comparison models that have been trained on the LUNA16 are directly tested using the CQUCH-LND, and (2) the trained MK-3DCNN and three comparison models are fine-tuned via employing 2-fold cross-validation on this dataset. As with the experiments on the LUNA16, the FROC analysis is exploited to quantitatively evaluate the detection performance. Similar to Table 2, the standard deviation (SD) of the CPM and the statistical significance (P-value) of the FROC are added as evaluation metrics to fully evaluate the nodule detection performance of the our proposed MK-3DCNN method. In direct testing and fine-tuning experiments, the P-value is calculated versus the MK-3DCNN and the Fine-tuned MK-3DCNN, respectively.

As can be observed from the detection results listed in Table 4, fine-tuning operations attain better detection performance compared to directly testing experiments, and the CPM of the MK-3DCNN, BaseNet-M$^3$P, BaseNet-MK and BaseNet increases from 0.8523, 0.8322, 0.8419 and 0.8229 to 0.8642, 0.8479, 0.8561 and 0.8360, respectively. In both fine-tuning and direct testing experiments, the lung nodule detection performance of our proposed MK-3DCNN approach is better than all contrast approaches at the P < 0.05 level, which indicates that the MK-3DCNN has a good prospect in real application. The FROC curves in Fig. 12 intuitively display the detection performance of the developed MK-3DCNN and three component-based contrasted models. Moreover, the SD value of the MK-3DCNN is not lower than that of all the comparison methods. This is because there are too few samples to cover all types of lung nodules, resulting in a certain fluctuation in the detection results. In the future work, constructing large-scale clinical datasets to provide dataset support for the development of high-performance models is one of the promising directions.

Fig. 12. Comparisons of FROC curves produced by the presented MK-3DCNN and the contrasted methods on the clinical dataset CQUCH-LND.

Download Full Size | PDF

Table 4. The nodule detection performance of the proposed MK-3DCNN and the comparison methods on the clinical dataset CQUCH-LND.

View Table | View all tables in this article

In addition, the visualization of several representative nodule detection results produced by fine-tuning the MK-3DCNN and the BaseNet on the dataset CQUCH-LND is shown in Fig. 13. Similar to the detection results on the dataset LUNA16, for easy samples (e.g. #1 and #2 nodule samples), both the proposed MK-3DCNN approach and the comparison approach BaseNet can obtain good detection performance. Furthermore, for some hard samples (e.g. #3-#8 nodule samples), the MK-3DCNN is significantly superior to the BaseNet. These experiment results prove the effectiveness of the MK-3DCNN in the nodule detection task.

Fig. 13. Visualization of some representative nodule detection results generated by the fine-tuned MK-3DCNN and BaseNet models on the clinical dataset CQUCH-LND. The red and blue rectangular boxes illustrate the position ground truths and detection results in the central slices, respectively. The central slice number and detection probabilities are provided below the images. The side length of the rectangular box corresponds to the nodule size. The Null denotes a missed detection case.

Download Full Size | PDF

4.9 Limitation

Although the developed MK-3DCNN model has achieved promising results on both the public dataset LUNA16 and the clinical dataset CQUCH-LND, there are still some limitations that need to be further considered. On the one hand, the 3D deep learning network with large parameters is required to be fully trained, thus the MK-3DCNN possesses high computational overhead. In the future, we will focus on designing efficient optimization algorithms to accelerate the training of the MK-3DCNN. On the other hand, as exhibited in Fig. 14, several ground glass nodules have small scales and low densities, which makes the model unable to accurately detect them. Likewise, some background organizations share many visual similarities with nodules in terms of shape and size, which results in them not being correctly identified by the MK-3DCNN. One solution is to combine the theories of graph learning and manifold learning to achieve more reasonable characterizations of nodule CT images.

Fig. 14. Missed detection and false detection cases by exploiting the MK-3DCNN on the LUNA16 dataset.

Download Full Size | PDF

5. Discussion

5.1 False positive reduction experiment analysis

As with some previous related work [53,54,59], our study focuses on developing a one-stage end-to-end 3D model for automated detection of lung nodules in chest CT scans. To further evaluate the performance of the proposed MK-3DCNN method, and considering limited nodule samples, we introduce a 3D self-supervised transfer learning method [69] to conduct an additional false positive reduction (FPR) experiment on the benchmark dataset LUNA16.

In the FPR process, a 3D encoder-decoder structure with residual connection [69] is used to implement self-supervised pre-training to learn valuable representation information from large amounts of randomly cropped unlabeled data, which helps reduce the dependence on labeled samples. Then, the pre-trained encoder part is transferred as the feature extractor, and the global average pooling operation is exploited to convert the feature map generated from the last convolutional layer of the encoder into a 512-dimensional feature vector. Finally, a classifier consisting of two fully connected layers (the number of neurons is respectively set to 512 and 256) and a Sigmoid unit is constructed to achieve the FPR. In this experiment, five image perturbation strategies (nonlinear transformation, local pixel shuffling, local pixel swapping, inner pixel cutout, and outer pixel cutout [69,70]) are integrated to enhance the image representation ability of the self-supervised learning network. Furthermore, conventional image rotation and image flipping approaches are used for data augmentation. The mean square error loss function and the stochastic gradient descent (SGD) optimizer with an initial learning rate of 1.0 are selected for self-supervised pre-training, and the cross-entropy loss function and the adaptive moment estimation (Adam) optimizer with an initial learning rate of 0.001 are adopted for the FPR training. The learning rate will be halved when the model performance is not improved over 10 epochs, the input size is set to $64 \times 64 \times 32$, the batch size is set to 32, and the early stop mechanism is employed to get a better model.

The nodule detection performance of our proposed MK-3DCNN with the FPR procedure (MK-3DCNN-FPR) and some existing models with the FPR step are contrasted in Table 5. In Table 5, the CNN-OSFHRL [52] attempts to solve the easy/hard sample imbalance issue by designing an online sample filtering algorithm. The DeepMed [71] develops a lightweight network to overcome the small sample problem. The NSADC-CNN [72] and the AA-3DCNN [9] are committed to solving the challenge of variable nodule sizes. For the A-CNN [73] and the DS-CMSF [3], a multi-stage detection framework is employed to achieve more accurate nodule detection. In the V-Net-SVM [74], a hard mining scheme is designed to improve the FPR performance. The SANet [55] tries to enhance the detection capability of the deep learning model by exploring long-range dependencies among one slice group and channels of the feature map. Compared to the above existing methods with the FPR process, the MK-3DCNN-FPR method obtains better lung nodule detection performance. Besides, from Table 3 and Table 5, it can be found that our proposed method achieves competitive performance compared with some SOTA one-stage, two-stage, and multi-stage methods, which indicates the effectiveness of our method in the lung nodule detection task.

Table 5. The nodule detection performance of the MK-3DCNN-FPR and some existing models with the FPR process on the benchmark dataset LUNA16.

View Table | View all tables in this article

5.2 Advantage and disadvantage analysis

Compared with the existing lung nodule detection studies, our work has the following advantages. (1) We propose a deep learning-based one-stage 3D lung nodule detection model that does not require human intervention and can achieve automated nodule detection in an end-to-end trainable way. Unlike traditional radiomics algorithms, our model is based on a neural network, and it does not demand hand-crafted feature designing. Different from frequently used 2D models that need to combine 2D proposals into 3D proposals, our method can directly generate 3D detection results. (2) The proposed method is tested and verified in multiple aspects as follows: the parameter sensitivity experiment is used to select the optimal parameter; the ablation study is conducted to demonstrate the effectiveness of the developed key modules; the visualization experiment is designed to intuitively analyze the detection performance of the proposed method; and the external validation experiment is performed to validate the generalization ability of our model. (3) We construct a new lung nodule CT dataset. The gold standard for lung nodule diagnosis is biopsy-based cytological analysis, not just the radiological characteristics. Therefore, a diagnosis gold standard-based clinical dataset is built to evaluate the detection performance of the proposed method in practical application.

There are also some disadvantages in our study as follows: (1) applying 3D CNN to lung nodule detection can capture rich spatial information of nodule CT images and achieve one-stage detection, but it occupies more memory compared to 2D networks, so the batch size and running speed are limited. Image patches are used instead of the entire CT scan as input to eliminate this issue. (2) The benchmark dataset LUNA16 only contains 888 CT scans and 1186 lung nodule objects, and too few samples are not enough to cover all types of lung nodules. The data augmentation strategy is designed to alleviate the model overfitting problem. (3) Although our method can deal with the problem of variable size, shape, and density of lung nodules in chest CT images to a certain extent, there are still some gaps compared to several top-level methods on the LUNA16 Grand Challenge. In future work, we will introduce new foundation model-based techniques (e.g. generative pre-trained Transformer) to further improve the lung nodule detection performance.

6. Conclusion

In this paper, a multi-kernel driven 3D convolutional neural network (MK-3DCNN) is developed for the automatic detection of lung nodules in thoracic CT scans. The MK-3DCNN method adopts a residual learning-based encoder-decoder structure as the backbone to exploit the multi-layer features of the deep network. Different from previous traditional convolutional networks with fixed kernel size, a multi-kernel joint learning block is designed to drive the detection model to capture 3D multi-scale spatial information from the nodule CT images with variable lesion sizes and shapes. In addition, a multi-mode mixed pooling strategy is proposed to surrogate the conventional single-mode pooling way, the designed pooling method reasonably incorporates three different types of pooling operations, including max pooling, average pooling, and center cropping pooling, and they can complement each other to attain more comprehensive nodule CT image representations. To fully evaluate the validity of the presented MK-3DCNN, systematic experiments are performed on the public dataset LUNA16 and the clinical dataset CQUCH-LND, and experimental results indicate the MK-3DCNN method outperforms some SOTA nodule detection approaches and possesses a good generalization ability in the clinical practice.

Funding

National Natural Science Foundation of China (42071302); Innovation Program for Chongqing Overseas Returnees (cx2019144); Graduate Research and Innovation Foundation of Chongqing (CYB21060); Visiting Scholar Foundation of Key Laboratory of Optoelectronic Technology and Systems (Chongqing University), Ministry of Education.

Acknowledgments

The authors would like to thank the Editor in Chief, Associate Editor, and anonymous Reviewers for their insightful comments and suggestions.

Disclosures

The authors declare no conflicts of interest.

Data availability

The public dataset LUNA16 is available in Ref. [13], and the external validation dataset CQUCH-LND can be obtained from the corresponding author upon reasonable request.

References

1. H. Sung, J. Ferlay, R. L. Siegel, et al., “Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” Ca-Cancer J. Clin. 71, 209–249 (2021). [CrossRef]

2. G. X. Wu and D. J. Raz, “Lung cancer screening,” Lung Cancer: Treat. Res. 170, 1–23 (2016). [CrossRef]

3. Z. Zhou, F. Gou, Y. Tan, et al., “A cascaded multi-stage framework for automatic detection and segmentation of pulmonary nodules in developing countries,” IEEE J. Biomed. Health Inform. 26(11), 5619–5630 (2022). [CrossRef]

4. H. Huang, R. Wu, Y. Li, et al., “Self-supervised transfer learning based on domain adaptation for benign-malignant lung nodule classification on thoracic ct,” IEEE J. Biomed. Health Inform. 26(8), 3860–3871 (2022). [CrossRef]

5. H. Mkindu, L. Wu, and Y. Zhao, “Lung nodule detection of ct images based on combining 3d-cnn and squeeze-and-excitation networks,” Multimed. Tools Appl 82(17), 25747–25760 (2023). [CrossRef]

6. S. A. Agnes, J. Anitha, and A. A. Solomon, “Two-stage lung nodule detection framework using enhanced unet and convolutional lstm networks in ct images,” Comput. Biol. Med. 149, 106059 (2022). [CrossRef]

7. Z. Guo, L. Zhao, J. Yuan, et al., “Msanet: multiscale aggregation network integrating spatial and channel information for lung nodule detection,” IEEE J. Biomed. Health Inform. 26(6), 2547–2558 (2021). [CrossRef]

8. L. Zhu, H. Zhu, S. Yang, et al., “Pulmonary nodule detection based on hierarchical-split hrnet and feature pyramid network with atrous convolution,” Biomed. Signal Process. Control. 85, 105024 (2023). [CrossRef]

9. D. Zhao, Y. Liu, H. Yin, et al., “An attentive and adaptive 3d cnn for automatic pulmonary nodule detection in ct image,” Expert Syst. with Appl. 211, 118672 (2023). [CrossRef]

10. H. MacMahon, D. P. Naidich, J. M. Goo, et al., “Guidelines for management of incidental pulmonary nodules detected on ct images: from the fleischner society 2017,” Radiology 284(1), 228–243 (2017). [CrossRef]

11. Y. Chen, X. Hou, Y. Yang, et al., “A novel deep learning model based on multi-scale and multi-view for detection of pulmonary nodules,” J Digit Imaging 36(2), 688–699 (2023). [CrossRef]

12. R. Wu, C. Liang, Y. Li, et al., “Self-supervised transfer learning framework driven by visual attention for benign–malignant lung nodule classification on chest ct,” Expert Syst. with Appl. 215, 119339 (2023). [CrossRef]

13. A. A. A. Setio, A. Traverso, T. De Bel, et al., “Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge,” Med. Image Anal. 42, 1–13 (2017). [CrossRef]

14. Y.-S. Huang, P.-R. Chou, H.-M. Chen, et al., “One-stage pulmonary nodule detection using 3-d dcnn with feature fusion and attention mechanism in ct image,” Comput. Methods Programs Biomed. 220, 106786 (2022). [CrossRef]

15. Y. Han, H. Qi, L. Wang, et al., “Pulmonary nodules detection assistant platform: An effective computer aided system for early pulmonary nodules detection in physical examination,” Comput. Methods Programs Biomed. 217, 106680 (2022). [CrossRef]

16. J. Xu, H. Ren, S. Cai, et al., “An improved faster r-cnn algorithm for assisted detection of lung nodules,” Comput. Biol. Med. 153, 106470 (2023). [CrossRef]

17. Z. Zhang, Y. Tie, D. Zhang, et al., “Quantum-involution inspire false positive reduction in pulmonary nodule detection,” Biomed. Signal Process. Control. 84, 104850 (2023). [CrossRef]

18. H. Mkindu, L. Wu, and Y. Zhao, “3d multi-scale vision transformer for lung nodule detection in chest ct images,” Signal, Image Video Process. 17(5), 2473–2480 (2023). [CrossRef]

19. Q. Mao, S. Zhao, D. Tong, et al., “Hessian-mrlog: Hessian information and multi-scale reverse log filter for pulmonary nodule detection,” Comput. Biol. Med. 131, 104272 (2021). [CrossRef]

20. X. Luo, T. Song, G. Wang, et al., “Scpm-net: An anchor-free 3d lung nodule detection network using sphere representation and center points matching,” Med. Image Anal. 75, 102287 (2022). [CrossRef]

21. O. Ozdemir, R. L. Russell, and A. A. Berlin, “A 3d probabilistic deep learning system for detection and diagnosis of lung cancer using low-dose ct scans,” IEEE Trans. Med. Imaging 39(5), 1419–1429 (2019). [CrossRef]

22. H. Cao, H. Liu, E. Song, et al., “A two-stage convolutional neural networks for lung nodule detection,” IEEE J. Biomed. Health Inform. 24, 2006–2015 (2020). [CrossRef]

23. A. A. Rezaie and A. Habiboghli, “Detection of lung nodules on medical images by the use of fractal segmentation,” International Journal of Interactive Multimedia and Artificial Intelligence 4(5), 15–19 (2017). [CrossRef]

24. S. A. El-Regaily, M. A. M. Salem, M. H. A. Aziz, et al., “Lung nodule segmentation and detection in computed tomography,” in 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), (IEEE, 2017), pp. 72–78.

25. Q. Dou, H. Chen, L. Yu, et al., “Multilevel contextual 3-d cnns for false positive reduction in pulmonary nodule detection,” IEEE Trans. Biomed. Eng. 64(7), 1558–1567 (2016). [CrossRef]

26. M. M. N. Abid, T. Zia, M. Ghafoor, et al., “Multi-view convolutional recurrent neural networks for lung cancer nodule identification,” Neurocomputing 453, 299–311 (2021). [CrossRef]

27. B.-C. Kim, J. S. Yoon, J.-S. Choi, et al., “Multi-scale gradual integration cnn for false positive reduction in pulmonary nodule detection,” Neural Networks 115, 1–10 (2019). [CrossRef]

28. Z. Zhang, K. W. Leong, K. Van Vliet, et al., “Deep learning for label-free nuclei detection from implicit phase information of mesenchymal stem cells,” Biomed. Opt. Express 12(3), 1683–1706 (2021). [CrossRef]

29. F. Shi, B. Chen, Q. Cao, et al., “Semi-supervised deep transfer learning for benign-malignant diagnosis of pulmonary nodules in chest ct images,” IEEE Trans. Med. Imaging 41(4), 771–781 (2021). [CrossRef]

30. Y. Luo, Q. Xu, R. Jin, et al., “Automatic detection of retinopathy with optical coherence tomography images via a semi-supervised deep learning method,” Biomed. Opt. Express 12(5), 2684–2702 (2021). [CrossRef]

31. Y. Zhao, L. Zhang, Y. Liu, et al., “Two-stream graph convolutional network for intra-oral scanner image segmentation,” IEEE Trans. Med. Imaging 41(4), 826–835 (2021). [CrossRef]

32. D. Zhao, Y. Liu, H. Yin, et al., “A novel multi-scale cnns for false positive reduction in pulmonary nodule detection,” Expert Syst. with Appl. 207, 117652 (2022). [CrossRef]

33. H. Chen, Y. Zhang, W. Zhang, et al., “Low-dose ct via convolutional neural network,” Biomed. Opt. Express 8(2), 679–694 (2017). [CrossRef]

34. Y. Xie, Y. Xia, J. Zhang, et al., “Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest ct,” IEEE Trans. Med. Imaging 38(4), 991–1004 (2018). [CrossRef]

35. L. Jiang, C. Tang, and H. Zhou, “White blood cell classification via a discriminative region detection assisted feature aggregation network,” Biomed. Opt. Express 13(10), 5246–5260 (2022). [CrossRef]

36. Y. Peng, Z. Chen, W. Zhu, et al., “Ads-net: attention-awareness and deep supervision based network for automatic detection of retinopathy of prematurity,” Biomed. Opt. Express 13(8), 4087–4101 (2022). [CrossRef]

37. L. Fan, Z. Wang, and J. Zhou, “Ldadn: a local discriminant auxiliary disentangled network for key-region-guided chest x-ray image synthesis augmented in pneumoconiosis detection,” Biomed. Opt. Express 13(8), 4353–4369 (2022). [CrossRef]

38. N. Tajbakhsh and K. Suzuki, “Comparing two classes of end-to-end machine-learning models in lung nodule detection and classification: Mtanns vs. cnns,” Pattern Recognit. 63, 476–486 (2017). [CrossRef]

39. H. Jiang, H. Ma, W. Qian, et al., “An automatic detection system of lung nodule based on multigroup patch-based deep learning network,” IEEE J. Biomed. Health Inform. 22(4), 1227–1237 (2017). [CrossRef]

40. Y. Zhao, Z. Wang, X. Liu, et al., “Pulmonary nodule detection based on multiscale feature fusion,” Computational And Mathematical Methods In Medicine 2022, 1–13 (2022). [CrossRef]

41. H. Xie, D. Yang, N. Sun, et al., “Automated pulmonary nodule detection in ct images using deep convolutional neural networks,” Pattern Recognit. 85, 109–119 (2019). [CrossRef]

42. C. C. Nguyen, G. S. Tran, J.-C. Burie, et al., “Pulmonary nodule detection based on faster r-cnn with adaptive anchor box,” IEEE Access 9, 154740–154751 (2021). [CrossRef]

43. J. George, S. Skaria, V. Varun, et al., “Using yolo based deep learning network for real time detection and localization of lung nodules from low dose ct scans,” in Medical Imaging 2018: Computer-Aided Diagnosis, vol. 10575 (SPIE, 2018), pp. 347–355.

44. S. Zheng, S. Kong, Z. Huang, et al., “A lower false positive pulmonary nodule detection approach for early lung cancer screening,” Diagnostics 12(11), 2660 (2022). [CrossRef]

45. D. Wang, Y. Zhang, K. Zhang, et al., “Focalmix: Semi-supervised learning for 3d medical image detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp. 3951–3960.

46. Y. Li and Y. Fan, “Deepseed: 3d squeeze-and-excitation encoder-decoder convolutional neural networks for pulmonary nodule detection,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), (IEEE, 2020), pp. 1866–1869.

47. H. Yuan, Y. Wu, J. Cheng, et al., “Pulmonary nodule detection using 3-d residual u-net oriented context-guided attention and multi-branch classification network,” IEEE Access 10, 82–98 (2021). [CrossRef]

48. H. Zhang, Y. Peng, and Y. Guo, “Pulmonary nodules detection based on multi-scale attention networks,” Sci. Rep. 12(1), 1466 (2022). [CrossRef]

49. Q. Wang, J. Wang, M. Zhou, et al., “A 3d attention networks for classification of white blood cells from microscopy hyperspectral images,” Opt. Laser Technol. 139, 106931 (2021). [CrossRef]

50. Q. Wang, L. Sun, Y. Wang, et al., “Identification of melanoma from hyperspectral pathology image using 3d convolutional networks,” IEEE Trans. Med. Imaging 40(1), 218–227 (2020). [CrossRef]

51. X. Guan, G. Yang, J. Ye, et al., “3d agse-vnet: an automatic brain tumor mri data segmentation framework,” BMC Med. Imaging 22(1), 6–18 (2022). [CrossRef]

52. Q. Dou, H. Chen, Y. Jin, et al., “Automated pulmonary nodule detection via 3d convnets with online sample filtering and hybrid-loss residual learning,” in Medical Image Computing and Computer Assisted Intervention- MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings, Part III 20, (Springer, 2017), pp. 630–638.

53. F. Liao, M. Liang, Z. Li, et al., “Evaluate the malignancy of pulmonary nodules using the 3-d deep leaky noisy-or network,” IEEE Trans. Neural Netw. Learning Syst. 30(11), 3484–3495 (2019). [CrossRef]

54. W. Zhu, C. Liu, W. Fan, et al., “Deeplung: Deep 3d dual path nets for automated pulmonary nodule detection and classification,” in 2018 IEEE winter conference on applications of computer vision (WACV), (IEEE, 2018), pp. 673–681,

55. J. Mei, M.-M. Cheng, G. Xu, et al., “Sanet: A slice-aware network for pulmonary nodule detection,” IEEE Trans. Pattern Anal. Mach. Intell. 44, 4374–4387 (2021). [CrossRef]

56. J. Lin, Q. She, and Y. Chen, “Pulmonary nodule detection based on ir-unet++,” Med. Biol. Eng. Comput. 61(2), 485–495 (2023). [CrossRef]

57. X. Zhu, X. Wang, Y. Shi, et al., “Channel-wise attention mechanism in the 3d convolutional network for lung nodule detection,” Electronics 11(10), 1600 (2022). [CrossRef]

58. M. Jian, L. Zhang, H. Jin, et al., “3dagnet: 3d deep attention and global search network for pulmonary nodule detection,” Electronics 12(10), 2333 (2023). [CrossRef]

59. R. Xu, Z. Liu, Y. Luo, et al., “Sgda: towards 3d universal pulmonary nodule detection via slice grouped domain attention,” IEEE/ACM Transactions on Computational Biology and Bioinformatics (2023).

60. S. G. Armato III, G. McLennan, L. Bidaut, et al., “The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans,” Med. Phys. 38(2), 915–931 (2011). [CrossRef]

61. S. Ren, K. He, R. Girshick, et al., “Faster r-cnn: Towards real-time object detection with region proposal networks,” Adv. neural information processing systems28 (2015).

62. X. Li, W. Wang, X. Hu, et al., “Selective kernel networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2019), pp. 510–519.

63. K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), pp. 770–778.

64. W. Shen, M. Zhou, F. Yang, et al., “Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification,” Pattern Recognit. 61, 663–673 (2017). [CrossRef]

65. R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, (2015), pp. 1440–1448.

66. M. Niemeijer, M. Loog, M. D. Abramoff, et al., “On combining computer-aided detection systems,” IEEE Trans. Med. Imaging 30(2), 215–223 (2010). [CrossRef]

67. J. Hu, L. Shen, S. Albanie, et al., “Squeeze-and-excitation networks,” IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020). [CrossRef]

68. Q. Ma, S. Li, W. Zhuang, et al., “Self-supervised time series clustering with model-based dynamics,” IEEE Trans. Neural Netw. Learning Syst. 32(9), 3942–3955 (2020). [CrossRef]

69. Z. Zhou, V. Sodha, J. Pang, et al., “Models genesis,” Med. Image Anal. 67, 101840 (2021). [CrossRef]

70. L. Chen, P. Bentley, K. Mori, et al., “Self-supervised learning for medical image analysis using image context restoration,” Med. Image Anal. 58, 101539 (2019). [CrossRef]

71. A. Pezeshk, S. Hamidian, N. Petrick, et al., “3-d convolutional neural networks for automatic detection of pulmonary nodules in chest ct,” IEEE J. Biomed. Health Inform. 23(5), 2080–2090 (2018). [CrossRef]

72. J. Wang, J. Wang, Y. Wen, et al., “Pulmonary nodule detection in volumetric chest ct scans using cnns-based nodule-size-adaptive detection and classification,” IEEE Access 7, 46033–46044 (2019). [CrossRef]

73. W. Huang, Y. Xue, and Y. Wu, “A cad system for pulmonary nodule prediction based on deep three-dimensional convolutional neural networks and ensemble learning,” PLoS One 14(7), e0219369 (2019). [CrossRef]

74. Y. Ye, M. Tian, Q. Liu, et al., “Pulmonary nodule detection using v-net and high-level descriptor based svm classifier,” IEEE Access 8, 176033–176041 (2020). [CrossRef]

NO.	Notations	Definitions
1	$ζ_{n o r}$	The maximum intensity of normalized images
2	$γ \times γ \times γ$	The spatial resolution of normalized images
3	$h \times w \times s$	Size of cropped input patch
4	$X_{i n}$	Input of multi-kernel joint learning module
5	$F_{o u t}$	Output of multi-kernel joint learning module
6	$φ$	Rectified linear unit activation function
7	$r$	Reduction ratio
8	$F_{g a p}$	The global average pooling
9	$X_{r}^{l}$	Input of a residual learning unit
10	$X_{r}^{l + 1}$	Output of a residual learning unit
11	$F_{r e s}$	Learning function with two convolutional operations
12	$X_{p}^{l}$	Input of multi-mode mixed pooling module
13	$X_{p}^{l + 1}$	Output of multi-mode mixed pooling module
14	$F_{m a x p}$	The max pooling
15	$F_{a v e p}$	The average pooling
16	$F_{c e n p}$	The center cropping pooling
17	$ψ$	Channel-wise squeezing operation
18	$N_{a n c}$	The number of anchors
19	$p_{g t}$	The label of the anchored object
20	${\tilde{p}}_{c l a}$	Predicted probability
21	$λ_{a d d}$	Summation weight

Methods	False positives per scan							CPM	SD	P-value
Methods	0.125	0.25	0.5	1	2	4	8	CPM	SD	P-value
BaseNet	0.6788	0.7504	0.8069	0.8609	0.9013	0.9275	0.9427	0.8384	0.0906	0.0015
BaseNet-MK	0.6762	0.7597	0.8272	0.8794	0.9140	0.9401	0.9562	0.8504	0.0950	0.0076
BaseNet-M $^{3}$ P	0.6762	0.7622	0.8153	0.8761	0.9132	0.9393	0.9511	0.8476	0.0941	0.0019
BaseNet-SE	0.6686	0.7513	0.8196	0.8676	0.9106	0.9325	0.9452	0.8422	0.0949	0.0015
BaseNet-SE-M $^{3}$ P	0.6771	0.7614	0.8246	0.8794	0.9165	0.9376	0.9545	0.8502	0.0943	0.0023
MK-3DCNN	0.7099	0.7723	0.8356	0.8836	0.9174	0.9384	0.9562	0.8591	0.0846	–

Method	False positives per scan							CPM
Method	0.125	0.25	0.5	1	2	4	8	CPM
3D-RES [54]	0.6620	0.7460	0.8150	0.8640	0.9020	0.9180	0.9320	0.8340
3D-DPN [54]	0.6920	0.7690	0.8240	0.8650	0.8930	0.9170	0.9330	0.8420
LNOR-Net [53,55]	0.5938	0.7266	0.7813	0.8438	0.8750	0.8906	0.8984	0.8013
MSANet [48]	0.7320	0.7740	0.8300	0.8660	0.9170	0.9290	0.9460	0.8560
YOLOv3-Net [14]	0.7040	0.7650	0.8320	0.8730	0.9090	0.9320	0.9350	0.8500
SGDA [59]	0.5917	0.6513	0.7752	0.8532	0.9311	0.9633	0.9678	0.8191
MK-3DCNN	0.7099	0.7723	0.8356	0.8836	0.9174	0.9384	0.9562	0.8591

Methods	False positives per scan							CPM	SD	P-value
Methods	0.125	0.25	0.5	1	2	4	8	CPM	SD	P-value
BaseNet	0.6312	0.7567	0.8061	0.8479	0.8821	0.9125	0.9240	0.8229	0.0955	0.0015
Fine-tuned BaseNet	0.6464	0.7605	0.8137	0.8669	0.9049	0.9240	0.9354	0.8360	0.0967	0.0014
BaseNet-MK	0.6692	0.7681	0.8213	0.8707	0.9087	0.9202	0.9354	0.8419	0.0893	0.0076
Fine-tuned BaseNet-MK	0.6692	0.7757	0.8555	0.8859	0.9240	0.9316	0.9506	0.8561	0.0938	0.0046
BaseNet-M $^{3}$ P	0.6464	0.7376	0.8099	0.8631	0.8897	0.9316	0.9468	0.8322	0.1009	0.0014
Fine-tuned BaseNet-M $^{3}$ P	0.6730	0.7833	0.8289	0.8745	0.9049	0.9278	0.9430	0.8479	0.0882	0.0021
MK-3DCNN	0.6692	0.7605	0.8479	0.8897	0.9049	0.9392	0.9544	0.8523	0.0957	–
Fine-tuned MK-3DCNN	0.6882	0.7833	0.8593	0.8897	0.9240	0.9468	0.9582	0.8642	0.0906	–

Method	False positives per scan							CPM
Method	0.125	0.25	0.5	1	2	4	8	CPM
CNN-OSFHRL [52]	0.6590	0.7450	0.8190	0.8650	0.9060	0.9330	0.9460	0.8390
DeepMed [71]	0.6370	0.7230	0.8040	0.8650	0.9070	0.9380	0.9520	0.8320
NSADC-CNN [72]	0.7880	0.8470	0.8950	0.9340	0.9520	0.9590	0.9630	0.9030
A-CNN [73]	0.8170	0.8510	0.8690	0.8830	0.8910	0.9070	0.9140	0.8760
V-Net-SVM [74]	0.6210	0.7330	0.8490	0.8990	0.9180	0.9240	0.9340	0.8397
SANet [55]	0.7117	0.8018	0.8649	0.9009	0.9369	0.9459	0.9550	0.8739
DS-CMSF [3]	0.7420	0.8406	0.8989	0.9250	0.9444	0.9546	0.9595	0.8950
AA-3DCNN [9]	0.6560	0.7540	0.8330	0.9170	0.9510	0.9700	0.9770	0.8654
MK-3DCNN-FPR	0.7985	0.8575	0.9073	0.9317	0.9519	0.9562	0.9570	0.9086

Multi-kernel driven 3D convolutional neural network for automated detection of lung nodules in chest CT scans

Abstract

1. Introduction

2. Dataset description

2.1 LUNA16 dataset

2.2 CQUCH-LND dataset

3. Proposed method

3.1 Overview

3.2 Image preprocessing and input patch cropping

3.3 Multi-kernel joint learning block

3.4 Multi-mode mixed pooling-based residual learning block

3.5 Model output

3.6 Loss function

4. Experimental results and analysis

4.1 Experimental design

4.2 Evaluation metrics

4.3 Parameter sensitivity analysis

4.4 Ablation study

4.5 Validity analysis of data augmentation operation

4.6 Comparison with some state-of-the-art methods

4.7 Visual analysis of detection results

4.8 External validation on clinical dataset CQUCH-LND

4.9 Limitation

5. Discussion

5.1 False positive reduction experiment analysis

5.2 Advantage and disadvantage analysis

6. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (14)

Tables (5)

Equations (15)

Biomedical Optics Express