Learned regularization for image reconstruction in sparse-view photoacoustic tomography

Tong Wang; Menghui He; Kang Shen; Wen Liu; Chao Tian; Chao Tian

doi:10.1364/BOE.469460

1. Introduction

As a noninvasive biological imaging technology based on photoacoustic effect, photoacoustic computed tomography (PACT) combines distinguished optical absorption contrast of optical imaging and superb spatial resolution of ultrasound imaging in deep tissues [1–3]. It has achieved increasingly wide applications in early cancer detection [4], cell imaging [3,5], biopsy guide [6,7], and other areas of the medical imaging problems. To achieve rapid imaging, sparse-view PACT measurements can be utilized, where the imaging time can be greatly shortened by decreasing the detector density and the complexity of transducer array can be reduced thus lower the cost. However, the reconstructed photoacoustic image usually suffers from quality degradation and the occurrence of artifacts and blur [8,9]. Great efforts have been devoted to improving sparse-view PACT through image reconstruction in recent years.

Conventional photoacoustic imaging algorithms can be classified into two main categories: analytical and model-based methods. The most widely used analytical method is filtered back projection (FBP) that directly solves the photoacoustic equation [10,11]. This method can reconstruct accurate photoacoustic images with large amount of measured data and small noise. Alternatively, the model-based algorithms try to find optimal solution by minimizing the error between measured signals and signals obtained by forward models [12–15]. In situation of sparse view or limited view problems, the reconstruction problem is ill-conditioned and prior constraints are introduced as regularization to improve image qualities. Different forms of regularization have been proposed to improve reconstructed image qualities such as Tikhonov [16,17], total variation (TV) [18,19], and some self-defined regularizations combining different types of regularizations [20]. Despite promising performance of model-based methods, they have some drawbacks: on the one hand, the solving process usually requires tens of iterations which means heavy computational burdens; on the other hand, the performance of handcrafted regularization on complicated structures is limited.

In recent years, convolutional neural networks (CNN) have shown great potential in medical image processing such as image classification [21], segmentation [22] and artifact removal [23]. In the field of high-quality PACT image reconstruction, the application of CNN can be divided into three main types: 1) post-processing of the reconstructed images: using networks to improve the quality of degraded images reconstructed by conventional algorithms with limited conditions; 2) direct reconstruction of the initial pressure: using large networks to directly reconstruct photoacoustic images from raw data; 3) model-based learning: using networks to calculate the modified update of model-based iteration method. Many researches have been concentrating on the first approach that reconstruct the photoacoustic images using simple and fast but not accurate algorithms, and then the CNN is applied to reduce artifacts [24–27]. CNN performs excellent in such image-to-image tasks. However, their performance is limited by the severely degraded images reconstructed by conventional methods.

The second approach is a signal-to-image task that transform data between different domains which exploits the powerful performance of deep learning networks [28,29]. The structure of the networks is large and complex for learning the accurate backward imaging model and the training of the networks is difficult and time consuming.

The third approach tries to combine the advantages of physical models and data-driven deep learning schemes more effectively [30,31]. This method uses CNN structure to mimic the conventional gradient decent or primal-dual algorithms and the information of physical model is incorporated to the network. High quality images with small networks can be obtained through this method.

Previous researchers have also developed different types of deep regularization learning methods to optimize model-base iteration method. Some researchers used denoisers trained by sequential CNN layers [32] and adversarial networks [33] as regularization. Li et al. and Antholzer et al. used the L₂ norm of the network as Tikhonov regularization [34,35]. Obmann et al. trained an encoder-decoder network and used the L₁ norm of the output to serve as sparse regularization [36]. These deep denoisers serve as regularization terms distinguishing artifacts and noises from true images in conventional model-based iteration frameworks. The iteration framework enables these methods to effectively insert knowledge about the forward operator and the noise model into the reconstruction, allowing the networks to be trained on little training data. However, these deep regularizers are trained before iteration steps, the separation of network training and iteration process means the trained regularization may not be the best for the optimization algorithms. The training of deep denoisers should take iteration process into account to further improve the performance of reconstruction.

In this paper, we adopt the model-based approach and proposed a learned regularization specially designed for iterative reconstruction using gradient decent scheme. A network is designed to learn deep regularization for iterative sparse view photoacoustic reconstruction. The network is trained step by step to extract different image prior constraints for each iteration step. To decrease computational burden, the training of the network and calculating of gradient information are decoupled. The performance of the network is evaluated on both simulation and in vivo datasets. The proposed method converges within five iterations while conventional iteration methods need tens of iteration steps. The size of the regularized learning network is much smaller than that of the U-Net based post-processing method, while realizes improvement of image quality on simulation and in vivo dataset.

2. Methods

2.1 Regularized image reconstruction in PACT

In PACT, a pulsed laser illuminates the target tissue and the energy of the photons will be absorbed by chromophores such as hemoglobin and converted to heat. Thermal expansion will induce an instantaneous local pressure that travels through the tissue as ultrasound waves. The generation and propagation of ultrasound waves can be described by the following function:

(1)$${\nabla ^2}p({\mathbf {r}},t) - \frac{1}{{{c^2}}}\frac{{{\partial ^2}}}{{\partial {t^2}}}p({\mathbf {r}},t) ={-} \frac{\beta }{{{C_p}}}\frac{\partial }{{\partial t}}H({\mathbf {r}},t)\textrm{, }$$

where p(r, t) is the photoacoustic signal at position r and time t, H(r, t) is heating function representing the energy absorbed by target tissue per unit volume per unit time, c is the speed of sound, β is the isobaric thermal volume expansion coefficient and C_p denotes the specific heat capacity at constant pressure. The initial condition of Eq. (1) is

(2)$$\textrm{ }p({\mathbf {r}},t = 0) = \mathbf{x}\textrm{, }\frac{\partial }{{\partial t}}p(\mathbf{r},t = 0) = 0.$$

where x is the initial pressure.

The measurement process of PACT signals can be written in matrix form as

(3)$$\mathbf{Ax} = \mathbf{y},$$

where A is a system matrix, y is the detected photoacoustic signal. Equation (3) describes the mapping from x to y, which constitutes the forward problem in PACT and the corresponding reconstruction process constitutes the inverse problem in PACT.

The system matrix is established based on Kaiser–Bessel (KB) Expansion Functions [37]. The temporal Fourier transform of the acoustic pressure generated by a KB function located at the origin $\tilde{p}_0^{\textrm{KB}}(f)$ is expressed as

(4)$$\tilde{p}_0^{\textrm{KB}}(f) ={-} \frac{{i4{\pi ^2}f{a^3}\beta }}{{{C_p}{I_m}(\gamma )}}\frac{{{j_{m + 1}}\left( {\sqrt {4{\pi^2}{a^2}{f^2}/c_0^2 - {\gamma^2}} } \right)}}{{{{({4{\pi^2}{a^2}{f^2}/c_0^2 - {\gamma^2}} )}^{(m + 1)/2}}}},$$

where i is imaginary unit, j_m(x) is the mth-order spherical Bessel function of the first kind, c₀ is the speed of sound, a and γ determine the support radius and the smoothness of KB function, respectively. Than the time domain expression of acoustic pressure is obtained by inverse Fourier transform of $\tilde{p}_0^{\textrm{KB}}(f)$. Note that it is usually unpractical to calculate the inverse or pseudo-inverse of system matrix A directly since the size of A is usually very large [38]. Suppose the targeted imaging area has a size of M × M pixels, the signal is detected by N transducer elements and the length of each channel of the signal is L, the size of x is M × M and the size of y is N × L. The system matrix A contains M × M columns and N × L rows, each column represents the photoacoustic signals associated to one pixel. In our experimental condition, M = 256, N = 128, L = 1856, the size of A will be very large, and the inverse problem should be solved with iteration methods.

The inverse process for Eq. (3) can be considered as an optimization problem:

(5)$$\mathbf{x} = \arg \mathop {\min }\limits_\mathbf{x} \left\|{\mathbf{Ax} - \mathbf{y}}\right\|_2^2,$$

where $\left\|\cdot \right\|_{_2}^2$ denotes the square of L₂ norm. The goal of reconstruction is to minimize the difference between the detected signals and the signals predicted by the forward model. Since the problem is usually ill-conditioned, the inversion of Eq. (5) is numerically unstable and the reconstructed images will contain artifacts. Regularization that encode the prior information of the images are often incorporated to improve the reconstructed image qualities, and the optimization problem can be reformulated as

(6)$$\mathbf{x} = \arg \mathop {\min }\limits_\mathbf{x} E(\mathbf{x}) = \arg \mathop {\min }\limits_\mathbf{x} \frac{1}{2}\left\|{\mathbf{Ax} - \mathbf{y}} \right\|_2^2 + \lambda R(\mathbf{x}),$$

where $\left\|{\mathbf{Ax} - \mathbf{y}} \right\|_2^2$ is a fidelity term addressing the data consistency between the reconstructed x and the measurement y, R(x) is a regularization term and the parameter λ is a regularization factor, controlling the contribution of the regularization term to the whole model.

Various types of regularization have been developed. Among them, the Tikhonov regularization is one of the most widely used and has a form of

(7)$$R(\mathbf{x}) = \left\|{\mathbf{Lx}} \right\|_2^2\textrm{,}$$

where L is the Tikhonov matrix and usually set as the identity matrix, i.e., L = I. This regularization tends to reduce the total energy of x to prevent unrealistically large entries in the solution to reduce noises. Another widely used regularization is total variation, having a form of [39]:

(8)$$R(\mathbf{x}) = {\left\|{\nabla \mathbf{x}} \right\|_2} = \sum\limits_{i = 1}^{n - 1} {\sum\limits_{j = 1}^{n - 1} {\sqrt {{{|{{x_{i + 1,j}} - {x_{i,j}}} |}^2} + {{|{{x_{i,j + 1}} - {x_{i,j}}} |}^2}} } } ,$$

where ${\left\|\cdot \right\|_2}$ denotes the L₂ norm and $\nabla$ denotes the gradient operator. The ability of reducing noises while preserving sharp edges makes TV regularization popular in medical image reconstruction.

2.2 Learned regularization

If the regularization R(x^k) is convex and differentiable, the problem in Eq. (6) can be solved using a simple gradient decent scheme. The iterative process of step k can be described as

(9)$${\mathbf{x}^{k + 1}} = {\mathbf{x}^k} - \alpha \frac{{\partial E}}{{\partial \mathbf{x}}} = {\mathbf{x}^k} - \alpha {\mathbf{A}^T}({\mathbf{A}{\mathbf{x}^k} - \mathbf{y}} )- \alpha \lambda \frac{{\partial R({\mathbf{x}^k})}}{{\partial \mathbf{x}}},$$

where α is a step size parameter. The gradient decent scheme is simple and straightforward, but the choice of parameters α and λ are not easy and it may take many steps to converge thus causing huge computational burden. Moreover, in many cases the problem (6) is convex but not differentiable, indicating that the gradient based methods cannot be used directly.

Conventional regularization terms try to extract specific structure information of the images for accurate reconstruction. For example, TV regularization will preserve the edges and reduce the noises of the images to improve image quality. CNN methods have been widely used in such image enhancing tasks, which means they have the potential of learning suitable regularization functions. It is straightforward to replace the regularization term R(x^k) with a network R_θ(x^k) where θ represents the parameters of the network. Compare to conventional handcrafted regularization terms, the learned regularization will not only capture simple structure edge information, but also reduce noises and learn large scale structure information. However, the term R_θ(x^k) cannot be used directly in the gradient decent process and the derivative in Eq.(9) should be calculated. The derivation is an operation that concerns edges and can also be incorporated in the network. Therefore, the CNN method is used to learn the gradient of regularization instead of the regularization term itself, which can be rewritten as

(10)$${R_{\textrm{CNN}}}({\mathbf{x}^k}) = \alpha \lambda \frac{{\partial R({\mathbf{x}^k})}}{{\partial \mathbf{x}}},$$

where R_CNN(x^k) is a learned regularization term based on a deep neural network and the parameter α is implicitly included in the network.

Since the learned regularization is supposed to be implemented in the gradient decent algorithm, the regularization parameter λ and the updating parameter α should be taken into account in the training process of R_CNN(x^k) at the same time. These parameters should be learned as well since it’s unpractical to get the parameters manually and the optimal parameters are different for every iteration step. It’s reasonable to put all three parts in Eq. (9) together to learn the optimal parameters at the same time, which results in the following iteration scheme:

(11)$${\mathbf{x}^{k + 1}} = {\mathbf{x}^k} - {\alpha ^k}{\mathbf{A}^T}({\mathbf{A}{\mathbf{x}^k} - \mathbf{y}} )- {R_{\textrm{CNN}}}({\mathbf{x}^k}),$$

where α^k is the learned updating parameter and R_CNN is learned from a deep neural network. The parameter α and λ in Eq. (9) are implicitly contained in α^k and R_CNN(x^k).

In the iteration scheme shows in (11), conventional optimization method and deep learning method work together to solve the optimization problem (6). The gradient decent method is used as iteration scheme to keep the overall structure concise and impact. There are many other optimization methods, but they are usually more complex and have special restrictions on the form of regularization term. For example, the conjugate gradient for least squares (CGLS) [40] require R(x^k) to be L₂ form and Alternating Direction Method of Multipliers (ADMM) [41] require R(x^k) to be L₁ form. Working with these optimization methods is practicable but will need special adjustment of the network structure and it is more complex.

The structure of the regularization learning network is shown in Fig. 1. According to Eq. (11), the iteration process consists of three parts: an identity term x^k, a fidelity term α^kA^T(Ax^k-y) and a regularization term R_CNN. The network structure follows the same logical as Eq. (11) and contains corresponding three parts as well: the identity term x^k does not contain any parameters and is copied from the input directly; the fidelity term α^kA^T(Ax^k-y) contains only one learnable parameter α^k which is learned from a scale layer; the regularization term R^k_CNN is learned by a feature learning block which takes both x^k and A^T(Ax^k-y) as input. The three parts are added together to get updated x^k +¹ image of the iteration step.

Fig. 1. Structure of the network for learned regularization.

Download Full Size | PDF

The architecture of the feature learning block adopts the encoder-decoder structure with an encoding part extracting features from inputs and a decoding part learning new images from the extracted features. Since the regularization term represents structure information of the reconstructed images, it is reasonable for the feature learning block to take x^k as input and learn structure information from it which is called single path strategy. However, the image x^k already loses information and it helps to learn more information from the signals. Motivated by the Y-Net [42], the proposed network consists of two encoders with encoder I extracts features from x^k and encoder II extracts features from A^T(Ax^k-y) which contains information from signals implicitly.

It’s natural to extract information from signals y directly. However, this is a signal-to-image task that transfers data between data domain and image domain which needs large and deep networks such as U-Net or the specially designed Y-net. It is unpractical to learn image information directly from signals y since we tend to keep the network size small. Therefore, the term A^T(Ax^k-y) is chosen as the input instead of y.

This extracting strategy is called dual path strategy. These two encoders extract features from x^k and A^T(Ax^k-y), and the features are merged together then decoded by the following decoder to get R^k_CNN, which is described by the following equation:

(12)$$R_{\textrm{CNN}}^k = R_{_{{\theta _k}}}^k({\mathbf{x},{\mathbf{A}^T}({\mathbf{A}{\mathbf{x}^k} - \mathbf{y}} )} ),$$

where θ_k is the parameters for the networks of step k, and the network has two inputs. The dual path strategy helps to learn information more effectively from both image and data domain.

The detailed structure of encoders and decoder are marked by red blocks in Fig. 1. The two encoders with the same structures take x^k and A^T(Ax^k-y) as inputs, respectively. Take encoder I as example, the input x^k first goes through two convolutional layers with 16 filters and a feature with size of 256 × 256 × 16 is obtained. Then a max pooling layer is used to down-sample the feature channel to 32. Two convolution layers with 32 filters are followed to get the final extracted feature with the size of 128 × 128 × 32. The two inputs x^k and A^T(Ax^k-y) go through encoder blocks I and II, which output features with size of 128 × 128 × 64 then they are merged together with concatenate layer (i) and go through the decoder block.

In the decoder block, the merged feature from encoders first goes through a deconvolution layer and two convolutional layers are next expanded to size of 256 × 256 × 32. To make the best use of the feature of different sizes, the former calculated features with size of 256 × 256 × 16 are merged together here with concatenate layer (ii). Finally, the merged feature is processed by two convolutional layers to learn the regularization term. The kernel sizes of all convolutional layers and deconvolution layers are the same of 3 × 3. Then the three parts, identity term x^k, fidelity term α^kA^T(Ax^k-y) and regularization R_CNN are added and processed by a final convolutional layer with kernel size of 1 × 1 to get x^k +¹.

The size of the network is very small compared with widely used post-processing networks such as U-Net [26,43]. Each encoder of the proposed network consists of only one downsampling step and the channels of convolutional layers vary from 16 to 32; while the encoder in the U-Net consists of four downsampling steps and the channels of convolutional layers vary from 64 to 1024. The parameter number of the network is determined by the channels and feature sizes of the layers [44]. The scale layer contains only one training parameter, and the number of parameters of each convolutional layer can be represented as:

(13)$$P = {C_i} \times {s^2} \times {C_o}\textrm{,}$$

where P is the parameter number of the layer, C_i is the input channel number, s² is convolutional kernel size and C_o is the output channel. The regularization learning network contains only 42 k training parameters while the U-Net for comparison contains 28 M training parameters. The size of the regularization learning network is only 0.15% of the U-Net.

Another motivation of the network is to take advantage of the physical model, the fidelity term A^T(Ax^k-y) does not go through any CNN process before training of final layer and the structures of encoders and decoder are quite simple compared to the U-Net based post-processing method. Since the calculation burden of fidelity term is massive, and the size of the system matrix is too large to fit in GPU, the part A^T(Ax^k-y) is calculated before feeding into the network.

2.3 Training strategy

For every different iteration step k, there are two approaches to train the parameters θ_k. The first approach is an end-to-end method, that the total iteration number is set as K. Every iteration from 0 to K-1 need to be calculated sequentially and only the result of the last iteration is used for evaluation. Given training set (x, y) and corresponding reference image set x_ref, the goal of the whole network is to find the minimum error ɛ between reference image and the prediction result of step K

(14)$$\varepsilon = \mathop {\min }\limits_{{\theta _0},\ldots ,{\theta _{K - 1}}} \left\|{{\mathbf{x}^K} - \mathbf{x}_{\textrm{ref}}^K} \right\|,$$

which means the output of the last iteration step x^K is monitored to approach reference image x_ref and intermediate results from x₀ to x_K-1 are optimized automatically by the network itself.

The second approach is step-by-step method that seek to find

(15)$${\varepsilon _k} = \mathop {\min }\limits_{{\theta _i}} \left\|{{\mathbf{x}^k} - \mathbf{x}_{\textrm{ref}}^k} \right\|,$$

for every k in [0, K-1] which means that the results of every iteration k are optimized to the optimum value separately. In this approach, the optimization problem in Eq. (14) is split into K independent optimization problems. For example, θ₀ is trained to minimize the error between x¹ and x¹_ref, after that, θ₀ is fixed and x¹ is calculated with θ₀ as optimal value. θ₁ is trained to minimize the error between x² and x²_ref. Since the reference images x^k_ref of intermediate iteration are not available and the results should approach the final reference image x^K_ref step by step, it’s reasonable to set all x^k_ref to be the same as x^K_ref.

Coupling parameters of every iterations and updating them simultaneously make it possible to reach global optimum. However, the system matrix has to be evaluated many times for the calculation of x^k and corresponding gradients for every iteration. While the second approach has difficulty finding global optimum as the first approach, it has two main advantages. Firstly, the computation burden is enormously reduced because the computation of A^T(Ax^k-y) can be done outside of network and the system matrix is decoupled with the training process of the network. Secondly, the networks decoupling of different iterations simplifies the training process. In each iteration, goal of the training is to find optimum parameters of the network for single iteration instead of all iterations. The first approach is straightforward and is widely used by many model-based deep learning methods in MRI [45,46], and CT [47,48], while in PACT, the second approach is usually adopted due to the huge system matrix [30].

The network cannot learn any useful information from the start of the iteration with x⁰= 0, and the according regularization should also be zero. Then, the first iteration is performed without the network and the actual initial value is

(16)$${\mathbf{x}^{_0}} = {\mathbf{A}^T}\mathbf{y},$$

where the amplitude of x⁰ is normalized to [0,1] and the parameter λ is deprecated.

The total iteration number is set to five which is enough to acquire high quality reconstruction results. The parameters of the network from the first iteration θ₀ are initialized randomly, the other parameters θ_k are initialized to the value of θ_k_-1.

3. Results

In this section, numerical and in vivo experimental datasets were used to evaluate the performance of the proposed method. Examples from test datasets were presented for visualization-based evaluation. For quantitative evaluation, the mean value of peak signal to noise ratio (PSNR) and structural similarity index measure (SSIM) of all test data were calculated. PSNR represents the error between corresponding pixels and SSIM measures image similarity from three aspects, brightness, contrast and structure.

The performance of the learned regularization was also compared with conventional iteration methods including TV, Tikhonov regularization, and U-Net based post-processing methods:

Firstly, as a strategy of regularization, the proposed method was compared with conventional Tikhonov and TV regularization methods. The conventional methods were implemented as described in [40,49]. In Tikhonov regularization method, we chose the identity matrix I as the Tikhonov matrix and solved the inverse problem using the conjugate gradient for least squares (CGLS) method. The CGLS method with Tikhonov regularization needs 20 iteration steps to converge, therefore the iteration number is set as 20. For TV regularization, we chose the form in Eq. (8) and solved the problem using gradient decent method. CGLS method was deprecated because TV regularization is not in the form of $\left\|{\mathbf{Lx}} \right\|_2^2$. The gradient decent method converges slower than CGLS method and it takes 50 iteration steps to converge. The regularization parameters were chosen to get the best PSNR and SSIM.

Secondly, as an application of CNN to PACT image enhancement, it was compared with post-processing CNN methods. To be consistent with the model-based scheme, initial reconstruction results before post-processing was obtained by CGLS method without any regularization. The popular U-Net was chosen as the post-processing network and the training environment and datasets were the same as those used in the learned regularization. The training of all datasets was performed with Adam for 50 epochs and a learning rate of 1e^-5.

The Adam algorithm [50] was used for network training and the initial learning rate is set to 5e^-5 for every iteration with a batch size of 8. We trained every other iteration for 50 epochs, and the total number of training epochs for five iterations is 250. For the training of each iteration, mean square error (MSE) was used as the loss function. The training was implemented with TensorFlow [51] on a GeForce RTX 2080Ti GPU with 11GB memory.

3.1 Results on simulation data

The simulation samples are selected from a human fundus dataset DRIVE collected by Hoover et al. [52]. The dataset contains only 40 segmented vessels images and need to be augmented to acquire sufficient training samples. The segmented blood vessel images in the dataset DRIVE is augmented to 1024 images with a size of 256 pixels × 256 pixels by cropping and rotating. The images are normalized in intensity and then used as phantoms to generate photoacoustic signals for image reconstruction. The transducer used in forward simulation is a ring array with a diameter of 80 mm and the imaging target is located at the center of the array with a size of 25 mm×25 mm. The center frequency of the transducer is 4.9 MHz and the bandwidth is 67%. The generation of photoacoustic signals is implemented with the k-wave toolbox [53]. The sample rate is 5 MS/s and the signal length is 1856.

To evaluate the performance of the network under different sparsity conditions, the number of transducers of the ring array is set as 32, 64, 128, respectively. The preprocessed fundus vessel images are chosen as references. The total 1024 phantoms are divided into three groups with 768 as the training set, 128 as the validation set and 128 as the test set.

Figure 2 shows an example from the test dataset. All methods suffer most from sparsity condition when transducer number N = 32 and perform best when transducer number N = 128. The Tikhonov regularization does not perform well especially when the transducer number is small. That is because it is hard for Tikhonov regularization to differentiate artifacts from true image details. The TV regularization outperforms the Tikhonov regularization in suppressing artifacts and emphasizing the sharp edges of vessel structures. That’s because the structure of vessel phantom conforms to the piece-wise constant assumption in TV regularization.

Fig. 2. Reconstruction results of different methods on simulation data generated from human fundus vessel phantom. The learned regularization is compared with conventional iteration methods with Tikhonov and TV regularization and the popular post-processing method using U-Net structure [54]. Iterative reconstruction without any regularization is used as initial reconstruction for U-Net post-processing. The preprocessed fundus vessel images are chosen as references. The methods are compared under different views with a transducer number N = 32, N = 64 and N = 128.

Download Full Size | PDF

It can be seen in the figure that the proposed learned regularization shows great advantage over both Tikhonov and TV regularization. The result exhibits a clean background and keeps the vessel shape well. Promising results are also obtained through the U-Net method. Still, in the situation of transducer number N = 32 with large sparsity the images are distorted, and some details are lost while the result of learned regularization shows neat edge of vessel phantom. Visual difference of the two methods with N = 64 and N = 128 is not obvious as they both perform very well, but the quantitative evaluation results in Table 1 show that the learned regularization outperforms the U-Net based post-processing for every situation.

Table 1. Quantitative evaluation of different methods with simulation data

View Table | View all tables in this article

The SSIM and PSNR are calculated and averaged on all the test samples. The PSNR and SSIM of the learned regularization are both superior to those of Tikhonov and TV regularization. It can be seen in the table that when the transducer number N increases from 32 to 128, both performances of Tikhonov and TV regularization improve notably, however, they are far inferior to that of learned regularization with N = 32.

Compared with the U-Net based post-processing method, the learned regularization still shows superiority. The SSIM of these two methods is quite close which means both methods can recover very well structural information from simple images. The PSNR of the learned regularization is superior to U-Net method. These results show great performance of the proposed learned regularization, but it still needs further experiments on in vivo data with more complicated target images and stronger noises in detected signals.

3.2 Results on in vivo data

The experimental conditions including transducer parameters and sample rates are the same as those of above simulations, and the number of train data and test data is also the same. Photoacoustic data is collected from eight female nude mice with whole body PACT imaging experiments. With each mouse, 128 cross-sections are collected at positions equally placed at the trunk position of the target mouse. As in the simulation, the transducer array contains 512 elements, and sparse measurements using 32, 64, and 128 elements are recorded for reconstruction with different methods. The reference images are reconstructed by iteration method without regularization using data collected with 512 elements. For network training, the data of six mice is used for training, and the data of two other mice is used for validation and test. Different datasets are collected from different mice to prevent overfitting and keep the trained network robust.

The reconstruction results of different methods on in vivo data are shown in Fig. 3. It’s shown that results of the method with Tikhonov regularization contain streak artifacts and noises with random amplitudes. Compared with Tikhonov regularization, TV regularization is more effective in reducing the noise level and suppressing the streak artifacts. But it still has problem reserving details of the image. The results of U-Net based post-processing and learned regularization methods are superior to that of conventional methods on performance of keeping details and eliminating streak artifacts. Very clean background can be obtained through the deep learning methods on every situation but details are still lost. It is seen that with an increase of the transducer number from N = 32 to N = 128, more details can be recovered and the final image quality improves significantly. Furthermore, both deep learning methods can denoise the image very well, and the results of learned regularization show more details and clearer edges than the U-Net method.

Fig. 3. Reconstruction results of different methods on in vivo data collected from female nude mouse. The learned regularization is compared with conventional iteration methods with Tikhonov and TV regularization and popular post-processing method using U-Net structure [54]. Iterative reconstruction without any regularization is used as initial reconstruction for U-Net post-process. The reference images are reconstructed by iteration method without regularization using data collected with 512 elements. The different methods are compared under different sparsity with transducer number N = 32, N = 64 and N = 128.

Download Full Size | PDF

The quantitative evaluation of the results is shown in Table 2. The SSIM and PSNR values of the results using Tikhonov and TV regularization are quite close, while the quantitative performances of deep learning methods are obviously superior. Moreover, all metrics of the learned regularization are better than those of the U-Net method. For example, when N = 128, the PSNR of the learned regularization is 12.0% higher than U-Net method. And the SSIM of the learned regularization is 4.6% higher than U-Net method. The results of quantitative evaluation are consistent with the visual results in Fig. 3.

Table 2. Quantitative performance evaluation of different methods with in vivo data

View Table | View all tables in this article

In both simulation dataset with simple images and experimental data with complex images, these two deep learning methods can reconstruct high quality images as they can learn sufficient information for image enhancement. And the combination of neural network and physical model enables the learned regularization with very small network to outperform the U-Net based post-processing method. The visual and quantitative results on both simulation and in vivo data support the conclusion that the learned regularization is superior to conventional regularization and U-Net methods.

4. Discussion

In this section we investigate the iteration process of the learned regularization and analyze the performance of the dual path strategy.

4.1 Iteration process

As introduced in section 2.3, the start point of the iteration is 0, but the initial input to the network is actually x⁰ = A^Ty for the network to learn meaningful information. With the increase of iteration step k, the reconstructed images x^k should be improved gradually and reach optimum on the last iteration. Figure 4 shows the iteration process of an example in the in vivo dataset when the transducer number N = 128.

Fig. 4. Progress of iterations of the in vivo data. (a)-(c) are visual results of reconstruction process with the same sample as Fig. 3. (a) Initial image x⁰ = A^Ty; (b) reconstruction result x⁰ after one iteration; (c) reconstruction result x⁵ after the final iteration; (d)(e) average PSNR and SSIM value after different iteration steps. Iteration 0 means initial image x⁰; (f) training loss during all the iteration steps. Every iteration step contains 50 epochs, for example, epochs 1-50 belong to the first iteration and epochs 200-250 belong to the 5th iteration.

Download Full Size | PDF

The initial image in Fig. 4 (a) contains obvious streak artifacts and it’s hard to recognize true vessels. After one iteration the result in Fig. 4 (b) shows that the streak artifacts are eliminated, and the main structure information is restored. But there are still some noises and the edges of the body is not clear, especially the edges in the left and bottom. After five iterations the image in Fig. 4 (c) reaches high quality, the noises are mostly removed and the edges are distinct. The quantitative evaluations of PSNR and SSIM and training losses shown in Fig. 4 (d-f) are in consistent with the visual results. It can be concluded that a major improvement is achieved in the first iteration and the other iteration steps do the refinement work and improve the performances on details.

In the simulation and in vivo experiments, CGLS method with Tikhonov regularization and gradient decent method with TV regularization iterate for 20 steps and 50 steps respectively. Results in Fig. 4 show that the learned regularization method need only five iteration steps which is much fewer than conventional iteration methods.

The major benefit of fewer iteration steps is much less computing time. The total processing time for different methods to get single sample is shown in Table 3. All the processing time is calculated for 10 samples than averaged on experimental data (It is the same as on simulation data). TV regularization takes about twice the time of Tikhonov method because it takes more iteration steps. The proposed method takes about only one third the time of Tikhonov method. The most time-consuming part of the proposed method is the calculation of A^T(Ax^k-y) and the time spent by CNN is less than 1 second for all the situations.

Table 3. Processing time of different methods

View Table | View all tables in this article

The U-Net post-processing time is also less than 1 second, however it needs conventional methods for initial reconstruction, therefore the total running time is mainly determined by initial reconstruction method. The initial image for U-Net processing is reconstructed by CGLS method with 20 iteration steps, thus the total processing time of U-Net method is approximately the same as that of Tikhonov method. If FBP method is chosen as initial reconstruction method, the time can be further shortened.

4.2 Dual path strategy analysis

As introduced in previous section, the learned regularization features are extracted from both x^k and A^T(Ax^k-y) and combined them to learn the best regularization. We call this strategy the dual path strategy, where encoder I serves as the path extracting features from images and encoder II serves as the path extracting features from signals implicitly.

Figure 5 shows the schematics of different feature extraction strategies. The main difference is that the dual path strategy showed in Fig. 5(a) contains the path marked with red block, while in the single path strategy showed in Fig. 5(b), this path is cut and the regularization is learned solely from x^k.

Fig. 5. Feature extraction strategy comparison. (a) Regularization learning scheme with dual path strategy; (b) regularization learning scheme with single path strategy.

Download Full Size | PDF

The comparison is implanted on both simulation and in vivo datasets and the quantitative evaluation is shown in Fig. 6. It’s shown in the figure that in all situations, the performances of dual path strategy are superior to that of single path strategy. The difference of their performances is more obvious when the transducer number is small, where the information from x^k is not sufficient for regularization learning.

Fig. 6. Comparison of the dual path feature extraction strategy with the single path (a) PSNR evaluation of simulation dataset; (b) PSNR evaluation of experimental dataset; (c) SSIM evaluation of simulation dataset; (d) SSIM evaluation of experimental dataset;

Download Full Size | PDF

This result implies that the information contains in signals is useful for constructing regularization while only information of image structure is taken into account in conventional handcrafted regularizations. The dual path strategy utilizes the signals more effective than single path strategy and performs better.

5. Conclusion

In this work, we established a CNN architecture to learn the regularization for iterative image reconstruction in sparse-view PACT. The regularization information is learned from both data and image domain with a dual path strategy. The training of the learned regularization is implemented step-by-step to reduce the computation burden related to system matrix.

We have evaluated the performance of the proposed learned regularization on both simulation and in vivo datasets. The comparison with conventional regularization and U-Net post-processing method shows the advantages of the learned regularization. This work shows that the deep combination of CNN methods and physical models has great potential. The CNN grants the learned regularization the ability to converge within five iterations which is much less than conventional iteration methods and it takes less than one third the time of conventional iteration method. The adoption of physical models optimizes the method to miniaturize the network and perform better than the classical U-Net method.

Despite all these superiorities, the proposed work also has some limitations and can be further improved in future works.

1. Due to limited computational resources, we adopted a step-by-step method to train the network to reduce the burden. It may fall into local minima in each single iteration and hard to find the best global optima for the whole iteration. Training with an end-to-end method can potentially improve the outcome.
2. This paper focuses on learning deep regularization terms for iterative image reconstruction, which needs the participation of system matrix A and is time consuming. Deep learning post-processing methods with filtered back projection or delay and sum as initial reconstruction methods will be faster.
3. The initial value of the first iteration is set as x⁰ = A^Ty which is heavily affected by sparse view artifacts, and the low-quality initial reconstruction image limits the performance of the proposed method. By replacing A^Ty with FBP or CGLS reconstructed image will further improve the performance.
4. The initial value of U-Net method is reconstructed by iterative method without any regularization, and training with Tikhonov and TV regularized images will further increase the final performance. Furthermore, if using the proposed deep regularization method as initial reconstruction method and using U-Net as post-processing, it is possible to improve the performance much more.

Funding

National Natural Science Foundation of China (12174368, 61705216, 62122072); Anhui Provincial Department of Science and Technology (18030801138, 202203a07020020); Institute of Artificial Intelligence at Hefei Comprehensive National Science Center (21KT016); Zhejiang Lab (2019MC0AB01).

Acknowledgments

The authors thank Songde Liu for collecting in vivo photoacoustic data.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. L. V. Wang, “Multiscale photoacoustic microscopy and computed tomography,” Nat. Photonics 3(9), 503–509 (2009). [CrossRef]

2. C. Tian, Z. Xie, M. L. Fabiilli, S. Liu, C. Wang, Q. Cheng, and X. Wang, “Dual-pulse nonlinear photoacoustic technique: a practical investigation,” Biomed. Opt. Express 6(8), 2923–2933 (2015). [CrossRef]

3. C. Tian, Z. Xie, M. L. Fabiilli, and X. Wang, “Imaging and sensing based on dual-pulse nonlinear photoacoustic contrast: a preliminary study on fatty liver,” Opt. Lett. 40(10), 2253–2256 (2015). [CrossRef]

4. G. P. Luke and S. Y. Emelianov, “Label-free detection of lymph node metastases with US-guided functional photoacoustic imaging,” Radiology 277(2), 435–442 (2015). [CrossRef]

5. C. Tian, W. Qian, X. Shao, Z. Xie, X. Cheng, S. Liu, Q. Cheng, B. Liu, and X. Wang, “Plasmonic nanoparticles with quantitatively controlled bioconjugation for photoacoustic imaging of live cancer cells,” Adv. Sci. 3(12), 1600237 (2016). [CrossRef]

6. S. Liu, H. Wang, C. Zhang, J. Dong, S. Liu, R. Xu, and C. Tian, “In vivo photoacoustic sentinel lymph node imaging using clinically-approved carbon nanoparticles,” IEEE Trans. Biomed. Eng. 67(7), 2033–2042 (2019). [CrossRef]

7. H. Wang, S. Liu, T. Wang, C. Zhang, T. Feng, and C. Tian, “Three-dimensional interventional photoacoustic imaging for biopsy needle guidance with a linear array transducer,” J. Biophotonics 12(12), e201900212 (2019). [CrossRef]

8. C. Tian, C. Zhang, H. Zhang, D. Xie, and Y. Jin, “Spatial resolution in photoacoustic computed tomography,” Rep. Prog. Phys. 84(3), 036701 (2021). [CrossRef]

9. C. Tian, M. Pei, K. Shen, S. Liu, Z. Hu, and T. Feng, “Impact of system factors on the performance of photoacoustic tomography scanners,” Phys. Rev. Appl. 13(1), 014001 (2020). [CrossRef]

10. M. Xu and L. V. Wang, “Universal back-projection algorithm for photoacoustic computed tomography,” Phys. Rev. E 71(1), 016706 (2005). [CrossRef]

11. L. A. Kunyansky, “Explicit inversion formulae for the spherical mean Radon transform,” Inverse Prob. 23(1), 373–383 (2007). [CrossRef]

12. C. Huang, K. Wang, L. Nie, L. V. Wang, and M. A. Anastasio, “Full-wave iterative image reconstruction in photoacoustic tomography with acoustically inhomogeneous media,” IEEE Trans. Med. Imaging 32(6), 1097–1110 (2013). [CrossRef]

13. L. Ding, D. Razansky, and X. L. Deán-Ben, “Model-based reconstruction of large three-dimensional optoacoustic datasets,” IEEE Trans. Med. Imaging 39(9), 2931–2940 (2020). [CrossRef]

14. Y. Lou, S. Park, F. Anis, R. Su, A. A. Oraevsky, and M. A. Anastasio, “Analysis of the use of unmatched backward operators in iterative image reconstruction with application to three-dimensional optoacoustic tomography,” IEEE Trans. Comput. Imaging 5(3), 437–449 (2019). [CrossRef]

15. L. Ding, X. L. Deán-Ben, and D. Razansky, “Efficient 3-D model-based reconstruction scheme for arbitrary optoacoustic acquisition geometries,” IEEE Trans. Med. Imaging 36(9), 1858–1867 (2017). [CrossRef]

16. D. Calvetti, S. Morigi, L. Reichel, and F. Sgallari, “Tikhonov regularization and the L-curve for large discrete ill-posed problems,” J. Comput. Appl. Math. 123(1-2), 423–446 (2000). [CrossRef]

17. C. B. Shaw, J. Prakash, M. Pramanik, and P. K. Yalavarthy, “Least squares QR-based decomposition provides an efficient way of computing optimal regularization parameter in photoacoustic tomography,” J. Biomed. Opt. 18(8), 080501 (2013). [CrossRef]

18. J. Wang and Y. Y. Wang, “Photoacoustic imaging reconstruction using combined nonlocal patch and total-variation regularization for straight-line scanning,” Biomed. Eng. Online 21(1), 1–24 (2022). [CrossRef]

19. Y. Zhang, Y. Wang, and C. Zhang, “Total variation based gradient descent algorithm for sparse-view photoacoustic image reconstruction,” Ultrasonics 52(8), 1046–1055 (2012). [CrossRef]

20. X. Li, L. Qi, S. Zhang, S. Huang, J. Wu, L. Lu, Y. Feng, Q. Feng, and W. Chen, “Model-based optoacoustic tomography image reconstruction with non-local and sparsity regularizations,” IEEE Access 7, 102136–102148 (2019). [CrossRef]

21. C. Wang, H. Qin, G. Lai, G. Zheng, H. Xiang, J. Wang, and D. Zhang, “Automated classification of dual channel dental imaging of auto-fluorescence and white lightby convolutional neural networks,” J. Innov. Opt. Health Sci. 13(04), 2050014 (2020). [CrossRef]

22. G. Zheng, Y. Jiang, C. Shi, H. Miao, X. Yu, Y. Wang, S. Chen, Z. Lin, W. Wang, and F. Lu, “Deep learning algorithms to segment and quantify the choroidal thickness and vasculature in swept-source optical coherence tomography images,” J. Innov. Opt. Health Sci. 14(01), 2140002 (2021). [CrossRef]

23. S. Xie, X. Zheng, Y. Chen, L. Xie, J. Liu, Y. Zhang, J. Yan, H. Zhu, and Y. Hu, “Artifact removal using improved GoogLeNet for sparse-view CT reconstruction,” Sci. Rep. 8(1), 6700 (2018). [CrossRef]

24. S. Guan, A. A. Khan, S. Sikdar, and P. V. Chitnis, “Fully dense UNet for 2-D sparse photoacoustic tomography artifact removal,” IEEE J. Biomed. Health Inf. 24(2), 568–576 (2020). [CrossRef]

25. H. Zhang, L. Hongyu, N. Nyayapathi, D. Wang, A. Le, L. Ying, and J. Xia, “A new deep learning network for mitigating limited-view and under-sampling artifacts in ring-shaped photoacoustic tomography,” Comput. Med. Imaging Graph. 84, 101720 (2020). [CrossRef]

26. N. Davoudi, X. L. Deán-Ben, and D. Razansky, “Deep learning optoacoustic tomography with sparse data,” Nat. Mach. Intell. 1(10), 453–460 (2019). [CrossRef]

27. H. Shan, G. Wang, and Y. Yang, “Accelerated correction of reflection artifacts by deep neural networks in photo-acoustic tomography,” Appl. Sci. 9(13), 2615 (2019). [CrossRef]

28. T. Tong, W. Huang, K. Wang, Z. He, L. Yin, X. Yang, S. Zhang, and J. Tian, “Domain transform network for photoacoustic tomography from limited-view and sparsely sampled data,” Photoacoustics 19, 100190 (2020). [CrossRef]

29. J. Feng, J. Deng, Z. Li, Z. Sun, H. Dou, and K. Jia, “End-to-end Res-Unet based reconstruction algorithm for photoacoustic imaging,” Biomed. Opt. Express 11(9), 5321–5340 (2020). [CrossRef]

30. A. Hauptmann, F. Lucka, M. Betcke, N. Huynh, J. Adler, B. Cox, P. Beard, S. Ourselin, and S. Arridge, “Model-based learning for accelerated, limited-view 3-D photoacoustic tomography,” IEEE Trans. Med. Imaging 37(6), 1382–1393 (2018). [CrossRef]

31. Y. E. Boink, S. Manohar, and C. Brune, “A partially-learned algorithm for joint photo-acoustic reconstruction and segmentation,” IEEE Trans. Med. Imaging 39(1), 129–139 (2020). [CrossRef]

32. K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN denoiser prior for image restoration,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), pp. 3929–3938.

33. S. Lunz, O. Öktem, and C.-B. Schönlieb, “Adversarial regularizers in inverse problems,” Adv. Neural Inf. Process. Syst. 31, 1 (2018). [CrossRef]

34. H. Li, J. Schwab, S. Antholzer, and M. Haltmeier, “NETT: Solving inverse problems with deep neural networks,” Inverse Prob. 36(6), 065005 (2020). [CrossRef]

35. S. Antholzer, J. Schwab, J. Bauer-Marschallinger, P. Burgholzer, and M. Haltmeier, “NETT regularization for compressed sensing photoacoustic tomography,” in Photons Plus Ultrasound: Imaging and Sensing 2019, (SPIE, 2019), pp. 272–282.

36. D. Obmann, J. Schwab, and M. Haltmeier, “Sparse synthesis regularization with deep neural networks,” in 2019 13th International Conference on Sampling Theory and Applications (SampTA), (IEEE, 2019), pp. 1–5.

37. K. Wang, R. W. Schoonover, R. Su, A. Oraevsky, and M. A. Anastasio, “Discrete imaging models for three-dimensional optoacoustic tomography using radially symmetric expansion functions,” IEEE trans. med. imaging 33(5), 1180–1193 (2014). [CrossRef]

38. A. Rosenthal, “Algebraic determination of back-projection operators for optoacoustic tomography,” Biomed. Opt. Express 9(11), 5173–5193 (2018). [CrossRef]

39. R. C. Aster, B. Borchers, and C. H. Thurber, “Total variation regularization,” in Parameter Estimation and Inverse Problems (Elsevier, 2019), pp. 195–196.

40. R. C. Aster, B. Borchers, and C. H. Thurber, “The CGLS method,” in Parameter Estimation and Inverse Problems (Elsevier, 2019), pp. 165–170.

41. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends® in Mach. learn. 3(1), 1–122 (2011).

42. H. Lan, D. Jiang, C. Yang, F. Gao, and F. Gao, “Y-Net: Hybrid deep learning image reconstruction for photoacoustic tomography in vivo,” Photoacoustics 20, 100197 (2020). [CrossRef]

43. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

44. C. Shi, T. Wang, and L. Wang, “Branch feature fusion convolution network for remote sensing scene classification,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 13, 5194–5210 (2020). [CrossRef]

45. H. K. Aggarwal, M. P. Mani, and M. Jacob, “MoDL: Model-based deep learning architecture for inverse problems,” IEEE Trans. Med. Imaging 38(2), 394–405 (2019). [CrossRef]

46. K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated MRI data,” Magn. Reson. Med. 79(6), 3055–3071 (2018). [CrossRef]

47. J. Adler and O. Öktem, “Solving ill-posed inverse problems using iterative deep neural networks,” Inverse Prob. 33(12), 124007 (2017). [CrossRef]

48. H. Chen, Y. Zhang, Y. Chen, J. Zhang, W. Zhang, H. Sun, Y. Lv, P. Liao, J. Zhou, and G. Wang, “LEARN: Learned experts’ assessment-based reconstruction network for sparse-data CT,” IEEE Trans. Med. Imaging 37(6), 1333–1347 (2018). [CrossRef]

49. R. C. Aster, B. Borchers, and C. H. Thurber, “The gradient decent method,” in Parameter Estimation and Inverse Problems (Elsevier, 2019), pp. 156–160.

50. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv preprint arXiv: 1412.6980 (2014).

51. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, and M. Devin, “Tensorflow: large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:.04467 (2016).

52. A. Hoover, V. Kouznetsova, and M. Goldbaum, “Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response,” IEEE Trans. Med. Imaging 19(3), 203–210 (2000). [CrossRef]

53. B. E. Treeby and B. T. Cox, “k-Wave: MATLAB toolbox for the simulation and reconstruction of photoacoustic wave fields,” J. Biomed. Opt. 15(2), 021314 (2010). [CrossRef]

54. K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image. Process. 26(9), 4509–4522 (2017). [CrossRef]

Views	32		64		128
Metrics	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
Tikhonov	13.91	0.22	17.64	0.33	20.01	0.42
TV	15.62	0.29	20.50	0.43	21.97	0.47
U-Net	29.77	0.94	34.99	0.94	39.61	0.98
Learned regularization	32.27	0.93	38.31	0.97	41.71	0.98

Views	32		64		128
Metrics	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
Tikhonov	21.30	0.27	23.18	0.36	24.97	0.51
TV	22.25	0.41	24.20	0.53	25.75	0.65
U-Net	25.28	0.83	27.23	0.85	27.89	0.87
Learned regularization	26.35	0.84	27.77	0.86	31.24	0.91

Views	32	64	128
Tikhonov	9.05 s	18.03 s	35.76 s
TV	20.88 s	40.24 s	79.61 s
U-Net	11.84 s	19.35 s	39.50 s
Learned regularization	3.42 s	5.39 s	9.01 s

Views	32		64		128
Metrics	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
Tikhonov	13.91	0.22	17.64	0.33	20.01	0.42
TV	15.62	0.29	20.50	0.43	21.97	0.47
U-Net	29.77	0.94	34.99	0.94	39.61	0.98
Learned regularization	32.27	0.93	38.31	0.97	41.71	0.98

Views	32		64		128
Metrics	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
Tikhonov	21.30	0.27	23.18	0.36	24.97	0.51
TV	22.25	0.41	24.20	0.53	25.75	0.65
U-Net	25.28	0.83	27.23	0.85	27.89	0.87
Learned regularization	26.35	0.84	27.77	0.86	31.24	0.91

Learned regularization for image reconstruction in sparse-view photoacoustic tomography

Abstract

1. Introduction

2. Methods

2.1 Regularized image reconstruction in PACT

2.2 Learned regularization

2.3 Training strategy

3. Results

3.1 Results on simulation data

3.2 Results on in vivo data

4. Discussion

4.1 Iteration process

4.2 Dual path strategy analysis

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (6)

Tables (3)

Equations (16)

Biomedical Optics Express