Photoacoustic imaging with limited sampling: a review of machine learning approaches

Ruofan Wang; Jing Zhu; Jun Xia; Junjie Yao; Junjie Yao; Junhui Shi; Junhui Shi; Chiye Li

doi:10.1364/BOE.483081

1. Introduction

With the advancement of the theory and the development of powerful computation hardware, artificial intelligence has become a popular research topic during the last few decades. It has developed many branches, such as expert system [1,2], machine learning [3–5], and the latest deep learning [6–9]. The application of machine learning in medical imaging attracts more and more attention from researchers and clinicians [10,11]. As one of these medical imaging modalities, photoacoustic imaging combines the high contrast of optical imaging and the deep penetration of ultrasound imaging. Likewise, photoacoustic imaging also benefits from incorporating knowledge and technical advancements in artificial intelligence [12,13].

In photoacoustic imaging, when biological tissue is irradiated by pulsed laser excitation, it absorbs the incident light, causing local temperature rise and subsequent thermal expansion. Such instantaneous expansion produces an ultrasonic wave, named the photoacoustic wave. Photoacoustic computed tomography (PACT) can map the light absorption distribution inside the tissue by detecting the photoacoustic wave from the outside using ultrasonic transducers. After image reconstruction, PACT can locate the source of the photoacoustic wave, recover the initial wave pressure, as well as quantify the concentration of absorbing molecules. Several algorithms have been developed for these tasks, including delay and sum (DAS), back projection (BP), time reversal (TR), etc. [14–16]. The number of the detector array elements, i.e., ultrasonic transducers, substantially influences the imaging quality, including spatial resolution, field of view, and imaging speed. Generally, a high-quality image requires data from a large number of transducer elements distributed over a large detection angle. However, a detector array with an increased number of elements requires more complicated fabrication, as well as corresponding signal amplifiers and digitizing channels, which all increase the system cost and data processing workload. Depending on the geometry and application, there is also a practical limit on the number of detectors that can be placed around the object [17]. Besides, it is usually impractical to design a detection system that can cover the full detection aperture around the object, since the imaging system needs some space for light delivery and target access. While maneuvering the detector module around the object may achieve a larger angle coverage, it would also substantially decrease the imaging speed. Due to these limitations, PACT data are usually collected from a limited view angle, and sometimes below the Nyquist sampling limit in the spatial domain. Figure 1 shows common photoacoustic imaging experiment system setups with limited view and sparse array. Incomplete angular coverage causes the loss of signal information in the wave measurements. The reconstructed images from the limited-view and under-sampled data are accompanied by serious artifacts , distortions, and aliasing, which degrade the medical image and affect its readability. As another essential implementation of photoacoustic imaging, photoacoustic microscopy (PAM) utilizes focused illumination and/or detection and forms an image by point-by-point scanning. Due to the requirement for a high spatial sampling rate, PAM often sacrifices the imaging speed to fulfill the sampling requirements. Otherwise, it also encounters challenges such as aliasing and artifacts. These limitations hinder the application of photoacoustic technology in preclinical and clinical settings. Therefore, many efforts have been focused on reconstructing PA images using limited-view and under-sampled data, in order to achieve comparable imaging quality with that using complete spatial sampling [18,19]. esearchers have made progress in hardware, as well as in detection and reconstruction algorithms, to address the restricted sampling issue. Lin et al. proposed a design with a dual-foci ultrasonic transducer, which has a high numerical aperture on the transverse plane. The large acceptance angle on the transverse plane and the virtual point detection focusing method enable imaging with only a few scanning steps and a small number of detectors [20]. The design demonstrated the potential of using sparse arrays for low-cost, high-quality, and high-speed PACT. In order to achieve a large coverage angle and dense sampling, Li et al. performed photoacoustic tomography by rotating a sparse detector array [21]. The illumination wavelength is switched at the same time as the detector array rotates, so the overall spectral imaging time is reduced. The detection view angles can also be increased by using acoustic reflectors, without the need to move the detector [22]. Other experiments have shown that the iterative reconstruction method can effectively reduce limited-view artifacts in a partial ring array setup, achieving good imaging results with a $135^{\circ }$ detection angle [23]. In addition to these endeavors, machine learning has shown great promise in medical imaging and has been quickly adopted by photoacoustic researchers. As an indication, the literature retrieved from the Web of Science on deep learning methods in photoacoustic imaging quickly increased from 2017 to 2022, as shown in Fig. 2. Both PACT and PAM have a substantial growth in the number of publications and citations. Several review articles also summarize the research works on applying deep learning to PACT , including applications in noise reduction, artifact removal, quantitative imaging, image interpretation, etc. [24,25]. This review is primarily focused on using machine learning approaches to deal with the problems originating from limited spatial sampling, including limited view and undersampling. Notably, we also cover the very recent advances in applying machine learning methods to sparse-sampling PAM.

Fig. 1. Diagrams of experiment system with limited spatial sampling in PACT. (a) The limited-view problem with a linear array [26]. (b) A three-quarter ring array-based PACT system [27]. (c) 50 equidistant transducers on a 12 cm diameter hemisphere simulated by the k-Wave toolbox [17].

Download Full Size | PDF

Fig. 2. The trend of publications (histogram) and citations (line plot) in recent years. (a) Deep learning-based PACT. (b) Deep learning-based PAM.

Download Full Size | PDF

The rest of the paper is organized as follows. In Section 2, we briefly introduce machine learning, and its applications in the context of limited spatial sampling problems of photoacoustic imaging. Then the various methods developed by researchers are summarized from different perspectives in Sections 3–5. In Section 6, we review the recent studies on applying machine learning to sparse-sampling PAM. The conclusions and outlook are discussed in Section 7.

2. Overview of machine learning for photoacoustic imaging

Machine learning is a branch of artificial intelligence that identifies patterns and relationships in data. It includes categories such as dictionary learning [18,28,29], Bayesian learning [30], deep learning [31,32], all of which have demonstrated applications in medical image processing. Dictionary learning is usually used to represent complete signals with a set of dimensionally reduced signals called the dictionary, and it is applied in photoacoustic imaging to recover the object structure from sparsely sampled data. Bayesian learning aims to develop the model from the training data from the probability perspective. There have been some applications of Bayesian learning in ultrasound image processing and reconstruction to improve imaging speed and image quality [33]. At present, it has not been applied to the reconstruction of photoacoustic signal under limited sampling. Deep learning is part of machine learning methods that is essentially artificial neural networks with three or more layers. Based on its basic structure, many different networks with distinguishing layers have been developed. In Fig. 3(a), we explain the relationship between artificial intelligence, machine learning, and deep learning through Venn diagram, and summarize the references to the various machine learning approaches mentioned in this paper. In general, U-Net, generative adversarial network (GAN), and dictionary learning are the most used designs applied in photoacoustic imaging. The structures of U-Net and GAN are shown in Fig. 3(b) and (c). At the same time, many researchers have also derived their own distinctive network structures.

Fig. 3. Representative machine learning models and networks. (a) The relationship between artificial intelligence, machine learning, and deep learning, and corresponding research works mentioned in this review. (b) The general architecture of U-Net [34]. (c) The schematic diagram of GAN [44].

Download Full Size | PDF

U-Net was presented at the Medical Image Computing and Computer Assisted Intervention Society (MICCAI) in 2015 [34], originally designed for biomedical image segmentation, and showed results superior to other networks. Later, U-Net and its variation networks were also used for other image processing applications, such as photoacoustic image reconstruction under limited sampling conditions [35,36], artifacts removal [37,38], and denoising [39].

GAN was designed by Goodfellow and colleagues in 2014 [40]. The GAN structure consists of a generator and a discriminator. The main task of the generator is to generate a batch of new data with the same characteristics as the training data, often called the fake data. The discriminator compares the real input data with the fake data and outputs the prediction label. GAN is flexible and allows unsupervised model training. It has also been used to improve the image reconstruction quality in PACT [41–43].

3. Training data for machine learning-based PACT

3.1 Source of training data

Generating the data for the machine learning model is a critical step. The scope of the data used for model development also determines how well the model can be generalized. Most applications in PACT adopt supervised learning approaches and require paired data of limited sampling and the corresponding ground truth for training and testing. In this section, we summarize the common sources of data used in machined learning PACT research.

(1) Simulated data.

Data simulation usually involves (1) generating the detector channel signals through a photoacoustic forward model, based on images of relevant structures, i.e., digital phantom; (2) simulating the sparse and limited view sampling by applying downsampling operation retrospectively; (3) reconstruction using the downsampled channel data. The reconstructed images have artifacts from undersampling and limited view, and can be paired with either the original digital phantom image or full data reconstruction image for supervised training. The digital phantom can be based on simple structures, such as disks and tubes. For better relevance and generalization to medical imaging, image databases for other medical imaging modalities are also used, including CT [19,45] and MRI databases [46]. For photoacoustic forward modeling, software toolboxes such as COMSOL [47] and k-Wave [48] have been adopted by many researchers to generate the raw sensor signals. Simulation is also preferable when the full bandwidth radiofrequency data is needed in studying the degradation effect of limited bandwidth [49]. For reconstructing from the sensor data to generate input to a post-processing model, direct reconstruction methods such as DAS, BP, and TR are often used, because they are computationally efficient and the artifacts can be removed from the reconstructed results during the post processing [50].

As an example, Zhang et al. generated a 3D blood vessel structure from the insight segmentation and registration toolkit (ITK) [51], and then projected it into a 2D vasculature image [27]. Subsequently, they used the k-Wave toolbox to simulate the PA raw pressure data generated by the 2D vasculature and detected by a $270^{\circ }$ sparse array. The data was later used to reconstruct images with the universal back-projection (UBP) algorithm. Lastly, the reconstructed images with artifacts are used as the input for network training. Figure 4 shows the representative workflow for generating photoacoustic image training data by Zhang et al. Similarly, k-Wave toolbox is used to generate initial raw pressure data in [19], [35], [42], [43], and [52]. As for the data source, besides the ITK toolkit, color images of the retina [53] and MRI images in Cancer Imaging Archive [54] have also been used to simulate the raw photoacoustic signals [46].

Fig. 4. An example of generating simulated training data for machine learning in limited sampling PACT reconstruction. The flow chart is adapted from [27].

Download Full Size | PDF

For more realistic simulation, optical modeling such as the Monte Carlo model is also used during signal generation to mimic the non-uniform fluence of the excitation light inside biological tissue [35]. In addition, by adding noise to the training data, the model is able to learn features of the noise and related artifacts, and can be trained to remove them from the image [45].

(2) Experiment data.

Real experiment data contains information on the imaging system and experiment environment, including noise features, exact sensor positions, and detector responses [55]. Therefore, adding experiment data to the training is beneficial for the model to generalize and process real experiment data. Phantom experiments are performed on samples that are precisely controlled in terms of geometry, absorption, speed of sound, and acoustic impedance. The prior knowledge of the phantom can be used in tuning the imaging system, developing the processing algorithm, and quantitatively comparing different machine learning models [56].

Aiming toward in vivo imaging applications, the machine learning network also benefits greatly from adding in vivo data to the training set. Research by Davoudi et al. underlines the importance of training with in vivo images acquired by full-view scanners, because the residual between the high-quality in vivo results and the predicted results are helpful for the training to achieve good performance in artifact removal [57]. They collected data with a full-ring dense array, and used the filtered back-projection (FBP) algorithm to reconstruct numerical phantom and in vivo mouse data on 64, 128, and 256 projections, which are later used as the input of their network. Their experiment data is shared for public use. In vivo mouse data is also used in [42] and [56]. In addition, Hauptmann et al. test the network with in vivo data of the human palm [45].

Compared to the high demand for data used in model training, the available photoacoustic data is usually limited, especially for the in vivo experiment data. Therefore, data augmentation methods are used. The signals and images can be scaled, rotated, and cropped into smaller patches to generate sufficient data for model development [27,43].

3.2 Sampling scheme

Many works have reported the application of machine learning methods to reconstruct or enhance PACT images under different view angles of a circular array, which typically covers view angles up to $360^{\circ }$ and contains as many as 512 or 1024 transducer elements in an imaging plane [57,58]. $60^{\circ }$ and $270^{\circ }$ detection angles are two frequently used limited-view setups in the experiment, as shown in Fig. 5(a) and (b). Guan et al. acquired simulation data of vasculature phantom from $180^{\circ }$ detection angle and the transducer element number varied from 16 to 64, as shown in Fig. 5(c) [19]. As another example of sparse sampling, Tong et al. used 32 elements of a 128-element circular array in [46]. Linear transducer arrays have also been widely used in PACT. However, due to its limited detection angle, it cannot recover the full features of the object. Therefore, Vu et al. proposed an approach that combined Wasserstein’s adversative network with gradient penalty (WGAN-GP) to reduce limited view problems and artifacts in linear array PACT [42]. Liu et al. adopted the dictionary learning technique for the linear array PACT reconstruction [18]. They used 16- and 32-element data of a 48-element linear array for sparse sampling schemes. In Table 1, we summarize the common sampling schemes in corresponding research works.

Fig. 5. Some representative limited sampling schemes. (a) $60^{\circ }$ and $135^{\circ }$ by [57]. (b) $270^{\circ }$ by [46]. (c) $180^{\circ }$ by [19].

Download Full Size | PDF

Table 1. Examples of different sampling schemes.

View Table | View all tables in this article

4. Incorporation into the image reconstruction process

Machine learning has been integrated into different stages of the PACT reconstruction process to improve the image result. Firstly, it can be used to extend the limited sampling data to approximate the fully sampled sensor data. Secondly, it can be used to develop a learned reconstruction model to achieve high-quality images from limited data. Last but not least, machine learning can be used to enhance the reconstructed images and remove the artifacts caused by limited sampling.

4.1 Extension of the limited sensor channel data

Due to the limited view and undersampling, signals from some view angles or sensor channels are missed and unknown in the data. A few techniques have been developed to estimate this part of unknown signals based on the available signals and to complete the data. The completed data can then undergo the conventional PACT reconstruction process and directly adopt the extensively-studied arsenal of processing and reconstruction techniques, in the exact same way as full-view PACT. An operator learning approach was proposed by Dreier et al. to extend the limited-view data through a statistical learning approach [59]. Based on knowledge of the image region, the operator was solved to approximately extend the limited-view wave data to the full view angles around the image region. Then, the explicit inversion formula of back projection is used for image reconstruction. Figure 6 shows an illustration of extended limited view data by the learned extension operator and corresponding reconstructed images. Currently, the learning of this data extension model needs knowledge of the imaging object. Further studies are needed to develop pre-trained models that do not need information about the object. A deep learning method to upsample the photoacoustic data channels was developed by Awasthi et al. [49]. The authors developed a U-Net model to increase the number of data channels from 100 to 200, along a ring trajectory around the object. At the same time, the signal bandwidth is broadened, and the noise is reduced. While rectified linear unit (ReLU) activation is used by the first few layers as it yields better performance, the final layer of their network uses exponential linear unit (ELU) to accommodate the negative portion of the photoacoustic signal.

Fig. 6. Illustration of the operator learning approach [59] for the limited view photoacoustic tomography.

Download Full Size | PDF

4.2 Learned reconstruction model for limited data

Learned reconstruction models can be designed specifically for the limited sampling data to transform from sensor channel data to image, and meanwhile, suppress the artifacts and achieve better image quality than direct applying conventional reconstruction methods. As the model is pre-trained, it can be computationally more convenient than iterative reconstruction.

One type of this approach is based on the structure of a conventional reconstruction model and are modified to handle limited sampling data. Schwab et al. proposed an improved algorithm for the limited sampling cases based on the UBP algorithm and machine learning [60]. In their method, adjustable weight factors are applied to the reconstruction formula and their values can be optimized, as opposed to the constant weight factors of 1 in conventional UBP. The learning aims to minimize the error between the reconstruction result and the corresponding ground truth photoacoustic sources, by finding the optimal weights in the discretized UBP formula.

Sparse representation approaches have also been used for PACT reconstruction with limited sampling data. A data-driven regularization method was proposed by Schwab et al. to solve the problem of serious ill-fitting of finite viewpoint data in photoacoustic image reconstruction [61]. In their first step, singular value decomposition (SVD) was applied to the discretized inverse matrix for specific geometry, and the small singular values were truncated to prevent the reconstruction from amplifying noise and artifacts. The truncated SVD constitutes an intermediate reconstruction of a coarse image. Secondly, a deep CNN is trained to restore the truncated coefficients and then form the residual. The intermediate reconstruction result and the residual image output by the network were summed together to provide the final result, which has better image quality than the direct inversion of the imaging model. Liu et al. adapted the dictionary learning technique to the PACT reconstruction [18]. The authors imaged different 2D frames of a 3D object with different sampling scenarios. Then, they used the K-SVD algorithm to form the dictionary based on fully sampled frames. The dictionary is then used to reconstruct the rest sparse-sampling frames. Compared with the traditional sparse transform, this method improved the mean-squared error (MSE) of the reconstructed images by 3.7 times on average in the case of 50$\%$ undersampling.

Waibel et al. used a U-Net deep neural network to reconstruct images from the raw sensor channel data [35]. Skip connections of the U-Net are modified to include anisotropic convolution layers for the channel number and time dimensions of the sensor channel data. The reconstructed images not only show fewer artifacts, but also achieve a more accurate estimation of the initial pressure. Hauptmann et al. developed a deep gradient descent (DGD) algorithm for 3D PACT reconstruction by incorporating gradient information [45]. A characteristic of this network is that the input contains both gradient information and images [Fig. 7(a)]. It is essentially a learned iterative reconstruction. Compared with the conventional iterative method which is based on total variation (TV), the DGD algorithm can achieve faster 3D reconstruction and fewer artifacts. The reconstruction results of the DGD model, U-Net post-processing, and TV are shown in Fig. 7(b), (c), and (d), respectively. Quantitative results also show that DGD has achieved a higher peak signal-to-noise ratio (PSNR) compared with the results of the U-Net post-processing method.

Fig. 7. Model-based learning in [45]. (a) Diagram of one iteration for the deep gradient descent. (b), (c) and (d) Reconstruction results of a tumor phantom by DGD, U-Net, and TV, respectively.

Download Full Size | PDF

4.3 Post-processing of reconstructed images

Researchers have also focused on post-processing techniques to remove artifacts. Image is firstly reconstructed from the limited-view and under-sampled data using conventional methods. Usually, fast and computationally-simple reconstruction methods are utilized, and the results may have serious artifacts. The reconstructed images are then passed to a network as input and undergo enhancement such as artifact removal and noise reduction. The resultant images will have improved quality and look similar to the results of fully sampled data.

Deng et al. proposed a machine learning method to generate high-quality images under the limited-view condition [58]. The images reconstructed by the DAS algorithm are put into the network for post-processing. The model takes full advantage of U-Net in preserving the key features of different scales and mitigating the artifacts. In addition, the authors also proposed using principal component analysis (PCA) method to extract image features. A set of feature bases of the photoacoustic images were constructed by applying PCA on full-view images, and then the VGG16 network [62] was used to establish the relationship between limited-view images and their PCA coefficients representation. The reconstruction results show that this model effectively suppressed the artifacts in the cases of sampling with only $1/4$ of a full ring. Zhang et al. designed a ring array deep learning network (RADL-NET), a network to specifically address the limited-view and undersampling artifacts for 3/4 sparse ring array [27]. The photoacoustic signal was generated through the k-Wave toolbox from 2D vascular structures, and then the images were reconstructed using UBP from limited sampling data. After that, the images were cropped into overlapping patches and fed to the network for training. Figure 8(a) is the flowchart of RADL-NET training. Results show that the proposed RADL-NET can significantly improve the quality of reconstructed images, compared with the compressed sensing (CS) algorithms [63] as shown in Fig. 8(b), especially for extremely sparse sampling conditions. Awasthi et al. developed another post-processing method by using a CNN to combine the image results from distinct reconstruction algorithms of the same data [64]. The deep fusion network, named PA-Fuse, uses two reconstructed images as the input. Since different reconstruction algorithms preserve different image features and display different artifacts, the network can extract the real structures of the object from the two inputs, and remove the reconstruction artifacts due to limited sampling and noise in the data. Waibel et al. compared the results of post-processing and deep learning-based direct reconstruction for recovering the initial photoacoustic pressure distribution with limited sampling data [35]. The network for direct reconstruction has a more complicated architecture because it needs additional convolutional layers in the skip connections of the U-Net to transform the time series data of sensors to the image domain. For objects with simple structures, the post-processing method has the advantages of higher processing efficiency and better accuracy.

Fig. 8. RADL-net proposed in [27]. (a) The flowchart of RADL-net training. (b) Reconstruction results of the mouse brain with different methods. Detailly, from left to right, photograph, UBP, CS with 500 iterations, and RADL-net.

Download Full Size | PDF

5. Network architecture

5.1 Single-network architecture

Many of the machine learning models use one data flow from a single input to the output, i.e., the single-network architecture. As a popular deep learning model, U-Net is widely used for the application to improve the image quality for limited sampling data.

As shown in Fig. 9(a), the pixel-wise deep learning (Pixel-DL) method by Guan et al. [19] is an effective model that combines U-Net with interpolation to process photoacoustic data with limited sampling. Time series signals of each sensor were firstly mapped to the grid-based image space by interpolating the wave propagation pixel-by-pixel. The pixel-interpolated data of all sensors were then used to reconstruct images with the TR algorithm. Then, the TR reconstructed images are used as the input of a U-Net model. Training and test data were derived from mouse brains, lungs, and phantoms. Lung and fundus vasculature experiments show that the Pixel-DL approach has a higher average PSNR than deep learning post-processing for different sparsity levels. The comparison reconstruction results of mouse brain vasculature by TR and Pixel-DL are shown in Fig. 9(b) and (c), respectively.

Fig. 9. Pixel-DL model proposed by Guan et al. [19]. (a) Pixel-wise deep learning method. (b) Reconstruction results by TR. (c) Reconstruction results by Pixel-DL.

Download Full Size | PDF

Besides applying the image reconstruction before the network processing, Antholzer et al. argue that the reconstruction and post-processing enhancement can be implemented together as a single neural network architecture [32], i .e., the reconstruction can be integrated as the first layer of the network. Adopting FBP as the first layer, researchers can use the U-Net architecture for the rest of the network for the PACT image post-processing. With ellipse phantom data, the authors were able to improve the image quality, with a relative $l^{2}$ error of 0.0087, compared with 0.1811 for the FBP method. In a similar fashion, Farnia et al. reported a method combining TR and deep learning for intra-operative photoacoustic image enhancement [52]. The first layer of their network is TR reconstruction and the other layers are U-Net. Experiment results on synthetic vessels show that the proposed model outperforms the TR algorithm with a smaller number of detectors and less data acquisition time. In [19], [27], and [57], similar single-network architectures are used, with reconstruction as the first layer.

5.2 Multiple-network architecture

In addition to the single-network architecture, some other works of literature propose models that use multiple inputs and/or stages, for additional information and improved flexibility.

Tong et al. proposed a deep learning approach that integrated the feature projection network (FP-net) and U-Net, as shown in Fig. 10(a) [46]. Pre-processing and network design used prior knowledge of the photoacoustic imaging model. Since both the channel data and its first-order partial derivative of time play key roles in the pressure reconstruction, the FP-net takes both of the two as inputs. One is the normalized signal and the other is the time derivative signal. The convolution kernel dimension is also chosen based on the imaging model and experiment setup. The feature maps from the two inputs are summed and transformed into an image. The output of FP-net is used as the input of U-Net, which is a post-processing step to improve quality of the summed image. The experiment system contains a circular transducer array with 128 elements and covers an angle of $270^{\circ }$ around the imaging objects. In FP-net processing, 32 detector signals were used for the sparse sampling conditions as shown in Fig. 10(b). Retinal blood vessel images [53] and brain dataset from The Cancer Imaging Archive [54] were used to verify the model. Compared with the FBP method and FBP+U-Net, quantitative results indicated that FP-net+U-Net has the best performance of PSNR, which are 27.69 (FP-net+U-Net), 9.94 (FBP), 26.56 (FBP+U-Net) respectively for the brain dataset. The reconstruction results of a test sample in the cancer dataset are shown in Fig. 10(c) and (d), which indicate that the reconstruction result of liver tumor by FP-net+U-Net is more clear.

Fig. 10. FP-net proposed by Tong et al. [46]. (a) Diagram of the deep learning approach. (b) Schematic of the undersampling scenario. (c) Reconstruction results by FBP+U-Net. (d) Reconstruction results by FP-net+U-Net.

Download Full Size | PDF

Vu et al. [42] proposed an approach that combined Wasserstein generative adversarial network with gradient penalty (WGAN-GP) as shown in Fig. 11 to reduce limited-view artifacts in linear array PACT. The generator network takes the reconstructed images with artifacts as the input, and outputs the processed images that approximate the artifact-free images. The discriminator was trained simultaneously to distinguish images enhanced by the generator model and the real artifact-free images. Based on the classical GAN model, WGAN-GP adds MSE and gradient information to the loss function. MSE transmits the information of image reconstruction to the loss function, and the added gradient information makes the optimization process more stable. The simulated training and test data were generated by k-Wave with a size of $256\ast 256$. Experiment results have shown that both U-Net and WGAN-GP can improve the TR reconstructed image quality. WGAN-GP has a better contrast-to-noise ratio (CNR=9.3) than U-Net (CNR=6.1) and a slightly better PSNR(26.5) than U-Net (25.7), at the cost of longer learning time.

Fig. 11. Illustration of WGAN-GP model [42].

Download Full Size | PDF

GAN has shown its advantages in the condition of extremely small view angles and small number of detectors. A GAN model for the limited view problem, named LV-GAN, was developed by Lu et al. [43]. Based on results from a hybrid dataset of both simulation and experiment data, the LV-GAN is able to reconstruct data from as small as $15^{\circ }$ view angle and achieved better accuracy than U-Net. In the case of extremely sparse sampling, Lu et al. proposed a cyclic generative adversarial network, named PA-GAN, to improve imaging with as few as 8 detector elements [56]. With two generators and two discriminators, the PA-GAN follows a cyclic training procedure and is learned in an unsupervised way. One of the generator networks is for removing artifacts from the limited sampling data, making the image a fake result of full sampling. The other generator network is for the inverse process, i.e., generating a fake limited sampling image. The two discriminators are for classifying real and fake images of the two processes. Compared with the supervised-learning U-Net method, results of 8 projections in a circle phantom have shown that the unsupervised PA-GAN has an improvement of 66$\%$ in PSNR. The improvement for LV-GAN and PA-GAN is presented in Fig. 12. It is clear that LV-GAN improves the image quality in the limited-view case of $90^{\circ }$ in Fig. 12(a) and (b). Reconstruction results of the in vivo mouse abdomen in the case of 128 projections show that PA-GAN can remove artifacts effectively in Fig. 12(c) and (d).

Fig. 12. Comparison results for LV-GAN: (a) Reconstruction results of a phantom containing vessel branches in the limited view case of $90^{\circ }$. (b) Reconstruction results of LV-GAN. Comparison results for PA-GAN: (c) Reconstruction results of in vivo mouse abdomen in the case of 128 projections. (d) Images recovered by PA-GAN.

Download Full Size | PDF

The performance of the imaging results is evaluated using different metrics, including structure similarity index measure (SSIM), peak signal-to-noise ratio (PSNR), Pearson correlation coefficient (PCC), mean absolute error (MAE), MSE, normalized 2D cross-correlation (NCC), signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), etc [6]. In our study of the literature, the SSIM and PSNR metrics are more frequently used. Computation requirements and processing time are another important aspect in evaluating the machine learning models. It has a great impact on their applications. As for the neural network models, it takes several hours for training. The training time of GAN is longer than U-Net on the same platform for the more complex network structure and more parameters [42]. Machine learning-based post-processing models and direct reconstruction models also require different training times because of their different network complexity. Once the model training is completed, image reconstruction using the pre-trained model usually only takes tens of milliseconds for a 512*512 image [56]. Guan et al. compared the reconstruction and processing time of different approaches. Post processing method requires the image to be reconstructed with conventional methods first, making it more time consuming ($\sim$2.58 s) than model-based direct reconstruction ($\sim$7.9 ms) [19]. In Table 2, we summarize the dataset, sampling scheme, input dimension for training, and quantitative performance (with PSNR metrics) of different machine learning models.

Table 2. Main characteristics of representative PACT studies.

View Table | View all tables in this article

6. Machine learning-based sparse sampling PAM

The high resolution of PAM requires dense spatial sampling. However, most lasers have a limited pulse repetition rate, which constraints the imaging speed of PAM. In the point-by-point scan process, reducing the sampling points can effectively improve the overall speed. A sparse sampling scheme in PAM is shown in Fig. 13, which corresponds to $1/4$ and $1/16$ sampling. However, compared with full scanning, the image from sparse sampling usually suffers from low resolution and aliasing. Recently, low-rank and sparse matrix recovery methods are proposed for sparse sampling PAM [65–67]. At the same time, it is worth mentioning that machine learning methods have also been demonstrated as an alternative way to recover high-quality PAM images from under-sampled data [68,69]. Sathyanarayana et al. used dictionary learning to reconstruct PAM images with 50% random sampling [70]. Their method learns the dictionary matrix from the sub-sampled data instead of fully-sampled data, so the reconstruction coefficients are solved at the same time as dictionary learning. Vu et al. proposed the use of the deep image prior (DIP) model to improve the image quality of under-sampled PAM [71]. The DIP model is designed for the restoration of undersampling or distortion in images. It uses iterative optimization to solve CNN parameters to approximate the fully sampled image, which then goes through the known undersampling operator and is compared with experiment data. The method does not need to be pre-trained, therefore it can be applied when fully sampled ground truth is not available.

Fig. 13. Illustration of a type of sparse sampling in PAM [68].

Download Full Size | PDF

To recover the dense sampling pixels from an under-sampled image with deep learning approaches, the processing algorithm needs to fill the pixels that were missed by the sampling. One of the methods is to add blocks of up-sampling and convolution to the network, for increasing the image size from the under-sampled image. With each of their up-sampling block to restore $1/2$ undersampling, Zhou et al. used different numbers of up-sampling blocks successively in their CNN-based model to deal with different amounts of undersampling, as shown in Fig. 14(a) [68]. This network uses 16 residual blocks and 8 squeeze-and-excitation (SE) blocks to extract the feature. Different amounts of up-sampling and convolutional layer blocks were added to recover high-resolution images from under-sampled data, and each block can restore $1/2$ under-sampling. A maximum of 2 up-sampling blocks, i.e. 4$\times$ scaling, was demonstrated. Instead of using the MSE of the image as the loss function, they skillfully adopted the pre-trained two feature maps from the 7th convolutional layer of VGG19 to calculate the perceptual loss, which reduces the over-smoothing of the image. Initially trained with leaf vein data, the model needs transfer learning to be applied to mouse ear data, while the model for the mouse ear can directly apply to mouse eye blood vessel imaging. In Fig. 14(c), (d), and (e), the comparison between under-sampled images, bicubic interpolated images, and model-recovered images has shown that the proposed CNN-based model could effectively improve the image quality at $1/16$ sampling rates. Methods such as zero-filling and interpolation can also be used prior to inputting the image to the network, so the image dimension does not need to change before and after the network. DiSpirito et al. compared zero-filling and bicubic interpolation (BI) methods for resizing the image, and found that zero-filling has better performance in terms of the restored image quality [69]. By changing to the desired pixel dimension before inputting the image to the network, this type of pre-processing approach is more flexible and can be readily modified to process different undersampling ratios and non-uniform sampling, and the network architecture can remain unchanged.

Fig. 14. CNN-based model in [68]. (a) Diagram of the CNN-based architecture. (b) The full-scanning image. (c) The low-sampling images at $1/16$ sampling rates. (d) The bicubic interpolated image. (e) The recovered image by the proposed CNN-based model.

Download Full Size | PDF

Deep learning methods are also used to process PAM data with even higher degree of sparsity. Zhao et al. proposed a multitask residual dense network (MT-RDN) that for the first time obtained high-quality images from under-sampled dual-wavelength PAM data [72]. Detailly, the MT-RDN has two inputs and three subnetworks. The first subnetwork was to process the input of under-sampled data acquired with low energy 532 nm excitation, and the second subnetwork was used to process 560 nm data with the same spatial undersampling. The third subnetwork was designed to combine information from both 532 nm and 560 nm. Each subnetwork is a residual dense network, with weight factors allocated to the three subnetworks to achieve the best image quality. The three subnetworks were trained with ground truth as in vivo fully sampled 532 nm data, fully sampled 560 nm data, and vasculature filter-enhanced 532 nm image, respectively. The model is able to enhance images acquired at 16$\times$ undersampling condition (4$\times$ undersampling on each scan dimension). Trained for mouse brain vasculature imaging, the multiple network approach performs better than a single network (residual dense network or U-Net), and generalizes well for mouse ear vasculature imaging. A Fully Dense U-net (FD U-Net) model was proposed by DiSpirito et al. to reconstruct PAM images with as little as 2$\%$ of the original pixels [69]. FD U-Net adopted dense connections in both the contraction and expansion paths of U-Net. Dense connections can enhance information flow, and further reduce network parameters on the premise of similar performance. The other major changes made to the FD U-Net structure are: firstly, using ReLU activation rather than ELU activation; secondly, adding a spatial dropout with a probability of 0.05 in each convolution block. These modifications are beneficial for improving learning speed and model generalization. The dataset contains 381 images of in vivo mouse brain vasculature [65]. The experiment results show that the PSNR and mean absolute error (MAE) values of FD U-Net are better than BI and U-Net at different undersampling conditions. Figure 15(a) shows the image result in the condition of 2% effective pixels, and the image results of FD U-Net and BI are shown in Fig. 15(b), and (c). Although the FD U-Net results degrade as the sampling rate decreases, the imaging results are significantly improved compared with the BI method and the imaging speed is potentially increased by 50 times.

Fig. 15. Comparison image results of FD U-Net in [69]. (a), (b), and (c) full-sampled image at downsampling ratios of 2%, results by FD-U-Net, and BI.

Download Full Size | PDF

So far all the sparse sampling PAM images are processed on the 2D maximum amplitude projection (MAP) level. However, the image size is still large, due to the high resolution and large scan field. For processing efficiency, the images are usually divided into small patches at first, and are stitched back together after enhancement. Stitching algorithms have been developed to remove the edge artifacts of the small image patches. In Table 3, we summarize the main characteristics of different methods used in limited sampling PAM.

Table 3. Main characteristics of representative sparse sampling PAM studies.

View Table | View all tables in this article

7. Conclusions and outlook

Many research works have demonstrated that machine learning provides an effective tool to deal with image degradation problems caused by limited spatial sampling in photoacoustic imaging. Depending on different system setups and sampling scenarios, as well as different imaging objects, various algorithms have been developed to remove artifacts for better anatomical visuality, improve the accuracy in locating the wave source, and recover the initial wave pressure for quantitative measurement. The application of machine learning also alleviates the strict requirements for system development and data acquisition, showing the potential for low-cost and fast imaging systems for broader applications. For objects with simple structures, the post-processing method can be tried first, for considerable improvements at a low computational cost [35]. Usually learned reconstruction model can achieve better performance for complex object structures and in vivo imaging, and may be considered if computational power and time is not the restricting issue [45]. Hsu et al. argued that it is the effective amount of information carried by the input that has a major effect on the result [50]. Strategies such as applying derivation [46] and pixel interpolation [19] can enhance the amount of input information and therefore provide better results.

Meanwhile, machine learning methods also face several challenges in practical applications, which need to be addressed in the future. (1) Due to the design variations and prototype nature of photoacoustic imaging systems, models that are learned based on data from one specific photoacoustic system cannot directly apply to another system. System setup and data collection protocol such as illumination profile and sensor arrangements all play key roles in determining the features in the images. Studies are needed to investigate the methods for generalizing a model for one system to other systems, such as the calibration and transfer learning methods [73]. (2) Another hurdle to overcome is the limited photoacoustic imaging dataset. Machine learning model development needs high-quality standardized data to optimize the parameters and validate the performance. More powerful and versatile model architectures have increased demands for larger photoacoustic imaging datasets, especially for in vivo data. In addition to the data on blood vessel structures, other anatomical structures such as the brain and tumors are also of great interest to many researchers. Simulated data can be combined with in vivo experiment data for sufficient model training. Some new applications can also use a pre-trained model of a related task and apply the transfer learning approach to address their differences to modify the model for the new task, which needs less data than developing the new model from scratch [68]. (3) Using the correct and reliable ground truth information is essential for successful model development. For in vivo imaging experiment, it is challenging to obtain the ground truth. Sometimes imaging result based on full sampling has to be used as the ground truth for training model to process limited sampling data. Imaging results of other modalities, such as MRI and X-ray angiography, may be considered to serve as the ground truth. (4) Most of the methods process the photoacoustic images on the 2D level. The inherent 3D imaging nature of photoacoustic imaging would greatly enhance its relevance and usefulness for clinical applications. Photoacoustic imaging also has the capabilities of spectral measurement and time-lapse imaging. Information from an adjacent location or another domain can potentially contribute to complementing the sampling in the 2D plane [74]. (5) For high dimensional data processing, the data volume and model complexity would increase substantially, highlighting the need for more powerful computing equipment and more efficient algorithms. (6) Machine learning studies in the context of photoacoustic imaging also face the interpretability issue, which is a universal challenge to machine learning studies. Further understanding of the model and the development process, and incorporating knowledge of the photoacoustic imaging physical model, can contribute to the design and optimization of the machine learning model.

In conclusion, the machine learning method provides new ideas for solving the bottleneck in both PACT and PAM technology development. Undoubtedly, the algorithms for reconstructing high-quality images under limited sampling conditions are well worth investigating. It also contributes to reducing the cost of system construction and developing low-cost and operator-friendly imaging equipment. In the future, there will certainly be more studies on processing the limited sampling photoacoustic data with machine learning.

Funding

Zhejiang Lab Research Funds (Grant No. 2020MC0AD01); Zhejiang Provincial Key Research and Development Program (Grant No. 2021C0305); Youth Foundation Project of Zhejiang Lab (Grant No.K2023BA0AA04); National Natural Science Foundation of China (Grant No.T2293751, Grant No.T2293752).

Acknowledgments

This work is supported by Zhejiang Lab Research Funds (Grant No. 2020MC0AD01), Zhejiang Provincial Key Research and Development Program (Grant No. 2021C0305), Youth Foundation Project of Zhejiang Lab (No. K2023BA0AA04), and National Natural Science Foundation of China (Grant No. T2293752, T2293751).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

No data were generated or analyzed in the presented research.

References

1. J. Livio and R. Hodhod, “AI Cupper: a fuzzy expert system for sensorial evaluation of coffee bean attributes to derive quality scoring,” IEEE Trans. Fuzzy Syst. 26(6), 3418–3427 (2018). [CrossRef]

2. I. van de Poel, “Embedding values in artificial intelligence (AI) systems,” Minds Mach. 30(3), 385–409 (2020). [CrossRef]

3. T. Kirchner, J. Groehl, and L. Maier-Hein, “Context encoding enables machine learning-based quantitative photoacoustics,” J. Biomed. Opt. 23(05), 1 (2018). [CrossRef]

4. Y. Chen, C. Xu, Z. Zhang, A. Zhu, X. Xu, J. Pan, Y. Liu, D. Wu, S. Huang, and Q. Cheng, “Prostate cancer identification via photoacoustic spectroscopy and machine learning,” Photoacoustics 23, 100280 (2021). [CrossRef]

5. I. A. M. Huijben, B. S. Veeling, K. Janse, M. Mischi, and R. J. G. van Sloun, “Learning sub-sampling and signal recovery with applications in ultrasound imaging,” IEEE Trans. Med. Imaging 39(12), 3955–3966 (2020). [CrossRef]

6. J. Groehl, M. Schellenberg, K. Dreher, and L. Maier-Hein, “Deep learning for biomedical photoacoustic imaging: a review,” Photoacoustics 22, 100241 (2021). [CrossRef]

7. Y. Yang, C. Feng, and R. Wang, “Automatic segmentation model combining U-Net and level set method for medical images,” Expert Syst. Appl. 153, 113419 (2020). [CrossRef]

8. J. Zhang, Y. Xie, Q. Wu, and Y. Xia, “Medical image classification using synergic deep learning,” Med. Image Anal. 54, 10–19 (2019). [CrossRef]

9. N.-K. Chlis, A. Karlas, N.-A. Fasoula, M. Kallmayer, H.-H. Eckstein, F. J. Theis, V. Ntziachristos, and C. Marr, “A sparse deep learning approach for automatic segmentation of human vasculature in multispectral optoacoustic tomography,” Photoacoustics 20, 100203 (2020). [CrossRef]

10. L. K. S. Sundar, O. Muzik, I. Buvat, L. Bidaut, and T. Beyer, “Potentials and caveats of AI in hybrid imaging,” Methods 188, 4–19 (2021). [CrossRef]

11. S. Agrawal, T. Suresh, A. Garikipati, A. Dangi, and S.-R. Kothapalli, “Modeling combined ultrasound and photoacoustic imaging: Simulations aiding device development and artificial intelligence,” Photoacoustics 24, 100304 (2021). [CrossRef]

12. L. Lin, P. Hu, J. Shi, C. M. Appleton, K. Maslov, L. Li, R. Zhang, and L. V. Wang, “Single-breath-hold photoacoustic computed tomography of the breast,” Nat. Commun. 9(1), 2352 (2018). [CrossRef]

13. L. V. Wang and J. Yao, “A practical guide to photoacoustic tomography in the life sciences,” Nat. Methods 13(8), 627–638 (2016). [CrossRef]

14. M. Mozaffarzadeh, A. Hariri, C. Moore, and J. V. Jokerst, “The double-stage delay-multiply-and-sum image reconstruction method improves imaging quality in a led-based photoacoustic array scanner,” Photoacoustics 12, 22–29 (2018). [CrossRef]

15. M. Xu and L. V. Wang, “Universal back-projection algorithm for photoacoustic computed tomography,” Phys. Rev. E 71(1), 016706 (2005). [CrossRef]

16. Y. Xu and L. V. Wang, “Time reversal and its application to tomography with diffracting sources,” Phys. Rev. Lett. 92(3), 033902 (2004). [CrossRef]

17. K. Kratkiewicz, R. Manwar, M. Zafar, S. Mohsen Ranjbaran, M. Mozaffarzadeh, N. de Jong, K. Ji, and K. Avanaki, “Development of a stationary 3d photoacoustic imaging system using sparse single-element transducers: phantom study,” Appl. Sci. 9(21), 4505 (2019). [CrossRef]

18. F. Liu, X. Gong, L. Wang, V J. Guan, L. Song, and J. Meng, “Dictionary learning sparse-sampling reconstruction method for in-vivo 3D photoacoustic computed tomography,” Biomed. Opt. Express 10(4), 1660–1677 (2019). [CrossRef]

19. S. Guan, A. A. Khan, S. Sikdar, and P. V. Chitnis, “Limited-view and sparse photoacoustic tomography for neuroimaging with deep learning,” Sci. Rep. 10(1), 8510 (2020). [CrossRef]

20. X. Lin, C. Liu, J. Meng, X. Gong, R. Lin, M. Sun, and L. Song, “Dual-foci detection in photoacoustic computed tomography with coplanar light illumination and acoustic detection: a phantom study,” J. Biomed. Opt. 23(5), 050501 (2018). [CrossRef]

21. X. Li, S. Zhang, J. Wu, S. Huang, Q. Feng, L. Qi, and W. Chen, “Multispectral interlaced sparse sampling photoacoustic tomography,” IEEE Trans. Med. Imaging 39(11), 3463–3474 (2020). [CrossRef]

22. G. Li, J. Xia, K. Wang, K. Maslov, M. A. Anastasio, and L. V. Wang, “Tripling the detection view of high-frequency linear-array-based photoacoustic computed tomography by using two planar acoustic reflectors,” Quant. Imaging Med. Surg. 5(1), 57–62 (2015). [CrossRef]

23. Y. Han, S. Tzoumas, A. Nunes, V. Ntziachristos, and A. Rosenthal, “Sparsity-based acoustic inversion in cross-sectional multiscale optoacoustic imaging,” Med. Phys. 42(9), 5444–5452 (2015). [CrossRef]

24. H. Deng, H. Qiao, Q. Dai, and C. Ma, “Deep learning in photoacoustic imaging: a review,” J. Biomed. Opt. 26(04), 1–32 (2021). [CrossRef]

25. C. Yang, H. Lan, F. Gao, and F. Gao, “Review of deep learning for photoacoustic imaging,” Photoacoustics 21, 100215 (2021). [CrossRef]

26. X. Lin, G. Leng, Y. Shen, Y. Wang, M. Zhang, and M. Sun, “Photoacoustic tomography based on a novel linear array ultrasound transducer configuration,” in 2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), (2017), pp. 1–5.

27. H. Zhang, H. Li, N. Nyayapathi, D. Wang, A. Le, L. Ying, and J. Xia, “A new deep learning network for mitigating limited-view and under-sampling artifacts in ring-shaped photoacoustic tomography,” Comput. Med. Imaging Graph. 84, 101720 (2020). [CrossRef]

28. P. Farnia, E. Najafzadeh, A. Hariri, S. N. Lavasani, B. Makkiabadi, A. Ahmadian, and J. V. Jokerst, “Dictionary learning technique enhances signal in led-based photoacoustic imaging,” Biomed. Opt. Express 11(5), 2533–2547 (2020). [CrossRef]

29. O. Lorintiu, H. Liebgott, M. Alessandrini, O. Bernard, and D. Friboulet, “Compressed sensing reconstruction of 3d ultrasound data using dictionary learning and line-wise subsampling,” IEEE Trans. Med. Imaging 34(12), 2467–2477 (2015). [CrossRef]

30. R. Fuentes, C. Mineo, S. G. Pierce, K. Worden, and E. J. Cross, “A probabilistic compressive sensing framework with applications to ultrasound signal processing,” Mech. Syst. Signal Proc. 117, 383–402 (2019). [CrossRef]

31. H. Lan, D. Jiang, F. Gao, and F. Gao, “Deep learning enabled real-time photoacoustic tomography system via single data acquisition channel,” Photoacoustics 22, 100270 (2021). [CrossRef]

32. S. Antholzer, M. Haltmeier, and J. Schwab, “Deep learning for photoacoustic tomography from sparse data,” Inverse Probl. Sci. Eng. 27(7), 987–1005 (2019). [CrossRef]

33. O. Lorintiu, H. Liebgott, and D. Friboulet, “Compressed sensing Doppler ultrasound reconstruction using block sparse Bayesian learning,” IEEE Trans. Med. Imaging 35(4), 978–987 (2016). [CrossRef]

34. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, (Springer International Publishing, Cham, 2015), pp. 234–241.

35. D. Waibel, J. Gröhl, F. Isensee, T. Kirchner, K. Maier-Hein, and L. Maier-Hein, “Reconstruction of initial pressure from limited view photoacoustic images using deep learning,” in Photons Plus Ultrasound: Imaging and Sensing 2018, vol. 10494 (SPIE, 2018), pp. 196–203.

36. S. Antholzer, M. Haltmeier, R. Nuster, and J. Schwab, “Photoacoustic image reconstruction via deep learning,” in Photons Plus Ultrasound: Imaging and Sensing 2018, vol. 10494 (SPIE, 2018), pp. 433–442.

37. J. Deng, J. Feng, Z. Li, Z. Sun, and K. Jia, “Unet-based for photoacoustic imaging artifact removal,” in Imaging and Applied Optics Congress, (Optica Publishing Group, 2020), p. JTh2A.44.

38. S. Guan, A. A. Khan, S. Sikdar, and P. V. Chitnis, “Fully dense unet for 2-d sparse photoacoustic tomography artifact removal,” IEEE J. Biomed. Health Inform. 24(2), 568–576 (2020). [CrossRef]

39. N. Awasthi, R. Pardasani, S. K. Kalva, M. Pramanik, and P. K. Yalavarthy, “Sinogram super-resolution and denoising convolutional neural network (SRCN) for limited data photoacoustic tomography,” arXiv, arXiv.2001.06434: Image and Video Processing (2020). [CrossRef]

40. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, (MIT Press, 2014), NIPS’14, pp. 2672–2680.

41. H. Lan, K. Zhou, C. Yang, J. Cheng, J. Liu, S. Gao, and F. Gao, “Ki-gan: Knowledge infusion generative adversarial network for photoacoustic image reconstruction in vivo,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, D. Shen, T. Liu, M. T. Peters, L. H. Staib, C. Essert, S. Zhou, T. P.-Yap, and A. Khan, eds. (Springer International Publishing, Cham, 2019), pp. 273–281.

42. T. Vu, M. Li, H. Humayun, Y. Zhou, and J. Yao, “A generative adversarial network for artifact removal in photoacoustic computed tomography with a linear-array transducer,” Exp. Biol. Med. 245(7), 597–605 (2020). [CrossRef]

43. T. Lu, T. Chen, F. Gao, B. Sun, V. Ntziachristos, and J. Li, “LV-GAN: a deep learning approach for limited-view optoacoustic imaging based on hybrid datasets,” J. Biophotonics 14(2), e202000325 (2021). [CrossRef]

44. H. Alqahtani, M. Kavakli-Thorne, and G. Kumar, “Applications of generative adversarial networks (GANS): an updated review,” Arch. Comput. Method Eng. 28(2), 525–552 (2021). [CrossRef]

45. A. Hauptmann, F. Lucka, M. Betcke, N. Huynh, J. Adler, B. Cox, P. Beard, S. Ourselin, and S. Arridge, “Model-based learning for accelerated, limited-view 3-d photoacoustic tomography,” IEEE Trans. Med. Imaging 37(6), 1382–1393 (2018). [CrossRef]

46. T. Tong, W. Huang, K. Wang, Z. He, L. Yin, X. Yang, S. Zhang, and J. Tian, “Domain transform network for photoacoustic tomography from limited-view and sparsely sampled data,” Photoacoustics 19, 100190 (2020). [CrossRef]

47. S. Chandramoorthi and A. K. Thittai, “Simulation of photoacoustic tomography (PAT) system in COMSOL(R) and comparison of two popular reconstruction techniques,” in Medical Imaging 2017: Biomedical Applications in Molecular, Structural, and Functional Imaging, vol. 10137 (SPIE, 2017), p. 101371O.

48. B. E. Treeby and B. T. Cox, “k-wave: Matlab toolbox for the simulation and reconstruction of photoacoustic wave fields,” J. Biomed. Opt. 15(2), 021314 (2010). [CrossRef]

49. N. Awasthi, G. Jain, S. K. Kalva, M. Pramanik, and P. K. Yalavarthy, “Deep neural network-based sinogram super-resolution and bandwidth enhancement for limited-data photoacoustic tomography,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr. 67(12), 2660–2673 (2020). [CrossRef]

50. K.-T. Hsu, S. Guan, and P. V. Chitnis, “Comparing deep learning frameworks for photoacoustic tomography image reconstruction,” Photoacoustics 23, 100271 (2021). [CrossRef]

51. G. Hamarneh and P. Jassi, “Vascusynth: Simulating vascular trees for generating volumetric image data with ground-truth segmentation and tree analysis,” Comput. Med. Imaging Graph. 34(8), 605–616 (2010). [CrossRef]

52. P. Farnia, M. Mohammadi, E. Najafzadeh, M. Alimohamadi, B. Makkiabadi, and A. Ahmadian, “High-quality photoacoustic image reconstruction based on deep convolutional neural network: towards intra-operative photoacoustic imaging,” Biomed. Phys. Eng. Express 6(4), 045019 (2020). [CrossRef]

53. J. Staal, M. Abramoff, M. Niemeijer, M. Viergever, and B. van Ginneken, “Ridge-based vessel segmentation in color images of the retina,” IEEE Trans. Med. Imaging 23(4), 501–509 (2004). [CrossRef]

54. K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips, D. Maffitt, M. Pringle, L. Tarbox, and F. Prior, “The cancer imaging archive (TCIA): Maintaining and operating a public information repository,” J. Digit. Imaging 26(6), 1045–1057 (2013). [CrossRef]

55. C. Dehner, I. Olefir, K. B. Chowdhury, D. Jüstel, and V. Ntziachristos, “Deep-learning-based electrical noise removal enables high spectral optoacoustic contrast in deep tissue,” IEEE Trans. Med. Imaging 41(11), 3182–3193 (2022). [CrossRef]

56. M. Lu, X. Liu, C. Liu, B. Li, W. Gu, J. Jiang, and D. Ta, “Artifact removal in photoacoustic tomography with an unsupervised method,” Biomed. Opt. Express 12(10), 6284–6299 (2021). [CrossRef]

57. N. Davoudi, X. L. Dean-Ben, and D. Razansky, “Deep learning optoacoustic tomography with sparse data,” Nat. Mach. Intell. 1(10), 453–460 (2019). [CrossRef]

58. H. Deng, X. Wang, C. Cai, J. Luo, and C. Ma, “Machine-learning enhanced photoacoustic computed tomography in a limited view configuration,” in Advanced Optical Imaging Technologies II, vol. 11186 International Society for Optics and Photonics (SPIE, 2019), pp. 52–59.

59. F. Dreier, S. P. Jr, and M. Haltmeier, “Operator learning approach for the limited view problem in photoacoustic tomography,” Comput. Methods Appl. Math. 19(4), 749–764 (2019). [CrossRef]

60. J. Schwab, S. Antholzer, and M. Haltmeier, “Learned backprojection for sparse and limited view photoacoustic tomography,” in Photons Plus Ultrasound: Imaging and Sensing 2019, vol. 10878 (SPIE, 2019).

61. J. Schwab, S. Antholzer, R. Nuster, G. Paltauf, and M. Haltmeier, “Deep learning of truncated singular values for limited view photoacoustic tomography,” Proc. SPIE10878, 1087836 (2019). [CrossRef]

62. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ArXivarXiv.1409.1556 (2014). [CrossRef] .

63. Z. Guo, C. Li, L. Song, and L. V. Wang, “Compressed sensing in photoacoustic tomography in vivo,” J. Biomed. Opt. 15(2), 021311 (2010). [CrossRef]

64. N. Awasthi, K. R. Prabhakar, S. K. Kalva, M. Pramanik, R. V. Babu, and P. K. Yalavarthy, “Pa-fuse: deep supervised approach for the fusion of photoacoustic images with distinct reconstruction characteristics,” Biomed. Opt. Express 10(5), 2227–2243 (2019). [CrossRef]

65. T. Liu, M. Sun, Y. Liu, D. Hu, Y. Ma, L. Ma, and N. Feng, “ADMM based low-rank and sparse matrix recovery method for sparse photoacoustic microscopy,” Biomed. Signal Process. Control 52, 14–22 (2019). [CrossRef]

66. T. Liu, M. Sun, N. Feng, M. Wang, D. Chen, and Y. Shen, “Sparse photoacoustic microscopy based on low-rank matrix approximation,” Chin. Opt. Lett. 14, 091701 (2016).

67. S. G. Sathyanarayana, Z. Wang, N. Sun, B. Ning, S. Hu, and J. A. Hossack, “Recovery of blood flow from undersampled photoacoustic microscopy data using sparse modeling,” IEEE Trans. Med. Imaging 41(1), 103–120 (2022). [CrossRef]

68. J. Zhou, D. He, X. Shang, Z. Guo, S.-L. Chen, and J. Luo, “Photoacoustic microscopy with sparse data by convolutional neural networks,” Photoacoustics 22, 100242 (2021). [CrossRef]

69. A. DiSpirito, I. I. I D. Li, T. Vu, M. Chen, D. Zhang, J. Luo, R. Horstmeyer, and J. Yao, “Reconstructing undersampled photoacoustic microscopy images using deep learning,” IEEE Trans. Med. Imaging 40(2), 562–570 (2021). [CrossRef]

70. S. G. Sathyanarayana, B. Ning, S. Hu, and J. A. Hossack, “Simultaneous dictionary learning and reconstruction from subsampled data in photoacoustic microscopy,” in 2019 IEEE International Ultrasonics Symposium (IUS), (2019), pp. 483–486.

71. T. Vu, A. DiSpirito, I. I. I D. Li, Z. Wang, X. Zhu, M. Chen, L. Jiang, D. Zhang, J. Luo, Y. S. Zhang, Q. Zhou, R. Horstmeyer, and J. Yao, “Deep image prior for undersampling high-speed photoacoustic microscopy,” Photoacoustics 22, 100266 (2021). [CrossRef]

72. H. Zhao, Z. Ke, F. Yang, K. Li, N. Chen, L. Song, C. Zheng, D. Liang, and C. Liu, “Deep learning enables superior photoacoustic imaging at ultralow laser dosages,” Adv. Sci. 8(3), 2003097 (2021). [CrossRef]

73. J. Kim, G. Kim, L. Li, P. Zhang, J. Y. Kim, Y. Kim, H. H. Kim, L. Wang, V S. Lee, and C. Kim, “Deep learning acceleration of multiscale superresolution localization photoacoustic imaging,” Light: Sci. Appl. 11(1), 131 (2022). [CrossRef]

74. H. Zhang, W. Bo, D. Wang, A. DiSpirito, C. Huang, N. Nyayapathi, E. Zheng, T. Vu, Y. Gong, J. Yao, W. Xu, and J. Xia, “Deep-e: a fully-dense neural network for improving the elevation resolution in linear-array-based photoacoustic tomography,” IEEE Trans. Med. Imaging 41(5), 1279–1288 (2022). [CrossRef]

Sampling scheme (limited view)	Reference
$270^{\circ}$	[27,46]
$180^{\circ}$	[19]
$135^{\circ}$	[57]
$120^{\circ}$	[43]
$90^{\circ}$	[43,58]
$60^{\circ}$	[43,57]
Linear array	[35,42]
Sampling scheme (sparse array)	Reference
Ring array
1/4^a of a $270^{\circ}$ 128-element array	[46]
1/4, 1/2^a of a $270^{\circ}$ 256-element array	[27]
Linear array
1/3, 1/2, 2/3^a of a 48-element linear array	[18]

Ref.	Input dimension	PSNR improvement	Sampling scheme	Dataset
[19]	340*340	44-81% over TR	16, 32, and 64 elements, $180^{\circ}$	Micro-CT Mouse Brain Vasculature,High-Resolution Fundus ImageDatabase, and ELCAP Public Lung Image Database
[27]	400*400	33-92% over UBP	64, 128, and 256 elements, 270	Insight segmentation and registration toolkit
[32]	128*128	N.A. ( $l^{2} = 0.0087$ )	30 elements, $360^{\circ}$	Ellipse digital phantom
[35]	128*128	N.A.	128 elements, linear array	Ellipse digital phantom
[42]	256*256	N.A. (PSNR= $32.1 \pm 3.14$ )	128 elements, linear array	Disk digital phantom and brain vascular database by two-photon microscopy
[43]	512*512	214-233% over UBP	$15 - 120^{\circ}$	Digital and experiment phantom with sphere and vasculature
[45]	24024080	25% over DGD	1/16 sampling (of 1024 elements, $360^{\circ}$ )	ELCAP Public Lung Image
[46]	3210241	178-315% over FBP	1/4 sampling (of 128 elements, $270^{\circ}$ )	DRIVE dataset, the Canner Imaging Archive
[52]	128*128	21% over TR	48 elements, $360^{\circ}$	DRIVE dataset
[57]	512*512	N.A. (PSNR=40.99)	8-256 elements,	Circle phantom, In vivo mouse torso
[58]	128*256	222% over DAS	$90^{\circ}$	In vivo mouse liver

Ref.	Input size	PSNR improvement	Sampling rate	Dataset
[68]	256256180	11-16% over BI	1/4,1/16	Leaf vein,
[68]	256256180	11-16% over BI	1/4,1/16	in vivo mouse ear blood vessel
[69]	128*128	8-17% over BI	0.2	In vivo mouse brain
[69]	128*128	8-17% over BI	0.2	microvasculature data
[72]	100100, 5050	7-46%	1/4,1/16	In vivo mouse brain and ear blood vessel

Sampling scheme (limited view)	Reference
$270^{\circ}$	[27,46]
$180^{\circ}$	[19]
$135^{\circ}$	[57]
$120^{\circ}$	[43]
$90^{\circ}$	[43,58]
$60^{\circ}$	[43,57]
Linear array	[35,42]
Sampling scheme (sparse array)	Reference
Ring array
1/4^a of a $270^{\circ}$ 128-element array	[46]
1/4, 1/2^a of a $270^{\circ}$ 256-element array	[27]
Linear array
1/3, 1/2, 2/3^a of a 48-element linear array	[18]

Ref.	Input dimension	PSNR improvement	Sampling scheme	Dataset
[19]	340*340	44-81% over TR	16, 32, and 64 elements, $180^{\circ}$	Micro-CT Mouse Brain Vasculature,High-Resolution Fundus ImageDatabase, and ELCAP Public Lung Image Database
[27]	400*400	33-92% over UBP	64, 128, and 256 elements, 270	Insight segmentation and registration toolkit
[32]	128*128	N.A. ( $l^{2} = 0.0087$ )	30 elements, $360^{\circ}$	Ellipse digital phantom
[35]	128*128	N.A.	128 elements, linear array	Ellipse digital phantom
[42]	256*256	N.A. (PSNR= $32.1 \pm 3.14$ )	128 elements, linear array	Disk digital phantom and brain vascular database by two-photon microscopy
[43]	512*512	214-233% over UBP	$15 - 120^{\circ}$	Digital and experiment phantom with sphere and vasculature
[45]	24024080	25% over DGD	1/16 sampling (of 1024 elements, $360^{\circ}$ )	ELCAP Public Lung Image
[46]	3210241	178-315% over FBP	1/4 sampling (of 128 elements, $270^{\circ}$ )	DRIVE dataset, the Canner Imaging Archive
[52]	128*128	21% over TR	48 elements, $360^{\circ}$	DRIVE dataset
[57]	512*512	N.A. (PSNR=40.99)	8-256 elements,	Circle phantom, In vivo mouse torso
[58]	128*256	222% over DAS	$90^{\circ}$	In vivo mouse liver

Photoacoustic imaging with limited sampling: a review of machine learning approaches

Abstract

1. Introduction

2. Overview of machine learning for photoacoustic imaging

3. Training data for machine learning-based PACT

3.1 Source of training data

3.2 Sampling scheme

4. Incorporation into the image reconstruction process

4.1 Extension of the limited sensor channel data

4.2 Learned reconstruction model for limited data

4.3 Post-processing of reconstructed images

5. Network architecture

5.1 Single-network architecture

5.2 Multiple-network architecture

6. Machine learning-based sparse sampling PAM

7. Conclusions and outlook

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (15)

Tables (3)

Biomedical Optics Express