Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Self-supervised next view prediction for limited-angle optical projection tomography

Open Access Open Access

Abstract

Optical projection tomography captures 2-D projections of rotating biological samples and computationally reconstructs 3-D structures from these projections, where hundreds of views with an angular range of π radian is desired for a reliable reconstruction. Limited-angle tomography tries to recover the structures of the sample using fewer angles of projections. However, the result is far from satisfactory due to the missing of wedge information. Here we introduce a novel view prediction technique, which is able to extending the angular range of captured views for the limited-angle tomography. Following a self-supervised technique that learns the relationship between the captured limited-angle views, unseen views can be computationally synthesized without any prior label data required. Combined with an optical tomography system, the proposed approach can robustly generate new projections of unknown biological samples and extends the angles of the projections from the original 60° to nearly 180°, thereby yielding high-quality 3-D reconstructions of samples even with highly incomplete measurement.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Tomography is a long-used technique to reconstruct the 3-D structure of objects based on a series of 2-D images [1,2]. Parallel light rays from the planar light source or cone-beam X-rays from the point source travel through the biological samples, and are attenuated by the tissues of various densities. The sensors collects the transmitted beams to compose a 2-D projection of the sample. By rotating the sample horizontally, a series of projections from different angles are obtained. Traditional tomography techniques such as filtered back projection (FBP) [3,4] and SIRT(Simultaneous iterative reconstruction technique) [5] are used to reconstruct the distribution of absorbance from the series of 2-D projections. For most practical purposes, an angular range of approximately 180° (π radian) is required to capture, in order to solve the underdetermined back-projection problem.

 Instead of recording the π-radian information, limited angle tomography techniques [68] only capture projections within a smaller angular range, and tries to recover a relatively convincing reconstruction by computational methods. Among them, [6]applied a total variation (TV) regularization to the traditional iterative reconstruction algorithm to remove the star-like artefacts caused by the missing of wedge information. [8] performed the sinogram inpainting and the reconstruction simultaneously by minimizing a joint cost function, and recovered better boundaries of the sample. Unlike the model-based approaches, deep-learning techniques have been widely applied to the field of image restoration, owing to its strong capability of inverse inference based on the learning of data priors [911]. Compared to these algebraic methods, deep learning based approaches [1216] takes advantage of the prior knowledge from a huge training dataset, thus over-perform traditional iterative reconstruction methods in the case of very-limited-angle tomography. Reference [12,14] firstly obtained a low-quality limited-angle reconstruction using the SART algorithm, and then trained a U-Net to enhance the low-quality reconstruction into a high-quality one, converting the limited-angle reconstruction problem into an image recovery task. Reference [13] directly used a neural network to transform a sinogram into its reconstruction and then adopted a U-Net to further promote the reconstruction quality. Despite that these methods may realize a better image quality with less artefacts, either a complete capture of 180◦ or a reliable 3-D sample is required to provide a reference to train the neural networks. Unfortunately, such high-quality label datasets are usually not available in the limited-angle tomography, making a conventional label data-driven deep-learning model difficult to be applied in real cases.

In the field of computer vision, view synthesis can generate photorealistic novel views of a scene, either by light field interpolation [17,18] using the densely sampled views, or from sparsely observed views by predicting the geometry and appearance representation of the scene [1921]. These techniques enlighten us with a potential solution to the angle-limited tomography. Inspired by the image-based rendering technique that takes very few 2-D images of a 3-D scene as inputs, and synthesis novel 2-D views in other directions for the same scene, we propose a self-supervised deep learning approach that combines the physical model of optical projection and the data priors to extend the angular range of the sinogram for the limited-angle tomography. Without involving either 3-D targets or complete π-radian projections, a neural network is trained using just the limited-angle projections with a self-supervised task, and generate projections at new angles incrementally. By extending the angular range of the sinograms, a far better 3-D reconstruction could be obtained with regular reconstruction algorithms.

2. Methods

2.1 Self-supervised next view prediction

We proposed a self-supervised deep learning method to predict novel 2-D projections (views) for the limited angle tomography, extending the sinogram by synthesizing views in the unseen angles. Taking as inputs a sequence of 2-D projections, the proposed next-view-prediction method (termed NVP) trains a neural network to predict pixel intensities at the next angle with a self-supervise task, eliminating the requirements of 3-D references or the complete π-radian sinograms. The pipeline of our method is shown in Fig. 1. Firstly limited-angle (about 1/3 π radian, i.e., 60° of rotation of the sample) 2-D projections are captured using a conventional optical projection tomography (OPT) imaging system (Fig. 1 a1). To compose a view-synthesis task, projections of the 60° are split into two parts, with the first part as the inputs of a neural network and the last part as the targets for reference. The neural network samples several views with an angle interval from the input part, aggregating them to predict the subsequence view in the target part (Fig. 1 a2). While teaching itself well, the neural network then predicts the novel view that are not captured in the original imaging setups. By recurrently utilizing the synthesized view, further views are predicted (Fig. 1 a3), extending the angular range of the limited angle sinogram incrementally.

 figure: Fig. 1.

Fig. 1. Self-supervised next view prediction. (a) Self-supervised training and inference. (a1) Limited-angle optical projections acquisition. A series of 2-D projections of the sample with an angular range of 60° are captured. (a2) Self-supervised training of the proposed model. The first 2/3 projections from the captured series are taken as inputs, and the last 1/3 of the projections as targets, to train a view-prediction neural network. (a3) The trained network then uses the captured series as inputs to predict the un-captured projections at the successive views, completing the π-radian sinograms. The already synthesized projections are taken as inputs to predict views at further angles incrementally. (b) The physics-based density aggregation and the view synthesis. (b1) For a 3-D point in the space, it is first projected onto each input views to find the corresponding information, which is extracted to aggregate the density of the 3-D point. (b2)A new pixel at the view to be predicted is rendered by integral the densities along a light ray, and the whole image is rendered by repeating the integral for all the parallel rays. (c) 3-D reconstruction can be obtained from the extended views together with the original views using traditional reconstructing algorithms.

Download Full Size | PDF

As shown in Fig. 1(b), our view synthesizing algorithm integrates physics-based projection geometry in the OPT imaging, where the uniform light rays from the planar light source travel through the semi-transparent biological samples, attenuated by the tissues of various densities and collected by the CMOS sensors to compose a 2-D projection of the sample. For a better illustration, we reasonably interpret the whole scene as a static sample to be imaged and a moving camera, where the camera is changing its position on a circle centered at the sample while keeping looking at the sample. Intuitively we regard a projection as the radiances of the transmitted light rays travelling perpendicularly to the current image plane. To render the intensity of a casted ray, the densities of 3-D points at the scene which intersect with the ray are required. For each 3-D point we re-project it back to each input 2-D view to get the corresponding information such as pixel intensities and high-dimension features (Fig. 1 b1). Then the neural network (specifically, several multi-layer perceptrons) is engaged to aggregate information from different views into the density of the 3-D point at the scene. The final radiance of this ray is obtained by integrating all the points along the ray according to the Beer-lambert law:

$$\begin{array}{{c}}{I = {I_0}*\;{e^{ - \mathop \smallint \nolimits_r^{} \sigma \left( {r\left( s \right)}\right)ds}}\;}\end{array}$$
where ${I_0}$ is the radiance of the planar light source, $\sigma ({r(s )} )$ is the density of a point on the ray, and $\mathop \smallint \nolimits_r^{} ds$ is the integral along the ray. In practice the 3-D points were sampled uniformed along the ray and the integral is implemented by a discretized sum. By repeating the above steps for all the parallel rays at the current angular direction, a novel projection of the sample could be rendered (Fig. 1 b2). After novel views being generated, the synthesized views together with the original captured views are used for the 3-D reconstruction by the well-established algorithms like FBP and SIRT (Fig. 1(c)). The inpainted wedge information by NVP helps reaching a much better reconstruction than that of the original limited-angle views.

2.2 Density aggregation

While a 3-D point can be re-projected to each input views to find the corresponding intensity and features of that point in each view, the relationship between the density of the 3-D point and the obtained information from each view is not perceived. Actually it is difficult to find out an analytic solution that maps these information into a density. Instead, we use a learning based approach that could automatically infer an appropriate density from the extracted information. we first filter the input projections with a pre-trained Visual Geometry Group (VGG) network, of which the feature maps of the first 4 layers are obtained as the high-dimension features. As shown in Fig. 2, for each 3-D point, it is firstly re-projected to each input view, then the pixel intensity and high-dimension features of its re-projection in a certain input view are concatenated together as a 1-D 193-channel vector (Fig. 2, “x”). To allow awareness of multi-view inputs, we calculate the mean (Fig. 2, “µ”) and the variance (Fig. 2, “σ”) of the 1-D vectors among all input views, and concatenates the mean and variance to the end of each 1-D vector. Two multi-layer perceptrons (MLPs) are designed to transfer the 1-D vectors into 1-D density features (Fig. 2, “f”) and 1-D weights vectors (Fig. 2, “ω”). Another perceptron takes as input the weighted mean (Fig. 2, “µω”) and variance (Fig. 2, “σω”) of the density features and the weights, and predicts the density (Fig. 2, “d”) of the 3-D point. Densities of another points (Fig. 2, “d2” ∼ “dN”) along this ray are obtained by repeating the above density aggregating process. By defining the appropriate objective function, the networks (i.e., 3 MLPs) can be optimized as following.

 figure: Fig. 2.

Fig. 2. Density aggregation and networks training. The dash line boxes are tensors engaged in the computation, with the shape listed in the top right. “x” is the concatenated pixel intensity and VGG features of the re-projected point, N is the number of the input views, MLP is the multiple layer perception with the numbers of units of each layer listed in the following bracket. The mean and variance operations are performed along the first dimension of the tensors. The output “d” of the last MLP is the density of one 3-D point, and densities of other points along the same ray can be obtained by the same steps. After that, the pixel intensity at the new view to be generated are calculated by ray integral. By minimizing the loss function – the difference between the predicted pixel intensity and the real pixel intensity, the parameters of MLPs are optimized.

Download Full Size | PDF

2.3 Networks training

The MLP contains several cascaded fully-connected layers. To determine their parameters properly, we utilize a projection error based optimization. The loss function is defined as the error between the pixel intensity of the rendered view and that of the target view:

$$\begin{array}{{c}}{Los{s_{proj}} = \left\| {{I_0}{\rm{*}}{e^{ - \mathop \smallint \nolimits_r^{} {P_{{\theta _p}}}\left( {F\left( {r\left( s \right)} \right)} \right)ds}} - {I_t}} \right\|}\end{array}_2^2$$
where $F({r(s )} )$ denotes the above mentioned feature extraction for a 3-D point $r(s )$ at the ray r; ${P_{{\theta _p}}}$ is the density-aggregating MLPs parameterized by ${\theta _p}$, ${I_0}$ is the intensity of the illumination, the e-exponential term means the attenuation by the predicted densities along the ray r, and ${I_t}$ is the pixel intensity of the ground truth view. The loss function is minimized by upgrading ${\theta _p}$ iteratively with the minus derivatives of $Los{s_{proj}}$ with respect to ${\theta _p}$. It is noteworthy that no 3-D labels are required during the optimization, making this approach much more accessible than other supervised methods where the volumetric information are needed for training. In our experiment, 9 sequential projections with an angle interval of 5° are used as the inputs in each iteration, to predict a novel projection at the next view. The real projection at this view is used as the target.

2.4 Dataset preparation

There are two types of OPT data in our experiment, the simulated projections of micro tubulins and the real images of a vessel-tagged zebrafish. In the tubulin experiment, 180 projections (480 × 256 pixels) equally distributed in an angular range of 180° were generated using the Astra-toolbox [22] from a 3-D tubulin image. The first 60 projections with an angular range of 60° were used in the training and testing of the NVP network.

For the zebrafish experiment, 360 projections (512 × 256 pixels each projection) of a zebrafish embryo with a rotation angle of 360° were experimentally captured using a conventional OPT imaging system, among which the first 60 projections with a rotation angle of 60° were used for the training and testing of the NVP network. To obtain the desired projection sequence, we selected 3 dpf wild type and scotch tape zebrafish embryos, added 4.5 l of 75 mg/mL NBT and 3.5 l of 50 mg/mL BCIP per mL for 10 min, labelling all the blood vessels. Each stained sample was then embed into a 0.4-0.6% agarose gel in a circular glass tube, which was immersed in a water solution to match the refractive indices. While imaging, a commercial white light source composed of a 5 × 6 LED lighting array and a slab diffuser (WorldView, Beijing, China) were used to provide uniform illumination over the entire sample. The fixed zebrafish embryo rotated along its vertical axis with a custom rotation stage. The transmitted light was collected using a 1/2” CMOS monochrome camera (EO-5012M, Edmund Optics, New Jersey, USA) and a 2X telecentric lens (REV 02, Edmund Optics, New Jersey, USA). The exposure time was 30 ms. See Supplementary Note 2 for more details.

The training dataset consists of a projection sequence with an angular range of 60° for both experiments. At the training step, 10 projections with an angle interval of 5° between each adjacent two were randomly sampled from the sequence, of which the first 9 were used as the input views and the last as the target view for prediction. While testing, the sequence was reversed (i.e., turning the original permutation (0°, 1°, 2°, …, 58°, 59°) into (59°, 58°, …, 2°, 1°, 0°)) and 10 projections were sampled in the same way as training, with the first 9 as the inputs and the last 1 as the target. The prediction of new angles went clockwise when training and counter-clockwise when testing, respectively. Note that projections of the first several angles (0°, 1°, 2°, 3°, 4°) in the original sequence can be in the inputs during training, but becomes the targets during testing, and vice versa for the last several angles (55°, 56°, 57°, 58°, 59°). Consequently, the networks established its view-predicting ability just using the limited-angle projections.

3. Result

3.1 View prediction on simulated OPT data

We firstly evaluated our methods with the simulated OPT images of 3-D micro tubulins. Projections within 60° (Fig. 3(a), original views from -30° to 30°, with the center one located at 0° for convenience) were used to train the neural networks. After being optimized, the networks took as input 9 projections uniformly sampled from the original 60° sequence, and predicted new views. The above-mentioned incremental strategy was used to extend the angular range of the views, that is, the newly synthesized views were engaged to predict further angles. For example, real projections No.1 ∼ 9 were used to predict the unseen projection No.10, then real projections No.2 ∼ 9 together with the synthesized projection No.10 were used to predict unseen projection No.11, and so on. Meanwhile the prediction can be performed simultaneously in both clockwise (Fig. 3(a), predicted views (CW) on the left) and counterclockwise (Fig. 3(a), predicted views (CCW) on the right) directions, extending the view angles at two ends. The proposed method not only predicts the perspective changing in the target angles successfully, but also maintains rich details of tubular structures in a relatively high fidelity, as indicated by the error maps between the predicted views and the real images in Fig. 3(c). Then the 3-D reconstruction of the tubulins can be obtained by the SIRT algorithm. As shown in Fig. 2(b), the reconstruction using the original and the NVP-extended views (with 160° of rotation in total), has a notably promotion as compared to that using the original limited-angle views only (with 60° of rotation in total), with most artefacts being eliminated. The topology of the structures is also assured by comparing to the reconstruction using 180 real views with an angular range of complete 180°.

 figure: Fig. 3.

Fig. 3. View prediction on synthetic OPT images of micro tubulins. (a) The original limited-angle (60°) views and the bi-directional predicted views up to the angular range of 160°. ROIs in the dash-line box were used to calculate the error maps between the predicted views and the real views. (b) The SIRT reconstruction using the limited-angle views, using the original and the NVP-extended views, and using real views of 180°, respectively. The reconstruction using 180° views is regarded as the ground truth.(c) The error map defined as the pixel-wise difference between the ROIs of the predicted views and those of the ground truths. (d) The 3-D reconstruction becomes better as more predicted views involved in the SIRT, while 2-D prediction errors accumulate due to the incremental prediction. Extra predicted views further than ±80° (160° in total together with the original angles) will not benefit the 3-D reconstruction.

Download Full Size | PDF

The incrementally predicting strategy makes it flexible to generate projections of more angles and effectively eases the task, for the model only takes into consideration of fixed relative angles in each prediction. However, the prediction error accumulated with more synthesized views involved, which has been indicated by the mean-square-error (MSE) between the predictions and the target views in Fig. 3(d). We extended the angular range of the captured projection series from the original 60° to 160°, with extra 50° in both sides, beyond which the accumulated errors of the prediction become too large to benefit the following 3-D SIRT reconstructions. This “optimal” angular range can be found by a brute-force search through all the predicted angles.

3.2 View prediction on experimental OPT data

We also evaluated our methods in the real OPT imaging experiment of a zebrafish. The raw projections represent the attenuation of transmitted lights by the tissues of the sample, thus darker pixels is more related to the stained biological structure more than brighter ones. We inverted the intensity of the images to make the brighter pixels correspond to the tagged signals. Still 60° of projections (Fig. 4(a)) were captured and used for training the networks, after which the views were extended into an optimal angular range of 120° in total (Fig. 4(b)).

 figure: Fig. 4.

Fig. 4. Novel projection predictions for OPT imaging of zebrafish. (a) Inverted 2-D projections of the zebrafish with a total angular range of 60°. The central one is set as 0° for convenience. (b) Predicted views at the next several angles. Predictions are performed along both the clockwise (2nd row) and the counter-clockwise (1st row) directions. The insets are the pixel-wise error maps between the predictions and the corresponding ground truths at each angle. Normalized root MSE between the predictions and the ground truths are used to measure the prediction error quantitatively. Due to the progressive predicting strategy, MSE accumulates as the predictions go further. (c) The 3-D reconstruction of the zebrafish. Shown are the SIRT reconstruction using the original limited-angle (60°) views, the SIRT reconstruction with TV regularization using the original views, the directional inpainting and reconstruction result using the original views, CNOPT enhancement, the SIRT reconstruction using the original and the NVP-extended projections (120°), and the SIRT reconstruction using the real views of π radian as the ground truth, respectively. (d) 3 lateral sections of the 3-D reconstructions by each methods. The PSNR and SSIM indices between the section of the reconstruction and the ground truth are shown.

Download Full Size | PDF

The 3-D reconstructions are shown in Fig. 4(c). Beside the SIRT reconstruction using the limited-angle views, we also evaluated the limited-angle SIRT reconstruction with total variation(TV) regularization [6], directional inpainting and reconstructing method [8], and the CNN-based OPT (CNOPT) [12] method on the limited angle inputs for comparison. A SIRT reconstruction using 180 real images with an angular range of 180° was regarded as the ground truth. Due to the severer missing of wedge information, structures of the fish vessels cannot be localized correctly with limited input angles. As a result, the reconstruction using only the original views has massive artefacts and distortion (Fig. 4(c), “limited-angle”, “TV”, “Directional inpainting”), which have been eliminated by a large margin when the extra predicted views take part in the reconstruction (Fig. 4(c), “NVP”). Although CNOPT may has a better reconstruction, the reconstructions using complete 180° views are required for training the U-Net in an image restoration task before it can be used for inference. In our experiment, the CNOPT was trained in a supervised way using data of another zebrafish, with the SIRT reconstructions of limited-angle (60°) as inputs and the SIRT reconstructions of complete 180° as targets. On the contrary, the proposed NVP only requires views of very limited angle to establish its angle-extending ability, and reaches a comparable quality. Similarly, the cross sections of the reconstructions has a broken contour because of a very limited angular range of input views, while the extending of the angles by our method helps to recover a more reliable contour of the sample (Fig. 4(d)). For quantitatively evaluating these methods, the peak signal-to-noise ratio (PSNR) and structure similarity (SSIM) were used, which were defined as following:

$$\begin{array}{{c}}{PSNR\left( {x,\;y} \right) = \;10{{\log }_{10}}\frac{{{L^2}}}{{\frac{1}{{HW}}{{\left\| {x - y} \right\|}_2}^2}}}\end{array}$$
$$\begin{array}{{c}} {SSIM({x,y} )= \;\frac{{({2{\mu_x}{\mu_y} + {{({0.01L} )}^2}} )({2{\sigma_{xy}} + {{({0.03L} )}^2}} )\;}}{{({{\mu_x}^2 + \;{\mu_y}^2 + {{({0.01L} )}^2}} )({{\sigma_x}^2 + \;{\sigma_y}^2 + {{({0.03L} )}^2}} )}}} \end{array}$$
where x and y are two images with height H and width W and pixel dynamic range L(e.g., 255 for 8-bit images). $\mu $, $\sigma $, and ${\sigma _{xy}}$ are the mean pixel intensity of an image, standard variance of an image, and the covariance of two images, respectively. Both the PSNR and the SSIM index between the cross section of the reconstruction and that of the reference volume indicate a substantial promotion by the proposed NVP. It is noteworthy that the recovered cross sections by CNOPT has a rotation misalignment due to the inaccuracy of the image recovering process. By contrast, the proposed NVP avoids this by utilizing the physics-based projection geometry for both density aggregation and view prediction.

4. Discussion

4.1 Incremental prediction

Our NVP algorithm takes as input a view sequence with a fixed angular range and interval, and always outputs the projection at the very next angle. The angle information is implicitly encoded by the sequence of the input. To predict more than one views, NVP generates them one after another, by taking the last predicted views into account for the synthesis of future views. As a result, the prediction error accumulates inevitably. Despite the fact that more views leading to better SIRT reconstructions, an optimal number of predicted views exists before the accumulated prediction error do harm to the SIRT reconstructions, which can be easily found by a brute-force search through all the predicted angles.

4.2 Relationship with the back projection

The conventional back projection algorithms try to recover the volumetric information by putting the intensity of the projected pixels back to its preimages in the 3-D space, in either space domain or frequency domain. Our density aggregation algorithm resembles to this process by sampling information of a 3-D point from its corresponding projections at input views. What’s different is that, we take advantage of learning based technique to infer densities from not only the projection intensities, but also the high-dimension features extracted by a pre-trained VGG network. It is the flexibility of the learning that introduces reasonable 3-D information which has been lost in the volumetric-planer projection and the limited-angle capturing.

4.3 Ability of generalization

Due to the self-supervised strategy, the proposed method can easily generalized to any other limited angle situation, simply by dividing the captured projections into training and target views, and establishing its view-predicting ability on the new occasion. It is one of the most notably advantages as compared to other supervised learning methods. What’s more, we could reasonably believe that the proposed algorithm is universal for all projection-related tomography such as X-ray computed tomography and electron tomography, simply by turning to the correct photo trajectory model, which is one of the directions for the future research of this work.

5. Conclusion

We proposed a next view prediction technique for limited-angle optical projection tomography, completing the missing wedge information of limited-angle imaging by computationally synthesize new projections at new view angles. Taking a few limited-angle projections captured by a conventional optical tomography system, the proposed NVP aggregates corresponding information from input views to infer the density of the sample, and generates geometry-consistent new projections using a physics-based ray integral. The view prediction process is implemented by several small neural networks, which are trained through a self-supervised learning task. Instead of using 3-D references or π-radian views, the proposed NVP only requires limited-angle views. It takes a part of views as inputs and the left views as the reference to establish its view-predicting ability, making it much more applicable for limited-angle cases than conventional deep-learning based approaches which require complete measurements as label datasets. As a result, the unseen views of a limited-angle optical tomography system can be computationally synthesized, and the following 3-D reconstruction could be promoted. The proposed method has been validated by the imaging experiment of simulated tubulins and a zebrafish embryo, indicating its feasibility in the limited-angle optical tomography imaging. It renders a potential tool for other tomography such as X-ray computed tomography and electron tomography by adopting appropriate optical trajectory models in the future works.

Funding

Key Technologies Research and Development Program (2017YFA0700501); National Natural Science Foundation of China (21874052, T2225014).

Acknowledgments

The authors thank Fang Zhao (School of Optical and Electronic Information, Huazhong University of Science and Technology) for the helps on maintaining the zebrafish dataset.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. D. J. Brenner and E. J. Hall, “Computed tomography — an increasing source of radiation exposure,” N. Engl. J. Med. 357(22), 2277–2284 (2007). [CrossRef]  

2. D. Huang, E. A. Swanson, C. P. Lin, J. S. Schuman, W. G. Stinson, W. Chang, M. R. Hee, T. Flotte, K. Gregory, C. A. Puliafito, and J. G. Fujimoto, “Optical coherence tomography,” Science 254(5035), 1178–1181 (1991). [CrossRef]  

3. L. A. Feldkamp, L. C. Davis, and J. W. Kress, “Practical cone-beam algorithm,” J. Opt. Soc. Am. A 1(6), 612 (1984). [CrossRef]  

4. J. D. Pack, F. Noo, and R. Clackdoyle, “Cone-beam reconstruction using the backprojection of locally filtered projections,” IEEE Trans. Med. Imaging 24(1), 70–85 (2005). [CrossRef]  

5. J. Trampert and J. J. Leveque, “Simultaneous iterative reconstruction technique: physical interpretation based on the generalized least squares solution,” J. Geophys. Res. 95(B8), 12553 (1990). [CrossRef]  

6. Z. Chen, X. Jin, L. Li, and G. Wang, “A limited-angle CT reconstruction method based on anisotropic TV minimization,” Phys. Med. Biol. 58(7), 2119–2141 (2013). [CrossRef]  

7. B. Goris, W. Van den Broek, K. J. Batenburg, H. Heidari Mezerji, and S. Bals, “Electron tomography based on a total variation minimization reconstruction technique,” Ultramicroscopy 113, 120–130 (2012). [CrossRef]  

8. R. Tovey, M. Benning, C. Brune, M. J. Lagerwerf, S. M. Collins, R. K. Leary, P. A. Midgley, and C. B. Schönlieb, “Directional sinogram inpainting for limited angle tomography,” Inverse Probl. 35(2), 024004 (2019). [CrossRef]  

9. M. Weigert, U. Schmidt, T. Boothe, A. Müller, A. Dibrov, A. Jain, B. Wilhelm, D. Schmidt, C. Broaddus, S. Culley, M. Rocha-Martins, F. Segovia-Miranda, C. Norden, R. Henriques, M. Zerial, M. Solimena, J. Rink, P. Tomancak, L. Royer, F. Jug, and E. W. Myers, “Content-aware image restoration: pushing the limits of fluorescence microscopy,” Nat. Methods 15(12), 1090–1097 (2018). [CrossRef]  

10. B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image super-resolution,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2017), 2017-July.

11. Z. Wang, L. Zhu, H. Zhang, G. Li, C. Yi, Y. Li, Y. Yang, Y. Ding, M. Zhen, S. Gao, T. K. Hsiai, and P. Fei, “Real-time volumetric reconstruction of biological dynamics with light-field microscopy and deep learning,” Nat. Methods 18(5), 551–556 (2021). [CrossRef]  

12. S. P. X. Davis, S. Kumar, Y. Alexandrov, A. Bhargava, G. da Silva Xavier, G. A. Rutter, P. Frankel, E. Sahai, S. Flaxman, P. M. W. French, and J. McGinty, “Convolutional neural networks for reconstruction of undersampled optical projection tomography data applied to in vivo imaging of zebrafish,” J. Biophotonics 12(12), e201900128 (2019). [CrossRef]  

13. Y. Wang, T. Yang, and W. Huang, “Limited-angle computed tomography reconstruction using combined FDK-based neural network and U-Net,” in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS (2020), 2020-July.

14. J. Wang, J. Liang, J. Cheng, Y. Guo, and L. Zeng, “Deep learning based image reconstruction algorithm for limited-angle translational computed tomography,” PLoS One 15(1), e0226963 (2020). [CrossRef]  

15. T. A. Bubba, G. Kutyniok, M. Lassas, M. März, W. Samek, S. Siltanen, and V. Srinivasan, “Learning the invisible: A hybrid deep learning-shearlet framework for limited angle computed tomography,” Inverse Probl. 35(6), 064002 (2019). [CrossRef]  

16. S. Barutcu, S. Aslan, A. K. Katsaggelos, and D. Gürsoy, “Limited-angle computed tomography with deep image and physics priors,” Sci. Rep. 11(1), 17740 (2021). [CrossRef]  

17. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH1996 (1996).

18. S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “Lumigraph,” in Proceedings of the ACM SIGGRAPH Conference on Computer Graphics (1996).

19. J. Flynn, M. Broxton, P. Debevec, M. Duvall, G. Fyffe, R. Overbeck, N. Snavely, and R. Tucker, “Deepview: view synthesis with learned gradient descent,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2019), 2019-June.

20. P. E. Debevec, C. J. Taylor, and J. Malik, “Modeling and rendering architecture from photographs: A hybrid geometry- And image-based approach,” in Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH1996 (1996).

21. Q. Wang, Z. Wang, K. Genova, P. Srinivasan, H. Zhou, J. T. Barron, R. Martin-Brualla, N. Snavely, and T. Funkhouser, “IBRNet: learning multi-view image-based rendering,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2021).

22. W. van Aarle, W. J. Palenstijn, J. De Beenhouwer, T. Altantzis, S. Bals, K. J. Batenburg, and J. Sijbers, “The ASTRA Toolbox: A platform for advanced algorithm development in electron tomography,” Ultramicroscopy 157, 35–47 (2015). [CrossRef]  

Supplementary Material (1)

NameDescription
Supplement 1       supplementary statement

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (4)

Fig. 1.
Fig. 1. Self-supervised next view prediction. (a) Self-supervised training and inference. (a1) Limited-angle optical projections acquisition. A series of 2-D projections of the sample with an angular range of 60° are captured. (a2) Self-supervised training of the proposed model. The first 2/3 projections from the captured series are taken as inputs, and the last 1/3 of the projections as targets, to train a view-prediction neural network. (a3) The trained network then uses the captured series as inputs to predict the un-captured projections at the successive views, completing the π-radian sinograms. The already synthesized projections are taken as inputs to predict views at further angles incrementally. (b) The physics-based density aggregation and the view synthesis. (b1) For a 3-D point in the space, it is first projected onto each input views to find the corresponding information, which is extracted to aggregate the density of the 3-D point. (b2)A new pixel at the view to be predicted is rendered by integral the densities along a light ray, and the whole image is rendered by repeating the integral for all the parallel rays. (c) 3-D reconstruction can be obtained from the extended views together with the original views using traditional reconstructing algorithms.
Fig. 2.
Fig. 2. Density aggregation and networks training. The dash line boxes are tensors engaged in the computation, with the shape listed in the top right. “x” is the concatenated pixel intensity and VGG features of the re-projected point, N is the number of the input views, MLP is the multiple layer perception with the numbers of units of each layer listed in the following bracket. The mean and variance operations are performed along the first dimension of the tensors. The output “d” of the last MLP is the density of one 3-D point, and densities of other points along the same ray can be obtained by the same steps. After that, the pixel intensity at the new view to be generated are calculated by ray integral. By minimizing the loss function – the difference between the predicted pixel intensity and the real pixel intensity, the parameters of MLPs are optimized.
Fig. 3.
Fig. 3. View prediction on synthetic OPT images of micro tubulins. (a) The original limited-angle (60°) views and the bi-directional predicted views up to the angular range of 160°. ROIs in the dash-line box were used to calculate the error maps between the predicted views and the real views. (b) The SIRT reconstruction using the limited-angle views, using the original and the NVP-extended views, and using real views of 180°, respectively. The reconstruction using 180° views is regarded as the ground truth.(c) The error map defined as the pixel-wise difference between the ROIs of the predicted views and those of the ground truths. (d) The 3-D reconstruction becomes better as more predicted views involved in the SIRT, while 2-D prediction errors accumulate due to the incremental prediction. Extra predicted views further than ±80° (160° in total together with the original angles) will not benefit the 3-D reconstruction.
Fig. 4.
Fig. 4. Novel projection predictions for OPT imaging of zebrafish. (a) Inverted 2-D projections of the zebrafish with a total angular range of 60°. The central one is set as 0° for convenience. (b) Predicted views at the next several angles. Predictions are performed along both the clockwise (2nd row) and the counter-clockwise (1st row) directions. The insets are the pixel-wise error maps between the predictions and the corresponding ground truths at each angle. Normalized root MSE between the predictions and the ground truths are used to measure the prediction error quantitatively. Due to the progressive predicting strategy, MSE accumulates as the predictions go further. (c) The 3-D reconstruction of the zebrafish. Shown are the SIRT reconstruction using the original limited-angle (60°) views, the SIRT reconstruction with TV regularization using the original views, the directional inpainting and reconstruction result using the original views, CNOPT enhancement, the SIRT reconstruction using the original and the NVP-extended projections (120°), and the SIRT reconstruction using the real views of π radian as the ground truth, respectively. (d) 3 lateral sections of the 3-D reconstructions by each methods. The PSNR and SSIM indices between the section of the reconstruction and the ground truth are shown.

Equations (4)

Equations on this page are rendered with MathJax. Learn more.

I = I 0 e r σ ( r ( s ) ) d s
L o s s p r o j = I 0 e r P θ p ( F ( r ( s ) ) ) d s I t 2 2
P S N R ( x , y ) = 10 log 10 L 2 1 H W x y 2 2
S S I M ( x , y ) = ( 2 μ x μ y + ( 0.01 L ) 2 ) ( 2 σ x y + ( 0.03 L ) 2 ) ( μ x 2 + μ y 2 + ( 0.01 L ) 2 ) ( σ x 2 + σ y 2 + ( 0.03 L ) 2 )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.