## Abstract

X-ray fluorescence tomography is based on the detection of fluorescence x-ray photons produced following x-ray absorption while a specimen is rotated; it provides information on the 3D distribution of selected elements within a sample. One limitation in the quality of sample recovery is the separation of elemental signals due to the finite energy resolution of the detector. Another limitation is the effect of self-absorption, which can lead to inaccurate results with dense samples. To recover a higher quality elemental map, we combine x-ray fluorescence detection with a second data modality: conventional x-ray transmission tomography using absorption. By using these combined signals in a nonlinear optimization-based approach, we demonstrate the benefit of our algorithm on real experimental data and obtain an improved quantitative reconstruction of the spatial distribution of dominant elements in the sample. Compared with single-modality inversion based on x-ray fluorescence alone, this joint inversion approach reduces ill-posedness and should result in improved elemental quantification and better correction of self-absorption.

© 2017 Optical Society of America

## 1. Introduction

The use of characteristic x-ray emission lines to distinguish between different chemical elements in a specimen goes back to the birth of quantum mechanics [1]. X-ray fluorescence can be stimulated by energy transfer from electron or proton beams, but the best combination of sensitivity and minimum radiation damage is provided by using x-ray absorption [2–4] for this purpose. This is usually done in a scanning microscope mode, where a small x-ray beam spot is raster-scanned across the specimen while x-ray photons are collected by an energy dispersive detector that provides a measure of the energy of each emitted photon [5]. Following early demonstrations [6], x-ray fluorescence microscopy is now commonplace in many laboratories and in particular at a wide range of synchrotron radiation light source facilities worldwide. Because the x-ray beam from synchrotron light sources is usually linearly polarized in the horizontal direction, the energy dispersive detector is usually located at a position 90° in the horizontal from the incident beam so as to be centered on the direction of minimum elastic scattering as shown in Fig. 1 (other energy dispersive detector positions can be used [7], with various relative merits [8]). Because the depth of focus of the x-ray beam is usually large compared to the specimen size, one can treat the incident x-ray beam as a pencil beam of constant diameter and thus translate and rotate the specimen to obtain a set of 2D projections from each x-ray fluorescence line for tomographic reconstruction [9,10], even for elements present at low concentration such as trace elements in biological specimens [11].

A common approach is to collect the photon counts within predetermined energy windows, or to use per-pixel spectral fitting [12], so as to get immediate elemental concentration maps. These maps are then used in conventional tomographic reconstruction schemes, such as filtered back projection and iterative reconstruction techniques [13, 14]. A more recent approach has been to use a penalized maximum likelihood estimation method on the per-pixel spectra recorded by the energy dispersive detector for improved quantification and elemental separation [15]; we refer to this as full-spectrum analysis. A separate complication involves the correction of fluorescence self-absorption, where characteristic x-ray fluorescence photons emitted from one voxel in a 3D specimen might suffer absorption in specimen material that lies between this voxel and the energy dispersive detector. There have been interesting approaches to correct for self-absorption as will be discussed below, but these approaches have not been combined with full-spectrum analysis. For these reasons, we consider here a combined approach that incorporates both full-spectrum fluorescence analysis, and transmission imaging using absorption, as part of an optimization-based approach to fluorescence tomography analysis by using a complete forward model of the x-ray imaging process, and provide an initial demonstration of our approach on a set of experimental data obtained from the Advanced Photon Source.

## 2. Fluorescence self-absorption

We begin with a simple illustration of self-absorption. Consider a specimen that consists of a 200 *μ*m diameter borosilicate glass cylinder with a 10 *μ*m diameter tungsten wire off to one side, and a 10 *μ*m gold wire off to another side (Fig. 2). The borosilicate glass was assumed to consist of 81% SiO_{2}, 13% B_{2}O_{3}, 3.5% Na_{2}O, 2% Al_{2}O_{3}, and 0.5% K_{2}O, with a density of 2.23 g/cm^{3}. If a chromatic x-ray focusing optic like a Fresnel zone plate is used to produce the x-ray pencil beam, a monochromatic x-ray beam should be used for scanning and its energy might be set to 12.1 keV to be well-separated in energy from the Au *Lβ*_{1} fluorescence line at 11.4 keV. As the specimen is rotated, one obtains x-ray transmission (XRT) sinograms based on the attenuation of 12.1 keV x rays in Si, W, and Au as shown at right in Fig. 2. The situation with the x-ray fluorescence (XRF) signal is different; when the x-ray beam is at the right edge of the Si cylinder, the Si *Kα*_{1} x-rays with 1.74 keV photon energy will have to traverse nearly 200 *μ*m of Si before they reach the XRF detector located at left (Fig. 1). Since Table 1 shows that the absorption length of 1.74 keV x-rays is 1.66 *μ*m in glass, Si x-ray fluorescence in particular will be strongly self-absorbed. The fraction of the signal reaching the XRF detector is only exp[−200/1.66] ≃ 5 × 10^{−53} so that essentially none of the Si XRF signal is detected in this case. In fact, only the Si XRF signal from the side nearest to the XRF detector is registered, so that from the Si XRF signal one cannot distinguish between a solid Si cylinder versus one that is hollowed out as shown in the bottom row of Fig. 2. The Au and W XRF signals can better traverse through the Si cylinder to reach the XRF detector, and moreover the 12.1 keV incident beam is also only partly absorbed so by combining all of these signals one can indeed distinguish between a solid and hollow Si cylinder.

Correction for the self-absorption effect can be achieved from either algorithmic or experimental acquisition perspectives. One possibility to minimize data acquisition time is to use a half rotation/full translation scanning method [16]. From an algorithmic perspective, several methods have been used to correct for the self-absorption effect, including earlier approaches used for radionuclide emission tomography [17–19]. If one can measure the transmission sinograms of the specimen at the energies of all x-ray fluorescence lines, it is possible to correct for self absorption [20]. However, this approach is exceedingly difficult to realize experimentally, since a large number of x-ray fluorescence lines are present in many specimens and one would need to collect a transmission tomography dataset at each of these energies. In the case of uniform absorption and illumination at a single x-ray energy, analytical approaches have been developed [21, 22] and these have been shown [14] to provide a good starting point for the iterative methods we now describe. One approach is to use algebraic (rather than filtered back projection) reconstruction methods to better handle limited rotational sampling, and least-squares fitting to better handle quantum noise [23]; other approaches have used ordered-subsets expectation maximization [24]. In more recent work, optimization approaches have been introduced where the transmission tomography data at a single x-ray energy was used to estimate the absorption at all x-ray fluorescence energies using the fact that (in the absence of x-ray absorption edges) x-ray absorption scales in a power-law relationship with x-ray energy [25–27]. One can also add the Compton scattered signal as another measurement of overall specimen electron density, and use the tabulated absorption coefficients
${\mu}_{e}^{E}$ of all elements *e* at each fluorescence energy *E* [28]. Other approaches classify the specimen as being composed of a finite number of material phases for the calculation of self-absorption [29]. The optimization approaches in particular serve as inspiration for our approach, which we believe is unique in combining both full-spectrum analysis and transmission imaging along with fluorescence.

We begin by briefly describing the mathematical “forward models” of XRF and XRT. Next, we detail our joint reconstruction approach and the formulation of the objective function and corresponding optimization algorithm. We then discuss choices of scaling parameters in the numerical implementation of the algorithm and present the performance of our joint inversion compared with existing approaches on real datasets.

## 3. Mathematical model

We start from an earlier approach [30], which we extend considerably here to include a different model of XRF self-absorption effect and the ability to better balance differences in variability of acquired data. We use *θ* ∈ Θ and
$\tau \in \mathcal{T}$ to denote, respectively, the index of the x-ray beam angle and discretized beamlet from a collection of |Θ| angles and
$|\mathcal{T}|$ beamlets. The set
$\mathcal{V}$ denotes the collection of
$|\mathcal{V}|$ spatial voxels used to discretize the reconstructed sample. By
$\mathbf{L}=[{L}_{v}^{\theta ,\tau}]$, we denote the intersection lengths (in cm) of beamlet (*θ*, *τ*) with the voxel
$v\in \mathcal{V}$. We use *ℇ* to denote the collection of |*ℇ*| possible elements and
${\mu}_{e}^{E}$ to denote the mass attenuation coefficient (in cm^{2}g^{−1}) of element *e* at beam incident energy *E*. Our goal is to recover
$\mathcal{W}=[{\mathcal{W}}_{v,e}]$, the concentration (in g cm^{−3}) of element *e* in voxel *v*.

#### 3.1. XRT imaging model

A conventional way (see, e.g., [31]) to model the XRT projection
${\tilde{F}}_{\theta ,\tau}^{T}$ (in counts/sec) of a sample from beamlet (*θ*, *τ*) is

*I*

_{0}is the incident x-ray intensity (in counts/sec) and ${\tilde{\mathit{\mu}}}^{E}=[{\tilde{\mu}}_{v}^{E}]$ is the linear attenuation absorption coefficient (in cm

^{−1}) at incident energy

*E*.

To better explore the correlation of XRF and XRT and to link these two data modalities by the common variable $\mathcal{W}$, we note that the coefficients ${\tilde{\mathit{\mu}}}^{E}$ depend on $\mathcal{W}$ by way of ${\tilde{\mu}}_{v}^{E}={\displaystyle \sum _{e}{\mathcal{W}}_{v,e}{\mu}_{e}^{E}}$. Incorporating this fact in Eq. (1), we obtain a new XRT forward model based on $\mathcal{W}$

To obtain a simple proportional relationship, we divide both sides of Eq. (2) by *I*_{0} and then take the logarithm to obtain the XRT forward model used in this work:

We similarly take the logarithm of the raw XRT sinograms used in this paper.

#### 3.2. XRF imaging model

Our discrete model follows an elemental approach, in the sense that we model the XRF energy emitted from a single elemental atom by its corresponding elemental unit spectrum. We first apply spectral blurring to each unit spectrum according to the detector’s response function. Then, justified by the fact that photon counts are additive, the total XRF spectrum detected from the given specimen model comes in after which is estimated as a weighted sum of the unit spectra of the elements being recovered.

First, we model the net XRF intensity *I _{e,ℓ,s}*, which corresponds to the characteristic XRF energy

*E*emitted from element

_{e}*e*, by Sherman’s equation [32] up to first order (i.e., neglecting effects such as Rayleigh and Compton scattering):

*c*is the total concentration of element

_{e}*e*(

*c*= 1 in the case of our unit spectra),

_{e}*ω*is the XRF yield of

_{e,ℓ}*e*for the spectral line

*ℓ*, and

*r*is the probability that a shell

_{e,s}*s*electron (rather than other shell electrons) will be ejected.

In our calculations, the quantity
${\omega}_{e,\ell}\left(1-\frac{1}{{r}_{e,s}}\right){\mu}_{e}^{E}$ is approximated by the XRF cross sections provided from xraylib [33]. Next, we convert the intensity to a spectrum by incorporating the practical experimental environment. Given an energy-dispersive XRF detector with energy channels *x _{i}*,
$i=1,\dots ,|\mathcal{I}|$, we define an indicator function

Then we have the ideal, delta-function peak
${\mathbf{I}}_{e,l,s}^{\mathbf{x}}={I}_{e,l,s}{\mathbf{1}}_{{E}_{e}}^{\mathbf{x}}$. In practice, because of the detector energy resolution [2], discrete x-ray lines get broadened by a Gaussian distribution with a standard deviation *σ*. The resulting unit spectrum of element *e* is thus given by
${\mathbf{M}}_{e}={\displaystyle \sum _{\ell ,s}{\mathbf{M}}_{e,\ell ,s}}$, where

^{−1}) is the (inverse) Fourier transform. To simplify the model, we consider only the

*K*,

_{α}*K*,

_{β}*L*,

_{α}*L*, and

_{β}*M*lines as tabulated [34].

_{α}We then model the total XRF spectrum of a sample with multiple elements by explicitly considering the attenuation of the beam energy and the self-absorption effect of the XRF energy. We represent the attenuation experienced by beamlet (*θ*, *τ*) (at incident beam energy *E*) as it travels toward voxel *v* by

*X*and ${\mathcal{U}}_{v}^{\theta ,\tau}$ is the set of voxels that are intersected by beamlet (

*θ*,

*τ*) before it enters voxel

*v*.

We let
${P}_{v,e}^{\theta ,\tau}(\mathcal{W})$ be the attenuation of XRF energy emitted from element *e* at voxel *v* by beamlet (*θ*, *τ*). To reduce the complexity of the calculation, instead of tracking all the emitted photons isotropically, we consider only the emission from the region Ω* _{v}*. This region is the part of the sample discretization that intersects the pyramid determined by the centroid of the voxel

*v*and the XRF detector endpoints; see Fig. 1 for a 2D illustration. In a slight abuse of notation, we let

*v*″ ∈ Ω

*indicate that the centroid of voxel*

_{v}*v*″ is contained in the region Ω

*. Then the self-absorbed XRF energy is approximated by*

_{v}*a*(Ω) is the volume of Ω (or area of Ω for a 2D sample) and ${\mu}_{{e}^{\prime}}^{{E}_{e}}$ is the linear attenuation coefficient of element

*e*′ at the XRF energy

*E*of element

_{e}*e*. Accordingly, the fluorescence spectrum ${\mathbf{F}}_{\theta ,\tau}^{R}$ (in counts/sec) of the sample resulting from beamlet (

*θ*,

*τ*) is the $|\mathcal{I}|$-dimensional vector

## 4. Optimization-based reconstruction formulations and algorithms

We take ${D}_{\theta ,\tau}^{\tilde{T}}\in \mathbb{R}$ and ${\mathbf{D}}_{\theta ,\tau}^{R}\in {\mathbb{R}}^{|\mathcal{I}|}$ to denote the experimental data for XRT and XRF, respectively. We now solve reconstruction problems involving the models ${F}_{\theta ,\tau}^{T}(\mathcal{W})$ and ${\mathbf{F}}_{\theta ,\tau}^{R}(\mathcal{W})$. Given that both these data sources are derived from measured photon counts, we follow a maximum likelihood approach that assumes the measurements are subject to independent Poisson noise [35, 36]. First, we take a logarithm of ${D}_{\theta ,\tau}^{\tilde{T}}$ and work with ${D}_{\theta ,\tau}^{T}=-\mathrm{ln}({D}_{\theta ,\tau}^{\tilde{T}}/{I}_{0})$. Maximizing the likelihood (derived in App. A) for our joint inverse problem then can be written as

where the non-negativity constraint $\mathcal{W}\ge \mathbf{0}$ is enforced to respect the physical nature of mass;
${\tilde{\varphi}}^{R}$ and
${\tilde{\varphi}}^{T}$ correspond to the XRF and XRT objective terms, respectively; and *β*_{1} ≥ 0, *β*_{2} ≥ 0 are scaling parameters. The scaling parameter *β*_{1} balances the ability of each modality to fit the data, and *β*_{2} detects the relative variability between the data sources
${\mathbf{D}}_{\theta ,\tau}^{R}$ and
${D}_{\theta ,\tau}^{T}$.

Advances in x-ray sources, optics, and detectors mean that the datasets to be analyzed can be large; thus, having a fast and memory-efficient algorithm to solve (8) is highly desirable. Therefore, we apply an alternating direction approach described in Alg. 1. In this approach, instead of directly minimizing Eq. (8), we first solve an “inner iteration” subproblem:

with*i*of Alg. 1.

To approximately solve Eq. (9), we adapt a form of the inexact truncated Newton (TN) method in [37]. We write TN as a function of the form

which applies*k*iterations of TN to the problem in Eq. (9) with initial guess ${\mathcal{W}}^{i}$ to obtain ${\mathcal{W}}^{i+1}$. In particular, we use a bound-constrained preconditioned conjugate gradient obtain the search direction, followed by a backtracking line search (see Alg. 2) to obtain the next iterate ${\mathcal{W}}^{i+1}$. The process is repeated until desired stopping criteria are satisfied; in Alg. 1 we repeat until consecutive iterates are within a user-specified distance

*ϵ*of one another. Since the focus of this work is to show the potential of joint inversion with multimodal data, future work will address convergence analysis of Alg. 1.

## 5. Experimental reconstruction

We now demonstrate the benefit of the proposed joint inversion approach by using experimental data. We constructed a simple test sample consisting of a borosilica glass rod with a composition as described in Sec. 2, wrapped with a W wire of 10 *μ*m diameter, and a Au wire of 10 *μ*m diameter. This test specimen was scanned by an incident beam energy of 12.1 keV at beamline 2-ID-E at the Advanced Photon Source at Argonne National Laboratory. Each projection was acquired by raster scanning horizontally and vertically with a 200 nm scanning step size and 73 scanning angles over an angular range of 360°, with the data stored as HDF5 files with metadata included [38]. At each scanning step, the transmission signal was acquired using a four-segment charge-integrating silicon drift detector [39], while the fluorescence signals were collected by using an energy-dispersive detector (Vortex ME-4) located at 90° relative to the incident beam, covering an energy range of 0–20 keV with 2,000 energy channels. We then sum the signal from the 4 transmission detector elements without distinguishing the fact that each element sees a slightly different angle to obtain the final data; this results in 73 projections, with each slice involving 1, 750 × 51 pixels, and leads to a dataset of dimension 73 × 1, 750 × 51 × 2, 000.

As expected from the attenuation lengths given in Table 1 and the simulations shown in Fig. 2, this dataset shows strong self-absorption in the Si fluorescence measurements. We compare our reconstruction result with the output of TomoPy 0.1.15 [40], a widely used tomographic data-processing and image reconstruction library. TomoPy takes the elemental concentration map decomposed from the raw spectrum by the program MAPS 1.2 [12] for improved photon statistics compared with the raw data. The MAPS program fits the full energy spectrum recorded at each scan to a set of x-ray fluorescence peaks plus background signals, and it returns a 2D dataset corresponding to a certain elemental concentration per scan. Figure 3 shows three elemental XRF sinograms of interest as calculated by this approach. Within TomoPy, we used a maximum likelihood expectation maximization algorithm to reconstruct the three sinograms separately. All numerical experiments are performed on a platform with 1.5 TB DDR3 memory and four Intel E7-4820 Xeon CPUs.

For the purposes of algorithm testing with reduced computational cost, we reconstructed only a 2D middle slice (*x*, *y*) of the 3D glass rod dataset (*x*, *y*, *z*). For each angle, we summed together 9 adjacent *y* slices of both XRF spectra and XRT measurements as the input experiment data. Furthermore, instead of using the full 2000 energy channels, we pick 40 energy channels around each of the 20 emission lines provided by xraylib for each element’s *K _{α}*,

*K*,

_{β}*L*,

_{α}*L*, and

_{β}*M*lines; this process returns a total of 800 energy channels to be considered in the reconstruction. Therefore, in our illustration, |

_{α}*θ*| = 73, |

*τ*| = 195, |

*ℇ*| = 800, and $|\mathcal{V}|=(|\tau |)\times (|\tau |)$.

#### 5.1. Selection of β values

Next, we explain one way to select values for *β*_{1} and *β*_{2} for use in the objective in Eq. (8). We recall that the effect of *β*_{2} is to balance the magnitudes of the XRF and XRT measurement data, and that its exact value is not critical. Therefore, according to the magnitude difference of two data sources shown in Fig. 4, we chose to use *β*_{2} = 100 to balance this difference so that both measurements have maxima near 15 in their respective units. The selection of *β*_{1} is accomplished by applying the so-called “L-curve method” [41]. In Fig. 5, we plot the L-curve defined by the curve of XRT terms
${\tilde{\varphi}}^{T}$ versus XRF terms
${\tilde{\varphi}}^{R}$ obtained from solving Eq. (8) with different *β*_{1} values. This curve displays the tradeoffs between these two modalities and provides an aid in choosing an appropriate balancing parameter *β*_{1}. The curvature, defined as the curvature of a circle drawn through three successive points on the L-curve, is calculated and plotted in Fig. 6. As suggested by [41], we choose the point on the L-curve with the maximum curvature; according to the objective values of
${\tilde{\varphi}}^{R}$ and
${\tilde{\varphi}}^{T}$ shown in Fig. 5 and the curvature shown in Fig. 6, this is *β*_{1} = 1. Figure 10 shows the reconstructed elemental maps corresponding to different *β*_{1} values. For the two extreme cases *β*_{1} = 0.001 and *β*_{1} = 100, the reconstruction results resemble the single-modality reconstructions. However, there is no significant difference between the results returned for *β*_{1} ∈ [0.25, 10] based on our prior knowledge of the sample. This lack of sensitivity to precise balancing parameter values indicates robustness for practical utilization.

#### 5.2. Joint inversion results

Given
${\mathcal{W}}^{0}=\mathbf{0}$ as the initial guess for the joint inversion, Fig. 7 shows the reconstruction result for each element by using Alg. 1 with *ϵ* = 10^{−6}. In particular, Fig. 8 shows the performance of the inner iteration by TN to reduce both the XRF and XRT objective values. Correspondingly, Fig. 9 shows the reconstructed result of each outer iteration of Alg. 1. The reconstructed elemental maps show the benefits of our joint inversion mainly from two perspectives. First, because of the imperfections of spectral fitting and background rejection, the decomposed elemental concentrations show certain artifacts—which we call the “elemental crosstalk ”—where certain elemental sinograms pick up other elements’ signals. For example, according to our knowledge of the sample, we know that Si exists only in the rod part with a cylinder shape; but its corresponding sinogram from MAPS (Fig. 3) would suggest that it also exists in the two wires; this is caused by imperfect data fitting. Those two extra curves in the sinogram are actually picked up from Au and W signals because certain emission lines of Au and W overlap those of Si. As a result, the reconstruction from TomoPy based on these decomposed sinograms will contain the “elemental-crosstalk points,” which are shown as two small dots around Si and a dot in the W map in the left bottom corner of Fig. 7. Comparing the results from XRF single inversion using our forward model with the TomoPy output, we see that our forward model is able to better distinguish the different elemental signals; that is, the “elemental crosstalk” is greatly mitigated. Furthermore, by introducing the XRT modality into the reconstruction, the joint inversion not only suppressed the artifacts from the “elemental crosstalk” introduced by the imperfect fitting and background rejection, but also more accurately recovered Si by filling the inside of the cylinder and thereby correcting the self-absorption effect. Also, we provide a quantity evaluation of our reconstruction. We simulate the XRF spectrum based on the reconstructed elemental composition and compare it to the real experimental data as shown in Fig. 11. We can see that, except the background region that our forward model does not include to simulate, the essential peaks corresponding to the main elements we are interested to cover agree very well with the experimental data. This comparison indicates that not only our joint reconstruction improves the solution from a visualization perspective, but also from quantification point of view. Furthermore, it indicates a satisfying accuracy of our XRF forward model.

One limitation of our method is the high requirement of computing resources for generating the forward mapping tensor incorporating the known mass attenuation coefficients. Overall, the reconstruction time spent on this particular test is 4301 seconds plus a couple of hours on generating the forward mapping. The main bottleneck is in the large memory requirement and the calculation of the self-absorption term. Parallelizing our TN solver and projections/beamlets, or using multilevel schemes [42], would also accelerate the performance. This will be the focus of our future work.

## 6. Conclusion

Guided by the multimodal analysis methodology developed in [30], we apply a joint-inversion framework to solve XRF reconstruction problem more accurately by incorporating a second data modality as XRT. We investigate the correlations between XRF and XRT data, and establish a link between datasets by reformulating their models so that they share a common set of unknown variables. We develop an iterative algorithm by alternatively maximizing a Poisson likelihood objective to estimate the unknown elemental distribution, and then updating the self-absorption term in the forward model. The initial demonstration presented in the paper show that when facing strong self-absorption effects, significant improvements are achieved by performing joint inversion. Furthermore, because of the improved accuracy provided by our XRF forward model, the artifacts arising from the “elemental crosstalk” are greatly mitigated.

The bottleneck of the current code is in its extensive memory requirement to evaluate large-scale tensor product involving raw spectra with many energy channels. In the future, we expect that larger-size problems will be achievable once we move beyond our prototype code. Parallelizing our TN solver and projections/beamlines, or using multilevel schemes [42] would also accelerate the performance. On the other hand, we will explore the combination of our method with different type of data acquisition (e.g., suggested in [16]) to achieve better performance.

## Appendix: Maximum likelihood derivation

We assume that the measurement data are independent and that each measurement *D _{j}* follows a Poisson distribution with mean
${F}_{j}(\mathcal{W})$. The likelihood for any

*D*is then

_{j}By the assumed independence of the measurements, the joint likelihood is $\prod _{j}f({D}_{j};{F}_{j}(\mathcal{W}))$.

The problem of maximizing the log likelihood is thus

Since each *D _{j}* is a scalar (independent of
$\mathcal{W}$), it is therefore equivalent to solve

Our approach requires first-order derivatives, which are easily derived in the Poisson noise setting. For a particular (voxel *v*, element *e*) pair, the first-order derivative of (10) with respect to the concentration
${\mathcal{W}}_{v,e}$ is

The calculation of the remaining derivatives $\frac{\partial}{\partial {\mathcal{W}}_{v,e}}{F}_{j}(\mathcal{W})$ is described in our previous paper [30].

## Funding

Department of Energy (DOE) (DE-AC02-06CH11357); National Institutes of Health (NIH) (R01 GM104530).

## Acknowledgments

The work of YPH and CJ was supported in part by the National Institutes of Health. The authors are grateful to Doga Gursoy and Stefan Vogt for their help with and support of the TomoPy and MAPS codes.

## References and links

**1. **H. Moseley, “Atomic models and x-ray spectra,” Nature **92**, 554 (1914). [CrossRef]

**2. **C. J. Sparks Jr., “X-ray fluorescence microprobe for chemical analysis,” in “*Synchrotron Radiation Research*,” H. Winick and S. Doniach, eds. (Plenum Press, 1980), 459–512. [CrossRef]

**3. **J. Kirz, “Specimen damage considerations in biological microprobe analysis,” Scan Electron Microsc. **2**, 239–249 (1979).

**4. **L. Grodzins, “Intrinsic and effective sensitivities of analysis by x-ray fluorescence induced by protons, electrons, and photons,” Nucl. Instrum. Methods Phys. Res., Sect. A **218**, 203–208 (1983). [CrossRef]

**5. **V. Cosslett and P. Duncumb, “Micro-analysis by a flying-spot x-ray method,” Nature **177**, 1172–1173 (1956). [CrossRef]

**6. **P. Horowitz and J. A. Howell, “A scanning x-ray microscope using synchrotron radiation,” Science **178**, 608–611 (1972). [CrossRef] [PubMed]

**7. **C. G. Ryan, R. Kirkham, R. M. Hough, G. Moorhead, D. P. Siddons, M. D. de Jonge, D. J. Paterson, G. De Geronimo, D. L. Howard, and J. S. Cleverley, “Elemental x-ray imaging using the Maia detector array: The benefits and challenges of large solid-angle,” Nucl. Instrum. Methods Phys. Res., Sect. A **619**, 37–43 (2010). [CrossRef]

**8. **Y. Sun, S.-C. Gleber, C. Jacobsen, J. Kirz, and S. Vogt, “Optimizing detector geometry for trace element mapping by x-ray fluorescence,” Ultramicroscopy **152**, 44–56 (2015). [CrossRef] [PubMed]

**9. **P. Boisseau and L. Grodzins, “Fluorescence tomography using synchrotron radiation at the NSLS,” Hyperfine Interact. **33**, 283–292 (1987). [CrossRef]

**10. **R. Cesareo and S. Mascarenhas, “A new tomographic device based on the detection of fluorescent x-rays,” Nucl. Instrum. Methods Phys. Res., Sect. A **277**, 669–672 (1989). [CrossRef]

**11. **M. D. de Jonge and S. Vogt, “Hard x-ray fluorescence tomography - an emerging tool for structural visualization,” Curr. Opin. Struct. Biol. **20**, 606–614 (2010). [CrossRef] [PubMed]

**12. **S. Vogt, “MAPS: A set of software tools for analysis and visualization of 3D x-ray fluorescence data sets,” J. Phys. IV **104**, 635–638 (2003).

**13. **A. Muñoz-Barrutia, C. Pardo-Martin, T. Pengo, and C. Ortiz-de Solorzano, “Sparse algebraic reconstruction for fluorescence mediated tomography,” Proc. SPIE **7446**, 744604 (2009). [CrossRef]

**14. **E. X. Miqueles and A. R. De Pierro, “Iterative reconstruction in x-ray fluorescence tomography based on Radon inversion,” IEEE Trans. Med. Imaging **30**, 438–450 (2011). [CrossRef]

**15. **D. Gürsoy, T. Biçer, A. Lanzirotti, M. G. Newville, and F. De Carlo, “Hyperspectral image reconstruction for x-ray fluorescence tomography,” Opt. Express **23**, 9014–9023 (2015). [CrossRef] [PubMed]

**16. **P. J. L. Rivière, P. Vargas, M. Newville, and S. R. Sutton, “Reduced-scan schemes for x-ray fluorescence computed tomography,” IEEE Trans. Nucl. Sci. **54**, 1535–1542 (2007). [CrossRef]

**17. **L.-T. Chang, “A method for attenuation correction in radionuclide computed tomography,” IEEE Trans. Nucl. Sci. **25**, 638–643 (1978). [CrossRef]

**18. **J. Nuyts, P. Dupont, S. Stroobants, R. Benninck, L. Mortelmans, and P. Suetens, “Simultaneous maximum a posteriori reconstruction of attenuation and activity distributions from emission sinograms,” IEEE Trans. Med. Imaging **18**, 393–403 (1999). [CrossRef] [PubMed]

**19. **H. Zaidi and B. Hasegawa, “Determination of the attenuation map in emission tomography,” J. Nucl. Medicine **44**, 291–315 (2003).

**20. **J. P. Hogan, R. A. Gonsalves, and A. S. Krieger, “Fluorescent computer-tomography - a model for correction of x-ray absorption,” IEEE Trans. Nucl. Sci. **38**, 1721–1727 (1991). [CrossRef]

**21. **P. J. La Rivière, “Approximate analytic reconstruction in x-ray fluorescence computed tomography,” Phys. Med. Biol. **49**, 2391–2405 (2004). [CrossRef] [PubMed]

**22. **E. X. Miqueles and A. R. De Pierro, “Exact analytic reconstruction in x-ray fluorescence CT and approximated versions,” Phys. Med. Biol. **55**, 1007–1024 (2010). [CrossRef] [PubMed]

**23. **T. Yuasa, M. Akiba, T. Takeda, M. Kazama, A. Hoshino, Y. Watanabe, K. Hyodo, F. A. Dilmanian, T. Akatsuka, and Y. Itai, “Reconstruction method for fluorescent x-ray computed tomography by least-squares method using singular value decomposition,” IEEE Trans. Nucl. Sci. **44**, 54–62 (1997). [CrossRef]

**24. **Q. Yang, B. Deng, W. Lv, F. Shen, R. Chen, Y. Wang, G. Du, F. Yan, T. Xiao, and H. Xu, “Fast and accurate x-ray fluorescence computed tomography imaging with the ordered-subsets expectation maximization algorithm,” J. Synchrotron Radiat. **19**, 210–215 (2012). [CrossRef] [PubMed]

**25. **C. G. Schroer, “Reconstructing x-ray fluorescence microtomograms,” Appl. Phys. Lett. **79**, 1912 (2001). [CrossRef]

**26. **P. J. La Rivière, D. Billmire, P. Vargas, M. Rivers, and S. R. Sutton, “Penalized-likelihood image reconstruction for x-ray fluorescence computed tomography,” Opt. Eng. **45**, 077005 (2006). [CrossRef]

**27. **Q. Yang, B. Deng, G. Du, H. Xie, G. Zhou, T. Xiao, and H. Xu, “X-ray fluorescence computed tomography with absorption correction for biomedical samples,” X-ray Spectrom. **43**, 278–285 (2014). [CrossRef]

**28. **B. Golosio, A. Simionovici, A. Somogyi, L. Lemelle, M. Chukalina, and A. Brunetti, “Internal elemental microanalysis combining x-ray fluorescence, Compton and transmission tomography,” J. Appl. Phys. **94**, 145–156 (2003). [CrossRef]

**29. **B. Vekemans, L. Vincze, F. E. Brenker, and F. Adams, “Processing of three-dimensional microscopic x-ray fluorescence data,” J. Anal. At. Spectrom. **19**, 1302–1308 (2004). [CrossRef]

**30. **Z. Di, S. Leyffer, and S. M. Wild, “Optimization-based approach for joint x-ray fluorescence and transmission tomographic inversion,” SIAM J. Imaging Sci. **9**, 1–23 (2016). [CrossRef]

**31. **A. C. Kak and M. Slaney, *Principles of Computerized Tomographic Imaging* (SIAM, 2001). [CrossRef]

**32. **J. Sherman, “The theoretical derivation of fluorescent x-ray intensities from mixtures,” Spectrochimica Acta **7**, 283–306 (1955). [CrossRef]

**33. **T. Schoonjans, A. Brunetti, B. Golosio, M. S. del Rio, V. A. Solé, C. Ferrero, and L. Vincze, “The xraylib library for x-ray-matter interactions. Recent developments,” Spectrochim. Acta, Part B **66**, 776–784 (2011). [CrossRef]

**34. **W. T. Elam, B. D. Ravel, and J. R. Sieber, “A new atomic database for x-ray spectroscopic calculations,” Radiat. Phys. Chem. **63**, 121–128 (2002). [CrossRef]

**35. **J. A. Browne and T. J. Holmes, “Developments with maximum-likelihood x-ray computed tomography: initial testing with real data,” Appl. Opt. **33**, 3010–3022 (1994). [CrossRef] [PubMed]

**36. **T. J. Holmes and Y.-H. Liu, “Acceleration of maximum-likelihood image restoration for fluorescence microscopy and other noncoherent imagery,” J. Opt. Soc. Am. A **8**, 893–907 (1991). [CrossRef]

**37. **S. G. Nash, “A survey of truncated-Newton methods,” J. Comput. Appl. Math. **124**, 45–59 (2000). [CrossRef]

**38. **F. De Carlo, D. Gürsoy, F. Marone, M. Rivers, D. Y. Parkinson, F. Khan, N. Schwarz, D. J. Vine, S. Vogt, S.-C. Gleber, S. Narayanan, M. Newville, T. Lanzirotti, Y. Sun, Y. P. Hong, and C. Jacobsen, “Scientific Data Exchange: a schema for HDF5-based storage of raw and analyzed data,” J. Synchrotron Radiat. **21**, 1224–1230 (2014). [CrossRef] [PubMed]

**39. **B. Hornberger, M. D. de Jonge, M. Feser, P. Holl, C. Holzner, C. Jacobsen, D. Legnini, D. Paterson, P. Rehak, L. Strüder, and S. Vogt, “Differential phase contrast with a segmented detector in a scanning x-ray microprobe,” J. Synchrotron Radiat. **15**, 355–362 (2008). [CrossRef] [PubMed]

**40. **D. Gürsoy, F. De Carlo, X. Xiao, and C. Jacobsen, “TomoPy: a framework for the analysis of synchrotron tomographic data,” J. Synchrotron Radiat. **21**, 1188–1193 (2014). [CrossRef] [PubMed]

**41. **P. C. Hansen and D. P. O’Leary, “The use of the L-curve in the regularization of discrete ill-posed problems,” SIAM J. Sci. Comput. **14**, 1487–1503 (1993). [CrossRef]

**42. **S. G. Nash, “A multigrid approach to discretized optimization problems,” Optim. Methods Softw. **14**, 99–116 (2000). [CrossRef]