In-plane rotation classification for coherent X-ray imaging of single biomolecules

Kaiqin Chu; James Evans; Nina Rohringer; Stefan Hau-Riege; Alexander Graf; Matthias Frank; Zachary J. Smith; Stephen Lane

doi:10.1364/OE.19.011691

1. Introduction

With the advent of the X-ray free electron laser, it is possible to image a single biomolecule through coherent diffraction imaging [1–4], albeit at lower wavelengths. Many promising approaches have been suggested for accomplishing this goal [5–18]. In a typical coherent X-ray imaging experiment, single biomolecules are injected into a pulsed X-ray beam. The elastically scattered photons are recorded by a detector array placed downstream. Due to the ultrashort time scale of the pulse, the scattering pattern can be recorded before the molecule is destroyed [19]. The recorded scattering data are then sorted according to the orientations of the biomolecules. Signal strength can be improved by averaging shots of the biomolecules having the same orientation, which will benefit subsequent steps of phase retrieval and image reconstruction. Due to the small size of the biomolecule and small elastic scattering cross-section of the electron, we encounter two problems for the coherent X-ray diffraction imaging. One is that the detected signal will be mostly dominated by Poisson noise. This is especially the case when the scattering angle is large, as the scattering is mostly in the forward direction for coherent illumination. The other difficulty is that the biomolecule itself is intrinsically a 3D structure and its orientation is not controlled. Thus, orientation classification is the first crucial step of data processing in coherent X-ray imaging.

Early work on classification was based on large scattering angle data where the diffraction pattern is modeled as a speckle pattern [12, 20]. In practice, this is primarily true for large-angle scattering where the detected photon level is very low with the current or near future X-ray photon budgets. Thus the classification schemes based on large-angle-scattering-data are not currently practical. Recently there are two interesting developments where Expectation-Maximization algorithms is utilized for classification [21, 22]. The signal level for successful classifications is much reduced. Another imaging scheme is to simultaneously inject many randomly oriented biomolecules [23]. The scattered photons are much more and the orientation classification is unnecessary in this setup.

In this paper we focus on scattering by a single particle and propose a classification scheme based on patterns of low scattering angles where a substantial number of photons can be detected. Further the comparison among different rotations is not directly based on the pattern itself but on compressed signals which are the radial and azimuthal distributions of the scattering pattern. As both of these distributions are signals integrated over many pixels, the variance in the radial or azimuthal distribution is much smaller than that in the individual pixels. Consequentially a classification can be relatively immune to the shot noise and the results of the classification can be trustworthy. An example of utilizing those compressed signals is given in this paper for classification of in-plane rotations (i.e., relative orientations between biomolecules are within a plane parallel to the detector plane).

2. Weak elastic scattering process and the issue of shot noise

In a typical setup for coherent diffraction imaging of single biomolecules, the samples are injected into the waist of the focused coherent X-ray beam and a detector is placed downstream to record the scattering pattern. The scattered pattern recorded by the detector array, I(x, y), can be written as

I (x, y) = Φ_{inc} r_{e}^{2} Ω_{pix} {| \sum_{n = 1}^{N_{atom}} f_{n} (x_{n}^{'}, y_{n}^{'}, z_{n}^{'}) exp (j 2 π \frac{x x_{n}^{'} + y y_{n}^{'}}{λ z}) |}^{2},

where r_e is the free electron scattering length, Φ_inc is the incident fluence and Ω_pix is the solid angle of one detector pixel extended with respect to the biomolecule. The coordinates and atomic form factor of the n^th atom in the biomolecule are described by (

x_{n}^{'}

,

y_{n}^{'}

,

z_{n}^{'}

) and f_n, respectively. The pixel coordinates for the detector array are (x,y,z) and the wavelength of the X-ray is lambda. The center region of the detector array has a hole to pass the unscattered photons. In this paper we are interested in the small-angle scattering patterns, the variation of the atomic form factor over the scattering angles can thus be ignored and f_n is assumed to be a real function. The DNA polymerase delta (PDB ID:3IAY, molecular weight: 113822; total number of atoms: ∼ 8.2K) is used as an example of a typical biomolecule to calculate the diffraction pattern and perform orientation classifications.

In this paper we also assume that the noise is mainly shot noise, the detected photon number in any pixel is a Poisson distributed random variable whose mean value can be calculated from Eq. (1). In Fig. 1 we show a typical scattering pattern by the single biomolecule when the incident photon number is infinite (noise free), 2 × 10¹⁴ and 2 × 10¹² per pulse. The beam is assumed to be 100 nm and the X-ray wavelength is 1.5Å. The detector array is 150 mm away from the sample and the pixel size is 0.11 mm.

Fig. 1 Typical simulated scattering patterns of a DNA polymerase delta (PDB ID:3IAY). The data are shown in logarithmic scale. The radius of the hole in the center of the detector array is 3.72 mm. Left: “noise free” pattern; Middle: incident photon number is 2 × 10¹⁴ and the number of collected photons is 818875; Right: incident photon number is 2 × 10¹² and the number of collected photons is 8114.

Download Full Size | PDF

From Fig. 1 we can see that for the “noise free” case, the scattering pattern looks like speckles in regions corresponding to large scattering angles. However, with a finite incident photon number such as 2 × 10¹⁴ photons per pulse, the contrast of the speckles is diminished and some even disappear. For the 2 × 10¹² photons per pulse case, which is the current available level at LCLS, the number of photons scattered with scattering angles larger than 2° is so little that basically all speckles disappear. It will be very difficult to distinguish any exposures from each other, not to mention conducting orientation classification with this incident photon level.

Consider an annular area of the scattering pattern with radius ρ_min ≤ ρ ≤ ρ_max. We divide this area into regions of radial or azimuthal segments as shown in Fig. 2(a) and 2(b). The boundaries between these radial segments are rings with radii ρ ₁, ρ ₂,..., ρ_N, which can be defined as

\begin{array}{l} ρ_{n + 1} & = & ρ_{n} + Δ ρ_{n}, n = 0, 1, 2, \dots N - 1; \\ Δ ρ_{n + 1} = (1 + {(\frac{Δ ρ_{n}}{ρ_{n}})}^{2}) Δ ρ_{n}, \end{array}

where ρ ₀ = 3.72mm, Δρ ₀ = 0.66mm. We see that the width of the outer ring is larger than the inner rings. This will help to compensate for the decreasing signal strength with increasing radius. The radial distribution, I_ρ (n), can be written as,

I_{ρ} (n) = \frac{\int_{ρ_{n}}^{ρ_{n + 1}} \int_{0}^{2 π} I (ρ, θ) d θ ρ d ρ}{\int_{ρ_{min}}^{ρ_{max}} \int_{0}^{2 π} I (ρ, θ) d θ ρ d ρ}, n = 0, 1, 2, \dots N - 1

Note that I_ρ is normalized with respect to the total scattered photons in the effective detection area. In this paper we choose N = 60, which is able to significantly reduce the variance in the radial distribution and still give us sufficient sensitivity to rotations as we will see in the following parts of this paper.

Fig. 2 Detector array is divided into radial (a) or azimuthal segments (b).

Download Full Size | PDF

In the simulation study we choose δθ = 1°. As the scattering pattern is symmetric about the origin, we limit the value of θ to be within [0, 180°]. The azimuthal distribution can be written as:

I_{θ} (n) = \frac{\int_{θ_{n}}^{θ_{n + 1}} \int_{ρ_{min}}^{ρ_{max}} I (ρ, θ) ρ d ρ d θ}{\int_{0}^{π} \int_{ρ_{min}}^{ρ_{max}} I (ρ, θ) ρ d ρ d θ}, n = 0, 1, 2, \dots 179.

In order to study the effect of the noise, the mean value of the radial and azimuthal distributions are first calculated from the “noise free” diffraction patterns. The noisy versions of the radial or azimuthal distribution are calculated from the noisy diffractions when the incident photon number is 2 × 10¹⁴ and 2 × 10¹² respectively. A typical radial and azimuthal distributions and their mean values are shown in Fig. 3(a) and 3(b) respectively. We see that when the incident photon number is 2 × 10¹⁴, the noisy version of the radial and azimuthal distribution is similar to its mean value. However when the incident photon number is 2 × 10¹², the noisy version of the radial and azimuthal distribution deviates substantially from their corresponding mean values, which will affect the reliability of the classification adversely.

Fig. 3 Typical radical (a) and azimuthal (b) distributions. Three cases are studied: the mean value case (dotted line); the 2 × 10¹⁴ photon number case (solid line) and the 2 × 10¹² photon number case (dashed line).

Download Full Size | PDF

In Figs. 4 we show surface plots when the biomolecule rotates about the x, y, and z axes. We see that with rotations about x or y, both the radial and the azimuthal distributions change smoothly with rotation angles. For rotations about z, the radial distribution stays the same while the azimuthal distribution shifts continuously with the rotation angle. Thus rotation about z (also called an in-plane rotation) can be recognized by studying the evolution of the radial distributions, and the rotation angles can be identified by finding the amount of the shift of the azimuthal distribution. For rotations other than along the z-axis, both the radial and azimuthal distributions are going to change smoothly with the rotation angles. Thus a distance between two distributions, e.g. Euclidean distance, can be a measure describing the global changes during continuous rotation. A small distance between distributions means that the two distributions are quite similar to each other and the relative rotation angle between them is small. In a separate simulation we modified a large biomolecule (PDB ID: 3I55, size ∼ 200Å) by removing all atoms greater than 90Å from the centroid and showed that the same process can be easily applied to spherical molecules.

Fig. 4 Surface plots of the radial (left column) and azimuthal (right column) distributions of the scattering patterns when the biomolecule rotates about the x-axis (a,b), y-axis (c,d), and z-axis (e–f). The incident photon number is 2 × 10¹⁴.

Download Full Size | PDF

3. Euler angles and the possibility of in-plane rotation

In the previous sections we have devised a method to preprocess the scattering data to improve the data consistency for the classification of orientations. In this section, we will use introduce Euler angles, which have been widely used in the cryo-EM field, to represent arbitrary orientations [24–26].

An arbitrary rotation of an angle ϕ about an arbitrary axis can be described by three consecutive rotations, i.e.,

R (ϕ) = R_{z} (γ) R_{y} (β) R_{z} (α); - π \leq α, γ \leq π, 0 \leq β \leq π,

where R is the rotation matrix that depends on the rotation axis and rotation angle ϕ. The first rotation is about the intrinsic z-axis, the second rotation is about the intrinsic y-axis and the last one is again a rotation about the intrinsic z-axis.

As the low-angle scattering pattern is proportional to the square of the amplitude of the Fourier transform of the scattering potential, which is a real and even function, we can limit the ranges of α, β and γ to [0, π].

The last γ-rotation is the so-called in-plane rotation because the z-axis of the intrinsic coordinate and the z-axis of the laboratory coordinates are co-axial. If we can classify the in-plane rotation out of all possible rotations and know the angle of the in-plane rotation of a selected pattern with respect to a reference pattern, we can rotate that pattern to overlap with the reference pattern. This will give us an opportunity to increase the SNR by averaging over all the in-plane rotations. The extent of the SNR improvement depends on the number of in-plane rotations.

Consider the random orientation of a single biomolecule when injected into the passage of the X-ray beam, the probability of the orientation which can be described by the Euler angles (α, β, γ) and can be written as [27]

p (α, β, γ) = \frac{sin β}{8 π^{2}} .

Due to the symmetry of the scattering pattern, the effective ranges for Euler angles are within [0, π], thus we can write the effective probability density function in terms of α, cosβ, γ as

p_{eff} (α, cos β, γ) = \frac{1}{2 π^{2}}, 0 \leq α, β, γ \leq π .

Thus the probability density function is uniform within a rectangular volume of 2π ² with Euler rotations defined by α, γ and cosβ. With a total number N_total of random rotations, the number density for α and γ is $N_{total}^{1 / 3} / pi$ and $N_{total}^{1 / 3} / 2$ for cos β. Thus the mean number of in-plane rotations is expected to be:

N_{inplane} = \frac{N_{total}^{1 / 3}}{n} .

where n is the number of classes for each Euler angles. Thus the SNR of averaging over all the in-plane rotations can be further improved by

\frac{N_{total}^{1 / 6}}{\sqrt{n}}

times compared to averaging over only same orientations.

4. In-plane rotation recognition through the radial distribution

We can also see from the surface plots shown in Fig. 4(e) that the radial distribution stays unchanged with continuous rotation about z. Thus, we may identify the in-plane rotation by studying the radial distribution.

Consider a distance defined as

D_{ρ} (I_{ref}, I_{test}) = {| \sum_{n = 1}^{N} {(I_{ρ}^{ref} (n) - I_{ρ}^{test} (n))}^{2} |}^{1 / 2},

where I_test and I_ref are the scattering pattern before and after the in-plane rotation, respectively,

I_{ρ}^{ref}

and

I_{ρ}^{test}

are the radial distributions calculated according to Eq. (3). This “radial” distance should be zero if there is no shot noise, but with shot noise unavoidable, this distance should still be minimum if it is to be an effective classification measure. Thus we start from the biomolecule in an initial orientation, compute the scattering pattern according to Eq. (1), include the Poisson distributed noise and compute the corresponding radial distribution. This radial distribution is taken as the reference distribution

I_{ρ}^{ref}

. Then we rotate the biomolecule along y- and z-axis by angles from 0 to 180°. The corresponding noisy scattering pattern I_test and radial distribution

I_{ρ}^{test}

are computed in the same way as I_ref and

I_{ρ}^{ref}

. Then the distance between

I_{ρ}^{test}

and the reference distribution

I_{ρ}^{ref}

is computed according to Eq. (9). Clearly this distance is a function of the rotation of the test pattern with respect to the reference pattern. We repeat the distance calculation 100 times independently and calculate the mean and variance of the distance from those results (Fig. 5). In order to understand the effect of the incident photon number on the distance calculation, we have used two incident photon number levels: 2 × 10¹⁴ (Fig. 5(a)) and 2 × 10¹² (Fig. 5(b)). We see that with both cases, the mean value of the distance maintains the lowest level for the in-plane rotations and increases with the out-of-plane rotation (rotation about y-axis) in general. Zoomed-in versions of the distance curves around the reference position are plotted in Fig. 5(c) and 5(d). We can see that for the case of 2 × 10¹⁴ incident photon number, the variance in the distance is so small that we can differentiate even a half degree of out-of-plane rotation from in-plane rotations with high certainty. This means that the sensitivity of the above classification is about 0.5°. With less incident photon number, the scattering patterns are more noisy and have a larger variance (Fig. 5(d)). The sensitivity of the classification is about ±2.5° in this case.

Fig. 5 The radial distance as a function of the rotation angle when the protein rotates about y, and z respectively. The curve for rotation along z is shown in pink; in blue for rotation along y. Two incident photon levels are studied here: (a) 10¹⁴; (b) 10¹²; The mean values of the distances are shown in the solid lines; The color-shaded areas represent ±σ around the mean values for both the rotation about y and z respectively. σ is the variance. (c)–(d) are zoomed-in versions of (a) and (b) respectively.

Download Full Size | PDF

5. Finding the in-plane rotation angles through the azimuthal distribution

After having identified scattering patterns corresponding to in-plane rotations, we can find the angle of the in-plane rotation by studying the azimuthal distribution I_θ. This angle information is needed if we hope to average over in-plane rotations. In this section we describe a simple method to find this angle.

If $I_{θ}^{ref}$ is the azimuthal distribution of the reference pattern and $I_{θ}^{test}$ is the azimuthal distribution of the test pattern with only an in-plane rotation γ with respect to the reference pattern, then these two distributions should be shifted with respect to each other. This can be clearly seen in the surface plot shown in Fig. 4(f). The amount of the shift should be the angle of the in-plane rotation of the biomolecule with respect to its original position.

Consider a new function, $I_{θ}^{'}$ , which is a circularly-shifted version of $I_{θ}^{test}$ , i.e.,

I_{θ}^{'} (θ, θ_{shift}) = I_{θ}^{test} (θ - θ_{shift}),

where θ_shift is the shift angle. When the shift angle equals the in-plane rotation angle γ, the shifted azimuthal distribution

I_{θ}^{'}

shall overlap with the reference azimuthal distribution

I_{θ}^{ref}

. Thus we can define a distance between

I_{θ}^{'}

and

I_{θ}^{ref}

as:

D_{θ} (I_{ref}, I_{test}, θ_{shift}) = {| \int_{0}^{π} {(I_{θ}^{ref} - I_{θ}^{test} (θ - θ_{shift}))}^{2} d θ |}^{1 / 2},

where I_ref and I_test are the scattering patterns before and after the in-plane rotation, respectively. This distance is a function of the shift angle θ_shift. When the shift angle is exactly the in-plane rotation angle, this distance should be zero if there is no shot noise. With shot noise being unavoidable, this distance should reach the minimum when θ_shift = γ. For example, when the in-plane rotation is 90 degrees, we compute the distance according to Eq. (11) as a function of the shift angle θ_shift. In order to study the effect of the incident photon number on the distance, we have again studied two cases, 2 × 10¹⁴ and 2 × 10¹² (Fig. 6). We can see that the distance reaches the minimum when the shift angle equals the in-plane rotation angle for both incident photon number levels. Note the steepness of the curves around the in-plane rotation angle, indicating that the distance is quite sensitive to the shift angle. In this example, it is about 0.5°, which is much smaller than the sensitivity for identifying the in-plane rotations from all possible rotations. Thus in the next few sections, we will use the sensitivity of the radial distance as the limit of the overall sensitivity of in-plane rotation classification.

Fig. 6 Azimuthal distance reaches minimum at the in-plane rotation angle. Two incident photon number levels are studied: 2 × 10¹⁴(solid line) and 2 × 10¹²(dashed line).

Download Full Size | PDF

With in-plane rotations classified (section 5) and the angle of in-plane rotation determined as described in this section, we can rotate those in-plane rotated patterns to overlap the reference patterns, and average those patterns to improve the signal-to-noise ratio. In Fig. 7 we show a typical single shot of the scattering pattern and the averaged pattern from 19 shots with different in-plane rotations. Comparing these two images we can clearly see that the signal-to-noise ratio is improved after the averaging, especially in regions away from the center.

Fig. 7 (a) Scattering pattern from a single measurement; (b) Scattering pattern after averaging over 19 patterns with different in-plane rotations; (c) “Noise free” pattern.

Download Full Size | PDF

6. Accuracy of the classification vs photon budget

As we saw in the previous sections, the simple distances can be used to compare the radial or azimuthal distributions to distinguish between in-plane rotations and out-of-plane rotations, and to find the degree of in-plane rotation. In this section we are going to study the performance of the classification scheme for different incident photon numbers. To be specific, we want to know the ratio of the correct classification results from the results of the proposed classification method.

In order to visualize the success rate of the proposed algorithm, we have generated a number of random orientations and their corresponding diffraction patterns, then used the method described in Section 5 to find the in-plane rotations. The Euler angles for those orientations are limited to {0°, 180°}. Twenty random values for each α, cosβ, γ are generated independently. Thus a total number of orientations is 20 × 20 × 20 = 8000, ignoring the inconsequential fact that the ranges for those three random variables (α, cosβ, γ) are not all equal to each other. The diffraction patterns for the protein with those orientations are generated using Eq. (1). Then we select an arbitrary pattern as a reference frame and assign the rest of the patterns to be the test patterns. The corresponding radial distribution for the reference frame and the test patterns are computed according to Eq. (3). Then the distance is calculated using Eq. (9) to compare the reference and test frames.

Consider a total number N_total of patterns from randomly oriented biomolecules, the total number of pairwise distances, N_d, will be N_total × (N_total – 1) if each pattern is compared to other patterns in the data set. Note that the comparison between any two patterns are performed twice since each one has the opportunity to be a reference pattern. We also assume that the number of random values for each Euler angle is $N_{total}^{1 / 3}$ . Thus the percentage for in-plane rotations among all possible rotations shall be $N_{total}^{1 / 3} / N_{total}$ . The total number of distances that describe in-plane rotations, $N_{in}^{d}$ , is expected to be

N_{in}^{d} = N_{total} \times (N_{total} - 1) \times \frac{N_{total}^{1 / 3}}{N_{total}} .

As the distances describing the in-plane rotations should be zero if there is no noise in the experiment, but with noise, we choose

N_{in}^{d}

shortest distances as arising from comparison of two patterns who differ by at most an in-plane rotation. Of course, the noise inherent in the frames will tend to mix the in-plane rotations and out-of-plane rotations, especially those patterns with small out-of-plane rotations.

As we have generated the patterns and know the orientations of the test frames, we can verify the results of the classification with the known information. Thus we can define the accuracy of the classification as:

η = \frac{N_{true}}{N_{classified}},

where N_true and N_classified are the number of true and assigned in-plane rotations, respectively. This accuracy rate can provide a quantitative measure of the classification performance. For a certain classification scheme, it depends mainly on two factors, noise level and the size of the orientation classes. For patterns that were found to be in-plane rotations, their values for α and β should be identical in the ideal case, but with noise, we say the classification is accurate if α and β differ by an amount smaller than the size of the orientation class. In this section we will test the class size from 1° to 5°. In Table 1 we list the accuracy of these simulation results for different incident photon numbers and different orientation class sizes.

Table 1. Accuracy of the In-Plane Rotation Classification^a

View Table

We can see that even with the current incident level, the success of the classification can be as high as 79% with a class size of 1°, which is required for atomic resolution. If we can relax the requirement on the sensitivity of the classification, we can achieve 92% accuracy for the class size of 3° and 97% accuracy for 5°. In this way we can break the data set into smaller groups and use more sophisticated algorithms such as described in [21] or [22] to find additional information. Research in this direction is underway. With higher photon levels, almost 99% accuracy can be achieved for class size of 1°. The accuracy from a mere guess is expected to be N_inplane/N_total, which is only 0.25% in this case.

Note that with the limited number of generated random orientations due to available computing speed, the orientations which are close to each other in the R ³ space may not have been fully simulated in the above example. Thus we perform another round of simulations where the Euler angles are limited to α ∈ (0, 10°), cosβ ∈ (0.9, 1). As the radial distributions are invariant to in-plane rotations, we choose the limit on the in-plane rotation angle still to be (0, 180°). For each Euler angle we generate 10 random numbers resulting in 1000 patterns. With a much smaller range of Euler angles, the percentage for two orientations whose Euler angles (α_i, β_i, i = 1, 2) are within 1° becomes much higher. In our simulation, the percentage is about 48%, i.e., almost half of the orientations have a similar orientation. The accuracy in this small data set can be lower compared to the entire data set. In Table 1 we list the accuracy (numbers inside the parenthesis) for different photon numbers and orientation class sizes.

We see that even in this smaller data set where almost half of the orientations have a nearby orientation, the success of the classification is still almost 50%. Note that with a smaller data set, the expected success rate by a mere guess is bigger. In the case we have described above, it is about 10/1000 = 1% and so the proposed classification scheme still gives a much better result.

7. Discussion and summary

In this paper we have shown a method to compress the 2-dimensional diffraction patterns into 1-dimensional distributions, namely, radial and azimuthal distributions. With the compressed signals, the variance is greatly reduced. Those signals can be utilized as a first step in processing the data from coherent X-ray experiments. As an example, we have used a simple distance measure computed from the radial and azimuthal distributions to compare the test frame and the reference frames. As the radial distributions are invariant to in-plane rotations, we can use the radial distribution to find the in-plane rotations. The data is divided into classes of (α, β). With the proposed classification scheme we can achieve 79% accuracy even with current incident photon number. With a protein that is bigger than used in this paper or possesses additional symmetries, a good classification may be achieved with an even lower incident photon level. We have performed two simulations to test the performance of the proposed classification scheme. The performance for a real situation should be between these two cases.

With the degree of in-plane rotation identified, we can rotate one pattern to overlap with the corresponding reference pattern. This provides us an opportunity to be able to average over more patterns improving the signal-to-noise level further compared to averaging over only patterns with the same orientations. The method is non-iterative in nature, which makes it a good candidate for data preprocessing.

Another interesting aspect of the suggested compression method is that the radial distribution is invariant to in-plane rotations. This property could be used in conjunction with the manifold method described in [22]. A hyper-space constructed from the radial distribution shall exhibit a 2-manifold depending only on α and β. With fewer parameters to estimate, searching the manifold will be easier, which could potentially lower the required photon level even more. Further as the variance in the radial distributions is much smaller than those in the individual pixels, the searching of the 2-manifold will be more robust to noise, thus offering us a possibility to work with even fewer photons per pixel.

As far as atomic resolution is concerned, it is associated with large-angle scattering data, which lies on the Ewald sphere. As the signal-to-noise ratio is extremely low in these regions, averaging diffraction patterns within the same orientation class is necessary to obtain meaningful large-angle-scattering data. If the size of the orientation class is large, information will be lost during averaging. For a biomolecule of size a, the maximum displacement of the same atom within one orientation class of size δθ can be estimated as

| δ r | = \frac{a}{2} δ θ .

In order to maintain atomic resolution, the displacement of atoms within one class should be smaller than an angstrom. Thus for a biomolecule of size about 10 nm, such as the one considered in this paper, the size of the orientation class should be around 1.2°. From Table 1 we see that the classification scheme proposed in this paper can divide the data into orientation classes with size of 1°, thus maintaining the atomic resolution desired for coherent X-ray imaging.

In this paper, the biomolecules are assumed to be rigid during the injection and measuring time. Other than their orientations and relative positions in the X-ray beam, the biomolecules are also assumed to be identical, i.e. no shot-to-shot variation of the structure or conformation is considered. Different conformations of the molecules will result in different diffraction patterns. The small-angle scattering data and the resulting radial and azimuthal distributions will encode information on different confirmations [28]. The classification scheme proposed in this paper will hence distinguish different classes to different conformations. Whether a certain class belongs to an out-of-plane orientation of the reference molecule in the same conformation or an orientation class of the molecule in a different conformation can be addressed in the next step of the classification. One possible method for such purpose is the manifold method [22]. A general heterogeneous sample may not be solely classified by means of the low-angle scattering data, large-angle-scattering may have to be additionally included in the classification algorithm. This is a research direction we are currently pursuing.

A common technique for solving the orientation problem in cryo-EM is the common line approach [24–26]. Because of the extremely low signal-to-noise level, this method is not applicable in current x-ray diffraction experiments. Current classification work on X-ray diffraction data [21, 22] is primarily focused on the direct 2D diffraction patterns. We propose to compress the diffraction data by radial and azimuthal integrals, so that the orientation information can still be extracted, but the variance is greatly reduced. As we have shown in Section 2, we can use the simple difference measure as described in Section 4 and 5 for classification of in-plane rotations. We can also use more elaborate methods for a complete orientation classification, such as [21, 22], with the advantage of improved SNR.

In summary, the radial and azimuthal distributions of scattering patterns from single biomolecules has shown a great success in classifying in-plane rotations by using a simple distance measure. With more sophisticated algorithms such as the manifold method, we may be able to perform a complete orientation classification and produce the expected diffraction patterns with enough sensitivity to obtain atomic resolution.

Acknowledgment

This work is funded by UCOP LAB FEE. This work is also performed under the auspices of the US Department of Energy by Lawrence Livermore National Laboratory under the contract DE-AC52-07NA27344.

References and links

1. D. Sayre, H. N. Chapman, and J. Miao, “On the extendibility of x-ray crystallography to noncrystals,” Acta Crystallogr., Sect. A: Found. Crystallogr. 54, 323–239 (1998). [CrossRef]

2. J. Miao, Charalambous, J. Kirz, and D. Sayre, “Extending the methodology of x-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens,” Nature 400, 342–344 (1999). [CrossRef]

3. R. Neutze, R. Wouts, D. van der Spoel, E. Weckert, and J. Hajdu, “Potential for biomolecular imaging with femtosecond x-ray pulses,” Nature 406, 753–757 (2000). [CrossRef]

4. S. Marchesini, H. Chapman, S. P. Hau-Riege, R. A. London, A. Szoke, H. He, M. R. Howells, H. Padmore, R. Rosen, J. C. H. Spence, and U. Weierstall, “Coherent x-ray diffractive imaging: applications and limitations,” Opt. Express 11, 2344–2353 (2003). [CrossRef] [PubMed]

5. J. Miao, D. Sayre, and H. N. Chapman, “Phase retrieval from the magnitude of the Fourier transforms of nonperiodic objects,” J. Opt. Soc. Am. A 15, 1662–1669 (1998). [CrossRef]

6. I. Robinson, I. Vartanyants, G. Williams, M.A. Pfeifer, and J. Pitney, “Reconstruction of the shapes of gold nanocrystals using coherent x-ray diffraction,” Phy. Rev. Lett. 87(19), 195505 (2001). [CrossRef]

7. J. Miao, T. Ishkawa, B. Johnson, E. H. Anderson, B. Lai, and K. O. Ohodgson, “High resoluton 3D x-ray diffraction microscopy,” Phys. Rev. Lett. 89, 088302 (2002). [CrossRef]

8. J. C. H. Spence, U. Weierstall, and M. Howells, “Phase recovery and lensless imaging by iterative methods in optical x-ray and electron diffraction,” PPhilos. Trans. R. Soc. London, Ser. A 360, 875–895 (2002). [CrossRef]

9. Y. Nishino, J. Miao, and T. Ishikawa, “Image reconstruction of nanostructured nonperiodic objects only from oversampled hard x-ray diffraction intensities,” Phys. Rev. B 68, 220101 (2003). [CrossRef]

10. S. Marchesini, H. He, H. N. Chapman, S. Hau-Riege, A. Noy, M. R. Howells, U. Weierstall, and J. C. H. Spence, “X-ray image reconstruction from a diffraction pattern alone,” Phy. Rev. B 68, 140101 (2003). [CrossRef]

11. G. Williams, M. A. Pfeifer, I. Vartanyants, and I. Robinson, “Three-dimensional imaging of microstructure in au nanocrystals,” Phys. Rev. Lett. 90, 195505 (2003).

12. G. Huldt, A. Szoke, and J. Hajdu, “Diffraction imaging of single particles and biomolecules,” J. Struct. Biol. 144, 219–227 (2003). [CrossRef] [PubMed]

13. S. Hau-Riege, H. Szoke, H. N. Chapman, A. Szoke, S. Marchesini, A. Noy, H. He, M. Howells, U. Weierstall, and J. C. H. Spence, “Speden: reconstructing single particles from their diffraction patterns,” Acta Crystallogr. Sect. A: Found. Crystallogr. 60, 294–305 (2004). [CrossRef]

14. D. Shapir, P. Thibault, T. Beetz, V. Elser, M. Howells, C. Jacobsen, J. Kirz, E. Lima, H. Miao, A. M. Neiman, and D. Sayre, “Biological imaging by soft x-ray diffraction microscopy,” Proc. Natl. Acad. Sci. U.S.A. 102(43), 15343–15346 (2005). [CrossRef]

15. H. N. Chapman, A. Barty, S. Marchesini, A. Noy, S. P. Hau-Riege, C. Cui, M. R. Howells, R. Rosen, H. He, J. C. H. Spence, U. Weierstall, T. Beetz, C. Jacobsen, and D. Shapiro, “High-resolution ab initio three-dimensional x-ray diffraction microscopy,” J. Opt. Soc. Am. A 23, 1179–1200 (2006). [CrossRef]

16. H. N. Chapman, S. Bajt, A. Barty, W. H. Benner, M. J. Bogan, M. Frank, S. P. Hau-Riege, R. A. London, S. Marchesini, E. Spiller, A. Szke, and B. W. Woods, “Ultrafast coherent diffraction imaging with x-ray free-electron lasers,” Proc. FEL , 805–811 (2006).

17. J. M. Rodenburg, A. C. Hurst, A. G. Cullis, B. R. Dobson, F. Pfeiffer, O. Bunk, C. David, K. Jefimovs, and I. Johnson, “Hard x-ray lensless imaging of extended objects,” Phy. Rev. Lett. 98, 034801 (2007). [CrossRef]

18. C. Song, H. Jiang, A. Mancuso, B. Amirbekian, L. Peng, R. Sun, S. S. Shah, Z. H. Zhou, T. Ishikawa, and J. Miao, “Quantitative imaging of single, unstained viruses with coherent x rays,” Phys. Rev. Lett. 101, 158101 (2008). [CrossRef] [PubMed]

19. M. M. Seibert, T. Ekeberg, F. R. N. C. Maia, M. Svenda, J. Andreasson, O. Jonsson, D. Odic, B. Iwan, A. Rocker, D. Westphal, M. Hantke, D. P. DePonte, A. Barty, J. Schulz, L. Gumprecht, N. Coppola, A. Aquila, M. Liang, T. A. White, A. Martin, C. Caleman, S. Stern, C. Abergel, V. Seltzer, and J.-M. Claverie, “Single mimivirus particles intercepted and imaged with an x-ray laser,” Nature 470, 78–81 (2011). [CrossRef] [PubMed]

20. G. Bortel and G. Faigel, “Classification of continuous diffraction patterns: a numerical study,” J. Struct. Biol. 158, 10–18 (2007). [CrossRef]

21. N.-T. D. Loh and V. Elser, “Reconstruction algorithm for single-particle diffraction imaging experiments,” Phy. Rev. E 80, 026705 (2009). [CrossRef]

22. R. Fung, V. Shneerson, D. K. Saldin, and A. Ourmazd, “Structure from fleeting illumination of faint spinning objects in flight,” Nat. Phys. 5, 64–67 (2009). [CrossRef]

23. D. K. Saldin, V. L. Shneerson, R. Fung, and A. Ourmazd, “Structure of isolated biomolecules obtained from ultrashort x-ray pulses: exploiting the symmetry of random orientations,” J. Phys.: Condens. Matter 21, 134014 (2009). [CrossRef]

24. M. van Heel, “Angular reconstruction: a posteriori assignment of projection directions for 3d reconstruction,” Ultramicroscopy 21, 111–124 (1987). [CrossRef] [PubMed]

25. N. A. Farrow and F. P. Ottensmeyer, “A posteriori determination of relative projection directions of arbitrarily oriented macromolecules,” J. Opt. Soc. Am. A 9, 1749–1760 (1992). [CrossRef]

26. M. van Heel, “Single-particle electron cryo-microscopy: towards atomic resolution,” Q. Rev. Biophys. 33, 307–369 (2000). [CrossRef]

27. R. Miles, “On random rotations in r³,” Biometrika 52, 636–639 (1965).

28. J. Lipfert, D. Herschlag, and S. Doniach, “Riboswitch conformations revealed by small-angle x-ray scattering,” Methods Mol. Biol. 540, 141–159 (2009). [CrossRef] [PubMed]

	Incident photon number
Size of the orientation class	2 × 10¹⁴	2 × 10¹³	2 × 10¹²
1°	98.9% (84.9%)	97.6% (78.0%)	79.3% (49.3%)
2°	99.5% (93.6%)	98.9% (90.3%)	86.2% (73.3%)
3°	99.5% (95.9%)	99.4% (94.5%)	92.1% (82.2%)
4°	99.5% (96.9%)	99.4% (95.9%)	95.0% (85.9%)
5°	99.5% (97.2%)	99.4% (96.8%)	97.3% (90.5%)

In-plane rotation classification for coherent X-ray imaging of single biomolecules

Abstract

1. Introduction

2. Weak elastic scattering process and the issue of shot noise

3. Euler angles and the possibility of in-plane rotation

4. In-plane rotation recognition through the radial distribution

5. Finding the in-plane rotation angles through the azimuthal distribution

6. Accuracy of the classification vs photon budget

7. Discussion and summary

Acknowledgment

References and links

Cited By

Figures (7)

Tables (1)

Equations (14)

Optics Express